KafkaReporter issues
I've cut some of the issues in reed.essick/iDQ#66 and moved them over to here because they've been identified to be specific to KafkaReporter and they may be related in some sense to each other.
stream.calibrate
:
-
Calibration maps are too big to be passed around Kafka using a KafkaReporter
as is. I've lowered the DEFAULT_NUM_MCMC to 100 as a way to get around this. The soft limit on a single buffer is 10 MB. Although I could raise this, I've been warned that trying to push around really big things around is mostly uncharted territory and would like to avoid it if possible.stream.timeseries
: -
It appears that since the recent commit, that for each stride something will stall within the process for a very long time (~ 1 minute), and then fall behind real time and not actually get the data needed, and repeat. This same process used to keep up with real time without any problems. Example below:
2018-09-28 20:22:30,385 | chunk_8-timeseries : INFO : using config : idq.ini
2018-09-28 20:22:30,386 | chunk_8-timeseries : INFO : approx_kernel_svm -> flavor:ApproximateKernelSVM safe_channels_path:safe_channel_list.txt window:0.1 time:trigger_time significance:snr num_proc:1
2018-09-28 20:22:30,386 | chunk_8-timeseries : INFO : classifier_data -> flavor:KafkaClassifierData ignore_segdb:True topic:synchronizer_online_test port:10.21.2.20:9182 poll_timeout:0.05 latency_timeout:10 retry_cadence:0.01 sample_rate:16 stride:1 direct:True time:trigger_time significance:snr columns:['trigger_time', 'snr', 'frequency'] feature_columns:['delta_t', 'snr']
2018-09-28 20:22:30,386 | chunk_8-timeseries : INFO : generating timeseries sampled at 256.000 Hz
2018-09-28 20:22:30,386 | chunk_8-timeseries : INFO : stream_processor -> cadence:1 delay:0.1 max_iters:2 max_latency:12
2018-09-28 20:22:33,896 | chunk_8-timeseries : INFO : starting streaming timeseries
2018-09-28 20:22:33,896 | chunk_8-timeseries : INFO : --- timeseries stride: [1222226568.000, 1222226569.000) ---
2018-09-28 20:23:35,515 | chunk_8-timeseries : WARNING : too far behind realtime at 1222226633, skipping ahead
2018-09-28 20:23:35,516 | chunk_8-timeseries : INFO : acquired 0.000 sec of data at 1222226633
2018-09-28 20:23:35,516 | chunk_8-timeseries : INFO : --- timeseries stride: [1222226633.000, 1222226634.000) ---
2018-09-28 20:24:37,330 | chunk_8-timeseries : WARNING : too far behind realtime at 1222226695, skipping ahead
2018-09-28 20:24:37,331 | chunk_8-timeseries : INFO : acquired 0.000 sec of data at 1222226695
2018-09-28 20:24:37,331 | chunk_8-timeseries : INFO : --- timeseries stride: [1222226695.000, 1222226696.000) ---
2018-09-28 20:25:39,345 | chunk_8-timeseries : WARNING : too far behind realtime at 1222226757, skipping ahead
2018-09-28 20:25:39,346 | chunk_8-timeseries : INFO : acquired 0.000 sec of data at 1222226757
General:
-
Issues with seeking in KafkaReporter
, where after a seek to the location where data is, it won't grab the data the first time around causing the pipeline to fail because of missing models / calibration maps.