KafkaReporter throws 'reported timestamp greater than x' in streaming jobs
Here is the traceback below:
File "/home/patrick.godwin/local/master/iDQ/bin/idq-streaming_train", line 41, in <module>
stream.train(config_path, gps_start=gps_start, gps_end=gps_end, **vars(opts))
File "/home/patrick.godwin/local/master/iDQ/lib/python2.7/site-packages/idq/stream.py", line 355, in train
path = trainreporter.report(train_nicknames[nickname], model, preferred=True) ### report the resulting model
File "/home/patrick.godwin/local/master/iDQ/lib/python2.7/site-packages/idq/io.py", line 1823, in report
raise ValueError('KafkaReporter has already reported a timestamp larger than self.start=%.3f'%self.start)
ValueError: KafkaReporter has already reported a timestamp larger than self.start=1221006861.000
What's going on is that for any streaming job, initially the jobs have a start and end time to be set to some large range. Then when you update the start and end bounds of say, a training job, it'll check whether the timestamp of this range gets smaller or not, but it's defined as (start + end) / 2. This means that if you intend to update these bounds from the original bounds, the new timestamp will always be smaller than the old timestamp, KafkaReporter.old
will be flagged and you won't be able to pass data in ever.
Edited by Patrick Godwin