Skip to content

Simplify stream workflow

Patrick Godwin requested to merge stream_updates into main

This PR attempts to simplify the streaming workflow, removing functionality that is unused, reducing complexity and addressing various LSP warnings. In particular:

  • Only allow a single classifier to run at one time for streaming. We ran in this configuration for O3 due to extra overhead in pickling when multiprocessing. This makes the 'block' workflow the single one that needs to be supported, and we can remove the 'fork' workflow entirely.
  • Remove KafkaReporter. This implementation had its host of problems and we switched to using the various DiskReporters for O3. Instead of improving its implementation, we call the DiskReporter-based implementations 'good enough' for use.
  • Remove custom Kafka Producer/Consumers in utils.py. This was only needed due to pickling (why?) but since we don't need to support the fork workflow, this is not needed anymore. Instead, fold in the relevant functionality to the single place that uses it, SNAXKafkaDataLoader.
  • Remove passing in logger to StreamProcessor, as it's pointless to do so.
  • Flatten conditionals in SNAXKafkaDataLoader.
  • Actually raise errors in streaming reporting jobs instead of giving a warning. Not sure why this was done in the first place.
  • Address flaky StreamProcessor tests.

Closes #67 (closed).

Edited by Patrick Godwin

Merge request reports

Loading