Skip to content
Snippets Groups Projects

Simplify stream workflow

Merged Patrick Godwin requested to merge stream_updates into main
11 files
+ 405
1377
Compare changes
  • Side-by-side
  • Inline
Files
11
@@ -152,36 +152,12 @@ As such, you will also have to manage a config file for your synthetic data in a
Streaming Tasks
====================================================================================================
In order to run the streaming pipeline, you will need to have an instance of Kafka
running in the background, which is needed for ``KafkaReporter`` to run
correctly. This can be done in one of the following two ways:
1. Connect via an already-running instance of Kafka, set up in an existing LDG cluster.
* If this option is available, no further configuration is necessary and the default iDQ configuration
file will contain everything needed to get up and running.
2. Run an instance of Kafka as a background process.
* Running a separate instance requires Kafka to be installed, as well as its C and Python bindings,
librdkafka and confluent-python-kafka, respectively. These dependencies can all be installed using
the Makefile provided in /etc.
* Sourcing the environment script that is built alongside the iDQ dependencies, one can start up background
instances of Kafka and Zookeeper as follows, which takes in a path to their respective configuration files:
a. zookeeper-server-start.sh zookeeper.properties
b. kafka-server-start.sh kafka.properties
Both instances take a few seconds to start up. Sample configuration files for both are provided within /etc.
TODO:
Describe how to manage (asynchronous) processes via ``idq-stream``.
Describe how that will manage
* ``idq-streaming_train``
* ``idq-streaming_calibrate``
* ``idq-streaming_evaluate``
* ``idq-streaming_calibrate``
* ``idq-streaming_timeseries``
Describe the input/output data streams for each.
Loading