Get data from kafka (!554) · Merge requests · lscsoft / bilby_pipe

Jacob Golomb requested to merge jacob.golomb/bilby_pipe:get_data_from_kafka into master May 23, 2023

This MR fixes the issue noted in this issue that caused two online PE jobs to fail. Instead of getting strain data from cache (via TimeSeries.get()), the pre-generation step of gracedb.py will now query the /dev/shm/kafka/ directory (which now will store 1 hour of data) for strain data, save the relevant data to the job's rundir, and update the data-dict of the config to point there. This is the intended method for online/low-latency data fetching.

If the data do not exist in the /dev/shm/kafka directory, then it will fall back to TimeSeries.get() to save the data. In case there is an issue grabbing the data (e.g. overloaded queue to the filesystem) it will try 10 times before giving up on writing the data files.

I have tested this in my own directory /home/jacob.golomb/o4_pe/test_online_PE/ by running bilby_pipe_gracedb --gracedb G407830 and confirming the resulting job in /home/jacob.golomb/o4_pe/test_online_PE/outdir_G407830 proceeds as intended.

Edited May 23, 2023 by Jacob Golomb

Get data from kafka

Merge request reports