Skip to content

Get data from kafka

This MR fixes the issue noted in this issue that caused two online PE jobs to fail. Instead of getting strain data from cache (via TimeSeries.get()), the pre-generation step of gracedb.py will now query the /dev/shm/kafka/ directory (which now will store 1 hour of data) for strain data, save the relevant data to the job's rundir, and update the data-dict of the config to point there. This is the intended method for online/low-latency data fetching.

If the data do not exist in the /dev/shm/kafka directory, then it will fall back to TimeSeries.get() to save the data. In case there is an issue grabbing the data (e.g. overloaded queue to the filesystem) it will try 10 times before giving up on writing the data files.

I have tested this in my own directory /home/jacob.golomb/o4_pe/test_online_PE/ by running bilby_pipe_gracedb --gracedb G407830 and confirming the resulting job in /home/jacob.golomb/o4_pe/test_online_PE/outdir_G407830 proceeds as intended.

Edited by Jacob Golomb

Merge request reports