Skip to content

Coinc file ingestion improvements

Alexander Pace requested to merge coinc-upload-improvements into master

Bunch of stuff in this to address:

  • "Coinc Table Created" log messages pointed to the wrong version of coinc.xml, when the initial upload was called coinc.xml
  • Why is the entire upload being copied to an identical file (coinc.xml) instead of just the coinc table? (see: https://git.ligo.org/computing/gracedb/server/-/blame/master/gracedb/events/translator.py#L135-138)
  • Implement PartialLIGOLWContentHandler with datatype switching (int8 <--> ilwd:char) to read in just the coinc_inspiral and sngl_inspiral tables need for parsing and populating the database. Note: testing this out showed that reading the entire ligolw xml file, with the psd included, added an extra 700+ms before the event upload alert would go out.
  • Assuming the coinc.xml is to be the whole file and not just the coinc table, use a more efficient copy than reading it with ligolw.

EDIT:

I tested this in isolation outside of GraceDB to test how quickly it will open and read some sample xml files. A part of this test is that the two contenthandlers (FlexibleLIGOLWContentHandler and the new GraceDBFlexibleContentHandler) spit out the same data from the coinc_inspiral, coinc_event, and sngl_inspiral tables. Results are below:

FlexibleLIGOLWContentHandler GraceDBFlexibleContentHandler
int8, no psd 0.04317s 0.02675s
int8, psd 0.4768s 0.05307s
ilwd:char, no psd 0.0496s 0.03960s
ilwd:char, psd 0.4739s 0.05168s

Also, the old ligolw parser really choked on MBTA uploads, which are around twice as big as gstlal (O(1.5M) vs O(800K)). Here's the event log for a sample MBTA upload with the old contenthandler:

Screen_Shot_2023-01-24_at_2.55.06_PM

It took over five seconds to parse the initial upload! Also bear in mind that the new event alert gets sent out at the end of this process, so there can be seconds of latency added to every event follow-up because gracedb is spinning its gears churning through ligolw files. Changing to the new contenthandler on the same machine resulted in the following log:

Screen_Shot_2023-01-24_at_3.09.43_PM

Edited by Alexander Pace

Merge request reports

Loading