Coinc file ingestion improvements
Bunch of stuff in this to address:
-
"Coinc Table Created" log messages pointed to the wrong version of coinc.xml
, when the initial upload was calledcoinc.xml
-
Why is the entire upload being copied to an identical file ( coinc.xml
) instead of just the coinc table? (see: https://git.ligo.org/computing/gracedb/server/-/blame/master/gracedb/events/translator.py#L135-138) -
Implement PartialLIGOLWContentHandler
with datatype switching (int8 <--> ilwd:char) to read in just thecoinc_inspiral
andsngl_inspiral
tables need for parsing and populating the database. Note: testing this out showed that reading the entire ligolw xml file, with the psd included, added an extra 700+ms before the event upload alert would go out. -
Assuming the coinc.xml
is to be the whole file and not just the coinc table, use a more efficient copy than reading it with ligolw.
EDIT:
I tested this in isolation outside of GraceDB to test how quickly it will open and read some sample xml files. A part of this test is that the two contenthandlers (FlexibleLIGOLWContentHandler
and the new GraceDBFlexibleContentHandler
) spit out the same data from the coinc_inspiral
, coinc_event
, and sngl_inspiral
tables. Results are below:
FlexibleLIGOLWContentHandler |
GraceDBFlexibleContentHandler |
|
---|---|---|
int8, no psd | 0.04317s | 0.02675s |
int8, psd | 0.4768s | 0.05307s |
ilwd:char, no psd | 0.0496s | 0.03960s |
ilwd:char, psd | 0.4739s | 0.05168s |
Also, the old ligolw parser really choked on MBTA uploads, which are around twice as big as gstlal (O(1.5M) vs O(800K)). Here's the event log for a sample MBTA upload with the old contenthandler:
It took over five seconds to parse the initial upload! Also bear in mind that the new event alert gets sent out at the end of this process, so there can be seconds of latency added to every event follow-up because gracedb is spinning its gears churning through ligolw files. Changing to the new contenthandler on the same machine resulted in the following log: