Speed up reading and writing of coinc files

added to epic &14

I performed a GraceDB event search for CBC created: 2024-11-01 .. 2024-11-02 (coincinspiral.mass < 10) to find a sample of 111 low-mass events.

I used the following script to profile the coinc.xml input stage of BAYESTAR:

#!/usr/bin/env python
import io
from ligo.lw.utils import load_fileobj
from ligo.skymap.io import events
import pathlib
from pyinstrument import Profiler

coincs = [path.read_bytes() for path in pathlib.Path(".").glob("*.xml")]

profiler = Profiler()
profiler.start()
for coinc_psd in coincs:
    doc = load_fileobj(
        io.BytesIO(coinc_psd), contenthandler=events.ligolw.ContentHandler
    )
    event_source = events.ligolw.open(doc, psd_file=doc, coinc_def=None)
    (event,) = event_source.values()
    series = [(sngl.snr_series, sngl.psd) for sngl in event.singles]
profiler.stop()
profiler.write_html("profile.html")

Here is the profile output: profile.html

This is the hot line of code: https://git.ligo.org/kipp/python-ligo-lw/-/blob/1.8.3/ligo/lw/array.py?ref_type=tags#L106

It's spending most of its time in C code. Profiling the C code with perf reveals that most of the time is spent in wcstod, a C standard library function to convert a wide char array to a double.

I recalled that since Python 3.3, Python has started to be more flexible and memory-efficient regarding its internal representation of strings. Using PyUnicode_KIND, I confirmed that in this inner loop Python is representing the strings to be parsed using UTF-8, which can usually be safely handled by strtod.

I reasoned that strtod is probably a lot faster than wcstod, and so optimizing python-ligo-lw for ASCII characters might help. To test that idea, I wrote this little benchmark C program: test.c

Indeed, on my Mac, I get the following output:

Called strtod 100000 times 0.000233 seconds - sum was 352431
Called wcstod 100000 times 0.733835 seconds - sum was 352431
strtod is 3149.51 times faster than wcstod

However, on one of our computing clusters's x86_64 hosts, I get the following:

Called strtod 100000 times 0.00100611 seconds - sum was 1.67891e+06
Called wcstod 100000 times 0.00101183 seconds - sum was 1.67891e+06
strtod is 1.00569 times faster than wcstod

So the idea of re-writing the C bits of python-ligo-lw to use UTF-8, and replacing wcstod with strtod, is ruled out.

I dug up LIGO-T990023 which documents the LIGO-LW format. The format supports encoding float and double arrays as little-endian or big-endian byte strings, encoded to UTF-8. It is not implemented in python-ligo-lw. Initial benchmarks of this approach, though, look very promising.

mentioned in commit leo-singer/python-ligo-lw@e5495cd1

mentioned in merge request kipp/python-ligo-lw!38 (closed)

mentioned in commit leo-singer/python-ligo-lw@dd72a580

mentioned in commit leo-singer/python-ligo-lw@8d7c1577

mentioned in commit leo-singer/python-ligo-lw@68f7342f

mentioned in commit leo-singer/python-ligo-lw@5fc6872f

mentioned in commit leo-singer/python-ligo-lw@d0a0cd56

mentioned in commit leo-singer/python-ligo-lw@71b2361b

Update: kipp/python-ligo-lw!38 (closed) is a draft implementation of base64-encoding of arrays in coinc.xml files.

I benchmarked it on the sample that I described at the top of this issue on the host ldas-pcdev2.ligo.caltech.edu. Here are the results:

-	Old format	New format
Time to read 111 files	14.3 s	8.64 s
Time to write 111 files	2.13 s	0.423 s

During the process of creating an alert, these files are written once (by the pipeline) and read twice (once by GraceDB, once by BAYESTAR). So the expected speedup per event is (2 * (14.3 - 2.13) + 8.64 - 0.423) / 111 = 0.29 s.

The coinc.xml file sizes also decrease by an average of 38%.

Update: there is significant variation in runtime per event. Here are the slowest events:

-	Old format	New format
Time to read (worst case)	0.548 s	0.039 s
Time to write (worst case)	0.402 s	0.017 s

The expected speedup for the worst events is 2 * (0.548 - 0.039) + 0.402 - 0.017 = 1.403 s.

mentioned in commit leo-singer/python-ligo-lw@b783440a

Here's more benchmark data (from my M3 Mac) for a larger sample of GraceDB uploads.

mentioned in commit leo-singer/igwn-ligolw@77d94e77

mentioned in merge request computing/software/igwn-ligolw!21

mentioned in commit leo-singer/igwn-ligolw@8377e630

mentioned in commit leo-singer/igwn-ligolw@09d3d8dd

Speed up reading and writing of coinc files

Child items ...

Activity