Expanded API calls for analytics
From an email chain with @andrew.toivonen, @michael-coughlin, @sushant.sharma-chaudhary:
Alex,
Following up on your email, we had a discussion as a group about what GraceDB API changes could be useful.
For some context, these are the scripts (and what they fetch) that we have used in the past to fetch from GraceDB/GraceDB Playground:
Playground:
All MDC events (from a range of gpstimes): https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/events_from_gracedb.py
MDC Skymaps: https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/fetch_skymaps.py
MDC Posterior Samples (from a range of gpstimes): https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/fetch_all_PE.py
GraceDB
All data products from a superevent: https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/fetch_superevent.py
Posterior Samples from a single event: https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/fetch_PE.py
GCN latencies: https://git.ligo.org/emfollow/em-properties/mdc-analytics/-/blob/main/fetch_data/fetch_O4_gcn.py
First off, if you feel any of these scripts are poorly optimized feel free to let us know. This brings me to my next thought, we know that bulk fetching from Playground for the MDC is very resource intensive and has caused issues in the past. I however think there will always be a need for bulk fetching when it comes to the MDC, simply due to the nature of the study and the numerous triggers. Part of the strain was also caused due to the fact that we did not fetch in an optimized manner (and maybe our method could be optimized event further), so one possible addition to the API would be adding a call to fetch a table of all event quantities as we did, yet done how you would optimize such a query. The same could be said for event data products, such as PE and skymaps. We were maybe wondering if there was a way to add a call that would simply download a file, without having to save it or a list of files as an object?
As for fetching from GraceDB, I think in general our studies will be focused on specific or a small subset of events. What could be most useful would be a call to download the latest skymap or latest posterior samples for a given event. Finally, I know latency was added to the GraceDB page, how is that latency defined? And is there an easy way to fetch that value? Fetching all the latencies for a range of gpstimes or just the entire observing run would be useful as well. Maybe it would also be good to include the ability to fetch all superevents, or just significant ones.
These were our initial thoughts without a great idea of which of these are most easily implemented and would make a difference.
Let us know what you think,
Andrew