figure out why event queries are so convoluted

there's something going on with how gracedb handles event searches, in particular when there are bulk searches with lots of results. so, for example if a user searches for all events for a given pipeline during an MDC period.

Example: There's this line that gets called when a user does an event query. GraceDB by default returns event results in batches of 10, and so in addition to pulling results from the database, it does that count() every time it collects a batch of 10 events.

That count() for a sample query gets translated into the following SQL:

SELECT COUNT(*) FROM (SELECT DISTINCT "events_event"."id" AS Col1, "events_event"."submitter_id" AS Col2, "events_event"."created" AS Col3, "events_event"."group_id" AS Col4, "events_event"."superevent_id" AS Col5, "events_event"."pipeline_preferred_id" AS Col6, "events_event"."pipeline_id" AS Col7, "events_event"."search_id" AS Col8, "events_event"."instruments" AS Col9, "events_event"."nevents" AS Col10, "events_event"."far" AS Col11, "events_event"."likelihood" AS Col12, "events_event"."gpstime" AS Col13, "events_event"."perms" AS Col14, "events_event"."offline" AS Col15, "events_event"."graceid" AS Col16, "events_event"."reporting_latency" AS Col17 FROM "events_event" INNER JOIN "events_group" ON ("events_event"."group_id" = "events_group"."id") INNER JOIN "events_pipeline" ON ("events_event"."pipeline_id" = "events_pipeline"."id") LEFT OUTER JOIN "events_search" ON ("events_event"."search_id" = "events_search"."id") WHERE ("events_group"."name" IN ('CBC') AND NOT ("events_group"."name" = 'Test') AND "events_pipeline"."name" IN ('pycbc') AND NOT ("events_search"."name" = 'MDC' AND "events_search"."name" IS NOT NULL) AND ("events_event"."id" IN (SELECT CAST(U0."object_pk" AS bigint) AS "obj_pk" FROM "guardian_userobjectpermission" U0 INNER JOIN "auth_permission" U2 ON (U0."permission_id" = U2."id") WHERE (U0."user_id" = 3901 AND U2."content_type_id" = 3 AND U2."codename" IN ('view_event'))) OR "events_event"."id" IN (SELECT CAST(U0."object_pk" AS bigint) AS "obj_pk" FROM "guardian_groupobjectpermission" U0 INNER JOIN "auth_group" U1 ON (U0."group_id" = U1."id") INNER JOIN "auth_user_groups" U2 ON (U1."id" = U2."group_id") INNER JOIN "auth_permission" U4 ON (U0."permission_id" = U4."id") WHERE (U2."user_id" = 3901 AND U4."codename" IN ('view_event') AND U4."content_type_id" = 3))))) subquery

Which on gracedb-playground, takes 1682.343ms to do, which is way long to begin with. Further, since it's doing it once for every 10 events, in this scenario where were were 80,000 events in the query, that's 80,000/10 = 8000 counts, and at 1.7 seconds per, that's like 13,600 seconds where the database is needless work and the user is just sitting there. Crazy.

So, I would start by:

Figure out why the ORM is turning a simple query (ref https://git.ligo.org/computing/gracedb/server/-/blob/8dcbbbfeff28ad195b8bf6128aec726d971ef227/gracedb/api/v1/events/views.py#L404) into that that monstrosity. I've attached as a file an example of what it looks like. D26C0A10006C1BF220AA6B90D05B0611391D9431.txt
Figure out why that count() takes so long
Reverse engineer the query response to see if we can move that count() outside of the iteration loop so it only does it once, stores the value, and then loops over the batches of 10 events.

I'm hoping that reverse engineering the count() will elucidate why the event query ends up being so taxing to the database.

Edited Feb 07, 2023 by Alexander Pace

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information