query and database optimizations
-
Review changes -
-
Download -
Patches
-
Plain diff
I (@alexander.pace) was digging through postgres queries that were taking place when querying for events, and returning the event dictionary (here) using the API.
-
Superevent Index Improvements: When event serializer fetches for nearby superevents (here, it performs a query that looks like this one:
SELECT "superevents_superevent"."id", "superevents_superevent"."submitter_id", "superevents_superevent"."created", "superevents_superevent"."category", "superevents_superevent"."preferred_event_id", "superevents_superevent"."t_start", "superevents_superevent"."t_0", "superevents_superevent"."t_end", "superevents_superevent"."t_0_date", "superevents_superevent"."base_date_number", "superevents_superevent"."base_letter_suffix", "superevents_superevent"."gw_date_number", "superevents_superevent"."gw_letter_suffix", "superevents_superevent"."gw_id", "superevents_superevent"."is_gw", "superevents_superevent"."is_exposed", "superevents_superevent"."time_coinc_far", "superevents_superevent"."space_coinc_far", "superevents_superevent"."em_type", "superevents_superevent"."superevent_id", "superevents_superevent"."default_superevent_id", "events_event"."id", "events_event"."submitter_id", "events_event"."created", "events_event"."group_id", "events_event"."superevent_id", "events_event"."pipeline_preferred_id", "events_event"."pipeline_id", "events_event"."search_id", "events_event"."instruments", "events_event"."nevents", "events_event"."far", "events_event"."likelihood", "events_event"."gpstime", "events_event"."perms", "events_event"."offline", "events_event"."graceid", "events_event"."reporting_latency" FROM "superevents_superevent" INNER JOIN "events_event" ON ("superevents_superevent"."preferred_event_id" = "events_event"."id") WHERE ("superevents_superevent"."category" = 'P' AND "superevents_superevent"."t_0" >= 1269006750.300833 AND "superevents_superevent"."t_0" <= 1269006950.300833) ORDER BY "superevents_superevent"."id" DESC
So it's just filtering by t_0
. When you run the query through EXPLAIN ANALYZE
:
Sort (cost=336.83..336.84 rows=1 width=385) (actual time=2.324..2.330 rows=0 loops=1)
Sort Key: superevents_superevent.id DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.42..336.82 rows=1 width=385) (actual time=2.312..2.315 rows=0 loops=1)
-> Seq Scan on superevents_superevent (cost=0.00..328.38 rows=1 width=144) (actual time=2.308..2.309 rows=0 loops=1)
Filter: ((t_0 >= 1269006750.300833) AND (t_0 <= 1269006950.300833) AND ((category)::text = 'P'::text))
Rows Removed by Filter: 9222
-> Index Scan using events_event_pkey on events_event (cost=0.42..8.44 rows=1 width=241) (never executed)
Index Cond: (id = superevents_superevent.preferred_event_id)
Planning Time: 0.859 ms
Execution Time: 2.463 ms
(11 rows)
You can see that there's never actually an index scan performed against t_0
because it's not actually indexed! This is confirmed by describing the table:
Table "public.superevents_superevent"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+----------------------------------------------------
id | integer | | not null | nextval('superevents_superevent_id_seq'::regclass)
created | timestamp with time zone | | not null |
category | character varying(1) | | not null |
t_start | numeric(16,6) | | not null |
t_0 | numeric(16,6) | | not null |
t_end | numeric(16,6) | | not null |
t_0_date | date | | not null |
base_date_number | bigint | | not null |
base_letter_suffix | character varying(10) | | not null |
gw_date_number | bigint | | |
gw_letter_suffix | character varying(10) | | |
is_gw | boolean | | not null |
is_exposed | boolean | | not null |
preferred_event_id | integer | | not null |
submitter_id | integer | | not null |
time_coinc_far | double precision | | |
em_type | character varying(100) | | |
space_coinc_far | double precision | | |
default_superevent_id | character varying(32) | | |
gw_id | character varying(25) | | |
superevent_id | character varying(32) | | |
Indexes:
"superevents_superevent_pkey" PRIMARY KEY, btree (id)
"public_superevents_superevent_preferred_event_id1_idx" UNIQUE CONSTRAINT, btree (preferred_event_id)
"public_superevents_superevent_t_0_date2_idx" UNIQUE CONSTRAINT, btree (t_0_date, base_date_number, category)
"public_superevents_superevent_t_0_date3_idx" UNIQUE CONSTRAINT, btree (t_0_date, base_letter_suffix, category)
"public_superevents_superevent_t_0_date4_idx" UNIQUE CONSTRAINT, btree (t_0_date, gw_letter_suffix, category)
"public_superevents_superevent_t_0_date5_idx" UNIQUE CONSTRAINT, btree (t_0_date, gw_date_number, category)
"superevents_superevent_gw_id_key" UNIQUE CONSTRAINT, btree (gw_id)
"public_superevents_superevent_submitter_id6_idx" btree (submitter_id)
"superevents_superev_f49162_idx" btree (superevent_id)
"superevents_superevent_gw_id_1c30ece3_like" btree (gw_id varchar_pattern_ops)
Referenced by:
TABLE "events_event" CONSTRAINT "events_event_pipeline_preferred_i_29b0503f_fk_supereven" FOREIGN KEY (pipeline_preferred_id) REFERENCES superevents_superevent(id) DEFERRABLE INITIALLY DEFERRED
So I added an index not only for t_0
, but also for t_start
, t_end
, created
, and gw_id
. gracedb-2.17.0
added it for superevent_id
already, but those should pretty much cover any parameters that users or django would query against. So after migrating, the database contains the following new indexes:
"superevents_superevent_gw_id_key" UNIQUE CONSTRAINT, btree (gw_id)
"superevents_created_9703fb_idx" btree (created)
"superevents_gw_id_5ef90d_idx" btree (gw_id)
"superevents_superev_f49162_idx" btree (superevent_id)
"superevents_t_0_d2d6db_idx" btree (t_0)
"superevents_t_end_6afecc_idx" btree (t_end)
"superevents_t_start_801e07_idx" btree (t_start)
And the resulting EXPLAIN ANALYZE
:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=16.76..16.76 rows=1 width=385) (actual time=0.052..0.058 rows=0 loops=1)
Sort Key: superevents_superevent.id DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.71..16.75 rows=1 width=385) (actual time=0.018..0.022 rows=0 loops=1)
-> Index Scan using superevents_t_0_d2d6db_idx on superevents_superevent (cost=0.29..8.31 rows=1 width=144) (actual time=0.015..0.016 rows=0 loops=1)
Index Cond: ((t_0 >= 1269006750.300833) AND (t_0 <= 1269006950.300833))
Filter: ((category)::text = 'P'::text)
-> Index Scan using events_event_pkey on events_event (cost=0.42..8.44 rows=1 width=241) (never executed)
Index Cond: (id = superevents_superevent.preferred_event_id)
Planning Time: 1.521 ms
Execution Time: 0.143 ms
(11 rows)
-
Event Index Improvements:
I indexed more "frequently searched" parameters for events. It has a pretty negligible effect on querying a single event, but for searches who ware querying against stuff like channel names (anything that involves a character field comparison, jeez), far, snr, etc should see some significant speedup. I did see a speed up of 20% with a simple 'gpstime: 1346850000 .. 9999999999 far < 2.3e-5'
query on gracedb-dev1, but there's still more testing to do.
Summary:
The event query/serialization speedup for a single event is shown below:
Baseline | +Superevent Reindexing | +Event Reindexing |
---|---|---|
127ms | 115ms | 111ms |