Removing duplicate label entries from g-events
There is a bug in GraceDB where there is no database-level constraint for uniqueness between Label
s and Event
s. The consequence of this bug is that it exposes a race condition where, if two processes try and add a label to one g-event at the same time, a label can get applied more than once. There is a database-level constraint for superevents, so the bug doesn't exist on that side.
This django management tool scans the database for g-events that have duplicate labels, then removes one of the duplicates and writes a log message that a label was removed.
I ran this successfully on gracedb-dev1
and gracedb-test
already, and I'm using this ticket to document the output on gracedb-playground
ahead of it getting released into production. Note: when i checked most recently on prod, there were only 19 events with duplicate labels (all of which were on spiir
events...).
Doing a --dry-run
on gracedb-playground
:
# python3 manage.py fix-labelling-duplicates --dry-run
Looking for duplicate labels for INJ
--> No duplicates found for label INJ
Looking for duplicate labels for DQV
--> No duplicates found for label DQV
Looking for duplicate labels for LUMIN_GO
--> No duplicates found for label LUMIN_GO
Looking for duplicate labels for LUMIN_NO
--> No duplicates found for label LUMIN_NO
Looking for duplicate labels for SWIFT_GO
--> No duplicates found for label SWIFT_GO
Looking for duplicate labels for SWIFT_NO
--> No duplicates found for label SWIFT_NO
Looking for duplicate labels for EM_READY
--> 416 duplicate labels detected.
Looking for duplicate labels for PE_READY
--> No duplicates found for label PE_READY
Looking for duplicate labels for H1NO
--> No duplicates found for label H1NO
Looking for duplicate labels for H1OK
--> No duplicates found for label H1OK
Looking for duplicate labels for H1OPS
--> No duplicates found for label H1OPS
Looking for duplicate labels for L1NO
--> No duplicates found for label L1NO
Looking for duplicate labels for L1OK
--> No duplicates found for label L1OK
Looking for duplicate labels for L1OPS
--> No duplicates found for label L1OPS
Looking for duplicate labels for ADVREQ
--> No duplicates found for label ADVREQ
Looking for duplicate labels for ADVOK
--> No duplicates found for label ADVOK
Looking for duplicate labels for ADVNO
--> No duplicates found for label ADVNO
Looking for duplicate labels for EM_Throttled
--> No duplicates found for label EM_Throttled
Looking for duplicate labels for EM_Selected
--> No duplicates found for label EM_Selected
Looking for duplicate labels for EM_Superseded
--> No duplicates found for label EM_Superseded
Looking for duplicate labels for EM_COINC
--> 30 duplicate labels detected.
Looking for duplicate labels for GRB_ONLINE
--> No duplicates found for label GRB_ONLINE
Looking for duplicate labels for GRB_OFFLINE
--> No duplicates found for label GRB_OFFLINE
Looking for duplicate labels for EM_SENT
--> No duplicates found for label EM_SENT
Looking for duplicate labels for V1OPS
--> No duplicates found for label V1OPS
Looking for duplicate labels for V1OK
--> No duplicates found for label V1OK
Looking for duplicate labels for V1NO
--> No duplicates found for label V1NO
Looking for duplicate labels for SKYMAP_READY
--> 108 duplicate labels detected.
Looking for duplicate labels for EMBRIGHT_READY
--> 14 duplicate labels detected.
Looking for duplicate labels for PASTRO_READY
--> 597 duplicate labels detected.
Looking for duplicate labels for DQOK
--> No duplicates found for label DQOK
Looking for duplicate labels for GCN_PRELIM_SENT
--> No duplicates found for label GCN_PRELIM_SENT
Looking for duplicate labels for HWINJREQ
--> No duplicates found for label HWINJREQ
Looking for duplicate labels for HWINJOK
--> No duplicates found for label HWINJOK
Looking for duplicate labels for HWINJNO
--> No duplicates found for label HWINJNO
Looking for duplicate labels for RAVEN_ALERT
--> 71 duplicate labels detected.
Looking for duplicate labels for NOT_GRB
--> No duplicates found for label NOT_GRB
Looking for duplicate labels for EXT_SKYMAP_READY
--> No duplicates found for label EXT_SKYMAP_READY
Looking for duplicate labels for 2022_LENSING_MDC
--> No duplicates found for label 2022_LENSING_MDC
Looking for duplicate labels for DQR_REQUEST
--> No duplicates found for label DQR_REQUEST
Looking for duplicate labels for COMBINEDSKYMAP_READY
--> 12 duplicate labels detected.
Looking for duplicate labels for SOG_READY
--> No duplicates found for label SOG_READY
Looking for duplicate labels for EM_SelectedConfident
--> No duplicates found for label EM_SelectedConfident
Looking for duplicate labels for HIGH_PROFILE
--> No duplicates found for label HIGH_PROFILE
Looking for duplicate labels for LLAMA_COMPLETE
--> No duplicates found for label LLAMA_COMPLETE
Looking for duplicate labels for LOW_SIGNIF_PRELIM_SENT
--> No duplicates found for label LOW_SIGNIF_PRELIM_SENT
Looking for duplicate labels for SIGNIF_LOCKED
--> No duplicates found for label SIGNIF_LOCKED
Looking for duplicate labels for LOW_SIGNIF_LOCKED
--> No duplicates found for label LOW_SIGNIF_LOCKED
Looking for duplicate labels for EARLY_WARNING
--> No duplicates found for label EARLY_WARNING
Looking for duplicate labels for cWB_r
--> No duplicates found for label cWB_r
Looking for duplicate labels for cWB_s
--> No duplicates found for label cWB_s
Looking for duplicate labels for MOCK
--> No duplicates found for label MOCK
Looking for duplicate labels for SUBSOLAR_MASS
--> No duplicates found for label SUBSOLAR_MASS
Looking for duplicate labels for SNR_OPTIMIZED
--> No duplicates found for label SNR_OPTIMIZED
Looking for duplicate labels for LENSED_CANDIDATE
--> No duplicates found for label LENSED_CANDIDATE
There are hundreds of duplicates. On test I noticed that the majority was from MDC "first two years" superevents (where gwcelery is uploading 16 events simultaneously and doing annotations). After running the script for real, we can examine one of the MDC superevents (https://gracedb-playground.ligo.org/events/M1729409/view/) and see that it had three copies of EM_READY
Running the script a subsequent time finds no duplicates, and after running the uniqueness migration, users should get a 400 error when adding a duplicate label.