GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2023-07-28T19:19:26Zhttps://git.ligo.org/computing/gracedb/server/-/issues/323Consider increasing the configuration parameter "max_wal_size".2023-07-28T19:19:26ZAlexander PaceConsider increasing the configuration parameter "max_wal_size".There were some timeouts on `gracedb-playground` this afternoon (2023-07-23) from around 18:40-18:43ish UTC that I think were triggered in some part by a `VACUUM FULL` when i was doing some exploratory maintenance on playground's db. Dur...There were some timeouts on `gracedb-playground` this afternoon (2023-07-23) from around 18:40-18:43ish UTC that I think were triggered in some part by a `VACUUM FULL` when i was doing some exploratory maintenance on playground's db. During the period in question there were the following lines in `gracedb-playground`'s RDS logs:
```
2023-07-28 18:35:50 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:36:12 UTC::@:[393]:LOG: checkpoint complete: wrote 39902 buffers (16.5%); 0 WAL file(s) added, 0 removed, 16 recycled; write=20.183 s, sync=1.326 s, total=21.691 s; sync files=211, longest=1.323 s, average=0.007 s; distance=1048579 kB, estimate=1048579 kB
2023-07-28 18:36:13 UTC::@:[393]:LOG: checkpoints are occurring too frequently (23 seconds apart)
2023-07-28 18:36:13 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:36:13 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:36:39 UTC::@:[393]:LOG: checkpoint complete: wrote 231 buffers (0.1%); 0 WAL file(s) added, 0 removed, 13 recycled; write=25.661 s, sync=0.420 s, total=26.123 s; sync files=112, longest=0.399 s, average=0.004 s; distance=1048586 kB, estimate=1048586 kB
2023-07-28 18:36:49 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:14 UTC::@:[393]:LOG: checkpoint complete: wrote 2019 buffers (0.8%); 0 WAL file(s) added, 2 removed, 17 recycled; write=24.321 s, sync=0.191 s, total=25.505 s; sync files=138, longest=0.190 s, average=0.002 s; distance=1049475 kB, estimate=1049475 kB
2023-07-28 18:37:17 UTC::@:[393]:LOG: checkpoints are occurring too frequently (28 seconds apart)
2023-07-28 18:37:17 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:37:17 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:24 UTC::@:[393]:LOG: checkpoint complete: wrote 69 buffers (0.0%); 0 WAL file(s) added, 0 removed, 10 recycled; write=6.996 s, sync=0.342 s, total=7.539 s; sync files=34, longest=0.342 s, average=0.011 s; distance=1065103 kB, estimate=1065103 kB
2023-07-28 18:37:30 UTC::@:[393]:LOG: checkpoints are occurring too frequently (13 seconds apart)
2023-07-28 18:37:30 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:37:30 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:33 UTC::@:[393]:LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 9 recycled; write=0.480 s, sync=0.190 s, total=2.933 s; sync files=4, longest=0.190 s, average=0.048 s; distance=1056458 kB, estimate=1064239 kB
2023-07-28 18:38:33 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:38:49 UTC::@:[393]:LOG: checkpoint complete: wrote 171 buffers (0.1%); 0 WAL file(s) added, 0 removed, 19 recycled; write=15.533 s, sync=0.120 s, total=16.420 s; sync files=89, longest=0.120 s, average=0.002 s; distance=1034294 kB, estimate=1061244 kB
2023-07-28 18:39:19 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:39:36 UTC::@:[393]:LOG: checkpoint complete: wrote 171 buffers (0.1%); 0 WAL file(s) added, 0 removed, 14 recycled; write=17.051 s, sync=0.006 s, total=17.104 s; sync files=94, longest=0.006 s, average=0.001 s; distance=1063328 kB, estimate=1063328 kB
2023-07-28 18:40:59 UTC::@:[393]:LOG: checkpoint complete: wrote 517 buffers (0.2%); 0 WAL file(s) added, 11 removed, 17 recycled; write=28.949 s, sync=0.112 s, total=29.842 s; sync files=181, longest=0.111 s, average=0.001 s; distance=1040638 kB, estimate=1061059 kB
2023-07-28 18:41:00 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:41:11 UTC::@:[393]:LOG: checkpoint complete: wrote 118 buffers (0.0%); 0 WAL file(s) added, 0 removed, 14 recycled; write=10.732 s, sync=0.280 s, total=11.601 s; sync files=47, longest=0.280 s, average=0.006 s; distance=1084223 kB, estimate=1084223 kB
2023-07-28 18:41:14 UTC::@:[393]:LOG: checkpoints are occurring too frequently (14 seconds apart)
2023-07-28 18:41:14 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:41:14 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:41:16 UTC::@:[393]:LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 5 recycled; write=1.227 s, sync=0.054 s, total=2.786 s; sync files=2, longest=0.054 s, average=0.027 s; distance=1037553 kB, estimate=1079556 kB
2023-07-28 18:42:12 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:42:16 UTC::@:[393]:LOG: checkpoint complete: wrote 34 buffers (0.0%); 0 WAL file(s) added, 0 removed, 18 recycled; write=3.448 s, sync=0.090 s, total=3.948 s; sync files=22, longest=0.090 s, average=0.005 s; distance=1012093 kB, estimate=1072810 kB
2023-07-28 18:43:39 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:43:41 UTC::@:[393]:LOG: checkpoint complete: wrote 11 buffers (0.0%); 0 WAL file(s) added, 0 removed, 16 recycled; write=1.116 s, sync=0.181 s, total=2.198 s; sync files=8, longest=0.181 s, average=0.023 s; distance=1103069 kB, estimate=1103069 kB
```
This also occurred during a period of high relational load in the database:
![Screen_Shot_2023-07-28_at_3.13.56_PM](/uploads/95a62730a64f5a8d0c75d39d8c809705/Screen_Shot_2023-07-28_at_3.13.56_PM.png)
I haven't seen these hints and warnings on production, even when the database gets `VACUUM`'ed, so hopefully chalk it up to another example of playground's growing pains. Either way, consider some of the recommendations that the internet has to offer:
* https://www.crunchydata.com/blog/tuning-your-postgres-database-for-high-write-loads
* https://www.enterprisedb.com/blog/tuning-maxwalsize-postgresql
* https://stackoverflow.com/questions/75134262/why-do-i-have-the-message-max-wal-size-suddenly-appearing-in-my-postgres-logs
And once those parameters are tuned and validations in the `gracedb-postgresql-dev` parameter group, apply it to production.https://git.ligo.org/computing/gracedb/server/-/issues/322apache returns 502 (bad gateway) instead of 403 unauthorized2023-07-19T14:49:59ZAlexander Paceapache returns 502 (bad gateway) instead of 403 unauthorizedHere's the scenario: a client attempts to upload an event (`POST /api/events`) without a proper cert or valid auth. Gunicorn properly returns a `403 Unauthorized` error, but when it gets sent back to apache and then to the client, it get...Here's the scenario: a client attempts to upload an event (`POST /api/events`) without a proper cert or valid auth. Gunicorn properly returns a `403 Unauthorized` error, but when it gets sent back to apache and then to the client, it gets turned into a `502 Bad Gateway` error. For instance, this happened on CIT early this morning. Here's an example of the gracedb log line with the 502 and the two lines before it.
```
Jul 18 09:40:12 : DJANGO | 2023-07-18 09:40:12.610 | 9e6074462859 | 10.0.1.42 | performance | INFO | middleware.py, line 58 | create: 403:
Jul 18 09:40:12 : GUNICORN | 131.215.113.168 - - [18/Jul/2023:09:40:12 +0000] "POST /api/events/ HTTP/1.1" 403 58 "-" "gracedb-client/2.10.0"
Jul 18 09:40:12 : APACHE | 10.0.1.35 - - [18/Jul/2023:09:40:12 +0000] "POST /api/events/ HTTP/1.1" 502 315 "-" "gracedb-client/2.10.0"
```
The `DJANGO` performance middleware recognizes it as a `403`, `GUNICORN` says it's a `403`, `APACHE` says `502`.
What's going to happen in this scenario is, a user will see `502` in their error logs, when the issue isn't with GraceDB, per se, but rather it's returning a catch-all error instead of the proper `403`. Manual intervention by looking in the gracedb error logs is required to get the user the correct information.
I think i remember seeing this before, and the issue was a parameter in apache that controlled the maximum size of a unauthorized request... and since `POST` requests to create new events are ~O(1Mb), they exceed this value and so when gunicorn says the request is unauthorized, apache will return the too large bad gateway error.
This isn't a showstopper, but more of a fix for dev sanity. As in, the complaint will be "we got another 502 when creating an event, gracedb is broken", but the real issue was the user wasn't authorized.https://git.ligo.org/computing/gracedb/server/-/issues/321documentation for cwb_r and cwb_s2023-07-28T18:28:52ZAlexander Pacedocumentation for cwb_r and cwb_s@roberto.depietri @marek.szczepanczyk
There is essentially [no documentation](https://git.ligo.org/computing/gracedb/server/-/blob/master/docs/user_docs/source/labels.rst?plain=1#L23-27) for what the `cWB_r` and `cWB_s` actually represe...@roberto.depietri @marek.szczepanczyk
There is essentially [no documentation](https://git.ligo.org/computing/gracedb/server/-/blob/master/docs/user_docs/source/labels.rst?plain=1#L23-27) for what the `cWB_r` and `cWB_s` actually represent.
Could you please provide one concise sentence for each label, so that can go in gracedb's documentation and toolip?https://git.ligo.org/computing/gracedb/server/-/issues/320Phone number not recognised2023-07-19T10:51:36ZDaniela PascucciPhone number not recognisedWhen I submit my mobile number in the contact list I get the error "Not a valid phone number".
I have a Belgian phone number and I use the format "+32*********".
It might be that that the problem is the country code. Just as a check, I ...When I submit my mobile number in the contact list I get the error "Not a valid phone number".
I have a Belgian phone number and I use the format "+32*********".
It might be that that the problem is the country code. Just as a check, I tried to change it from +32 to +33 (keeping the same number) and in that case there was no error.https://git.ligo.org/computing/gracedb/server/-/issues/319Fermi, Swift external triggers not being received from IGWN alerts2023-07-10T16:33:09ZKeith ThorneFermi, Swift external triggers not being received from IGWN alerts## Description of problem
Both Livingston and Hanford control-rooms are using the the igwn-alerts through scimma to get alerts for IFO stand down, etc.
We are getting events from gracedb.superevent and gracedb.test_snews. But we are no...## Description of problem
Both Livingston and Hanford control-rooms are using the the igwn-alerts through scimma to get alerts for IFO stand down, etc.
We are getting events from gracedb.superevent and gracedb.test_snews. But we are not receiving any Fermi and Swift events (from gracedb.external_fermi, gracedb.external_swift streams)
## Expected behavior
Fermi and Swift events that are retrieved by polling the GraceDB database are not seen as events from igwn-alert
## Context/environment
Using igwn-alert conda environment with scimma credentialshttps://git.ligo.org/computing/gracedb/server/-/issues/318Change public page default view to hide insignificant by default2023-07-18T15:49:20ZAlexander PaceChange public page default view to hide insignificant by defaultThe public page absolutely crawls right now because there are too many insignificant events. Changing the caching policy (https://git.ligo.org/computing/gracedb/server/-/merge_requests/150) will help, but I am also going to propose the f...The public page absolutely crawls right now because there are too many insignificant events. Changing the caching policy (https://git.ligo.org/computing/gracedb/server/-/merge_requests/150) will help, but I am also going to propose the following:
1) Keep the wording and table structure the same
2) Hide insignificant events by default
3) Change the backend behavior so that the "show significant events only" button triggers a new database transaction, instead of loading everything at once and then hiding the html elements.
Thoughts on this, @keita.kawabe?https://git.ligo.org/computing/gracedb/server/-/issues/317Document meaning of "coherence"2023-07-04T14:53:10ZJacopo TissinoDocument meaning of "coherence"I was looking at the coherence report in the superevent page; as far as I can tell there is no reference to somewhere this quantity is defined; I've looked in the gracedb page, the gracedb docs, the ligo.skymap docs.
The closest thing I ...I was looking at the coherence report in the superevent page; as far as I can tell there is no reference to somewhere this quantity is defined; I've looked in the gracedb page, the gracedb docs, the ligo.skymap docs.
The closest thing I found was in the docs for [`ligo-skymap-stats`](https://lscsoft.docs.ligo.org/ligo.skymap/tool/ligo_skymap_stats.html),
where however it's only stated that `log_bci` is the "natural log Bayes factor, coherent vs. incoherent".
From the perspective of a new user, there seems to be no way to find out what "coherence" means in this context besides asking someone.
Therefore, I think it would be useful to have somewhere a reference to section V.C in [Veitch and Vecchio 2010](http://arxiv.org/abs/0911.3820) or a page with a summarized discussion of its contents.
I'm opening this issue here since I think that the GraceDB page is the obvious candidate for where to include this information, but feel free to redirect me if this is not the case.https://git.ligo.org/computing/gracedb/server/-/issues/316actually disable error emails for 429 rate limits2023-06-28T19:46:56ZAlexander Paceactually disable error emails for 429 rate limitshttps://git.ligo.org/computing/gracedb/server/-/issues/315Proposal to enable SNR threshold setting for phone and email alerts.2023-08-30T07:06:12ZTakahiro SawadaProposal to enable SNR threshold setting for phone and email alerts.The GraceDB web page has the function to set FAR threshold for phone and email alerts. It would be helpful to be able to set SNR threshold as well.The GraceDB web page has the function to set FAR threshold for phone and email alerts. It would be helpful to be able to set SNR threshold as well.https://git.ligo.org/computing/gracedb/server/-/issues/314Typo in "preferred event information" panel2023-06-21T14:35:04ZJacopo TissinoTypo in "preferred event information" panelThere is a typo in the "Preferred Event Information" panel: "Chirp" is misspelled as "Chrip".
![Screenshot_from_2023-06-21_16-11-02](/uploads/e68aceb59fe21daae4cc4546be9cf3ac/Screenshot_from_2023-06-21_16-11-02.png)
I hope this is the ...There is a typo in the "Preferred Event Information" panel: "Chirp" is misspelled as "Chrip".
![Screenshot_from_2023-06-21_16-11-02](/uploads/e68aceb59fe21daae4cc4546be9cf3ac/Screenshot_from_2023-06-21_16-11-02.png)
I hope this is the right repository to make this issue, apologies otherwise.https://git.ligo.org/computing/gracedb/server/-/issues/313Include p-astro in super event and/or g-event tables2023-06-13T15:35:56ZRyan MageeInclude p-astro in super event and/or g-event tables## Description of feature request
Change the web interface to include p-astro values for the super-event and g-event tables. This would make it easier to find those values rather than clicking through to the json / scrolling for the ima...## Description of feature request
Change the web interface to include p-astro values for the super-event and g-event tables. This would make it easier to find those values rather than clicking through to the json / scrolling for the image. This is also already implemented in the upcoming per-pipeline table, so hopefully this is a relatively small change.
## Use cases
Facilitates quick checking of p-astro when events come in.
## Benefits
## Drawbacks
None
## Suggested solutionshttps://git.ligo.org/computing/gracedb/server/-/issues/312Add time format popup menu for t_start, t_0 and t_end in superevent page2023-06-09T11:02:01ZTito Dal CantonAdd time format popup menu for t_start, t_0 and t_end in superevent pageThe "Superevent Information" table on superevent pages lists t_start, t_0 and t_end as GPS times only, while the "Submitted" time has a nice popup menu where one can choose different time formats. It would be useful to have this menu for...The "Superevent Information" table on superevent pages lists t_start, t_0 and t_end as GPS times only, while the "Submitted" time has a nice popup menu where one can choose different time formats. It would be useful to have this menu for the other times as well (especially for t_0).https://git.ligo.org/computing/gracedb/server/-/issues/311There is no migration that create EARLYWARNING label2023-06-23T17:43:50ZRoberto DePietriThere is no migration that create EARLYWARNING labelA restart of gracedb from an empty database does not create the EarlyWarning label.A restart of gracedb from an empty database does not create the EarlyWarning label.https://git.ligo.org/computing/gracedb/server/-/issues/309burst superevent skymaps on the public alerts page2023-06-02T15:04:31ZAlexander Paceburst superevent skymaps on the public alerts page@roberto.depietri: for the case of burst events (olib, cwb), is the skymap image always going to be `{olib, cwb}.png` AND tagged as `sky_loc`?
And is that the nomenclature for O3, ER15, and O4?@roberto.depietri: for the case of burst events (olib, cwb), is the skymap image always going to be `{olib, cwb}.png` AND tagged as `sky_loc`?
And is that the nomenclature for O3, ER15, and O4?publish the new public alerts pageAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/308Restrict source probability column to pSource on run summary page2023-06-02T18:07:27ZThomas DentRestrict source probability column to pSource on run summary pageCurrently the O4 public event display https://gracedb.ligo.org/superevents/public/#O4 includes 'HasMassGap' as a source probability if nonzero. However, there is no longer a 'MassGap' source class or category, and 'HasMassGap' is a poss...Currently the O4 public event display https://gracedb.ligo.org/superevents/public/#O4 includes 'HasMassGap' as a source probability if nonzero. However, there is no longer a 'MassGap' source class or category, and 'HasMassGap' is a possible _property_ of a source if the source has one or more BH components. The calculation for HasMassGap is quite different from the previous O3 p(MG) calculation, and if HasMassGap is >0 then the total probability will add up to more than 100% including the current source types/classes (BNS, NSBH, BBH, Terr), since events in the NSBH and BBH classes can have MassGap components.
Request is to show only the pBNS, pNSBH, pBBH and pTerr probabilities, which together now add up to 100%, in the 'possible source (probability)' column.publish the new public alerts pageAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/307Add ability to disable alerts from (pipeline, searches) combinations2023-05-23T16:20:45ZTito Dal CantonAdd ability to disable alerts from (pipeline, searches) combinationsFollowup from the semi-regular RRT call of Tuesday May 23.
It seems we currently have the ability to disable alerts from individual pipelines, but not from (pipeline, search) combinations. I would like to request the latter ability as w...Followup from the semi-regular RRT call of Tuesday May 23.
It seems we currently have the ability to disable alerts from individual pipelines, but not from (pipeline, search) combinations. I would like to request the latter ability as well. The use case is that we could in principle have problems from e.g. PyCBC Live early-warning, but not from PyCBC Live full-bandwidth.O4https://git.ligo.org/computing/gracedb/server/-/issues/306Adding robot SciToken support2023-09-06T05:38:29ZDuncan MeacherAdding robot SciToken supportI will now be working on adding support for robot SciTokens within GraceDB. My understanding of how to do this is to modify the [update_user_accounts_from_ligo_ldap.py](https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/...I will now be working on adding support for robot SciTokens within GraceDB. My understanding of how to do this is to modify the [update_user_accounts_from_ligo_ldap.py](https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/ligoauth/management/commands/update_user_accounts_from_ligo_ldap.py) management tool to scan the ldap for new robot scitoken accounts, create/modify accounts as needed, and then apply the per-pipeline permissions to those accounts as needed.
@satyanarayan.raypitambarmohapatra, @warren-anderson, what is the status of robot accounts within the ldap? Its been a while since I've looked at them, but my understanding is that they each have an eppn that will link a robot scitoken to an ldap account?
Including @duncanmmacleod in this issue.Duncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/305Detector State for active instruments2023-05-22T18:18:50ZBrian O'ReillyDetector State for active instruments![Screenshot_2023-05-22_at_12.30.37_PM](/uploads/081e7473df43e2433e3ab08e6dc937fc/Screenshot_2023-05-22_at_12.30.37_PM.png)
The "Overall state of all detectors" being shown as "bad" has already led to some questions. This line should be...![Screenshot_2023-05-22_at_12.30.37_PM](/uploads/081e7473df43e2433e3ab08e6dc937fc/Screenshot_2023-05-22_at_12.30.37_PM.png)
The "Overall state of all detectors" being shown as "bad" has already led to some questions. This line should be removed or at least the wording should be changed, especially given that it will just be L1 and H1 for a few months.https://git.ligo.org/computing/gracedb/server/-/issues/304(minor) Pipelines tab in gracedb still mentions O32023-05-23T14:53:53ZViola Sordini(minor) Pipelines tab in gracedb still mentions O3The Pipelines tab of the gracedb page for logged in users is still quoting O3 in "The EM advocate schedule for O3 is here" - although the link to the roster is correctly pointing to the RRT google sheet (modulo the discussions about usin...The Pipelines tab of the gracedb page for logged in users is still quoting O3 in "The EM advocate schedule for O3 is here" - although the link to the roster is correctly pointing to the RRT google sheet (modulo the discussions about using google docs). I realise this is really minor, but I noticed it so I thought I would report it.https://git.ligo.org/computing/gracedb/server/-/issues/303Make it easier to distinguish significant from low-significance alerts on the...2023-06-02T15:04:30ZViola SordiniMake it easier to distinguish significant from low-significance alerts on the gracedb public page## Description of feature request
I was wondering if it would be possible and interesting to make it easier visually to distinguish significant from low-significance alerts on the gracedb public alerts page.
## Use cases
It can be us...## Description of feature request
I was wondering if it would be possible and interesting to make it easier visually to distinguish significant from low-significance alerts on the gracedb public alerts page.
## Use cases
It can be useful to easily determine which alerts are significant, especially given that the page will be crowded.
## Benefits
<!-- Describe the benefits of adding this feature -->
## Drawbacks
## Suggested solutions
We could have the category appear somehow, or have two separate list, or different colors..publish the new public alerts pageAlexander PaceAlexander Pace