GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2023-02-08T16:51:58Zhttps://git.ligo.org/computing/gracedb/server/-/issues/240Generate railroad diagrams for query parsing language2023-02-08T16:51:58ZDaniel WysockiGenerate railroad diagrams for query parsing language`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to hav...`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to have for our documentation, but more importantly would be helpful for making improvements to the query language without breaking anything.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/239Remove query parsers' dependence on database state2023-02-08T16:08:49ZDaniel WysockiRemove query parsers' dependence on database stateThere are several database querying mini-languages written using the `pyparsing` module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserv...There are several database querying mini-languages written using the `pyparsing` module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserved words. This means any addition to the set of these values will require recompiling the parser, so as a result it's recompiled for _every query_. Speed considerations aside, this adds some serious complexity to the parsers, and means it's possible to break the parser by adding a badly named or non-unique value into one of the tables.
A much better approach would be to add a generic "identifier" token to the language. Then at code-generation time it would be resolved based on the database state.
To use Python as an analogy, consider what happens if one tries accessing an undefined variable
```python
>>> foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined
```
note that this isn't a `SyntaxError`, as Python knows `foo` is a valid identifier, but is unbound. The parser read my statement without issue, but the code generation phase correctly identified the missing name. This should be how our query language works as well.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/237More flexible queries for text/email alerts2023-02-08T16:56:01ZRebecca EwingMore flexible queries for text/email alerts## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
For text/email alerts there are only a few op...## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
For text/email alerts there are only a few options available, mostly just to choose a FAR threshold and set of labels. It would be useful if we could filter by additional parameters.
In general, if it's possible to support arbitrary queries for the alert rules that would be great.
## Use cases
<!-- List some specific cases where this feature will be useful -->
Getting alerted for public events while ignoring events from injection channels, so we don't get flooded with unnecessary alerts.
A query for this would be like `si.channel != "GDS-CALIB_STRAIN_INJ1_O3Replay" & si.channel != "Hrec_hoft_16384Hz_INJ1_O3Replay"` (I'm not sure exactly what the right syntax is.)
## Benefits
<!-- Describe the benefits of adding this feature -->
Adding this feature would make the alerts more general / flexible which should be a good thing.
## Drawbacks
<!--
Are there any drawbacks to adding this feature?
Can you think of any ways in which this will negatively affect the service for any set of users?
-->
As long as the old method stays in place and people can just optionally specify a more complicated/specific query I can't think of any drawbacks.
## Suggested solutions
<!-- Do you have any ideas for how to implement this feature? -->O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/235Occasional 500 error when reading files2024-03-19T00:41:56ZAlexander PaceOccasional 500 error when reading filesThere is an occasional 500 error returned by the cloud instances when attempting to read files. It occurs infrequently and randomly enough that I'm not able to reproduce it, but it does it gwcelery's workflow on occasion (~2 times per we...There is an occasional 500 error returned by the cloud instances when attempting to read files. It occurs infrequently and randomly enough that I'm not able to reproduce it, but it does it gwcelery's workflow on occasion (~2 times per week). And example error traceback looks like:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/viewsets.py", line 125, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 509, in dispatch
response = self.handle_exception(exc)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 469, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
raise exc
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python3.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/app/gracedb_project/gracedb/api/v1/superevents/views.py", line 321, in list
file_list = get_file_list(viewable_logs, parent_superevent.datadir)
File "/app/gracedb_project/gracedb/core/file_utils.py", line 32, in get_file_list
pointed_to = os.path.basename(os.path.realpath(full_path))
File "/usr/lib/python3.7/posixpath.py", line 395, in realpath
path, ok = _joinrealpath(filename[:0], filename, {})
File "/usr/lib/python3.7/posixpath.py", line 443, in _joinrealpath
path, ok = _joinrealpath(path, os.readlink(newpath), seen)
Exception Type: OSError at /api/superevents/MS220919n/files/
Exception Value: [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png'
```
It appears to be triggering the [retrying](https://git.ligo.org/computing/gracedb/server/-/commit/71daf97148ef21e858039343ba4dc6c60eb6f208) hook that I put in, but it doesn't seem to work because it is retying four times to get the file, sleeping one second between each attempt:
```
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:13 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:13.591 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:14 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:14.608 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:15 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:15.622 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:16 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:16.636 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
```
`Traefik` is showing that the request is returning a 500 error and is taking almost five seconds because of the retries:
```
Sep 19 13:38:18 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_webgateway_webgateway.1.l4j2u8hibrrtgelvsfhiubxfh: 131.215.113.198 - - [19/Sep/2022:13:38:13 +0000] "GET /api/superevents/MS220919n/files/ HTTP/1.1" 500 10472 "-" "-" 174967 "gracedb@docker" "http://10.0.2.51:80" 4815ms
```
For reference the nfs mounts are mounted with: `nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev`O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/233AWS resources for non-production GraceDB2023-02-08T19:04:20ZErik KatsavounidisAWS resources for non-production GraceDBGiven the heavy development currently in progress for the low latency alerts pipeline and the use of non-production GraceDB tiers, we will need to bring such tiers up to the same level of hardware resources under AWS with the production ...Given the heavy development currently in progress for the low latency alerts pipeline and the use of non-production GraceDB tiers, we will need to bring such tiers up to the same level of hardware resources under AWS with the production system.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/232Request to add external event info to igwn-alert2023-02-13T16:25:45ZCody MessickRequest to add external event info to igwn-alertCurrently both emfollow/gwcelery!857 and emfollow/gwcelery!852 download external events from gracedb to populate public alerts. Could the external event info just be included in the IGWN-Alert? The only catch that I see is that we need t...Currently both emfollow/gwcelery!857 and emfollow/gwcelery!852 download external events from gracedb to populate public alerts. Could the external event info just be included in the IGWN-Alert? The only catch that I see is that we need to be able to tell which event to use, @brandon.piotrzkowski said information this should be in the `em_type` field, so all we'd need is some way to identify the event that would be mentioned in that field.Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/229Migrate from ConcurrentLogHandler to concurrent-log-handler2022-08-11T23:24:35ZDaniel WysockiMigrate from ConcurrentLogHandler to concurrent-log-handler`requirements.txt` lists `ConcurrentLogHandler==0.9.1`, which is a package which was [last updated in 2013](https://pypi.org/project/ConcurrentLogHandler/), and makes use of the `use_2to3` feature of `setuptools<58`. We will be stuck wi...`requirements.txt` lists `ConcurrentLogHandler==0.9.1`, which is a package which was [last updated in 2013](https://pypi.org/project/ConcurrentLogHandler/), and makes use of the `use_2to3` feature of `setuptools<58`. We will be stuck with older versions of `setuptools` until this dependency is replaced, which may eventually become a problem.
Fortunately, one of the two maintainers forked the project as [`concurrent-log-handler`](https://pypi.org/project/concurrent-log-handler/), and has updated it as recently as this year. Changing our requirement to `concurrent-log-handler==0.9.20` gets me past the build issue on newer `setuptools` versions. It's also necessary to change the import from `cloghandler` to `concurrent_log_handler`. Beyond that I have not done further testing, so it may not be a drop-in replacement.BacklogDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/226Request for voevent IVORNs in superevent dictionary2023-07-05T19:49:00ZCody MessickRequest for voevent IVORNs in superevent dictionaryWould it be possible to include IVORNs for all VOEvents on a given superevent in the superevent dictionary? Doing so would allow the GWCelery team to populate two VOEvent fields without additional gracedb queries, specifically the citati...Would it be possible to include IVORNs for all VOEvents on a given superevent in the superevent dictionary? Doing so would allow the GWCelery team to populate two VOEvent fields without additional gracedb queries, specifically the citations sections and the `Pkg_Ser_Num` field.
Do non-LVK generated VOEvents ever end up in gracedb (e.g. from an external observation that is coincident with the GW)? I ask because my current mental model for determining `Pkg_Ser_Num` is just to count the IVORNs, i.e. if there are no IVORNs we assume the VOEvent we're generating is the first, if there's one IVORN we assume it's the second, etc. If VOEvents could show up from other events, that logic might need some additional checks.
Related to https://git.ligo.org/emfollow/gwcelery/-/merge_requests/857Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/225Uploading sky-maps with the MLy pipeline2023-02-08T19:49:34Zkyle willettsUploading sky-maps with the MLy pipelineWe (@mly) would like to be able to upload sky-maps to GraceDB in low-latency, ideally when publishing an event. Would it be possible to modify the upload file format we are currently using, to include a sky-map file?We (@mly) would like to be able to upload sky-maps to GraceDB in low-latency, ideally when publishing an event. Would it be possible to modify the upload file format we are currently using, to include a sky-map file?O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/224Study of missing notifications during O3 (not attempted by Twilio)2023-05-03T14:48:43ZPeter ShawhanStudy of missing notifications during O3 (not attempted by Twilio)During O3, some people reported not receiving notifications according to how they had configured GraceDB to send them notifications. A small fraction of people reported this, but it seemed to be consistent, i.e. not sporadic. I spent som...During O3, some people reported not receiving notifications according to how they had configured GraceDB to send them notifications. A small fraction of people reported this, but it seemed to be consistent, i.e. not sporadic. I spent some time looking into it in the summer of 2019. For the record, here is a copy of some email messages I sent to a few people (principally Tanner) at that time.
## Email on July 25, 2019:
I have gotten input from a number of people and cross-checked with the Twilio logs. I have not figured out what is happening, but I have learned some things so I thought I would distill my notes and share them with you.
* The problems people are having are with Call and Text notifications, not Email notifications. Well, I haven't paid much attention to what people mentioned about email notifications, so there could be problems there too, but anyway the problems are not ALL with Email notifications. The people who have communicated with me are primarily relying on calls and/or texts.
* The Twilio logs corroborate what people have told me. e.g. if they said they haven't gotten text messages and phone calls recently, the Twilio logs agree: it really looks like Twilio was not asked to call/text them. (Well, occasionally a phone call will fail and that will be shown in the Twilio log, but that is not common. It's not the explanation for people's reports of missing notifications.)
* Lots of people ARE being notified of relevant events. For instance, when S190718y was marked by ADVREQ, the Twilio logs list 98 text messages and 41 voice calls to people to notify them. When S190720a was labeled with ADVREQ, I see 112 text messages; I didn't count the voice calls in that case. When S190724g was labeled with EM_COINC, I see 67 text messages delivered and about 42 voice calls, most of which went through and were answered.
* Some people are receiving notifications reliably, while others are not receiving any. Some people used to receive notifications but have not been receiving them recently. A few people have observed that it seems like people who set up notifications a long time ago are receiving them, while people who set up notifications recently tend not to be receiving them.
So I think there are two general types of possible reasons: either (1) some call/text requests passed to Twilio are getting lost before Twilio attempts them, or (2) there is something funny in the software that the gracedb server is using to construct the list of contacts to call or text, leading it to omit some. (e.g., before I started looking into this, I had a hypothesis that a database query was being used to get the list of contacts and there was a maximum number of records returned by the query. But having looked at the code, that doesn't fit.)
I know you mentioned that logging is not working reliably on AWS; that's too bad, because from gracedb/alerts/phone.py I can see that every call/text attempt passed to Twilio is being logged. If you have a log file that you believe to be complete for some time that includes an event, I could compare it against the Twilio logs (which I have now exported into spreadsheets, cumulative since January).
There is a note here that "You can send messages to Twilio at a rapid rate as long as the requests do not reach Twilio's API concurrency limit which is at 100", but I don't THINK we would be running into that since call/text requests are made serially and I'm positive that Twilio is designed to queue requests and feed them out at the appropriate rate.
In terms of the software in the gracedb server, I spent some time studying the code in the gracedb/alerts directory, but it is complex enough that I can't trace it by inspection to check how it filters to get a list of matching notifications and then looks up the contact information from the notifications.
If you want a case to try debugging, look at Giacomo Ciani. His notification settings are:
~~~
Notifications
Once per year | Superevent created or updated & FAR < 3e-08 -> Text +393476487948, Email giacomo.ciani@unipd.it
Advocate request | Superevent labeled with ADVREQ -> Email giacomo.ciani@unipd.it, Call and text +393476487948
~~~
For S190720a he received several "A superevent with GraceDB ID S190720a was updated" text messages and emails (nobody received a "superevent created" message for that because the initial preferred event had too high a FAR), but did not receive any ADVREQ label messages for either S190720a or S190718y, either by text or by voice call. So it seems that the first of his two notifications was acted on but the second was not.
## Email on July 26, 2019:
I've taken some more time to digest the input I've received (and collected notes in a Google doc: https://docs.google.com/document/d/1QzDS-JWxi2EAXgYP64sKJaNde29x7MN7v0JtA5IGxl8/edit). Here are my high-level findings:
* For some people, all of their notifications are working.
* Working or not working seems to be associated with specific "lines" in a user's notifications configuration. For some people, SOME of their notification lines are working while others are not. For instance, Jenne Driggers has four notification lines configured, but only the first of them is working (i.e. generating text messages logged by Twilio); the other three lines are having no effect. Looking back through the logs, it seems that has been the case since she created the first two lines in early April, and added two more lines around July 18 or 19: her first notification line (which is interesting because it includes the NS candidate condition) has been working reliably, while none of the other lines has produced any notification through Twilio. Similarly, for Giacomo Ciani and Andrea Miani, their first notification line has been working while their second has not. Marco Bazzan's SECOND line has been working while his first has not.
* At least one user -- Daniel Sigg -- has two notification lines and neither is working. Daniel has in the past received phone calls through Twilio (on July 1 and 6), but he updated his alerts configuration and has not gotten any notifications since July 6.
* Giacomo Ciani, whom I mentioned above, added another line to his configuration today with voice call notifications, and it worked.
So my best picture of this is that some notification lines work and others don't. I can't tell what determines which ones work and which ones don't. It does seem to be the case that notification lines established a long time ago, or first in a person's list, are more likely to work; but that does not seem to be universal. And anyway, it's pretty clear to me that this is a GraceDB bookkeeping issue of some sort, not a problem with Twilio or individual users' phones or cell providers.
Oh, and for people who are not getting notifications according to their configuration, when they use the Test buttons to send test notifications, those work. (In most cases... Deep does not seem to be able to receive calls or text messages from Twilio on his phone.)O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/223Twilio SMS improvements2023-02-08T19:44:38ZAlexander PaceTwilio SMS improvementsThis ticket is intended as an information dump to make an informed decision about Twilio messaging from GraceDB in O4. Unfortunately, the SMS logs in Twilio's console only go back one year. So the fine-grained messaging logs for O3 are g...This ticket is intended as an information dump to make an informed decision about Twilio messaging from GraceDB in O4. Unfortunately, the SMS logs in Twilio's console only go back one year. So the fine-grained messaging logs for O3 are gone, but the billing rates are available. GraceDB writes a log message with a "`Texting...`" keyword, and those logs are gzipped and archived from O3. So it would be possible-- if need be-- to scrape a month's worth of logs to get the absolute number of messages sent and then extrapolate that to O4. But, you know, effort.
## Number of users and types of alerts
This is the result of digging around in the database to get some idea of the number of users and alert types that are live in GraceDB. Note that in the following nomenclature, a "Notification" object contains a unique set of parameters that dictates when and how an alert goes out to a user. Using @chad-hanna as an example (phone number redacted):
```
> chad=User.objects.get(username='chad.hanna@LIGO.org')
> Notification.objects.filter(user=chad)
<QuerySet [<Notification: chad.hanna@LIGO.ORG: Superevent created or updated & FAR < 1.92901235e-07 -> Call and text +1814XXXXXXX>, <Notification: chad.hanna@LIGO.ORG: Event created or updated & group=CBC & pipeline=gstlal & search=AllSky & FAR < 1.92901235e-07 -> Call and text +1814XXXXXXX>, <Notification: chad.hanna@LIGO.ORG: Event labeled with EM_COINC & any group & any pipeline & any search -> Call and text +1814XXXXXXX>]>
```
So in this example, he signed up for calls/texts for 1) low-far superevent creation, 2) low-far gstlal event uploads, and 3) event uploads with an EM_COINC label applied. In this scenario, there is one user, but three distinct "Notifications". That being said:
* Number of distinct notifications: 649
* Superevent notifications: 586
* Event notifications: 63
* Number of distinct users: 364
* Distinct users signed up for Event notifications: 60 (includes emails)
* Distinct users with Event call and/or sms notifications: 11
* Number of distinct Event call and/or sms notifications: 16
* Event notifications/user: 16/11=**1.5**
* Distinct users signed up for Superevent notifications: 345 (includes emails)
* Distinct users with Superevent call and/or sms notifications: 241
* Number of distinct Superevent call and/or sms notifications: 413
* Superevent notifications/user: 413/241=**1.7**
* Number of unique SMS notifications: 304
* Number of unique phone call notifications: 30
* Number of call+text notifications: 95
I'll dump more stats in here later on, but that should be a start. I think a first cut should be take the unique number of users for superevent/event notifications and scale that up by how much the collaboration has grown between O3--> O4. This assumes a constant percentage of collaboration members signed up for texts and calls. Then scale that by the expected increase in superevent/event rates in O4, and then use the call/sms per superevent/event per user to find out an expected messaging rate. For now that is left as an exercise for the reader.
## Batch processing of calls and SMS alerts
Getting this operation down to a single database query (which is totally doable), and then a single API call to twilio would save **loads** of time in generating alerts. Via the [documentation](https://support.twilio.com/hc/en-us/articles/223181548-Can-I-set-up-one-API-call-to-send-messages-to-a-list-of-people-):
> Each new SMS message from Twilio must be sent with a separate REST API request. To initiate messages to a list of recipients, you must make a request for each number to which you would like to send a message. The best way to do this is to build an array of the recipients and iterate through each phone number.
Weak.
## Prioritizing recipients
This is also a way to get messages out to the people who need them faster. At the end of O3 it was decided to pare down the number of G-Event recipients to a fixed list of pipeline and followup and control room recipients. I propose to formalize that list, and then have it as a community entity in LIGO's LDAP (similar to how `Communities:LVC:GraceDB:GraceDBAdvocates` exists for EM advocates). Then GraceDB will assign a priority to each group (so pipeline experts=2, em advocates=1, everyone else=0), then sort the list of SMS recipients via this priority and then start the messaging loop.
I'm open to other ideas, but this is a start of a strategy for defining twilio account messaging rates and prices.Critical Path O4 DevelopmentAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/222Asimov, Lensing support during O4 MDC2023-02-08T19:01:55ZAlexander PaceAsimov, Lensing support during O4 MDCThis ticket is to track changes and requests to support the Lensing Group during the ongoing O4 MDC.
@surabhi.sachdev @alvin.liThis ticket is to track changes and requests to support the Lensing Group during the ongoing O4 MDC.
@surabhi.sachdev @alvin.liO4 CBC Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/220"terminating connection due to administrator command"2022-04-04T14:37:45ZAlexander Pace"terminating connection due to administrator command"Over the weekend (April 3-4), I woke up to about ~30 emails from `gracedb-test/playground` and then the following day, from `gracedb` (production) with the following error message:
```
Internal Server Error: /some/api/path
OperationalE...Over the weekend (April 3-4), I woke up to about ~30 emails from `gracedb-test/playground` and then the following day, from `gracedb` (production) with the following error message:
```
Internal Server Error: /some/api/path
OperationalError at /some/api/path
terminating connection due to administrator command
SSL connection has been closed unexpectedly
```
Okay? I had never seen that before. So it appears to be a thing with postgres/RDS. ex: https://old.reddit.com/r/aws/comments/b5l3ha/rds_giving_terminating_connection_due_to/
I went into the management console and saw messages like this for "recent events":
![Screen_Shot_2022-04-04_at_10.28.21_AM](/uploads/6eafa63e416711b680f8db37bf87f369/Screen_Shot_2022-04-04_at_10.28.21_AM.png)
So the best I can gather from that and from the maintenance settings is that RDS triggered a minor version update and shutdown and restarted the databases automatically. Client connections were closed, and that's what caused the errors. So as a first cut, I disabled automatic updates, so that's something to keep an eye on for maintenance windows.
I also ducked into sentry and saw that gwcelery recorded the 500 httperror messages, so the clients saw it as well. Hopefully this doesn't pop up again. But i'm recording it here just in case.
Also, the line `SSL connection has been closed unexpectedly`.
For some reason by default postgres asks for an SSL connection? All the communication between the database and the EC2 nodes is behind the cloud and constrained to security groups, so I think we could get away with disabling it and reducing the connection overhead: https://www.postgresql.org/docs/current/libpq-ssl.htmlhttps://git.ligo.org/computing/gracedb/server/-/issues/219Port labels from catalog-dev2023-07-21T15:10:39ZAlexander PacePort labels from catalog-devPlease prepare a list of labels that are on `catalog-dev.lig.org` that will need to be in place on GraceDB to transfer events and shut down `catalog-dev`. As a first cut, I came up with:
```
In [1]: from ligo.gracedb.rest import GraceD...Please prepare a list of labels that are on `catalog-dev.lig.org` that will need to be in place on GraceDB to transfer events and shut down `catalog-dev`. As a first cut, I came up with:
```
In [1]: from ligo.gracedb.rest import GraceDb
In [2]: cdev = GraceDb('https://catalog-dev.ligo.org/api/')
In [3]: gdb = GraceDb('https://gracedb.ligo.org/api')
In [4]: catalog_dev_labels = cdev.allowed_labels
In [5]: gracedb_labels = gdb.allowed_labels
In [6]: set(catalog_dev_labels) - set(gracedb_labels)
Out[6]:
{'CHUNK_1',
'CHUNK_10',
'CHUNK_11',
'CHUNK_12',
'CHUNK_13',
'CHUNK_14',
'CHUNK_15',
'CHUNK_16',
'CHUNK_17',
'CHUNK_18',
'CHUNK_19',
'CHUNK_2',
'CHUNK_20',
'CHUNK_21',
'CHUNK_22',
'CHUNK_23',
'CHUNK_24',
'CHUNK_25',
'CHUNK_26',
'CHUNK_27',
'CHUNK_28',
'CHUNK_29',
'CHUNK_3',
'CHUNK_30',
'CHUNK_31',
'CHUNK_32',
'CHUNK_33',
'CHUNK_34',
'CHUNK_35',
'CHUNK_36',
'CHUNK_37',
'CHUNK_38',
'CHUNK_39',
'CHUNK_4',
'CHUNK_40',
'CHUNK_5',
'CHUNK_6',
'CHUNK_7',
'CHUNK_8',
'CHUNK_9',
'CHUNK_UNKNOWN',
'DETCHAR_NO',
'DETCHAR_YES',
'DQ_NO',
'DQ_YES',
'FINAL',
'GDB_NO',
'GDB_YES',
'NONE',
'O3A_CAT_NO',
'O3A_CAT_YES',
'O3A_CBC_CATALOG',
'O3A_CBC_FINAL',
'O3A_CBC_SUBTHRESHOLD',
'O3A_CWB_FINAL',
'O3A_CWB_ONLY',
'O3A_EVENT_FOR_O3B',
'O3A_SSM',
'O3B_CBC_CATALOG',
'O3B_CBC_SUBTHRESHOLD',
'O3B_CWB_ONLY',
'O3B_SSM',
'PE_NO',
'PE_YES',
'PRELIM'}
```
I think some thought needs to be given in terms of which ones to retain and which ones aren't necessary. For instance at first glance `NONE`, `GDB_NO`, `GDB_YES` probably don't make sense in the context of using GraceDB as the final event repository.
Once I get the list, I can add them to gracedb's deployment for testing.
@gregory.ashton @surabhi.sachdev @rebecca.ewingO4 CBC ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/218Validate the CBC meta data2022-03-15T10:31:20ZGregory Ashtongregory.ashton@ligo.orgValidate the CBC meta dataThe uploaded meta data #217 can be validated by the [JSON schema](https://git.ligo.org/cbc/meta-data/-/blob/main/cbc-meta-data.schema). Note this is not yet complete. A v1 draft is in preparation.The uploaded meta data #217 can be validated by the [JSON schema](https://git.ligo.org/cbc/meta-data/-/blob/main/cbc-meta-data.schema). Note this is not yet complete. A v1 draft is in preparation.https://git.ligo.org/computing/gracedb/server/-/issues/213Superevent "flattening"2022-03-18T01:03:59ZAlexander PaceSuperevent "flattening"I was really bad about documenting commits on this branch: https://git.ligo.org/computing/gracedb/server/-/tree/new_event_superevent_types
But basically it entailed "flattening" the table structure for superevents, such that the `supere...I was really bad about documenting commits on this branch: https://git.ligo.org/computing/gracedb/server/-/tree/new_event_superevent_types
But basically it entailed "flattening" the table structure for superevents, such that the `superevent_id` was no longer a python property constructed from the date id and such. This will go a LONG way to improve page load times and superevent queries. It also cuts down on a bunch of regex's throughout the code that decomposed the superevent_id back into dateids.
I had used the [django-computedfields](https://pypi.org/project/django-computedfields/) package that worked pretty well. But maybe there's a more postgres-y way to do this.
Also since events' GIDs are constructed from one letter already in the database along with a row id, I think we basically get graceid's in the database for free as well.O4 CBC Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/212Metadata for triggers and candidate events2022-03-22T13:38:22ZAlexander PaceMetadata for triggers and candidate eventsThe [charge](https://dcc.ligo.org/LIGO-T2100502) for O4 data product management states:
> Metadata: We define metadata as a set of lightweight data products or links to data products
associated with a given trigger. For example, lightwe...The [charge](https://dcc.ligo.org/LIGO-T2100502) for O4 data product management states:
> Metadata: We define metadata as a set of lightweight data products or links to data products
associated with a given trigger. For example, lightweight data products such as the FAR and SNR or
paths links to parameter estimation posteriors.
I think this can be accomplished with a new table that's linked with a 1:1 foreign key to a g-event. Also from the charge, a proposed format the metadata is below. Inserted as a screenshot since copy/paste from pdf totally gnarled up the formatting.
![Screen_Shot_2022-02-28_at_9.29.27_PM](/uploads/92a3e044fab88bcb533741c5f9604d15/Screen_Shot_2022-02-28_at_9.29.27_PM.png)
I think a first cut would involve taking advantage of postgres' [json datatype](https://www.postgresql.org/docs/9.4/datatype-json.html).
Querying for metadata would have to be implemented too.O4 CBC Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/211Add SciTokens support2023-07-17T15:07:39ZDuncan Macleodduncan.macleod@ligo.orgAdd SciTokens supportThe GraceDB server needs to accept a SciToken as an authorisation option.The GraceDB server needs to accept a SciToken as an authorisation option.O4 AdvanceDuncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/210Reducing queries by packing igwn-alert2022-03-18T01:03:58ZAlexander PaceReducing queries by packing igwn-alertAs discussed on low-latency call, December 8 2021. The purpose of this ticket is to solicit feedback and suggestions for what information to include in `igwn-alert` packets with the goal of reducing costly queries to GraceDB.
Relevant p...As discussed on low-latency call, December 8 2021. The purpose of this ticket is to solicit feedback and suggestions for what information to include in `igwn-alert` packets with the goal of reducing costly queries to GraceDB.
Relevant past commits:
* https://git.ligo.org/lscsoft/gracedb/-/commit/e52b0c2ea248efbbb221ed51c58d55a3e5c4a3de
* https://git.ligo.org/lscsoft/gracedb/-/commit/2402e914dd6afd28035f6d06086bd6519f8018a9
Current MR's:
* https://git.ligo.org/lscsoft/gracedb/-/merge_requests/52/O4 Infrastructure ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/209Re-enable X-Pipeline2023-02-22T15:36:48ZAlexander PaceRe-enable X-Pipeline**Background:** `X-Pipeline` or just `X` has been floating around in GraceDB since way before I've been on this project, but has [never uploaded an event](https://gracedb.ligo.org/search/?query=X&query_type=E&results_format=S). I was att...**Background:** `X-Pipeline` or just `X` has been floating around in GraceDB since way before I've been on this project, but has [never uploaded an event](https://gracedb.ligo.org/search/?query=X&query_type=E&results_format=S). I was attempting to clean up event logic back in mid 2020, and so I added `X`, `Q`, and `Omega` pipelines to a list of pipelines that were being phased out, and [returned a warning message](https://git.ligo.org/lscsoft/gracedb/-/blob/ca14eb8e8eb0c4111ecac38d6e879472fea1b111/gracedb/api/v1/events/views.py#L524-L525) if a user attempted to upload an event to that pipeline. Not that it would have worked anyway, because the [logic](https://git.ligo.org/lscsoft/gracedb/-/blob/master/gracedb/events/view_logic.py#L66-L67) to ingest X-pipeline event files had never actually been implemented and likely would have returned an error.
I received a request over [mattermost](https://chat.ligo.org/ligo/pl/t1zgpxrm8fdz8qk4xrk4jiueby) to revive the pipeline.
Before proceeding with this, I need from @amber-stuver:
1) An example event upload. I don't know the output file format (xml? json?) or what fields that are in the file should be stored in the database. I can look it over as a first step to compare to other event types that are in GraceDB, but I need the file first and foremost. It can be attached to this ticket.
2) What kind of search type is it? Right now, GraceDB ingests events from `CoincInspiral` searches, `GRB` searches, `Multiburst` searches, etc. If `X` fits into one of those categories, storing event data and constructing the view and `REST` response is simpler. But this will make more sense when I get the example upload.
3) Who is going to be uploading and populating the pipeline? If it's individual users, I need just your `@LIGO.org` email address. Or, if there is a robot account that is uploading, please apply for a cert from https://robots.ligo.org/ and then I'll add it as an uploader.
Once I get the example upload and the other information that I need, the steps I need to do are:
1) Remove `X` from the depreciated pipelines list.
2) Add logic to `view_logic.py` to read in `X` events.
3) Determine views for `X` events, add it to settings.
4) Add uploader permissions for the pipeline
5) Test event uploads, ingestion into the database, and webpage views.
6) Make appropriate LVAlert topics for `X-pipeline`
Then I'll have to push a server code change and deploy it.
Drop any questions or sample files you have here and I'll get back to you.Alexander PaceAlexander Pace