GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2023-02-08T19:49:48Zhttps://git.ligo.org/computing/gracedb/server/-/issues/242Revamp HardwareInjection event uploads.2023-02-08T19:49:48ZAlexander PaceRevamp HardwareInjection event uploads.This is to track work to bring back HardwareInjection events.
TODO:
- [x] provide sample json (?) upload
- [x] make data model
- [x] validate uploads
- [x] create page view
- [ ] determine what scenarios and alert contents should be
-...This is to track work to bring back HardwareInjection events.
TODO:
- [x] provide sample json (?) upload
- [x] make data model
- [x] validate uploads
- [x] create page view
- [ ] determine what scenarios and alert contents should be
- [ ] ????Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/241Upgrade pyparsing to 3.0.0 or above2023-02-08T15:59:30ZDaniel WysockiUpgrade pyparsing to 3.0.0 or aboveGraceDB is stuck on `pyparsing==2.3.0` due to an API change. There are some new features in `3.0.0` which would be useful, especially for visualization purposes, so we should upgrade.GraceDB is stuck on `pyparsing==2.3.0` due to an API change. There are some new features in `3.0.0` which would be useful, especially for visualization purposes, so we should upgrade.Daniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/240Generate railroad diagrams for query parsing language2023-02-08T16:51:58ZDaniel WysockiGenerate railroad diagrams for query parsing language`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to hav...`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to have for our documentation, but more importantly would be helpful for making improvements to the query language without breaking anything.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/239Remove query parsers' dependence on database state2023-02-08T16:08:49ZDaniel WysockiRemove query parsers' dependence on database stateThere are several database querying mini-languages written using the `pyparsing` module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserv...There are several database querying mini-languages written using the `pyparsing` module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserved words. This means any addition to the set of these values will require recompiling the parser, so as a result it's recompiled for _every query_. Speed considerations aside, this adds some serious complexity to the parsers, and means it's possible to break the parser by adding a badly named or non-unique value into one of the tables.
A much better approach would be to add a generic "identifier" token to the language. Then at code-generation time it would be resolved based on the database state.
To use Python as an analogy, consider what happens if one tries accessing an undefined variable
```python
>>> foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined
```
note that this isn't a `SyntaxError`, as Python knows `foo` is a valid identifier, but is unbound. The parser read my statement without issue, but the code generation phase correctly identified the missing name. This should be how our query language works as well.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/238race condition when creating new symlinks2022-10-06T13:51:14ZAlexander Pacerace condition when creating new symlinksThere have been a couple of scenarios when RAVEN (@brandon.piotrzkowski) has tried to simultaneously upload multiple files of the same filename (i.e., `coincidence_far.json`) at the same time, and while the code made it past the part in ...There have been a couple of scenarios when RAVEN (@brandon.piotrzkowski) has tried to simultaneously upload multiple files of the same filename (i.e., `coincidence_far.json`) at the same time, and while the code made it past the part in the [code](https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/core/vfile.py#L185) where it removes the old symlink, it would `OSError` (traceback at the end of this issue) when trying to [create](https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/core/vfile.py#L191) a new one because another process beat it to it.
This commit: https://git.ligo.org/computing/gracedb/server/-/commit/3daa20fd8e5736d7176cffee39235a507fd88423 fixes it by just catching the exception and moving on. The versioned file still gets created in the logs and is still available, but there's no real scenario in which we should implement the logic to prioritize one symlink over the other in this case.
```
...
...
File "/app/gracedb_project/gracedb/superevents/utils.py", line 248, in create_log
raise e
File "/app/gracedb_project/gracedb/superevents/utils.py", line 244, in create_log
event_or_superevent.datadir, data_file)
File "/app/gracedb_project/gracedb/core/vfile.py", line 271, in create_versioned_file
fdest.close()
File "/app/gracedb_project/gracedb/core/vfile.py", line 224, in close
self._repoint_symlink()
File "/app/gracedb_project/gracedb/core/vfile.py", line 191, in _repoint_symlink
os.symlink(name, self.fullname)
Exception Type: FileExistsError at /api/superevents/MS221004az/logs/
Exception Value: [Errno 17] File exists: 'coincidence_far.json,7' -> '/app/gracedb_project/../db_data/5f/8/0bea0146287984087bb980b9a0d5840958463/coincidence_far.json'
Request information:
USER: emfollow
GET: No GET data
POST:
tagname = 'ext_coinc'
comment = "RAVEN: Computed coincident FAR(s) in Hz with external trigger <a href='https://gracedb-test.ligo.org/events/M315290'>M315290</a>"
```https://git.ligo.org/computing/gracedb/server/-/issues/237More flexible queries for text/email alerts2023-02-08T16:56:01ZRebecca EwingMore flexible queries for text/email alerts## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
For text/email alerts there are only a few op...## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
For text/email alerts there are only a few options available, mostly just to choose a FAR threshold and set of labels. It would be useful if we could filter by additional parameters.
In general, if it's possible to support arbitrary queries for the alert rules that would be great.
## Use cases
<!-- List some specific cases where this feature will be useful -->
Getting alerted for public events while ignoring events from injection channels, so we don't get flooded with unnecessary alerts.
A query for this would be like `si.channel != "GDS-CALIB_STRAIN_INJ1_O3Replay" & si.channel != "Hrec_hoft_16384Hz_INJ1_O3Replay"` (I'm not sure exactly what the right syntax is.)
## Benefits
<!-- Describe the benefits of adding this feature -->
Adding this feature would make the alerts more general / flexible which should be a good thing.
## Drawbacks
<!--
Are there any drawbacks to adding this feature?
Can you think of any ways in which this will negatively affect the service for any set of users?
-->
As long as the old method stays in place and people can just optionally specify a more complicated/specific query I can't think of any drawbacks.
## Suggested solutions
<!-- Do you have any ideas for how to implement this feature? -->O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/236External events appearing in superevent neighbors `gw_events`2022-09-22T15:23:29ZBrandon PiotrzkowskiExternal events appearing in superevent neighbors `gw_events`First noted by Deep Chatterjee, within the `superevent_neighbors` field external events can appear as gw events. This can potentially disrupt the logic used in the superevent manager.
Summary of https://gracedb-test.ligo.org/api/superev...First noted by Deep Chatterjee, within the `superevent_neighbors` field external events can appear as gw events. This can potentially disrupt the logic used in the superevent manager.
Summary of https://gracedb-test.ligo.org/api/superevents/MS220920p/ , where `M304272` appears as an external event in the superevent (and not included in `gw_events`) but under `superevent_neighbors` appears as a `gw_event`:
```
{
"superevent_id": "MS220920p",
"gw_events": [
"M304283",
"M304282",
"M304281",
"M304280",
"M304279",
"M304278",
"M304277",
"M304276",
"M304275",
"M304274",
"M304273",
"M304271",
"M304270",
"M304269",
"M304268",
"M304267"
],
"em_events": [
"M304272"
],
"preferred_event_data": {
"superevent": "MS220920p",
"superevent_neighbours": {
"MS220920p": {
"superevent_id": "MS220920p",
"gw_events": [
"M304283",
"M304282",
"M304281",
"M304280",
"M304279",
"M304278",
"M304277",
"M304276",
"M304275",
"M304274",
"M304273",
"M304272",
"M304271",
"M304270",
"M304269",
"M304268",
"M304267"
]
}
}
}https://git.ligo.org/computing/gracedb/server/-/issues/235Occasional 500 error when reading files2024-03-19T00:41:56ZAlexander PaceOccasional 500 error when reading filesThere is an occasional 500 error returned by the cloud instances when attempting to read files. It occurs infrequently and randomly enough that I'm not able to reproduce it, but it does it gwcelery's workflow on occasion (~2 times per we...There is an occasional 500 error returned by the cloud instances when attempting to read files. It occurs infrequently and randomly enough that I'm not able to reproduce it, but it does it gwcelery's workflow on occasion (~2 times per week). And example error traceback looks like:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/viewsets.py", line 125, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 509, in dispatch
response = self.handle_exception(exc)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 469, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
raise exc
File "/usr/local/lib/python3.7/dist-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python3.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/app/gracedb_project/gracedb/api/v1/superevents/views.py", line 321, in list
file_list = get_file_list(viewable_logs, parent_superevent.datadir)
File "/app/gracedb_project/gracedb/core/file_utils.py", line 32, in get_file_list
pointed_to = os.path.basename(os.path.realpath(full_path))
File "/usr/lib/python3.7/posixpath.py", line 395, in realpath
path, ok = _joinrealpath(filename[:0], filename, {})
File "/usr/lib/python3.7/posixpath.py", line 443, in _joinrealpath
path, ok = _joinrealpath(path, os.readlink(newpath), seen)
Exception Type: OSError at /api/superevents/MS220919n/files/
Exception Value: [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png'
```
It appears to be triggering the [retrying](https://git.ligo.org/computing/gracedb/server/-/commit/71daf97148ef21e858039343ba4dc6c60eb6f208) hook that I put in, but it doesn't seem to work because it is retying four times to get the file, sleeping one second between each attempt:
```
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:13 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:13.591 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:14 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:14.608 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:15 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:15.622 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
gracedb-swarm-test-us-west-2a-docker-mgr-01.log:Sep 19 13:38:16 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.2.o400wqmzk6yutaoaz1cd8mjyt: DJANGO | 2022-09-19 13:38:16.636 | e459e5951d2a | 10.0.2.51 | api.v1.superevents.views | WARNING | api.py, line 40 | [Errno 5] Input/output error: '/app/db_data/9a/6/8ac9f1720d59940bed2d8e384d57c98049c82/bayestar.multiorder.coherence.png', retrying in 1.0 seconds...
```
`Traefik` is showing that the request is returning a 500 error and is taking almost five seconds because of the retries:
```
Sep 19 13:38:18 gracedb-swarm-test-us-west-2a-docker-mgr-01 gracedb_docker_webgateway_webgateway.1.l4j2u8hibrrtgelvsfhiubxfh: 131.215.113.198 - - [19/Sep/2022:13:38:13 +0000] "GET /api/superevents/MS220919n/files/ HTTP/1.1" 500 10472 "-" "-" 174967 "gracedb@docker" "http://10.0.2.51:80" 4815ms
```
For reference the nfs mounts are mounted with: `nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev`O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/234Add additional RAVEN-related labels2022-10-06T14:08:55ZBrandon PiotrzkowskiAdd additional RAVEN-related labelsThere is a need for additional labels to indicate states in GraceDB and resolve race conditions:
- `COMBINED_SKYMAP_READY`: Combined skymap is available.
- `SOG_READY`: A coincidence should trigger a speed of gravity measurement.
The c...There is a need for additional labels to indicate states in GraceDB and resolve race conditions:
- `COMBINED_SKYMAP_READY`: Combined skymap is available.
- `SOG_READY`: A coincidence should trigger a speed of gravity measurement.
The code changes that require these labels are https://git.ligo.org/emfollow/gwcelery/-/merge_requests/864 and https://git.ligo.org/emfollow/gwcelery/-/merge_requests/890 respectively.https://git.ligo.org/computing/gracedb/server/-/issues/233AWS resources for non-production GraceDB2023-02-08T19:04:20ZErik KatsavounidisAWS resources for non-production GraceDBGiven the heavy development currently in progress for the low latency alerts pipeline and the use of non-production GraceDB tiers, we will need to bring such tiers up to the same level of hardware resources under AWS with the production ...Given the heavy development currently in progress for the low latency alerts pipeline and the use of non-production GraceDB tiers, we will need to bring such tiers up to the same level of hardware resources under AWS with the production system.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/232Request to add external event info to igwn-alert2023-02-13T16:25:45ZCody MessickRequest to add external event info to igwn-alertCurrently both emfollow/gwcelery!857 and emfollow/gwcelery!852 download external events from gracedb to populate public alerts. Could the external event info just be included in the IGWN-Alert? The only catch that I see is that we need t...Currently both emfollow/gwcelery!857 and emfollow/gwcelery!852 download external events from gracedb to populate public alerts. Could the external event info just be included in the IGWN-Alert? The only catch that I see is that we need to be able to tell which event to use, @brandon.piotrzkowski said information this should be in the `em_type` field, so all we'd need is some way to identify the event that would be mentioned in that field.Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/231Request for information needed to compute fluence in IGWN-Alerts2023-06-13T16:36:33ZCody MessickRequest for information needed to compute fluence in IGWN-AlertsThe fluence for burst alerts is currently computed using information in the event file here: https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/annotations/voevent_utils.py#L574-643
Would it be possible to either comput...The fluence for burst alerts is currently computed using information in the event file here: https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/annotations/voevent_utils.py#L574-643
Would it be possible to either compute the fluence on the server side and include it in burst igwn-alerts or to include the information needed to compute fluence? I'm not sure if there's a reason to go with one over the other.
This is needed for emfollow/gwcelery!857.Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/230Add missing lines to coverage report2022-08-12T00:31:07ZDaniel WysockiAdd missing lines to coverage report## Description of feature request
`pytest-coverage` can output a column listing the specific lines which were not covered. We're currently (implicitly) using the default report, `--cov-report term`, but we simply have to swap this for ...## Description of feature request
`pytest-coverage` can output a column listing the specific lines which were not covered. We're currently (implicitly) using the default report, `--cov-report term`, but we simply have to swap this for `--cov-report term-missing` to add the extra info.
https://pytest-cov.readthedocs.io/en/latest/reporting.html
## Use cases
This helps target testing efforts on lines that have not been covered.
## Benefits
Better targeting of tests.
## Drawbacks
The CI/CD job logs will be a bit wider. If this is a problem we could instead keep the current simplified terminal output, and add `--cov-report xml` or `--cov-report html` to add the more detailed info in a job artifact.
## Suggested solutions
Add `--cov-report term-missing` to `pytest`'s arguments in `.gitlab-ci.yml`. I'll have a merge request momentarily.Daniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/229Migrate from ConcurrentLogHandler to concurrent-log-handler2022-08-11T23:24:35ZDaniel WysockiMigrate from ConcurrentLogHandler to concurrent-log-handler`requirements.txt` lists `ConcurrentLogHandler==0.9.1`, which is a package which was [last updated in 2013](https://pypi.org/project/ConcurrentLogHandler/), and makes use of the `use_2to3` feature of `setuptools<58`. We will be stuck wi...`requirements.txt` lists `ConcurrentLogHandler==0.9.1`, which is a package which was [last updated in 2013](https://pypi.org/project/ConcurrentLogHandler/), and makes use of the `use_2to3` feature of `setuptools<58`. We will be stuck with older versions of `setuptools` until this dependency is replaced, which may eventually become a problem.
Fortunately, one of the two maintainers forked the project as [`concurrent-log-handler`](https://pypi.org/project/concurrent-log-handler/), and has updated it as recently as this year. Changing our requirement to `concurrent-log-handler==0.9.20` gets me past the build issue on newer `setuptools` versions. It's also necessary to change the import from `cloghandler` to `concurrent_log_handler`. Beyond that I have not done further testing, so it may not be a drop-in replacement.BacklogDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/228Set up a instance of gracedb for development and testing for Daniel Wysocki2022-10-12T21:56:16ZPatrick BradySet up a instance of gracedb for development and testing for Daniel WysockiSet up a instance of gracedb for development and testing for Daniel Wysocki. Work with Duncan Meacher to do this. Instructions are available at https://git.ligo.org/computing/gracedb/server/-/wikis/New-instanceSet up a instance of gracedb for development and testing for Daniel Wysocki. Work with Duncan Meacher to do this. Instructions are available at https://git.ligo.org/computing/gracedb/server/-/wikis/New-instanceDuncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/227Add superevent labels to under the superevent_neighbours data model2023-02-08T20:03:52ZDeep Chatterjeedeep.chatterjee@ligo.orgAdd superevent labels to under the superevent_neighbours data model## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
The `superevent_neighbours` field in the even...## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
The `superevent_neighbours` field in the event data model contains the superevent information keyed on `superevent_id`. However, the information contains the following
```
[
"far",
"gw_events",
"preferred_event",
"preferred_event_data",
"superevent_id",
"t_0",
"t_end",
"t_start"
]
```
It would be helpful to have the superevent labels since they convey the state of the superevent.
## Use cases
<!-- List some specific cases where this feature will be useful -->
This helps the superevent manager know the state of the superevent without making a GET call. For example, in the MR for the superevent manager in gwcelery: https://git.ligo.org/emfollow/gwcelery/-/merge_requests/873/diffs#5f270f21c6a12c7b5c86ebe43ad932e527e39bd0_504_508
## Benefits
<!-- Describe the benefits of adding this feature -->
This would help remove any GET calls to gracedb in the superevent manager.
## Drawbacks
<!--
Are there any drawbacks to adding this feature?
Can you think of any ways in which this will negatively affect the service for any set of users?
-->
This change is backward compatible, so no breaking change.
## Suggested solutions
<!-- Do you have any ideas for how to implement this feature? -->
Change in the data model for the `superevent_neighbours`Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/226Request for voevent IVORNs in superevent dictionary2023-07-05T19:49:00ZCody MessickRequest for voevent IVORNs in superevent dictionaryWould it be possible to include IVORNs for all VOEvents on a given superevent in the superevent dictionary? Doing so would allow the GWCelery team to populate two VOEvent fields without additional gracedb queries, specifically the citati...Would it be possible to include IVORNs for all VOEvents on a given superevent in the superevent dictionary? Doing so would allow the GWCelery team to populate two VOEvent fields without additional gracedb queries, specifically the citations sections and the `Pkg_Ser_Num` field.
Do non-LVK generated VOEvents ever end up in gracedb (e.g. from an external observation that is coincident with the GW)? I ask because my current mental model for determining `Pkg_Ser_Num` is just to count the IVORNs, i.e. if there are no IVORNs we assume the VOEvent we're generating is the first, if there's one IVORN we assume it's the second, etc. If VOEvents could show up from other events, that logic might need some additional checks.
Related to https://git.ligo.org/emfollow/gwcelery/-/merge_requests/857Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/225Uploading sky-maps with the MLy pipeline2023-02-08T19:49:34Zkyle willettsUploading sky-maps with the MLy pipelineWe (@mly) would like to be able to upload sky-maps to GraceDB in low-latency, ideally when publishing an event. Would it be possible to modify the upload file format we are currently using, to include a sky-map file?We (@mly) would like to be able to upload sky-maps to GraceDB in low-latency, ideally when publishing an event. Would it be possible to modify the upload file format we are currently using, to include a sky-map file?O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/224Study of missing notifications during O3 (not attempted by Twilio)2023-05-03T14:48:43ZPeter ShawhanStudy of missing notifications during O3 (not attempted by Twilio)During O3, some people reported not receiving notifications according to how they had configured GraceDB to send them notifications. A small fraction of people reported this, but it seemed to be consistent, i.e. not sporadic. I spent som...During O3, some people reported not receiving notifications according to how they had configured GraceDB to send them notifications. A small fraction of people reported this, but it seemed to be consistent, i.e. not sporadic. I spent some time looking into it in the summer of 2019. For the record, here is a copy of some email messages I sent to a few people (principally Tanner) at that time.
## Email on July 25, 2019:
I have gotten input from a number of people and cross-checked with the Twilio logs. I have not figured out what is happening, but I have learned some things so I thought I would distill my notes and share them with you.
* The problems people are having are with Call and Text notifications, not Email notifications. Well, I haven't paid much attention to what people mentioned about email notifications, so there could be problems there too, but anyway the problems are not ALL with Email notifications. The people who have communicated with me are primarily relying on calls and/or texts.
* The Twilio logs corroborate what people have told me. e.g. if they said they haven't gotten text messages and phone calls recently, the Twilio logs agree: it really looks like Twilio was not asked to call/text them. (Well, occasionally a phone call will fail and that will be shown in the Twilio log, but that is not common. It's not the explanation for people's reports of missing notifications.)
* Lots of people ARE being notified of relevant events. For instance, when S190718y was marked by ADVREQ, the Twilio logs list 98 text messages and 41 voice calls to people to notify them. When S190720a was labeled with ADVREQ, I see 112 text messages; I didn't count the voice calls in that case. When S190724g was labeled with EM_COINC, I see 67 text messages delivered and about 42 voice calls, most of which went through and were answered.
* Some people are receiving notifications reliably, while others are not receiving any. Some people used to receive notifications but have not been receiving them recently. A few people have observed that it seems like people who set up notifications a long time ago are receiving them, while people who set up notifications recently tend not to be receiving them.
So I think there are two general types of possible reasons: either (1) some call/text requests passed to Twilio are getting lost before Twilio attempts them, or (2) there is something funny in the software that the gracedb server is using to construct the list of contacts to call or text, leading it to omit some. (e.g., before I started looking into this, I had a hypothesis that a database query was being used to get the list of contacts and there was a maximum number of records returned by the query. But having looked at the code, that doesn't fit.)
I know you mentioned that logging is not working reliably on AWS; that's too bad, because from gracedb/alerts/phone.py I can see that every call/text attempt passed to Twilio is being logged. If you have a log file that you believe to be complete for some time that includes an event, I could compare it against the Twilio logs (which I have now exported into spreadsheets, cumulative since January).
There is a note here that "You can send messages to Twilio at a rapid rate as long as the requests do not reach Twilio's API concurrency limit which is at 100", but I don't THINK we would be running into that since call/text requests are made serially and I'm positive that Twilio is designed to queue requests and feed them out at the appropriate rate.
In terms of the software in the gracedb server, I spent some time studying the code in the gracedb/alerts directory, but it is complex enough that I can't trace it by inspection to check how it filters to get a list of matching notifications and then looks up the contact information from the notifications.
If you want a case to try debugging, look at Giacomo Ciani. His notification settings are:
~~~
Notifications
Once per year | Superevent created or updated & FAR < 3e-08 -> Text +393476487948, Email giacomo.ciani@unipd.it
Advocate request | Superevent labeled with ADVREQ -> Email giacomo.ciani@unipd.it, Call and text +393476487948
~~~
For S190720a he received several "A superevent with GraceDB ID S190720a was updated" text messages and emails (nobody received a "superevent created" message for that because the initial preferred event had too high a FAR), but did not receive any ADVREQ label messages for either S190720a or S190718y, either by text or by voice call. So it seems that the first of his two notifications was acted on but the second was not.
## Email on July 26, 2019:
I've taken some more time to digest the input I've received (and collected notes in a Google doc: https://docs.google.com/document/d/1QzDS-JWxi2EAXgYP64sKJaNde29x7MN7v0JtA5IGxl8/edit). Here are my high-level findings:
* For some people, all of their notifications are working.
* Working or not working seems to be associated with specific "lines" in a user's notifications configuration. For some people, SOME of their notification lines are working while others are not. For instance, Jenne Driggers has four notification lines configured, but only the first of them is working (i.e. generating text messages logged by Twilio); the other three lines are having no effect. Looking back through the logs, it seems that has been the case since she created the first two lines in early April, and added two more lines around July 18 or 19: her first notification line (which is interesting because it includes the NS candidate condition) has been working reliably, while none of the other lines has produced any notification through Twilio. Similarly, for Giacomo Ciani and Andrea Miani, their first notification line has been working while their second has not. Marco Bazzan's SECOND line has been working while his first has not.
* At least one user -- Daniel Sigg -- has two notification lines and neither is working. Daniel has in the past received phone calls through Twilio (on July 1 and 6), but he updated his alerts configuration and has not gotten any notifications since July 6.
* Giacomo Ciani, whom I mentioned above, added another line to his configuration today with voice call notifications, and it worked.
So my best picture of this is that some notification lines work and others don't. I can't tell what determines which ones work and which ones don't. It does seem to be the case that notification lines established a long time ago, or first in a person's list, are more likely to work; but that does not seem to be universal. And anyway, it's pretty clear to me that this is a GraceDB bookkeeping issue of some sort, not a problem with Twilio or individual users' phones or cell providers.
Oh, and for people who are not getting notifications according to their configuration, when they use the Test buttons to send test notifications, those work. (In most cases... Deep does not seem to be able to receive calls or text messages from Twilio on his phone.)O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/223Twilio SMS improvements2023-02-08T19:44:38ZAlexander PaceTwilio SMS improvementsThis ticket is intended as an information dump to make an informed decision about Twilio messaging from GraceDB in O4. Unfortunately, the SMS logs in Twilio's console only go back one year. So the fine-grained messaging logs for O3 are g...This ticket is intended as an information dump to make an informed decision about Twilio messaging from GraceDB in O4. Unfortunately, the SMS logs in Twilio's console only go back one year. So the fine-grained messaging logs for O3 are gone, but the billing rates are available. GraceDB writes a log message with a "`Texting...`" keyword, and those logs are gzipped and archived from O3. So it would be possible-- if need be-- to scrape a month's worth of logs to get the absolute number of messages sent and then extrapolate that to O4. But, you know, effort.
## Number of users and types of alerts
This is the result of digging around in the database to get some idea of the number of users and alert types that are live in GraceDB. Note that in the following nomenclature, a "Notification" object contains a unique set of parameters that dictates when and how an alert goes out to a user. Using @chad-hanna as an example (phone number redacted):
```
> chad=User.objects.get(username='chad.hanna@LIGO.org')
> Notification.objects.filter(user=chad)
<QuerySet [<Notification: chad.hanna@LIGO.ORG: Superevent created or updated & FAR < 1.92901235e-07 -> Call and text +1814XXXXXXX>, <Notification: chad.hanna@LIGO.ORG: Event created or updated & group=CBC & pipeline=gstlal & search=AllSky & FAR < 1.92901235e-07 -> Call and text +1814XXXXXXX>, <Notification: chad.hanna@LIGO.ORG: Event labeled with EM_COINC & any group & any pipeline & any search -> Call and text +1814XXXXXXX>]>
```
So in this example, he signed up for calls/texts for 1) low-far superevent creation, 2) low-far gstlal event uploads, and 3) event uploads with an EM_COINC label applied. In this scenario, there is one user, but three distinct "Notifications". That being said:
* Number of distinct notifications: 649
* Superevent notifications: 586
* Event notifications: 63
* Number of distinct users: 364
* Distinct users signed up for Event notifications: 60 (includes emails)
* Distinct users with Event call and/or sms notifications: 11
* Number of distinct Event call and/or sms notifications: 16
* Event notifications/user: 16/11=**1.5**
* Distinct users signed up for Superevent notifications: 345 (includes emails)
* Distinct users with Superevent call and/or sms notifications: 241
* Number of distinct Superevent call and/or sms notifications: 413
* Superevent notifications/user: 413/241=**1.7**
* Number of unique SMS notifications: 304
* Number of unique phone call notifications: 30
* Number of call+text notifications: 95
I'll dump more stats in here later on, but that should be a start. I think a first cut should be take the unique number of users for superevent/event notifications and scale that up by how much the collaboration has grown between O3--> O4. This assumes a constant percentage of collaboration members signed up for texts and calls. Then scale that by the expected increase in superevent/event rates in O4, and then use the call/sms per superevent/event per user to find out an expected messaging rate. For now that is left as an exercise for the reader.
## Batch processing of calls and SMS alerts
Getting this operation down to a single database query (which is totally doable), and then a single API call to twilio would save **loads** of time in generating alerts. Via the [documentation](https://support.twilio.com/hc/en-us/articles/223181548-Can-I-set-up-one-API-call-to-send-messages-to-a-list-of-people-):
> Each new SMS message from Twilio must be sent with a separate REST API request. To initiate messages to a list of recipients, you must make a request for each number to which you would like to send a message. The best way to do this is to build an array of the recipients and iterate through each phone number.
Weak.
## Prioritizing recipients
This is also a way to get messages out to the people who need them faster. At the end of O3 it was decided to pare down the number of G-Event recipients to a fixed list of pipeline and followup and control room recipients. I propose to formalize that list, and then have it as a community entity in LIGO's LDAP (similar to how `Communities:LVC:GraceDB:GraceDBAdvocates` exists for EM advocates). Then GraceDB will assign a priority to each group (so pipeline experts=2, em advocates=1, everyone else=0), then sort the list of SMS recipients via this priority and then start the messaging loop.
I'm open to other ideas, but this is a start of a strategy for defining twilio account messaging rates and prices.Critical Path O4 DevelopmentAlexander PaceAlexander Pace