GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2023-07-17T15:07:39Zhttps://git.ligo.org/computing/gracedb/server/-/issues/211Add SciTokens support2023-07-17T15:07:39ZDuncan Macleodduncan.macleod@ligo.orgAdd SciTokens supportThe GraceDB server needs to accept a SciToken as an authorisation option.The GraceDB server needs to accept a SciToken as an authorisation option.O4 AdvanceDuncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/210Reducing queries by packing igwn-alert2022-03-18T01:03:58ZAlexander PaceReducing queries by packing igwn-alertAs discussed on low-latency call, December 8 2021. The purpose of this ticket is to solicit feedback and suggestions for what information to include in `igwn-alert` packets with the goal of reducing costly queries to GraceDB.
Relevant p...As discussed on low-latency call, December 8 2021. The purpose of this ticket is to solicit feedback and suggestions for what information to include in `igwn-alert` packets with the goal of reducing costly queries to GraceDB.
Relevant past commits:
* https://git.ligo.org/lscsoft/gracedb/-/commit/e52b0c2ea248efbbb221ed51c58d55a3e5c4a3de
* https://git.ligo.org/lscsoft/gracedb/-/commit/2402e914dd6afd28035f6d06086bd6519f8018a9
Current MR's:
* https://git.ligo.org/lscsoft/gracedb/-/merge_requests/52/O4 Infrastructure ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/47Adding KAGRA events to GraceDB2022-03-31T15:39:21ZTanner PrestegardAdding KAGRA events to GraceDBCreated by Alex on January 28, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5071)
The purpose of this ticket is to track the future development of the uploading and sharing protocol for events from KAGRA. The below po...Created by Alex on January 28, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5071)
The purpose of this ticket is to track the future development of the uploading and sharing protocol for events from KAGRA. The below points came as a result of a face-to-face conversation at the University of Tokyo on January 28, 2017. Open issues include:
1. ~~How to compartmentalize KAGRA events from LIGO events? Options include (but are not limited to):
Standing up a separate KAGRA GraceDB instance. KAGRA will use separate authentication to restrict access, but access to coincident LIGO events would be unavailable.~~
~~Using the existing GraceDB infrastructure, but restricting access to events to users with a separate KAGRA authentication, e.g., only KAGRA users can have access to events uploaded by KAGRA.~~
2. ~~Event data-exchange between LIGO and KAGRA users. Scenarios include events that are determined to be significant due to LIGO-KAGRA coincidence; what data about the event would be available to LIGO/KAGRA members?~~
3. ~~If and how to restrict KAGRA uploads in the age of public LIGO data? This is an open issue for VIRGO events as well.~~
4. Event data format. What will be the format of events uploaded to GraceDB? Would there need to be modification to GraceDB's data parser to accept KAGRA events?
Please add any relevant parties to this conversation, as needed. Relevant watchers for the ticket should be:
* Nobuyuki Kanda <kanda@sci.osaka-cu.ac.jp>
* Hideyuki Tagoshi <tagoshi@sci.osaka-cu.ac.jp>
---
**Updating this ticket, September 15 2021:**
As of today, KAGRA members have:
1. Access to GraceDB via X509 certificates (https://git.ligo.org/computing/helpdesk/-/issues/506)
2. Access to GraceDB via Shibboleth (https://git.ligo.org/lscsoft/gracedb/-/issues/186)
To my best understanding of the MOU and how it's implemented now, KAGRA members have equal upload and access privileges as LSC members. So the previous discussion regarding separation of GraceDB instances and restricting data access seems to be moot. That leaves the event data format.
This is just a matter of getting an example event upload that has entries for KAGRA's contribution. So, part of the `instruments` column, an additional `sngl_inspiral` table, etc. Once pipelines have an example event upload ("Sample Event?" in the table below), then I can upload and fix GraceDB's upload parser. The things I need to test are:
- Does the event file get ingested into GraceDB without error?
- Are the various event properties and and table entries parsed and input into the database? Are the KAGRA-relevent columns ingested? For example, does it recognize a `K1` instrument column; is KAGRA's `sngl_inspiral` table in the db?
- Are the tables legible and formatted correctly on the event's landing page? Is the data visible?
- Is the KAGRA data returned as part of the LVAlert and event `HTTPResponse` packet?
I started the table below to track the progress.
| Pipeline | Sample Event? | Upload Correctly? | Parse Correctly? | View Correctly? | LVAlert Contents | Link |
| --- | --- | --- | --- | --- | --- | --- |
|`CWB` | :x: | :x: | :x: | :x: | :x: | |
|`gstlal` | :white_check_mark: | :white_check_mark: | :white_check_mark:| :white_check_mark: | :white_check_mark: | [G153205](https://gracedb-test.ligo.org/events/G153205/view/) |
|`MBTAOnline` | :x: | :x: | :x: | :x: | :x: | |
|`oLIB` | :x: | :x: | :x: | :x: | :x: | |
|`pycbc` | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | [G153215](https://gracedb-test.ligo.org/events/G153215/view/) |
|`spiir` | :x: | :x: | :x: | :x: | :x: | |
Ahead of O4, I will also need a list of KAGRA members who will need to upload new events, and to what pipelines.O4 Infrastructure ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/256Plan for archiving MDC data at CIT2023-04-13T12:12:08ZAlexander PacePlan for archiving MDC data at CIT**Context:**
Before the MDCs started, it was a policy on `gracedb-playground` to remove events and the associated data after 21 days. After some pushback from the low-latency chairs, that operation was suspended, and the whole of the M...**Context:**
Before the MDCs started, it was a policy on `gracedb-playground` to remove events and the associated data after 21 days. After some pushback from the low-latency chairs, that operation was suspended, and the whole of the MDC remains to be archived on `gracedb-playground`, in the cloud. The costs of storage in Amazon EFS notwithstanding, this has been a useful exercise from a GraceDB development and optimization standpoint: having multiple users and pipelines interact with a database that is stuffed with test events (there are approximately 3x more events and superevents in `gracedb-playground` than in the production system) has been invaluable to identify and fix some fundamental low-level performance bottlenecks (see: https://git.ligo.org/computing/gracedb/server/-/issues/249, https://git.ligo.org/computing/gracedb/server/-/merge_requests/95, https://git.ligo.org/computing/gracedb/server/-/merge_requests/96).
That being said, in the past two weeks, I have received three private communications over email and mattermost (@roberto.depietri, @shaon.ghosh, @geoffrey.mo, @gaurav.waratkar) regarding bulk-data transfers of MDC data from AWS to CIT. In debugging and optimizing low-latency operations over the past months, I have observed other periods of increased download and query activity as well, where users are moving large numbers of files (O(1,000)-O(10,000)) from AWS to various user accounts and headnodes at CIT. These periods of activity correlate with the beginning of new rounds of MDC, as I suspect users are analyzing data from the previous round.
There hasn't been a clear definition of what constitutes "fair use" of resources; GraceDB is sort-of just there for the collaboration to use so no individual user is at "fault" in this situation. That being said, these ad hoc data transfers do affect the performance of low-latency operations, and results in redundant storage and network traffic at CIT.
**Action Required:**
I am requesting that the low-latency chairs who initially requested that MDC data be retained (again, a worthwhile effort) coordinate with the admins at CIT for a permanent and organized transfer and archive of MDC data from AWS to CIT. This would involve (and I'm thinking off the top of my head):
1) deciding on a namespace on where to store the data (other than random users' home directories)
2) deciding on a system and folder hierarchy (GraceDB uses its own system which is obtuse to someone not using the database)
3) communicating to users in the various working groups that the MDC data is locally-available on the LDG to use instead of making 10,000's of requests to the internet
When it comes time to do the actual transfer, I can coordinate with the CIT admins to open up a security group to directly mount the EFS partition at CIT for a bulk rsync, if need be. There might be a better idea, I dunno.
@roberto.depietri, @shaon.ghosh: as we move into O4 low-latency operations, please coordinate with @stuart.anderson and @philippe.grassia to get MDC data out of the cloud and onto an LDG resource. If anyone tagged on this ticket has other proposals, please chime in.O4 Prephttps://git.ligo.org/computing/gracedb/server/-/issues/66Load testing2022-08-03T18:59:50ZTanner PrestegardLoad testingCreated on February 5, 2018. Copied from redmine (https://bugs.ligo.org/redmine/issues/6087)
Starting a ticket to define a procedure for load testing. General overview:
* Collect/design some utilities for load testing and monitoring th...Created on February 5, 2018. Copied from redmine (https://bugs.ligo.org/redmine/issues/6087)
Starting a ticket to define a procedure for load testing. General overview:
* Collect/design some utilities for load testing and monitoring the server
* Define how we will evaluate the performance of the server
* Discuss procedure with CGCA admins and EM follow-up group and iterateO4 Prephttps://git.ligo.org/computing/gracedb/server/-/issues/19Improve server stability and performance2022-08-03T18:10:24ZTanner PrestegardImprove server stability and performanceStarted on October 12, 2017, copied from redmine (https://bugs.ligo.org/redmine/issues/5946)
We have a generic goal of attempting to improve GraceDB's stability and performance. Ideally, it should be able to handle a significant load (l...Started on October 12, 2017, copied from redmine (https://bugs.ligo.org/redmine/issues/5946)
We have a generic goal of attempting to improve GraceDB's stability and performance. Ideally, it should be able to handle a significant load (lots of automated processes triggering and querying after an event is identified) and provide reasonably fast performance (page loading, API queries, etc.). But we really need a more precise definition of what we want out of the server. A specific issue that we would like to rectify is the gateway timeout issue - it's been reduced, but not removed.
Some ideas of things we can do:
Significant profiling and rewriting of code - reduce memory footprint and number of database queries. We should use select_related and prefetch_related wherever we can.
Improve web UI performance - web pages shouldn't take as long to load, should cache files, etc.
Switch to PostgreSQL
Use gUnicorn with Apache as a reverse proxy - allows us to eliminate mod_wsgi plugin and hopefully boost performance
Possible issues:
We don't have a standard way of measuring performance. Note: unit tests might help with that.
We don't have a good way to imitate the production environment for load testing.O4 Prephttps://git.ligo.org/computing/gracedb/server/-/issues/291VersionedFile symlink inconsistency2023-05-04T19:38:30ZAlexander PaceVersionedFile symlink inconsistencyWhen a user uploads multiple files that have the same filename within an exceeeeedingly small time window, there's a chance that the [block of code](https://git.ligo.org/computing/gracedb/server/-/blob/d32071c941c905a13f043dbec16fa41d0fd...When a user uploads multiple files that have the same filename within an exceeeeedingly small time window, there's a chance that the [block of code](https://git.ligo.org/computing/gracedb/server/-/blob/d32071c941c905a13f043dbec16fa41d0fd9bfb4/gracedb/core/vfile.py#L102-110) that creates a symlinked version file can hit a race condition.
This happens pretty rarely, but whenever it does, it's always from gwcelery uploading multiple circular templates, which is a [known](https://git.ligo.org/emfollow/gwcelery/-/issues/480) [bug](https://git.ligo.org/emfollow/gwcelery/-/issues/616) that's being addressed.
That being said, examining the files in question in this superevent [S230504an](https://gracedb-playground.ligo.org/superevents/S230504an/view/):
![Screen_Shot_2023-05-04_at_3.31.12_PM](/uploads/50bfe26f2ba327d66f5969e70f0b4d38/Screen_Shot_2023-05-04_at_3.31.12_PM.png)
The file versioning seems to have worked like it should have? And the symlink seems to be pointing at the right file? But honestly it's difficult to tell when there are so many duplicates of the same file. So I don't know if the Error that Brian Moe raised in that routine is correct.... or if there was a brief moment in that superevent's timeline when the symlink was inconsistent with the intended file, or if that broken symlink was fixed the next time a new file came in, or if it's still broken and just pointing to the wrong file (which happens to be the same?).
Given that, and that it only occurs during the gwcelery bug that's going to get fixed, I'm kind of afraid to touch it without knowing what's really going on and having a good way to test it.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/289Proposal to hide exposed hourly MDC superevents on production2023-05-23T21:06:54ZAlexander PaceProposal to hide exposed hourly MDC superevents on production**Description:** Moving into O4, I've been monitoring the load on the production database, and I noticed that the highest load on the database (over two OOM cpu usage over other requests) occur under a very specific circumstance: when an...**Description:** Moving into O4, I've been monitoring the load on the production database, and I noticed that the highest load on the database (over two OOM cpu usage over other requests) occur under a very specific circumstance: when an _unauthenticated_ user makes a request to view _public_ data products. An example would be, when a member of the public views a public superevent page, or a script scrapes for public skymaps, etc.
I traced this down to the SQL that's generated by a `django-guardian` function called `get_objects_for_user`. There has to be an underlying bug with GraceDB's public `viewexposed` permission, but I haven't been able to find it yet.
That being said, there are a couple of [stackoverflow](https://stackoverflow.com/a/19444128) posts and github issues about this function and this statement is accurate to me:
> Also, if possible, i suggest you don't use get_objects_for_user shortcut when project gets bigger. Its VERY slow query once you get more objects/permissions in the database.
:arrow_up: that seems consistent with some [testing](https://git.ligo.org/computing/gracedb/server/-/issues/249#note_689232) that i've seen this week.
So why wasn't this an issue before? At the end of O3, there were 80 exposed (public) superevents. That's a trivial number of items from a database standpoint. But in the three years since O3 ended, the hourly first-two-years MDC uploads have been exposed to the public. Multiply 24 daily superevents by three years and all of a sudden....
```
In [11]: Superevent.objects.filter(is_exposed=True).filter(category='M').count()
Out[11]: 35354
```
There's over 35,000 exposed superevents and growing by the hour.
A quick test can be to open this file list: https://gracedb.ligo.org/superevents/S200316bj/files/
as an authenticated user (243ms):
![Screen_Shot_2023-05-03_at_11.37.54_AM](/uploads/538d6706b93b2e0b10593b03e72b5c0d/Screen_Shot_2023-05-03_at_11.37.54_AM.png)
and in incog (13.5s :sob:):
![Screen_Shot_2023-05-03_at_11.39.53_AM](/uploads/9d0d798392f395b6467e2a88670b50a9/Screen_Shot_2023-05-03_at_11.39.53_AM.png)
**Proposal:**
1) Unless there are objections, I'm going to hide exposed MDC uploads and see the performance impact.
2) If it works, then I'm going to set up a tool to hide all (or a subset..?) of MDC superevents (which is a bandaid)
3) Figure out what's wrong with the permissions, because finding the bug might have other wider-ranging performance implications
4) Unless there is the desire to have the test uploads public, then modify GWCelery not to expose the test uploads. We can revisit this request based on the results of 1-3.Critical Path O4 DevelopmentAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/278Ability to upload multiple files when creating an event2023-04-12T14:01:11ZTito Dal CantonAbility to upload multiple files when creating an event## Description of feature request
It would be useful to have the ability to upload multiple files when creating a G event.
## Use cases
CBC searches currently upload a LIGOLW XML file at event creation, followed by two JSON files con...## Description of feature request
It would be useful to have the ability to upload multiple files when creating a G event.
## Use cases
CBC searches currently upload a LIGOLW XML file at event creation, followed by two JSON files containing the EM-bright and p_astro information. At least some of the searches also upload a few other files, for example diagnostic plots.
## Benefits
The first benefit is convenience: GraceDB products are uploaded with a smaller number of REST/API calls, ideally just one, making the code simpler.
I suspect there would be two other benefits, though I do not know enough about the server code to judge if they are realistic or not:
* Robustness: reducing the number of HTTP requests might reduce the probability of a failure (e.g. due to a network glitch) and make the event creation more "atomic", in the sense of guaranteeing that if an event is created, it will have all the necessary files.
* Latency: currently each file upload adds order 1 s of latency, and occasionally much more. Transferring everything in a single request might help with that.
## Drawbacks
Apart from the obvious implementation burden, I cannot see any at the moment.
I suppose an alternative to an API change would be to design a file format which could actually communicate all the search information in a single file. There has been discussion in the past about storing the p_astro information in the LIGOLW file, for example, though that idea appears to have been shelved. Given how complicated it is to change established file formats, though, I think this feature request is still reasonable.
## Suggested solutions
I forget at the moment if HTTP requests support multiple files. If so, the feature seems easy to implement. Otherwise, one could come up with a simple data structure (e.g. JSON) that encodes the list of (file name, file content) pairs, and upload that as a file, though that may require some post-processing to "expand" the JSON back into the list of individual files on the server side.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/274Out of range float values are not JSON compliant2023-04-27T08:53:41ZAlexander PaceOut of range float values are not JSON compliantYesterday (April 5) and today (April 6) I got notified about json encoding errors for a couple of pycbc test events:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.p...Yesterday (April 5) and today (April 6) I got notified about json encoding errors for a couple of pycbc test events:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 204, in _get_response
response = response.render()
File "/usr/local/lib/python3.7/dist-packages/django/template/response.py", line 105, in render
self.content = self.rendered_content
File "/usr/local/lib/python3.7/dist-packages/rest_framework/response.py", line 70, in rendered_content
ret = renderer.render(self.data, accepted_media_type, context)
File "/usr/local/lib/python3.7/dist-packages/rest_framework/renderers.py", line 103, in render
allow_nan=not self.strict, separators=separators
File "/usr/local/lib/python3.7/dist-packages/rest_framework/utils/json.py", line 25, in dumps
return json.dumps(*args, **kwargs)
File "/usr/lib/python3.7/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/usr/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
Exception Type: ValueError at /api/events/
Exception Value: Out of range float values are not JSON compliant
Request information:
USER: prasia.p@ligo.org
GET: No GET data
POST:
search = 'AllSky'
pipeline = 'pycbc'
eventFile = 'text/xml'
group = 'Test'
offline = 'True'
```
An example of the event is here: https://gracedb-playground.ligo.org/events/T979324/view/
There are a couple of `nan`s in the coinc upload that are causing problems with json serialization.
I think gracedb is robust enough to catch the `nan` and then store it in the database, but when the event gets serialized to json for alerts and http responses, the error pops up. For example, if one were to look at the `api` view for that event, the user would get the error instead of a json serialization (plz don't).
Also it would cause errors in nagios/dashboard, because it tries to pull the json packet of the latest event, but if one of these is the latest event, then it gets an error instead.
@prasia.p @tito-cantonO4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/266Requests for reports page2023-04-04T07:57:03ZAlexander PaceRequests for reports pageA while back I set up the [beta reports page](https://gracedb-playground.ligo.org/reports/) on gracedb-playground that just showed the upload latency for all g-events uploaded to GraceDB over the last seven days. I had it up and running ...A while back I set up the [beta reports page](https://gracedb-playground.ligo.org/reports/) on gracedb-playground that just showed the upload latency for all g-events uploaded to GraceDB over the last seven days. I had it up and running and had thrown it out in various telecons, but never received feedback from it. I saw @rebecca.ewing using it during this morning's gstlal review call! I'm going to use this ticket to solicit requests for new features. I'm thinking:
- [ ] Variable date range (specify start and end times to make the query)
- [ ] Filter Early Warning events on/off or on a separate plot
- [ ] For superevents, plot time when `GCN_PRELIM_SENT` label applied less, the `t_0` of the superevent. Come up with a name for this parameter? `time_to_alert`?
- [ ] Plot `time_to_alert` vs number of g-events for a superevent? Something like mean and stddev.O4 Debugging and ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/265query results depend on order of inputs2023-03-15T20:48:51ZRebecca Ewingquery results depend on order of inputs## Description of problem
<!--
Describe in detail what you are trying to do and what the result is.
Exact timestamps, error tracebacks, and screenshots (if applicable) are very helpful.
-->
I want to make fairly complex queries to grace...## Description of problem
<!--
Describe in detail what you are trying to do and what the result is.
Exact timestamps, error tracebacks, and screenshots (if applicable) are very helpful.
-->
I want to make fairly complex queries to gracedb for example search for events with certain single inspiral attributes, specify the pipeline, search, creation time, and FAR. For example, the following query should return all gstlal AllSky injection uploads from MDC11 below a far threshold:
```
si.channel = "GDS-CALIB_STRAIN_O3Replay" | si.channel = "Hrec_hoft_16384Hz_O3Replay" pipeline: gstlal created: 2023-02-17 00:00:00 .. 2023-03-28 00:00:00 far < 1e-8 search: AllSky
```
But this query returns seemingly any gstlal upload regardless of FAR and single inspiral attributes. If I change the order, then it works as expected:
```
far <= 1e-8 & (si.channel = "GDS-CALIB_STRAIN_O3Replay" | si.channel = "Hrec_hoft_16384Hz_O3Replay") pipeline: gstlal search: AllSky created: 2023-02-17 00:00:00 .. 2023-03-28 00:00:00
```
## Expected behavior
<!-- What do you expect to happen instead? -->
I would expect these queries to be order independent. And when there are multiple inputs given (ie FAR, pipeline, single inspiral attributes) I would expect them to all implicitly be joined by "AND" instead of "OR".
## Steps to reproduce
<!-- Step-by-step procedure for reproducing the issue -->
Query with "wrong order": [here](https://gracedb-playground.ligo.org/search/?query=si.channel+%3D+%22GDS-CALIB_STRAIN_O3Replay%22+%7C+si.channel+%3D+%22Hrec_hoft_16384Hz_O3Replay%22+pipeline%3A+gstlal+created%3A+2023-02-17+00%3A00%3A00+..+2023-03-28+00%3A00%3A00+far+%3C+1e-8+search%3A+AllSky&query_type=E&results_format=S)
Query with "right order: [here](https://gracedb-playground.ligo.org/search/?query=far+%3C%3D+1e-8+%26+%28si.channel+%3D+%22GDS-CALIB_STRAIN_O3Replay%22+%7C+si.channel+%3D+%22Hrec_hoft_16384Hz_O3Replay%22%29++pipeline%3A+gstlal+search%3A+AllSky+created%3A+2023-02-17+00%3A00%3A00+..+2023-03-28+00%3A00%3A00+&query_type=E&results_format=S)
## Context/environment
<!--
Describe the environment you are working in:
* If using the ligo-gracedb client package, which version?
* Your operating system
* Your browser (web interface issues only)
* If you are experiencing this problem while working on a LIGO or Virgo computing cluster, which cluster are you using?
-->
## Suggested solutions
<!-- Any ideas for how to resolve this problem? -->
Even if it's not possible/easy to make the queries more flexible in terms of order of options, it would be nice if the "rules" were documented so that users can look up how to write queries to get the expected results.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/262Intermittent connection issues on gracedb-playground2023-05-09T17:51:26ZAlexander PaceIntermittent connection issues on gracedb-playground@rebecca.ewing reported some connection issues for the `gstlalcbc` user. The errors and timestamps are below:
```
“[Errno 111] Connection refused”
Feb 28 22:04 EST
“[Errno 110]”
March 4 20:31 PST
March 2 21:13 PST
March 2 20:53 PST
...@rebecca.ewing reported some connection issues for the `gstlalcbc` user. The errors and timestamps are below:
```
“[Errno 111] Connection refused”
Feb 28 22:04 EST
“[Errno 110]”
March 4 20:31 PST
March 2 21:13 PST
March 2 20:53 PST
Mar 2 20:04 PST
Mar 2 19:49 PST
Mar 2 18:42 PST
Mar 2 18:02 PST
Mar 2 16:42 PST
Mar 2 14:44 PST
Mar 2 13:56 PST
Mar 2 12:43 PST
Mar 2 13:56 PST
“HTTPSConnectionPool(host='gracedb-playground.ligo.org', port=443): Read timed out”
Mar 4 20:31 PST
Mar 2 13:56 PST
```O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/261Addition of Search tag for event uplaods from low-latency sub-solar mass sear...2023-03-15T16:13:15ZDivya SinghAddition of Search tag for event uplaods from low-latency sub-solar mass searches## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
GstLAL and MBTA will run low-latency sub-sola...## Description of feature request
<!--
Describe your feature request!
Is it a web interface change? Some underlying feature? An API resource?
The more detail you can provide, the better.
-->
GstLAL and MBTA will run low-latency sub-solar mass searches in O4 which require a new search tag to differentiate events uploaded by these searches from the full bandwidth events i.e. `AllSky`. We propose using a new tag `Search: SSM` which hasn't been used previously by any pipelines for past searches. Currrently, both pipelines are using `Search:LowMass` eg. [GstLAL uploads here](https://gracedb-test.ligo.org/search/?query=gstlal+far+%3C+1+created%3A+2023-03-06+12%3A30%3A00+..+2023-03-08+20%3A40%3A00&query_type=E&results_format=S).
## Use cases
<!-- List some specific cases where this feature will be useful -->
- Differentiate between events uploaded from AllSky searches and SSM searches.
- Apply different thresholds on the GWCelery/LL pipelines side to send out alerts based on the search tag alone.
## Benefits
<!-- Describe the benefits of adding this feature -->
- This will allow specifying different alerts threshold in the simplest way.
## Drawbacks
<!--
Are there any drawbacks to adding this feature?
Can you think of any ways in which this will negatively affect the service for any set of users?
-->
## Suggested solutions
<!-- Do you have any ideas for how to implement this feature? -->
We propose adding a new tag `Search: SSM` for uploads from the low-latency sub-solar mass searches on GraceDB.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/253Unique tag sets for inherited logs2023-02-14T18:39:34ZAlexander PaceUnique tag sets for inherited logsMy initial idea for tags for InheritedLogs was two have two sets of tags (the original `event.tags` set from the EventLog, and an additional `inheritedlog.superevent.tags` set) that are combined into one set that is queryable and filtera...My initial idea for tags for InheritedLogs was two have two sets of tags (the original `event.tags` set from the EventLog, and an additional `inheritedlog.superevent.tags` set) that are combined into one set that is queryable and filterable. Which is [easy enough](https://git.ligo.org/computing/gracedb/server/-/blob/fbc5e15e5564af51aad9506ca56b39d98a688129/gracedb/superevents/models.py#L587-589).
The problem arises from combining [`ManyToMany` sets](https://docs.djangoproject.com/en/3.2/ref/models/querysets/#values):
> Because ManyToManyField attributes and reverse relations can have multiple related rows, including these can have a multiplier effect on the size of your result set. This will be especially pronounced if you include multiple such fields in your values() query, in which case all possible combinations will be returned.
So in practice, on `gracedb-dev1` for a test InheritedLog:
```
In [23]: il
Out[23]: <InheritedLog: G414269 -> S230213b>
In [24]: il.source_event_log.tags.all()
Out[24]: <QuerySet [<Tag: Sky Localization>]>
In [25]: il.superevent_tags.all()
Out[25]: <QuerySet [<Tag: Public>]>
In [26]: combined_set = il.source_event_log.tags.all() | il.superevent_tags.all()
In [27]: combined_set.count()
Out[27]: 616735
In [28]: %time combined_set.count()
CPU times: user 0 ns, sys: 2.11 ms, total: 2.11 ms
Wall time: 1.35 s
Out[28]: 616735
```
By combining the querysets, the database is constructing a set of all the possible combinations of those tags, which is 600,000+ on dev1's small set of events and superevents, and it still takes over a second of wall time to count or construct a `.distinct()` set. I suspect on playground's massive database, it would absolutely destroy querying and rendering the superevent page.
I also tried the `*.union()` method to combine the tag sets, which is nearly instantaneous, but it [kills](https://stackoverflow.com/questions/50638442/django-queryset-union-return-broken-queryset-filter-and-get-return-every) the ability to `*.filter()`, or `*.get()` tags in the set... so that's a dealbreaker for querying and rendering the view.
So, I'm going to give up on adding new tags to InheritedLogs from the superevent (`InheritedLog.supervent_tags`) and ONLY have it inherit EventLog tags. We can revisit this if it becomes a dealbreaker, but it at least [looks like](https://gracedb-playground.ligo.org/events/G890530/view/) GWCelery is adding the `public` tag to the EventLog anyway, so it might all just work out.
![Screen_Shot_2023-02-14_at_10.58.03_AM](/uploads/42beee3be227fc8262ab5db942004265/Screen_Shot_2023-02-14_at_10.58.03_AM.png)
The `public` tag doesn't do anything on a g-event page, but I... think.... it might just work for exposing a superevent inherited log to the public.Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/250add ability to add multiple events to superevent in one call2023-02-08T21:36:55ZAlexander Paceadd ability to add multiple events to superevent in one callThe serializer will have to be modified around here https://git.ligo.org/computing/gracedb/server/-/blob/c6575e59e650449422dd65c87f4ffdcaa7bb4adb/gracedb/api/v1/superevents/serializers.py#L330-335 to accept `event` as a string (for backw...The serializer will have to be modified around here https://git.ligo.org/computing/gracedb/server/-/blob/c6575e59e650449422dd65c87f4ffdcaa7bb4adb/gracedb/api/v1/superevents/serializers.py#L330-335 to accept `event` as a string (for backwards compatibility), or a list and then loop over `add_event_to_superevent`, or alternatively modify `add_event_to_superevent` (see below).
Some considerations or questions that I don't have a good feel for yet:
1) Logging. If we were to just loop over `add_event_to_superevent`, then there would be a log message on the superevent for every event that gets added. There should still be a log message on each individual event, but maybe for superevents, there can be a "Added GXXX GYYY GZZZ" superevent.
2) Alerts. There are alerts that get sent out to event and superevent topics (https://gracedb-playground.ligo.org/documentation/igwn_alert.html#event-alerts) when events are added. The superevent alert (that contains the superevent packet) should be modified to show the events that were added. The question is event alerts. To remain consistent with the existing setup, there should be alerts for every event. Though if took out the event alert, it would make the response a lot faster and would make coding this up a lot easier. I wonder if any groups actually use that?
We should think about modifying `remove_event_from_superevent` as well (https://git.ligo.org/computing/gracedb/server/-/blob/c6575e59e650449422dd65c87f4ffdcaa7bb4adb/gracedb/api/v1/superevents/views.py#L171-174)Critical Path O4 DevelopmentDuncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/249figure out why event queries are so convoluted2023-07-17T20:02:33ZAlexander Pacefigure out why event queries are so convolutedthere's something going on with how gracedb handles event searches, in particular when there are bulk searches with lots of results. so, for example if a user searches for all events for a given pipeline during an MDC period.
Example: ...there's something going on with how gracedb handles event searches, in particular when there are bulk searches with lots of results. so, for example if a user searches for all events for a given pipeline during an MDC period.
Example: There's [this line](https://git.ligo.org/computing/gracedb/server/-/blob/master/gracedb/api/v1/events/views.py#L404) that gets called when a user does an event query. GraceDB by default returns event results in batches of 10, and so in addition to pulling results from the database, it does that `count()` every time it collects a batch of 10 events.
That `count()` for a sample query gets translated into the following SQL:
```
SELECT COUNT(*) FROM (SELECT DISTINCT "events_event"."id" AS Col1, "events_event"."submitter_id" AS Col2, "events_event"."created" AS Col3, "events_event"."group_id" AS Col4, "events_event"."superevent_id" AS Col5, "events_event"."pipeline_preferred_id" AS Col6, "events_event"."pipeline_id" AS Col7, "events_event"."search_id" AS Col8, "events_event"."instruments" AS Col9, "events_event"."nevents" AS Col10, "events_event"."far" AS Col11, "events_event"."likelihood" AS Col12, "events_event"."gpstime" AS Col13, "events_event"."perms" AS Col14, "events_event"."offline" AS Col15, "events_event"."graceid" AS Col16, "events_event"."reporting_latency" AS Col17 FROM "events_event" INNER JOIN "events_group" ON ("events_event"."group_id" = "events_group"."id") INNER JOIN "events_pipeline" ON ("events_event"."pipeline_id" = "events_pipeline"."id") LEFT OUTER JOIN "events_search" ON ("events_event"."search_id" = "events_search"."id") WHERE ("events_group"."name" IN ('CBC') AND NOT ("events_group"."name" = 'Test') AND "events_pipeline"."name" IN ('pycbc') AND NOT ("events_search"."name" = 'MDC' AND "events_search"."name" IS NOT NULL) AND ("events_event"."id" IN (SELECT CAST(U0."object_pk" AS bigint) AS "obj_pk" FROM "guardian_userobjectpermission" U0 INNER JOIN "auth_permission" U2 ON (U0."permission_id" = U2."id") WHERE (U0."user_id" = 3901 AND U2."content_type_id" = 3 AND U2."codename" IN ('view_event'))) OR "events_event"."id" IN (SELECT CAST(U0."object_pk" AS bigint) AS "obj_pk" FROM "guardian_groupobjectpermission" U0 INNER JOIN "auth_group" U1 ON (U0."group_id" = U1."id") INNER JOIN "auth_user_groups" U2 ON (U1."id" = U2."group_id") INNER JOIN "auth_permission" U4 ON (U0."permission_id" = U4."id") WHERE (U2."user_id" = 3901 AND U4."codename" IN ('view_event') AND U4."content_type_id" = 3))))) subquery
```
Which on gracedb-playground, takes 1682.343ms to do, which is way long to begin with. Further, since it's doing it once for every 10 events, in this scenario where were were 80,000 events in the query, that's 80,000/10 = 8000 counts, and at 1.7 seconds per, that's like 13,600 seconds where the database is needless work and the user is just sitting there. Crazy.
So, I would start by:
1) Figure out why the ORM is turning a simple query (ref https://git.ligo.org/computing/gracedb/server/-/blob/8dcbbbfeff28ad195b8bf6128aec726d971ef227/gracedb/api/v1/events/views.py#L404) into that that monstrosity. I've attached as a file an example of what it looks like. [D26C0A10006C1BF220AA6B90D05B0611391D9431.txt](/uploads/63c4c00ad03ede10d07aaf4246b770c4/D26C0A10006C1BF220AA6B90D05B0611391D9431.txt)
2) Figure out why that `count()` takes so long
3) Reverse engineer the query response to see if we can move that `count()` outside of the iteration loop so it only does it once, stores the value, and then loops over the batches of 10 events.
I'm hoping that reverse engineering the `count()` will elucidate why the event query ends up being so taxing to the database.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/244drop the banhammer on rogue processes2023-06-07T14:18:09ZAlexander Pacedrop the banhammer on rogue processesI (@alexander.pace) was trawling through production GraceDB's logs today (22-11-02) to sanity check that nothing was up with yesterday's deployment of the latest server code (https://git.ligo.org/computing/sccb/-/issues/1005), when i not...I (@alexander.pace) was trawling through production GraceDB's logs today (22-11-02) to sanity check that nothing was up with yesterday's deployment of the latest server code (https://git.ligo.org/computing/sccb/-/issues/1005), when i noticed a lot of traffic mostly performing `GET`s on (seemingly?) random `api/superevent/` paths. Okay? For example:
```
gracedb-swarm-production-us-west-2a-docker-mgr-01.log:Nov 2 00:00:04 gracedb-swarm-production-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.3.0wxlfddskqz0sxdcvafywkpv7: GUNICORN | 134.79.120.214 - - [02/Nov/2022:00:00:04 +0000] "GET /superevents/SIMS190408an_0p4_128/view/ HTTP/1.1" 404 5775 "-" "Python-urllib/2.7"
gracedb-swarm-production-us-west-2a-docker-mgr-01.log:Nov 2 00:00:05 gracedb-swarm-production-us-west-2a-docker-mgr-01 gracedb_docker_gracedb_gracedb.3.0wxlfddskqz0sxdcvafywkpv7: GUNICORN | 134.79.120.214 - - [02/Nov/2022:00:00:05 +0000] "GET /superevents/SIMS190408anC0p9N128/view/ HTTP/1.1" 404 5775 "-" "Python-urllib/2.7"
...
...
```
They were all `404`ing like they should, but it was a LOT of requests. For example, today, there were **15078** requests coming from the `134.79.120.*` subnet alone before I put the kibosh on that (more on that). Yesterday there were 18594 `GET`s. I say from that subnet because I saw requests coming from `134.79.120.214`, `134.79.120.195`, `134.79.120.165`...
I `traceroute`'ed the IPs back this group at Stanford (https://www6.slac.stanford.edu/).
I saw similar 404'ed `GET`s from a computer in Tokyo (`133.40.62.22`) that was trying to get files with wget?
```
gracedb-swarm-production-us-west-2c-docker-mgr-01.log:Nov 2 19:10:25 gracedb-swarm-production-us-west-2c-docker-mgr-01 gracedb_docker_gracedb_gracedb.1.j9sj8bcpdvddfn4g0ss05kq6e: GUNICORN | 133.40.62.22 - - [02/Nov/2022:19:10:25 +0000] "GET /apiweb/superevents/IC136985_60401984/files/bayestar.fits.gz HTTP/1.1" 404 23 "-" "Wget/1.13.4 (linux-gnu)"
gracedb-swarm-production-us-west-2c-docker-mgr-01.log:Nov 2 19:10:28 gracedb-swarm-production-us-west-2c-docker-mgr-01 gracedb_docker_gracedb_gracedb.1.j9sj8bcpdvddfn4g0ss05kq6e: GUNICORN | 133.40.62.22 - - [02/Nov/2022:19:10:28 +0000] "GET /api/superevents/IC137019_70165712/files/p_astro.json HTTP/1.1" 404 23 "-" "Wget/1.13.4 (linux-gnu)"
gracedb-swarm-production-us-west-2c-docker-mgr-01.log:Nov 2 19:10:29 gracedb-swarm-production-us-west-2c-docker-mgr-01 gracedb_docker_gracedb_gracedb.1.j9sj8bcpdvddfn4g0ss05kq6e: GUNICORN | 133.40.62.22 - - [02/Nov/2022:19:10:29 +0000] "GET /apiweb/superevents/IC137019_70165712/files/bayestar.fits.gz HTTP/1.1" 404 23 "-" "Wget/1.13.4 (linux-gnu)"
gracedb-swarm-production-us-west-2c-docker-mgr-01.log:Nov 2 19:10:30 gracedb-swarm-production-us-west-2c-docker-mgr-01 gracedb_docker_gracedb_gracedb.1.j9sj8bcpdvddfn4g0ss05kq6e: GUNICORN | 133.40.62.22 - - [02/Nov/2022:19:10:30 +0000] "GET /api/superevents/IC137065_22012496/files/p_astro.json HTTP/1.1" 404 23 "-" "Wget/1.13.4 (linux-gnu)"
gracedb-swarm-production-us-west-2c-docker-mgr-01.log:Nov 2 19:10:31 gracedb-swarm-production-us-west-2c-docker-mgr-01 gracedb_docker_gracedb_gracedb.1.j9sj8bcpdvddfn4g0ss05kq6e: GUNICORN | 133.40.62.22 - - [02/Nov/2022:19:10:31 +0000] "GET /apiweb/superevents/IC137065_22012496/files/bayestar.fits.gz HTTP/1.1" 404 23 "-" "Wget/1.13.4 (linux-gnu)"
```
They were all `404`'ed, but I'm concerned about the increased traffic especially when we go into observation. So, I made the executive decision to block traffic from the offending IPs/ranges. And if and when people start to complain, then we can push on a technical justification of what they were doing. And this doesn't apply to all robot processes of course. There are plenty of queries from IPs originating from caltech that are using the real client code, so those are obviously legit. But this ticket will be used to track which sources have been blocked from inbound traffic into gracedb's VPC.
| Date Blocked | IP Ranges | Reason | Status |
| ------ | ------ | ------ | ------ |
| 2022-11-02 | 134.79.120.0/24 | Excessive (15,000+/day) `GET`s | |
| 2022-11-02 | 133.40.62.22/32 | Excessive (10,000+/day) `GET`s | [Lifted](https://git.ligo.org/computing/helpdesk/-/issues/3943) 23/05/12|O4 Debugging and ImprovementsAlexander PaceAlexander Pacehttps://git.ligo.org/computing/gracedb/server/-/issues/242Revamp HardwareInjection event uploads.2023-02-08T19:49:48ZAlexander PaceRevamp HardwareInjection event uploads.This is to track work to bring back HardwareInjection events.
TODO:
- [x] provide sample json (?) upload
- [x] make data model
- [x] validate uploads
- [x] create page view
- [ ] determine what scenarios and alert contents should be
-...This is to track work to bring back HardwareInjection events.
TODO:
- [x] provide sample json (?) upload
- [x] make data model
- [x] validate uploads
- [x] create page view
- [ ] determine what scenarios and alert contents should be
- [ ] ????Critical Path O4 Developmenthttps://git.ligo.org/computing/gracedb/server/-/issues/240Generate railroad diagrams for query parsing language2023-02-08T16:51:58ZDaniel WysockiGenerate railroad diagrams for query parsing language`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to hav...`pyparsing>=3.0.0` introduces the ability to generate ["railroad diagrams"](https://pyparsing-docs.readthedocs.io/en/latest/whats_new_in_3_0_0.html#id4), which are a concise way of visualizing a language. These would be very nice to have for our documentation, but more importantly would be helpful for making improvements to the query language without breaking anything.O4 Debugging and ImprovementsDaniel WysockiDaniel Wysocki