GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2024-03-21T00:12:39Zhttps://git.ligo.org/computing/gracedb/server/-/issues/343Missing `else` clause in `check_and_serve_file`2024-03-21T00:12:39ZDaniel WysockiMissing `else` clause in `check_and_serve_file`[Sentry reported an error](https://ligo-caltech.sentry.io/issues/5081076463/?alert_rule_id=710526&alert_timestamp=1710797810690&alert_type=email&environment=test&notification_uuid=b5bfedba-abe7-4d31-b25f-280fa8935ba7&project=1456379&refe...[Sentry reported an error](https://ligo-caltech.sentry.io/issues/5081076463/?alert_rule_id=710526&alert_timestamp=1710797810690&alert_type=email&environment=test¬ification_uuid=b5bfedba-abe7-4d31-b25f-280fa8935ba7&project=1456379&referrer=alert_email) in [`core.http.check_and_serve_file`](https://git.ligo.org/computing/gracedb/server/-/blob/bebc24500045d00fc74ba56818bd9b34e184c310/gracedb/core/http.py#L62). The issue is that there's no `else` clause, and `response` is undefined. I don't know of a way to get more details behind this _instance_ of the error, however, there's only one possibility I see triggering this: `file_path` refers to a directory.
This would be solved by adding a simple `else` clause to catch all possible remaining errors, though we should probably identify exactly what happened here and see if it needs special treatment. Why did a user try accessing a file that was actually directory, assuming my assessment is correct?Daniel WysockiDaniel Wysockihttps://git.ligo.org/computing/gracedb/server/-/issues/296All links/versions of `initial.data` external VOEvent file point to original ...2023-05-17T11:46:48ZBrandon PiotrzkowskiAll links/versions of `initial.data` external VOEvent file point to original file## Description of problem
Currently all links of the VOEvent files `initial.data` point to the original file, regardless of which file or log being referenced
![Screenshot_2023-05-10_at_2.53.08_PM](/uploads/43c17fc38586596217bef9c956c78...## Description of problem
Currently all links of the VOEvent files `initial.data` point to the original file, regardless of which file or log being referenced
![Screenshot_2023-05-10_at_2.53.08_PM](/uploads/43c17fc38586596217bef9c956c7809a/Screenshot_2023-05-10_at_2.53.08_PM.png)
From: https://gracedb.ligo.org/events/E406825/files/
This prevents us from checking and downloading any of the files, except for the first one.
## Expected behavior
We expect these to be versioned properly as all other files are
## Steps to reproduce
Upload external event via `create_event` and then append using `replace_event`.
## Context/environment
All instances of gracedb/gwcelery
## Suggested solutions
Keep track of versions and fix links to use the correct version.O4 Debugging and Improvementshttps://git.ligo.org/computing/gracedb/server/-/issues/236External events appearing in superevent neighbors `gw_events`2022-09-22T15:23:29ZBrandon PiotrzkowskiExternal events appearing in superevent neighbors `gw_events`First noted by Deep Chatterjee, within the `superevent_neighbors` field external events can appear as gw events. This can potentially disrupt the logic used in the superevent manager.
Summary of https://gracedb-test.ligo.org/api/superev...First noted by Deep Chatterjee, within the `superevent_neighbors` field external events can appear as gw events. This can potentially disrupt the logic used in the superevent manager.
Summary of https://gracedb-test.ligo.org/api/superevents/MS220920p/ , where `M304272` appears as an external event in the superevent (and not included in `gw_events`) but under `superevent_neighbors` appears as a `gw_event`:
```
{
"superevent_id": "MS220920p",
"gw_events": [
"M304283",
"M304282",
"M304281",
"M304280",
"M304279",
"M304278",
"M304277",
"M304276",
"M304275",
"M304274",
"M304273",
"M304271",
"M304270",
"M304269",
"M304268",
"M304267"
],
"em_events": [
"M304272"
],
"preferred_event_data": {
"superevent": "MS220920p",
"superevent_neighbours": {
"MS220920p": {
"superevent_id": "MS220920p",
"gw_events": [
"M304283",
"M304282",
"M304281",
"M304280",
"M304279",
"M304278",
"M304277",
"M304276",
"M304275",
"M304274",
"M304273",
"M304272",
"M304271",
"M304270",
"M304269",
"M304268",
"M304267"
]
}
}
}https://git.ligo.org/computing/gracedb/server/-/issues/147Intermittent 502 'Bad Gateway' error2022-03-22T18:16:03ZTanner PrestegardIntermittent 502 'Bad Gateway' errorFrom @patrick\-brady:
> Hi Tanner:
>
> I've started going through the sentry logs for gwcelery. I did find a a number of cases of the 502 bad gateway error. This does not appear to be coming from any single api request. Here are links to...From @patrick\-brady:
> Hi Tanner:
>
> I've started going through the sentry logs for gwcelery. I did find a a number of cases of the 502 bad gateway error. This does not appear to be coming from any single api request. Here are links to 4 different sentry messages:
>
> https://sentry.io/organizations/ligo-caltech/issues/994229585/events/ac120124b2aa41e6aa5280fceafb403b/
>
> https://sentry.io/organizations/ligo-caltech/issues/993796824/events/c337068a03f246f1bbbd8bc4149a6256/
>
> https://sentry.io/organizations/ligo-caltech/issues/998916934/events/567af335c4f248d7bb1cfd78a33017e8/
>
> https://sentry.io/organizations/ligo-caltech/issues/992306270/events/bb4f0fd48cde40719383c24174fd1189/
>
> I spent about half an hour looking around for information related to 502 errors on AWS and found the following: https://blog.transposit.com/the-mysterious-case-of-the-bad-gateway-502-b9f370207d87
> This person tracked their errors to use of dropwizard (https://www.dropwizard.io/1.3.5/docs/). I have no idea if that is part of the stack use on ALB for gracedb, but it's what it did that may give hints on addressing the 502 errors.
> I know this is not your expertise and that Tom has extremely limited time left, but I thought I'd put this to you and see if you can make anything of it.
>
> Cheers,
> Patrickhttps://git.ligo.org/computing/gracedb/server/-/issues/134File upload issues with 'gevent' worker class and more than two workers2022-08-04T01:31:48ZTanner PrestegardFile upload issues with 'gevent' worker class and more than two workersUpgrading to gracedb-2.4.1 today failed on gracedb-playground and gracedb due to issues with the asynchronous `gevent` worker class. It seemed to fail on attempts to issue alerts for file uploads. This problem was not detected in devel...Upgrading to gracedb-2.4.1 today failed on gracedb-playground and gracedb due to issues with the asynchronous `gevent` worker class. It seemed to fail on attempts to issue alerts for file uploads. This problem was not detected in development or testing so it seems to only happen when more than two workers are present. For now, we will stick with the `sync` worker class.
Some StackOverflow posts indicated we may need to turn off the `sendfile` setting, although it's not clear why.
See attached error email for more details.[_Django__ERROR__EXTERNAL_IP___Internal_Server_Error___api_events_G15070_log_.eml](/uploads/b8cac15345ed110e0e426c611a5b9c9e/_Django__ERROR__EXTERNAL_IP___Internal_Server_Error___api_events_G15070_log_.eml)https://git.ligo.org/computing/gracedb/server/-/issues/126502 error for PUT requests with "large" message body2022-08-04T01:27:26ZTanner Prestegard502 error for PUT requests with "large" message bodyI can consistently produce a 502 Bad Gateway error by trying to update an event with a "large" data file. If I use a ~2 MB file it has a 100% failure rate, if I use something like 100 KB it will fail maybe 10% of the time. It seems to ...I can consistently produce a 502 Bad Gateway error by trying to update an event with a "large" data file. If I use a ~2 MB file it has a 100% failure rate, if I use something like 100 KB it will fail maybe 10% of the time. It seems to occur both on the production server and on a dev server, so it is not related to the difference in deployments.
I've monitored the gunicorn logs and the request never makes it to gunicorn when this happens. Looking in the Apache logs, I see the following error being produced by these requests:
```
[Wed Mar 27 09:56:54.238025 2019] [proxy:error] [pid 18753:tid 140701048239872] (104)Connection reset by peer: [client 75.86.138.174:37010] AH01084: pass request body failed to 127.0.0.1:8080 (localhost)
[Wed Mar 27 09:56:54.238106 2019] [proxy_http:error] [pid 18753:tid 140701048239872] [client 75.86.138.174:37010] AH01097: pass request body failed to 127.0.0.1:8080 (localhost) from 75.86.138.174 ()
```
To reproduce:
```
from ligo.gracedb.rest import GraceDb
g = GraceDb('https://gracedb-dev2.ligo.org/api/')
g.replaceEvent('T0497', './ligo/gracedb/test/integration/data/big.data')
```
I tried uploading the same file attached to a log message and it worked fine. The difference between the "replace event" request and the log upload is PUT vs POST.https://git.ligo.org/computing/gracedb/server/-/issues/119SSLError for some clients2019-03-27T15:03:23ZTanner PrestegardSSLError for some clientsGWCelery is reporting periodic `SSLError`s that have occurred when trying to interact with the GraceDB API using gracedb-client. A Sentry page which is tracking these issues is [here](https://emfollow.ligo.caltech.edu/sentry/gwcelery/iss...GWCelery is reporting periodic `SSLError`s that have occurred when trying to interact with the GraceDB API using gracedb-client. A Sentry page which is tracking these issues is [here](https://emfollow.ligo.caltech.edu/sentry/gwcelery/issues/305/).
The error looks like
```
SSLError
[SSL: SSL_HANDSHAKE_FAILURE] ssl handshake failure (_ssl.c:2217)
```https://git.ligo.org/computing/gracedb/server/-/issues/118SSLEOFError for some clients2019-04-22T16:10:31ZTanner PrestegardSSLEOFError for some clientsGWCelery is reporting periodic `SSLEOFError`s that have occurred when trying to interact with the GraceDB API using gracedb-client. A Sentry page which is tracking these issues is [here](https://emfollow.ligo.caltech.edu/sentry/gwcelery...GWCelery is reporting periodic `SSLEOFError`s that have occurred when trying to interact with the GraceDB API using gracedb-client. A Sentry page which is tracking these issues is [here](https://emfollow.ligo.caltech.edu/sentry/gwcelery/issues/299/events/latest/).
The error looks like
```
SSLEOFError
EOF occurred in violation of protocol (_ssl.c:777)
```https://git.ligo.org/computing/gracedb/server/-/issues/103Control room group issues2019-04-22T18:28:07ZTanner PrestegardControl room group issuesUsers seem to be getting left in the control room groups when the response cycle through the middleware should be removing them.Users seem to be getting left in the control room groups when the response cycle through the middleware should be removing them.https://git.ligo.org/computing/gracedb/server/-/issues/101Browser not rendering .xml.gz files correctly2022-08-03T19:06:39ZTanner PrestegardBrowser not rendering .xml.gz files correctlyThe content-type and encoding are set correctly in Django (application/xml and gzip), but those response headers do not seem to be making it back to the browser. Maybe there is some issue with them being passed through the Apache revers...The content-type and encoding are set correctly in Django (application/xml and gzip), but those response headers do not seem to be making it back to the browser. Maybe there is some issue with them being passed through the Apache reverse proxy?
Can debug some with Chrome developer tools and with Apache logs.
See an example here: ![encoding_error](/uploads/7d88026a3ecd9db97786b9c6d014fe09/encoding_error.png)https://git.ligo.org/computing/gracedb/server/-/issues/82Bad encoding of file download URLs2022-08-03T19:03:05ZTanner PrestegardBad encoding of file download URLsLinks on event detail pages (in the log messages) (probably for superevents, too) with special characters in them like '#' don't work properly. They are OK on the event file list page (probably because the `url` template tag encodes the...Links on event detail pages (in the log messages) (probably for superevents, too) with special characters in them like '#' don't work properly. They are OK on the event file list page (probably because the `url` template tag encodes them properly).
We can use probably use `encodeURLComponent()` in Javascript to fix the issue in the log messages. We should check on the file lists for superevents too. And we will have to patch the client to encode the URL properly in the `files()` method.https://git.ligo.org/computing/gracedb/server/-/issues/69IntegrityError for control room middleware2019-04-22T18:28:07ZTanner PrestegardIntegrityError for control room middlewareThe middleware that adds/removes users from the control room group is throwing IntegrityErrors suddenly. I just noticed it today when trying to test web signoffs on gracedb-dev2, and later, I got notifications when a user was trying to ...The middleware that adds/removes users from the control room group is throwing IntegrityErrors suddenly. I just noticed it today when trying to test web signoffs on gracedb-dev2, and later, I got notifications when a user was trying to GET a file on gracedb-playground. I noted that the user's REMOTE_ADDR corresponded to the H1 control room.
Potentially, this could be fixed by the new auth system which is in development on the auth_update branch, but we would have to test it extensively to be sure. I don't think there is a strong need to fix it before we merge that branch into master, since it seems to occur so rarely and moving to the new branch should happen in the near future.[error_email.log](/uploads/b8b2ceb7c413139a272ac24828cd410b/error_email.log)https://git.ligo.org/computing/gracedb/server/-/issues/31Intermitter server gateway timeouts2018-08-20T14:29:09ZTanner PrestegardIntermitter server gateway timeoutsCreated on April 5, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5418)
Several follow-up processes (approval processor, event supervisor, probably others) and one search pipeline (cWB) have reported receiving 504 gat...Created on April 5, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5418)
Several follow-up processes (approval processor, event supervisor, probably others) and one search pipeline (cWB) have reported receiving 504 gateway timeout errors when attempting to write log messages to GraceDB. Peter S reports that it happens for approval processor about once per day. It seems as though the log is still written, but the correct response is not sent, as the connection hangs for 2 minutes, then terminates.
The server also accumulates lingering threads owned by the wsgi_daemon user over time. It's not clear if these two issues are related.
I upgraded the production server to mod_wsgi 4.5.11 on 28 Mar 2017 in the hopes that it would take care of the lingering threads (tests on the development servers indicated that it cleared up lingering threads caused by overloading the server with log write processes), but it hasn't.https://git.ligo.org/computing/gracedb/server/-/issues/30Filenames truncated on upload2022-08-03T18:22:34ZTanner PrestegardFilenames truncated on uploadAdded by Chad Hanna on February 28, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5236)
Hi,
I am not exactly sure where things are going wrong, but I have found that I cannot upload a long filename without it being tr...Added by Chad Hanna on February 28, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5236)
Hi,
I am not exactly sure where things are going wrong, but I have found that I cannot upload a long filename without it being truncated. See e.g.,
https://gracedb.ligo.org/events/view/G275744
in the log entry where it tries to point to:
https://gracedb.ligo.org/apiweb/events/G275744/files/H1L1V1-GSTLAL_INSPIRAL_PLOTSUMMARY_ALL_LLOID_COMBINED_02_mchirp_acc_frac_scatter_SpinTaylorT4threePo,1
which doesn't exist.https://git.ligo.org/computing/gracedb/server/-/issues/29GraceDb.writeLog raises BadStatusLine error (when GraceDb is overwhelmed?)2019-04-16T22:01:13ZTanner PrestegardGraceDb.writeLog raises BadStatusLine error (when GraceDb is overwhelmed?)Created by Leo Singer on January 17, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5018)
I can very reliably get ligo.gracedb.rest.GraceDb.writeLog() to raise a BadStatusLine error by submitting several test events and...Created by Leo Singer on January 17, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5018)
I can very reliably get ligo.gracedb.rest.GraceDb.writeLog() to raise a BadStatusLine error by submitting several test events and running BAYESTAR on all of them simultaneously. At least one of the jobs will usually die with the traceback below.
This may be a random error, or it may be related to the high rate of GraceDb log message uploads that this entails. Note that in order to get a useful traceback you will first have to apply Merge Request !6. I suggest pointing the rest and cli interfaces to the gracedb-test server so as not to overwhelm the production server when running this code.
```
$ for x in {1..5}; do ((graceid=$(gracedb Test MBTAOnline AllSky coinc.xml); gracedb upload -t psd $graceid psd.xml.gz && bayestar_localize_lvalert $graceid)&); done
...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/logging.py", line 70, in _run
self._gracedb.writeLog(self._graceid, text)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 759, in writeLog
'displayName': displayName}, files=files)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 347, in post
return self.post_or_put("POST", *args, **kwargs)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 372, in post_or_put
return self.request(method, url, body, headers)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 480, in request
return GsiRest.request(self, method, *args, **kwargs)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 316, in request
response = self.get_response(conn)
File "/mnt/qfs3/lsinger/local/lib/python2.7/site-packages/ligo/gracedb/rest.py", line 265, in get_response
return conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
```
I'm guessing that this is a server-side issue and the django log will be more informative.https://git.ligo.org/computing/gracedb/server/-/issues/26Visualization issues with EMBB2019-04-01T14:32:47ZTanner PrestegardVisualization issues with EMBBStarted on April 21, 2016. Copied from redmine (https://bugs.ligo.org/redmine/issues/4053)
I (Alex) received a report from Peter Shawan (pshawhan@umd.edu) regarding visualization issues with the EM Bulletin Board:
> I see what you mean...Started on April 21, 2016. Copied from redmine (https://bugs.ligo.org/redmine/issues/4053)
I (Alex) received a report from Peter Shawan (pshawhan@umd.edu) regarding visualization issues with the EM Bulletin Board:
> I see what you mean about Skymap Viewer: it seems to display a different set of tiles when I select the GOTO box; whatever the previously displayed set was, I think. For instance, if I view the most recent (top) skymap from https://gracedb.ligo.org/events/view/G211117 and Show Bulletin Board, then de-select GOTO, de-select the last LOFAR-TKSP, and then select GOTO, it displays the tiles from that LOFAR-TKSP set. That's not the only problem, though. The selection box for the last EWE set actually displays the tiles from the middle LOFAR-TKSP set. My hunch is that this is a bug activated by the EWE entry with N_regions=0, but that is just a hunch.
I was able to reproduce the problem by following the steps below:
* Alternating the last two checkboxes in the bulletin board appear to turn on/off the same visualization layer. The color-coding for the layers is definitely incorrect, though I don't know (at first glance) which dataset is being rendered incorrectly.
* As of this posting, I have not observed the bug in other events so I need to determine the commonality in other events.
* Also, users have reported in the same email thread that visualization is slow to render.
I have uploaded the email thread to this ticket and will update it as new reports come in.[Re__GOTO_team_observations_with_SWASP_in_GraceDB.rtf](/uploads/9a517fab2354420f746f362a694883dc/Re__GOTO_team_observations_with_SWASP_in_GraceDB.rtf)https://git.ligo.org/computing/gracedb/server/-/issues/10Some floats cast as strings in LVAlerts and response JSONs2018-06-26T21:46:26ZTanner PrestegardSome floats cast as strings in LVAlerts and response JSONsAs reported in https://git.ligo.org/emfollow/gwcelery/merge_requests/94, there are situations in which certain float variables are being sent as strings in LVAlert JSONs and even in some response JSONs from gracedb-client.
I believe thi...As reported in https://git.ligo.org/emfollow/gwcelery/merge_requests/94, there are situations in which certain float variables are being sent as strings in LVAlert JSONs and even in some response JSONs from gracedb-client.
I believe this is happening because in some of the old code in the events API, the event is converted to a dictionary BEFORE it is saved in the database. Thus, some variables which are extracted from the event file (as strings) are still in memory as strings. If we did the save first, that would cast those variables to the correct type.
I will submit some test events for the active pipelines (gstlal, gstlal-spiir, pycbc, mbtaonline, cwb, lib, swift, fermi, and snews) and check the LVAlert content, as well as the response in gracedb-client.https://git.ligo.org/computing/gracedb/server/-/issues/141Sentry instance for AWS2019-05-16T17:15:46ZTanner PrestegardSentry instance for AWSWe need a better way of handling errors in the AWS gracedb instance. The issues with superevent directory creation for S190412m generated about 4000 error emails to each of the admins and disrupted my email traffic for days.We need a better way of handling errors in the AWS gracedb instance. The issues with superevent directory creation for S190412m generated about 4000 error emails to each of the admins and disrupted my email traffic for days.https://git.ligo.org/computing/gracedb/server/-/issues/117GWCelery issues with AWS implementation of GraceDB2019-03-08T20:53:50ZTanner PrestegardGWCelery issues with AWS implementation of GraceDBGWCelery has been seeing a few new errors since we moved the service to the AWS cloud:
* `SSLEOFError: Problem establishing secure connection: EOF occurred in violation of protocol (_ssl.c:777)`: [link](https://emfollow.ligo.caltech.edu...GWCelery has been seeing a few new errors since we moved the service to the AWS cloud:
* `SSLEOFError: Problem establishing secure connection: EOF occurred in violation of protocol (_ssl.c:777)`: [link](https://emfollow.ligo.caltech.edu/sentry/gwcelery/issues/299/?query=is:unresolved)
* `SSLError [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake failure (_ssl.c:2217)`: [link](https://emfollow.ligo.caltech.edu/sentry/gwcelery/issues/305/)
* `ConnectionResetError [Errno 104] Connection reset by peer`: [link](https://emfollow.ligo.caltech.edu/sentry/gwcelery/issues/306/)https://git.ligo.org/computing/gracedb/server/-/issues/341only send igwn-alerts to the {group}_{pipeline} topic2024-03-13T20:29:24ZAlexander Paceonly send igwn-alerts to the {group}_{pipeline} topicHistorically `igwn-alert` and `LVAlert` sends out g-event and e-event alerts to topics with the `{group}_{pipeline}` and `{group}_{pipeline}_{search}` schema. As more pipelines and searches are added, topic management is becoming a pain ...Historically `igwn-alert` and `LVAlert` sends out g-event and e-event alerts to topics with the `{group}_{pipeline}` and `{group}_{pipeline}_{search}` schema. As more pipelines and searches are added, topic management is becoming a pain across all the GraceDB tiers, especially with the lack of a scriptable API to interact with SCIMMA.
I'm proposing to change the way GraceDB issues alerts to only send to `{group}_{pipeline}` topics and have users filter on search, if need be. It would have the benefit of simplifying topic management, and also save some milliseconds in dispatching alerts. The alert contents would remain the same. Superevent topics would be unaffected by this change.
Putting out feelers to `igwn-alert` stakeholders... @deep.chatterjee @cody.messick @nicolas.arnaud @rebecca.ewing would that break your listening processes, if so, would adding an extra filter based on the search in the alert content be too much of a technical burden?O4bAlexander PaceAlexander Pace