GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2022-08-04T01:27:26Zhttps://git.ligo.org/computing/gracedb/server/-/issues/126502 error for PUT requests with "large" message body2022-08-04T01:27:26ZTanner Prestegard502 error for PUT requests with "large" message bodyI can consistently produce a 502 Bad Gateway error by trying to update an event with a "large" data file. If I use a ~2 MB file it has a 100% failure rate, if I use something like 100 KB it will fail maybe 10% of the time. It seems to ...I can consistently produce a 502 Bad Gateway error by trying to update an event with a "large" data file. If I use a ~2 MB file it has a 100% failure rate, if I use something like 100 KB it will fail maybe 10% of the time. It seems to occur both on the production server and on a dev server, so it is not related to the difference in deployments.
I've monitored the gunicorn logs and the request never makes it to gunicorn when this happens. Looking in the Apache logs, I see the following error being produced by these requests:
```
[Wed Mar 27 09:56:54.238025 2019] [proxy:error] [pid 18753:tid 140701048239872] (104)Connection reset by peer: [client 75.86.138.174:37010] AH01084: pass request body failed to 127.0.0.1:8080 (localhost)
[Wed Mar 27 09:56:54.238106 2019] [proxy_http:error] [pid 18753:tid 140701048239872] [client 75.86.138.174:37010] AH01097: pass request body failed to 127.0.0.1:8080 (localhost) from 75.86.138.174 ()
```
To reproduce:
```
from ligo.gracedb.rest import GraceDb
g = GraceDb('https://gracedb-dev2.ligo.org/api/')
g.replaceEvent('T0497', './ligo/gracedb/test/integration/data/big.data')
```
I tried uploading the same file attached to a log message and it worked fine. The difference between the "replace event" request and the log upload is PUT vs POST.https://git.ligo.org/computing/gracedb/server/-/issues/82Bad encoding of file download URLs2022-08-03T19:03:05ZTanner PrestegardBad encoding of file download URLsLinks on event detail pages (in the log messages) (probably for superevents, too) with special characters in them like '#' don't work properly. They are OK on the event file list page (probably because the `url` template tag encodes the...Links on event detail pages (in the log messages) (probably for superevents, too) with special characters in them like '#' don't work properly. They are OK on the event file list page (probably because the `url` template tag encodes them properly).
We can use probably use `encodeURLComponent()` in Javascript to fix the issue in the log messages. We should check on the file lists for superevents too. And we will have to patch the client to encode the URL properly in the `files()` method.https://git.ligo.org/computing/gracedb/server/-/issues/69IntegrityError for control room middleware2019-04-22T18:28:07ZTanner PrestegardIntegrityError for control room middlewareThe middleware that adds/removes users from the control room group is throwing IntegrityErrors suddenly. I just noticed it today when trying to test web signoffs on gracedb-dev2, and later, I got notifications when a user was trying to ...The middleware that adds/removes users from the control room group is throwing IntegrityErrors suddenly. I just noticed it today when trying to test web signoffs on gracedb-dev2, and later, I got notifications when a user was trying to GET a file on gracedb-playground. I noted that the user's REMOTE_ADDR corresponded to the H1 control room.
Potentially, this could be fixed by the new auth system which is in development on the auth_update branch, but we would have to test it extensively to be sure. I don't think there is a strong need to fix it before we merge that branch into master, since it seems to occur so rarely and moving to the new branch should happen in the near future.[error_email.log](/uploads/b8b2ceb7c413139a272ac24828cd410b/error_email.log)https://git.ligo.org/computing/gracedb/server/-/issues/31Intermitter server gateway timeouts2018-08-20T14:29:09ZTanner PrestegardIntermitter server gateway timeoutsCreated on April 5, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5418)
Several follow-up processes (approval processor, event supervisor, probably others) and one search pipeline (cWB) have reported receiving 504 gat...Created on April 5, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5418)
Several follow-up processes (approval processor, event supervisor, probably others) and one search pipeline (cWB) have reported receiving 504 gateway timeout errors when attempting to write log messages to GraceDB. Peter S reports that it happens for approval processor about once per day. It seems as though the log is still written, but the correct response is not sent, as the connection hangs for 2 minutes, then terminates.
The server also accumulates lingering threads owned by the wsgi_daemon user over time. It's not clear if these two issues are related.
I upgraded the production server to mod_wsgi 4.5.11 on 28 Mar 2017 in the hopes that it would take care of the lingering threads (tests on the development servers indicated that it cleared up lingering threads caused by overloading the server with log write processes), but it hasn't.