Proposal to hide exposed hourly MDC superevents on production
Description: Moving into O4, I've been monitoring the load on the production database, and I noticed that the highest load on the database (over two OOM cpu usage over other requests) occur under a very specific circumstance: when an unauthenticated user makes a request to view public data products. An example would be, when a member of the public views a public superevent page, or a script scrapes for public skymaps, etc.
I traced this down to the SQL that's generated by a django-guardian
function called get_objects_for_user
. There has to be an underlying bug with GraceDB's public viewexposed
permission, but I haven't been able to find it yet.
That being said, there are a couple of stackoverflow posts and github issues about this function and this statement is accurate to me:
Also, if possible, i suggest you don't use get_objects_for_user shortcut when project gets bigger. Its VERY slow query once you get more objects/permissions in the database.
So why wasn't this an issue before? At the end of O3, there were 80 exposed (public) superevents. That's a trivial number of items from a database standpoint. But in the three years since O3 ended, the hourly first-two-years MDC uploads have been exposed to the public. Multiply 24 daily superevents by three years and all of a sudden....
In [11]: Superevent.objects.filter(is_exposed=True).filter(category='M').count()
Out[11]: 35354
There's over 35,000 exposed superevents and growing by the hour.
A quick test can be to open this file list: https://gracedb.ligo.org/superevents/S200316bj/files/
as an authenticated user (243ms):
and in incog (13.5s
Proposal:
- Unless there are objections, I'm going to hide exposed MDC uploads and see the performance impact.
- If it works, then I'm going to set up a tool to hide all (or a subset..?) of MDC superevents (which is a bandaid)
- Figure out what's wrong with the permissions, because finding the bug might have other wider-ranging performance implications
- Unless there is the desire to have the test uploads public, then modify GWCelery not to expose the test uploads. We can revisit this request based on the results of 1-3.