It's been established here: #249 (comment 689232) that unauthorized queries. Context: there's one call coming from django-guardian called get_objects_for_user that takes in a user, a permission (like "view log"), and a list of objects, and it returns a subset of those objects that a user can actually see. Please see this ticket: #289
I'm going to document the process for making this call faster. I think it's going to be two steps:
Mitigation- reducing the number of objects that this function has to filter. Also see the above ticket.
Optimization- we very well might be calling this function sub-optimally. So after the first step, see what we might be doing wrong.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Okay first, establish a baseline. I'm doing this on the production box because there's lots of exposed objects. Let's look at the following scenario: the AnonymousUser and a normie LVK user (I can't use my own account since is_superuser bypasses much of django's permission structures). I'm going to volunteer @patrick.godwin's account as a guinea pig. I'm also going to restrict my permissions changes to MDC superevents so i don't interfere with any production data.
As such, I'm going to filter the log objects (and by proxy, files, comments, etc) for MS230519q and and time for AnonymousUser and patrick:
In [1]: from aws_xray_sdk.core import xray_recorder ...: xray_recorder.begin_segment("Migration Segment")Out[1]: <aws_xray_sdk.core.models.segment.Segment at 0x7fd0648075f8>In [2]: from superevents.models import Superevent, LogIn [3]: from django.contrib.auth.models import User, Group, PermissionIn [4]: an = User.objects.get(username='AnonymousUser')In [5]: pg = User.objects.get(username='patrick.godwin@ligo.org')In [6]: from guardian.shortcuts import get_objects_for_userIn [7]: ms = Superevent.get_by_date_id('MS230519q')In [8]: ms.log_set.count()Out[8]: 183
So that superevent has 183 log objects. Let's confirm that patrick can see all of them, and how long it takes. Note that viewing a log carries with it the superevents.view_log permission:
So we're confirming that it takes patrick around 6-9ms to view all 183 logs. Now let's try for the AnonymousUser. First, how many objects have the "public tag? Because the public_users view permission gets added when tag is added:
In [14]: ms.log_set.filter(tags__name='public').count()Out[14]: 27
Let's confirm that the anonymous user can fetch 27 objects, and how long it takes:
so lvk users should be part of the internal_users group, and anonymoususer should be part of the public_users group. Let's confirm, then look at their permissions:
In [18]: an.groups.all()Out[18]: <QuerySet [<Group: public_users>]>In [19]: pg.groups.all()Out[19]: <QuerySet [<Group: internal_users>]>In [20]: pu=an.groups.first()In [22]: iu=pg.groups.first()In [23]: iu.permissions.all()Out[23]: <QuerySet [<Permission: superevents | labelling | Can add labelling>, <Permission: superevents | labelling | Can delete labelling>, <Permission: superevents | log | Add tag to log>, <Permission: superevents | log | Remove tag from log>, <Permission: superevents | log | Can view log>, <Permission: superevents | signoff | Can view signoff>, <Permission: superevents | superevent | Can add test superevent>, <Permission: superevents | superevent | Can add log messages and EM observation data to uperevent>, <Permission: superevents | superevent | Can change test superevent>, <Permission: superevents | superevent | Can confirm test superevent as GW>, <Permission: superevents | superevent | Can view superevent>, <Permission: superevents | superevent group object permission | Can view superevent groupobjectpermission>, <Permission: superevents | vo event | Can add vo event>]>In [24]: pu.permissions.all()Out[24]: <QuerySet []>
huh, so the public_users group doesn't have any permissions associated with it. let's take that as granted to see what difference it makes. What about the groupobjectpermissions for a private vs an exposed log message?
In [27]: private_log = ms.log_set.last()In [28]: private_log.commentOut[28]: 'Superevent created with t_start=1368550707.765985, t_0=1368550708.765985, t_end=1368550709.765985, preferred_event=M407459'In [29]: private_log.tags.all()Out[29]: <QuerySet []>In [30]: private_log.loggroupobjectpermission_set.all()Out[30]: <QuerySet []>
okay, so the public-facing log DOES have the permission required for the public (and lvem, but that's redundant) to view it. And note that the GroupObjectPermission is (obviously) tied to the public_users group.
I think for some reason, instead of filtering for the small set of logs that have the permission, get_objects_for_user might be getting the list of ALL log objects that have the permission, and then filtering from there? that seems backward. But also explain analyze (#249 (comment 689232)) was filtering O(100,000) rows, so i'm not sure.
continuing step 1: mitigation. What happens if i remove the public permissions from old log messages? so basically hiding old (older than one month?) annotations.
yeesh, okay. so i'm going to loop through and remove the permissions from all of those objects. might take a while so i'm running in a screen. hold on....