Indexing guardian_groupobjectpermission table
Documenting some stuff here so I don't forget it.
The guardian_groupobjectpermission
table is a huge mess. Basically it controls access to-and-by-whom for nearly every transaction on GraceDB. Not surprisingly, this is a big table compared to the rest of the db and apparently it's extremely inefficient at doing lookups.
For instance, here's a line showing that there's a primary key lookup that compares a string (object_pk
) to an integer value (model.id
) to grab a GroupObjectPermission
. So instead of doing any sort efficient indexing, it's looping through the whole permissions table. So that's got to be improved.
There's other room for improvement too. After years of uploading and deleting stuff, there are countless "orphaned" object permissions littering the db (https://django-guardian.readthedocs.io/en/stable/userguide/caveats.html#orphaned-object-permissions). I went through and I deleted orphaned permissions on the development and playground boxes (didn't touch production yet), and this is what I found:
Instance | Number of Orphaned Objects | Time to remove |
---|---|---|
gracedb-dev2 | 4 | O(instantly) |
gracedb-dev | 371446 | 2400s |
gracedb-test | 308 | 665s |
gracedb-playground | O(300,000) | 4400s |
gracedb-production | 479899 | 7440s |
Definitely slow. I noticed some strange IOPs limits too, but I'll make a new ticket with that.
I went into the dev and test machines and created an index on object_pk
to see if that would make a difference, and there were definite performance improvements so far. For my own reference, this was just a matter of running:
MariaDB [gracedb]> create index object_pk_idx on guardian_groupobjectpermission (object_pk);
I timed how long it took to perform that operation along with the before/after times of the client integration test (to make sure nothing broke) and the results are:
Instance | Size of guardian_groupobjectpermission Table |
Test Time Before Indexing | Test Time After Indexing | With python-requests |
---|---|---|---|---|
gracedb-dev2 | 3103 | 264.69 | 256.20 | 176.17 |
gracedb-test | 472005 | 309.72 | 245.93 | 175.18 |
gracedb-playground | 1249035 | 408.66 | 261.96s | 168.57 |
gracedb-production | 1240519 | 385.85 | 297.86s | 211.11s |
So there's a definite improvement. I'm going to do same thing to playground during maintenance tomorrow, and if everyone's happy then I'll push it to production the week following.
Edit: Added gracedb-playground before/after testing data after indexing the permissions table.