Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
G
gracedb
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 100
    • Issues 100
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 1
    • Merge Requests 1
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • lscsoft
  • gracedb
  • Issues
  • #190

Closed
Open
Opened Jan 27, 2020 by Alexander Pace@alexander.paceMaintainer

Indexing guardian_groupobjectpermission table

Documenting some stuff here so I don't forget it.

The guardian_groupobjectpermission table is a huge mess. Basically it controls access to-and-by-whom for nearly every transaction on GraceDB. Not surprisingly, this is a big table compared to the rest of the db and apparently it's extremely inefficient at doing lookups.

For instance, here's a line showing that there's a primary key lookup that compares a string (object_pk) to an integer value (model.id) to grab a GroupObjectPermission. So instead of doing any sort efficient indexing, it's looping through the whole permissions table. So that's got to be improved.

There's other room for improvement too. After years of uploading and deleting stuff, there are countless "orphaned" object permissions littering the db (https://django-guardian.readthedocs.io/en/stable/userguide/caveats.html#orphaned-object-permissions). I went through and I deleted orphaned permissions on the development and playground boxes (didn't touch production yet), and this is what I found:

Instance Number of Orphaned Objects Time to remove
gracedb-dev2 4 O(instantly)
gracedb-dev 371446 2400s
gracedb-test 308 665s
gracedb-playground O(300,000) 4400s
gracedb-production 479899 7440s

Definitely slow. I noticed some strange IOPs limits too, but I'll make a new ticket with that.

I went into the dev and test machines and created an index on object_pk to see if that would make a difference, and there were definite performance improvements so far. For my own reference, this was just a matter of running:

MariaDB [gracedb]> create index object_pk_idx on guardian_groupobjectpermission (object_pk);

I timed how long it took to perform that operation along with the before/after times of the client integration test (to make sure nothing broke) and the results are:

Instance Size of guardian_groupobjectpermission Table Test Time Before Indexing Test Time After Indexing With python-requests
gracedb-dev2 3103 264.69 256.20 176.17
gracedb-test 472005 309.72 245.93 175.18
gracedb-playground 1249035 408.66 261.96s 168.57
gracedb-production 1240519 385.85 297.86s 211.11s

So there's a definite improvement. I'm going to do same thing to playground during maintenance tomorrow, and if everyone's happy then I'll push it to production the week following.

Edit: Added gracedb-playground before/after testing data after indexing the permissions table.

Edited Feb 07, 2020 by Alexander Pace
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: lscsoft/gracedb#190