GraceDB Server issueshttps://git.ligo.org/computing/gracedb/server/-/issues2023-07-28T19:19:26Zhttps://git.ligo.org/computing/gracedb/server/-/issues/323Consider increasing the configuration parameter "max_wal_size".2023-07-28T19:19:26ZAlexander PaceConsider increasing the configuration parameter "max_wal_size".There were some timeouts on `gracedb-playground` this afternoon (2023-07-23) from around 18:40-18:43ish UTC that I think were triggered in some part by a `VACUUM FULL` when i was doing some exploratory maintenance on playground's db. Dur...There were some timeouts on `gracedb-playground` this afternoon (2023-07-23) from around 18:40-18:43ish UTC that I think were triggered in some part by a `VACUUM FULL` when i was doing some exploratory maintenance on playground's db. During the period in question there were the following lines in `gracedb-playground`'s RDS logs:
```
2023-07-28 18:35:50 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:36:12 UTC::@:[393]:LOG: checkpoint complete: wrote 39902 buffers (16.5%); 0 WAL file(s) added, 0 removed, 16 recycled; write=20.183 s, sync=1.326 s, total=21.691 s; sync files=211, longest=1.323 s, average=0.007 s; distance=1048579 kB, estimate=1048579 kB
2023-07-28 18:36:13 UTC::@:[393]:LOG: checkpoints are occurring too frequently (23 seconds apart)
2023-07-28 18:36:13 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:36:13 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:36:39 UTC::@:[393]:LOG: checkpoint complete: wrote 231 buffers (0.1%); 0 WAL file(s) added, 0 removed, 13 recycled; write=25.661 s, sync=0.420 s, total=26.123 s; sync files=112, longest=0.399 s, average=0.004 s; distance=1048586 kB, estimate=1048586 kB
2023-07-28 18:36:49 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:14 UTC::@:[393]:LOG: checkpoint complete: wrote 2019 buffers (0.8%); 0 WAL file(s) added, 2 removed, 17 recycled; write=24.321 s, sync=0.191 s, total=25.505 s; sync files=138, longest=0.190 s, average=0.002 s; distance=1049475 kB, estimate=1049475 kB
2023-07-28 18:37:17 UTC::@:[393]:LOG: checkpoints are occurring too frequently (28 seconds apart)
2023-07-28 18:37:17 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:37:17 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:24 UTC::@:[393]:LOG: checkpoint complete: wrote 69 buffers (0.0%); 0 WAL file(s) added, 0 removed, 10 recycled; write=6.996 s, sync=0.342 s, total=7.539 s; sync files=34, longest=0.342 s, average=0.011 s; distance=1065103 kB, estimate=1065103 kB
2023-07-28 18:37:30 UTC::@:[393]:LOG: checkpoints are occurring too frequently (13 seconds apart)
2023-07-28 18:37:30 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:37:30 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:37:33 UTC::@:[393]:LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 9 recycled; write=0.480 s, sync=0.190 s, total=2.933 s; sync files=4, longest=0.190 s, average=0.048 s; distance=1056458 kB, estimate=1064239 kB
2023-07-28 18:38:33 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:38:49 UTC::@:[393]:LOG: checkpoint complete: wrote 171 buffers (0.1%); 0 WAL file(s) added, 0 removed, 19 recycled; write=15.533 s, sync=0.120 s, total=16.420 s; sync files=89, longest=0.120 s, average=0.002 s; distance=1034294 kB, estimate=1061244 kB
2023-07-28 18:39:19 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:39:36 UTC::@:[393]:LOG: checkpoint complete: wrote 171 buffers (0.1%); 0 WAL file(s) added, 0 removed, 14 recycled; write=17.051 s, sync=0.006 s, total=17.104 s; sync files=94, longest=0.006 s, average=0.001 s; distance=1063328 kB, estimate=1063328 kB
2023-07-28 18:40:59 UTC::@:[393]:LOG: checkpoint complete: wrote 517 buffers (0.2%); 0 WAL file(s) added, 11 removed, 17 recycled; write=28.949 s, sync=0.112 s, total=29.842 s; sync files=181, longest=0.111 s, average=0.001 s; distance=1040638 kB, estimate=1061059 kB
2023-07-28 18:41:00 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:41:11 UTC::@:[393]:LOG: checkpoint complete: wrote 118 buffers (0.0%); 0 WAL file(s) added, 0 removed, 14 recycled; write=10.732 s, sync=0.280 s, total=11.601 s; sync files=47, longest=0.280 s, average=0.006 s; distance=1084223 kB, estimate=1084223 kB
2023-07-28 18:41:14 UTC::@:[393]:LOG: checkpoints are occurring too frequently (14 seconds apart)
2023-07-28 18:41:14 UTC::@:[393]:HINT: Consider increasing the configuration parameter "max_wal_size".
2023-07-28 18:41:14 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:41:16 UTC::@:[393]:LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 5 recycled; write=1.227 s, sync=0.054 s, total=2.786 s; sync files=2, longest=0.054 s, average=0.027 s; distance=1037553 kB, estimate=1079556 kB
2023-07-28 18:42:12 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:42:16 UTC::@:[393]:LOG: checkpoint complete: wrote 34 buffers (0.0%); 0 WAL file(s) added, 0 removed, 18 recycled; write=3.448 s, sync=0.090 s, total=3.948 s; sync files=22, longest=0.090 s, average=0.005 s; distance=1012093 kB, estimate=1072810 kB
2023-07-28 18:43:39 UTC::@:[393]:LOG: checkpoint starting: wal
2023-07-28 18:43:41 UTC::@:[393]:LOG: checkpoint complete: wrote 11 buffers (0.0%); 0 WAL file(s) added, 0 removed, 16 recycled; write=1.116 s, sync=0.181 s, total=2.198 s; sync files=8, longest=0.181 s, average=0.023 s; distance=1103069 kB, estimate=1103069 kB
```
This also occurred during a period of high relational load in the database:
![Screen_Shot_2023-07-28_at_3.13.56_PM](/uploads/95a62730a64f5a8d0c75d39d8c809705/Screen_Shot_2023-07-28_at_3.13.56_PM.png)
I haven't seen these hints and warnings on production, even when the database gets `VACUUM`'ed, so hopefully chalk it up to another example of playground's growing pains. Either way, consider some of the recommendations that the internet has to offer:
* https://www.crunchydata.com/blog/tuning-your-postgres-database-for-high-write-loads
* https://www.enterprisedb.com/blog/tuning-maxwalsize-postgresql
* https://stackoverflow.com/questions/75134262/why-do-i-have-the-message-max-wal-size-suddenly-appearing-in-my-postgres-logs
And once those parameters are tuned and validations in the `gracedb-postgresql-dev` parameter group, apply it to production.https://git.ligo.org/computing/gracedb/server/-/issues/174Potential character set issue2019-09-20T15:35:58ZTanner PrestegardPotential character set issueThe development and playground databases should have the correct character sets and collations due to how they were created by Puppet. But the production database was created so long ago that it looks like it has the `latin1` character ...The development and playground databases should have the correct character sets and collations due to how they were created by Puppet. But the production database was created so long ago that it looks like it has the `latin1` character set by default.
It's not posing a problem at present since we have a migration which manually sets the `auth_user` table to use utf8, but I think it would be a good idea to set the database default character set and collation when an opportunity arises.
We'll have to get MySQL command-line access to the production database, then run:
```
ALTER DATABASE <dbname> CHARACTER SET utf8 COLLATE utf8_general_ci;
```
Might be worth testing this (I haven't) and taking a snapshot before doing so.https://git.ligo.org/computing/gracedb/server/-/issues/159Old database tables2019-09-11T14:33:24ZTanner PrestegardOld database tablesThere are a number of tables in the production database which appear to be completely unused by the Django application:
* coinc_definer
* coinc_event
* coinc_event_map
* coinc_inspiral
* events_approval
* experiment
* experiment_map
* e...There are a number of tables in the production database which appear to be completely unused by the Django application:
* coinc_definer
* coinc_event
* coinc_event_map
* coinc_inspiral
* events_approval
* experiment
* experiment_map
* experiment_summary
* ligolwids
* multi_burst
* process
* process_params
* search_summary
* search_summvars
* sngl_inspiral
* south_migrationhistory
* time_slide
Unsure about the `auth_message` table: maybe used for messages (?). Doubtful.
These tables should be briefly checked for the contents and likely just deleted from the MariaDB interface with `DROP TABLE tbl_name;`https://git.ligo.org/computing/gracedb/server/-/issues/140Add VOEvent columns2019-07-11T13:22:52ZTanner PrestegardAdd VOEvent columnsThere are a variety of VOEvent attributes which are required from the user to create a VOEvent and are included in the resulting files, but are **not** saved in the VOEvent database table.
We should
a) Fix this by adding the columns
b)...There are a variety of VOEvent attributes which are required from the user to create a VOEvent and are included in the resulting files, but are **not** saved in the VOEvent database table.
We should
a) Fix this by adding the columns
b) Try to retroactively update VOEvents from their corresponding fileshttps://git.ligo.org/computing/gracedb/server/-/issues/52Change database backend from MySQL (MariaDB) to PostgresQL2022-02-01T18:44:18ZTanner PrestegardChange database backend from MySQL (MariaDB) to PostgresQLCreated on November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5986)
PostgreSQL seems to the the preferred RDBMS for much of the Django community. A comparison here: https://www.digitalocean.com/community/tutori...Created on November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5986)
PostgreSQL seems to the the preferred RDBMS for much of the Django community. A comparison here: https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems.
Making the switch would likely be complicated. But if we can make the argument that it would significantly improve performance (speed and/or reliability), it may be worth it.
One point of note: the abstract base `AutoIncrementModel` would need to be reworked since it executes raw MySQL queries, which likely wouldn't be compatible.O4 Infrastructure ImprovementsDuncan MeacherDuncan Meacherhttps://git.ligo.org/computing/gracedb/server/-/issues/51Change table engine for database2022-08-03T18:44:32ZTanner PrestegardChange table engine for databaseCreated on November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5984)
Starting this issue just to have a place to record some research, notes, etc. on our database configuration. Currently, we use MySQL (effectiv...Created on November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5984)
Starting this issue just to have a place to record some research, notes, etc. on our database configuration. Currently, we use MySQL (effectively; moving to MariaDB shortly, in Debian 9) and we use MyISAM as a table engine. My understanding is that MyISAM is antiquated and really only around for legacy support. Note that MariaDB offers an updated alternative to MyISAM called Aria. But most things I've read seem to indicate that InnoDB is highly preferred and is a good general-purpose engine. The main issues I see with MyISAM is that it doesn't have transactions (I would think that is very important) and it technically doesn't have foreign keys, but Django seems to handle that sufficiently well on its own.
However, I note that Branson specifically switched from InnoDB to MyISAM in the following commit from 2014:
```
commit 2d4b3c483a4e40af8ff3ef3336bd9272b28ef01a
Author: Branson Stephens <branson.stephens@ligo.org>
Date: Sat Sep 27 16:10:37 2014 -0500
Committed changes to avoid new tables using INNODB and old ones using MYISAM. This leads to fk constraint errors.
```
So do we really want to go to InnoDB? I ran a few tests and noted that just running the initial migrations for a new 'gracedb' database took ~ 2 mins with InnoDB, compared to 10 secs with MyISAM. So speed might be an issue (or maybe there are just some knobs to tweak which would help). I don't think this is a top priority so I'm just jotting down some notes here for future reference. Note that these are based on brief research and are definitely not exhaustive.
InnoDB benefits:
* ACID transactions
* Row-level locking (only table-level with MyISAM)
* Foreign key constraints (technically don't exist in MyISAM)
* Automatic crash recovery
* Table compression (read/write)
* Spatial data types and indexes
MyISAM benefits:
* Fast COUNT(*) (when WHERE, GROUP BY, or JOIN is not used)
* Smaller disk footprint
* Very high table compression (read only)
If we end up wanting to switch to InnoDB, we need to do two things:
* Remove the line in settings/base.py which forces new tables to use MyISAM ('init_command': "SET storage_engine=MyISAM")
* Convert all tables from MyISAM to InnoDB:
```
SELECT CONCAT('ALTER TABLE ', TABLE_SCHEMA, '.', TABLE_NAME,' ENGINE=InnoDB;') FROM Information_schema.TABLES WHERE TABLE_SCHEMA = 'gracedb' AND ENGINE = 'MyISAM' AND TABLE_TYPE = 'BASE TABLE';
```
Note that this SQL command hasn't been tested and definitely should be!https://git.ligo.org/computing/gracedb/server/-/issues/50MySQL strict mode2019-04-22T18:22:09ZTanner PrestegardMySQL strict modeCreated November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5985)
After upgrading to Django 1.11, you get the following warning when doing migrations:
```
?: (mysql.W002) MySQL Strict Mode is not set for databa...Created November 16, 2017. Copied from redmine (https://bugs.ligo.org/redmine/issues/5985)
After upgrading to Django 1.11, you get the following warning when doing migrations:
```
?: (mysql.W002) MySQL Strict Mode is not set for database connection 'default'
HINT: MySQL's Strict Mode fixes many data integrity problems in MySQL, such as data truncation upon insertion, by escalating warnings into errors. It is strongly recommended you activate it. See: https://docs.djangoproject.com/en/1.11/ref/databases/#mysql-sql-mode
```
The link suggests adding one of the following to your DATABASES OPTIONS:
```
'init_command': "SET sql_mode='STRICT_TRANS_TABLES'"
'init_command': "SET sql_mode='STRICT_ALL_TABLES'"
```
Both of these can lead to issues with the MyISAM engine (see https://www.noelherrick.com/blog/mysql-strict_all_tables-vs-strict_trans_tables). I think we're going to ignore this warning until we potentially update to InnoDB.