Bilby storage use
It looks like Bilby is very storage hungry, especially on playground due to the larger volume of high-significance triggers. Hopefully, this is a good place to discuss this.
I think a good solution will be to have many of the files automatically cleaned up by gwcelery
if the task completes and a cron task that clears out data for failed jobs/files we want to retain slightly longer.
For deomstration, here is a listing for a random event that ran to completion.
$ ls -lah /home/emfollow-playground/.cache/bilby/S230508am/production/*
-rw-r--r-- 1 emfollow-playground emfollow-playground 4.2K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/bilby_config.ini
-rw-r--r-- 1 emfollow-playground emfollow-playground 480K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/coinc.xml
-rw-r--r-- 1 emfollow-playground emfollow-playground 16K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/event.json
-rw-r--r-- 1 emfollow-playground emfollow-playground 6.7K May 7 21:56 /home/emfollow-playground/.cache/bilby/S230508am/production/G1044531_config_complete.ini
-rw-r--r-- 1 emfollow-playground emfollow-playground 201K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/H1_psd.txt
-rw-r--r-- 1 emfollow-playground emfollow-playground 201K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/L1_psd.txt
-rw-r--r-- 1 emfollow-playground emfollow-playground 1.9K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/likelihood_mode.json
-rw-r--r-- 1 emfollow-playground emfollow-playground 962 May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/online.prior
-rw-r--r-- 1 emfollow-playground emfollow-playground 386 May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/settings.json
-rw-r--r-- 1 emfollow-playground emfollow-playground 201K May 7 21:55 /home/emfollow-playground/.cache/bilby/S230508am/production/V1_psd.txt
/home/emfollow-playground/.cache/bilby/S230508am/production/data:
total 3.5G
drwxr-xr-x 2 emfollow-playground emfollow-playground 9 May 8 06:14 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 3.5G May 7 22:09 G1044531_data0_1367556606-3536878_generation_data_dump.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 52K May 7 21:57 H1_G1044531_data0_1367556606-3536878_generation_frequency_domain_data.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 114K May 7 21:57 H1_G1044531_data0_1367556606-3536878_generation_whitened_data.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 46K May 7 21:57 L1_G1044531_data0_1367556606-3536878_generation_frequency_domain_data.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 95K May 7 21:57 L1_G1044531_data0_1367556606-3536878_generation_whitened_data.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 53K May 7 21:57 V1_G1044531_data0_1367556606-3536878_generation_frequency_domain_data.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 124K May 7 21:57 V1_G1044531_data0_1367556606-3536878_generation_whitened_data.png
/home/emfollow-playground/.cache/bilby/S230508am/production/final_result:
total 7.7M
drwxr-xr-x 2 emfollow-playground emfollow-playground 4 May 7 23:57 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 387K May 7 23:57 Bilby.posterior_samples.hdf5
-rw-r--r-- 1 emfollow-playground emfollow-playground 9.6M May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_result.hdf5
/home/emfollow-playground/.cache/bilby/S230508am/production/log_data_analysis:
total 89K
drwxr-xr-x 2 emfollow-playground emfollow-playground 14 May 7 23:48 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 57 May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge.err
-rw-r--r-- 1 emfollow-playground emfollow-playground 0 May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_final_result.err
-rw-r--r-- 1 emfollow-playground emfollow-playground 1.4K May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_final_result.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 0 May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_final_result.out
-rw-r--r-- 1 emfollow-playground emfollow-playground 1.4K May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 0 May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge.out
-rw-r--r-- 1 emfollow-playground emfollow-playground 26K May 7 23:47 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0.err
-rw-r--r-- 1 emfollow-playground emfollow-playground 3.7K May 7 23:47 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 21K May 7 23:47 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0.out
-rw-r--r-- 1 emfollow-playground emfollow-playground 26K May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1.err
-rw-r--r-- 1 emfollow-playground emfollow-playground 3.9K May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 19K May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1.out
/home/emfollow-playground/.cache/bilby/S230508am/production/log_data_generation:
total 17K
drwxr-xr-x 2 emfollow-playground emfollow-playground 5 May 7 21:57 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 12K May 7 22:09 G1044531_data0_1367556606-3536878_generation.err
-rw-r--r-- 1 emfollow-playground emfollow-playground 1.8K May 7 22:09 G1044531_data0_1367556606-3536878_generation.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 0 May 7 21:57 G1044531_data0_1367556606-3536878_generation.out
/home/emfollow-playground/.cache/bilby/S230508am/production/log_results_page:
total 2.0K
drwxr-xr-x 2 emfollow-playground emfollow-playground 2 May 7 21:57 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
/home/emfollow-playground/.cache/bilby/S230508am/production/result:
total 169M
drwxr-xr-x 2 emfollow-playground emfollow-playground 19 May 7 23:48 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 9.6M May 7 23:48 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_result.hdf5
-rw-r--r-- 1 emfollow-playground emfollow-playground 101K May 7 23:46 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_checkpoint_run.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 72K May 7 23:46 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_checkpoint_stats.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 11M May 7 23:45 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_checkpoint_trace.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 18M May 7 23:46 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_checkpoint_trace_unit.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 18M May 7 23:46 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_dynesty.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 2.8M May 7 23:47 .G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_generate_posterior_cache.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 16M May 7 23:47 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_result.hdf5
-rw-r--r-- 1 emfollow-playground emfollow-playground 20M May 7 23:45 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0_resume.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 100K May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_checkpoint_run.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 73K May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_checkpoint_stats.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 11M May 7 23:39 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_checkpoint_trace.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 18M May 7 23:39 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_checkpoint_trace_unit.png
-rw-r--r-- 1 emfollow-playground emfollow-playground 19M May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_dynesty.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 2.8M May 7 23:40 .G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_generate_posterior_cache.pickle
-rw-r--r-- 1 emfollow-playground emfollow-playground 16M May 7 23:40 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_result.hdf5
-rw-r--r-- 1 emfollow-playground emfollow-playground 21M May 7 23:38 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1_resume.pickle
/home/emfollow-playground/.cache/bilby/S230508am/production/submit:
total 91K
drwxr-xr-x 2 emfollow-playground emfollow-playground 15 May 7 23:48 .
drwxr-xr-x 9 emfollow-playground emfollow-playground 19 May 7 21:57 ..
-rw-r--r-- 1 emfollow-playground emfollow-playground 2.6K May 7 21:56 bash_G1044531.sh
-rw-r--r-- 1 emfollow-playground emfollow-playground 3.3K May 7 21:56 dag_G1044531.submit
-rw-r--r-- 1 emfollow-playground emfollow-playground 1.9K May 7 21:57 dag_G1044531.submit.condor.sub
-rw-r--r-- 1 emfollow-playground emfollow-playground 40K May 7 23:48 dag_G1044531.submit.dagman.out
-rw------- 1 emfollow-playground emfollow-playground 0 May 7 21:57 dag_G1044531.submit.lib.err
-rw------- 1 emfollow-playground emfollow-playground 29 May 7 23:48 dag_G1044531.submit.lib.out
-rw-r--r-- 1 emfollow-playground emfollow-playground 544 May 7 23:48 dag_G1044531.submit.metrics
-rw-r--r-- 1 emfollow-playground emfollow-playground 6.1K May 7 23:48 dag_G1044531.submit.nodes.log
-rw-r--r-- 1 emfollow-playground emfollow-playground 851 May 7 21:56 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge_final_result.submit
-rw-r--r-- 1 emfollow-playground emfollow-playground 813 May 7 21:56 G1044531_data0_1367556606-3536878_analysis_H1L1V1_merge.submit
-rw-r--r-- 1 emfollow-playground emfollow-playground 862 May 7 21:56 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par0.submit
-rw-r--r-- 1 emfollow-playground emfollow-playground 862 May 7 21:56 G1044531_data0_1367556606-3536878_analysis_H1L1V1_par1.submit
-rw-r--r-- 1 emfollow-playground emfollow-playground 796 May 7 21:56 G1044531_data0_1367556606-3536878_generation.submit
- The files in the base run directory are useful for setting up follow-up jobs, but can be trivially recovered using the commands that gwcelery called, so I don't think we need to keep these for long.
- The most important files are in the
final_result
directory and I'd like for these to be retained long enough for them to be copied to a PE namespace - The log files are useful to retain for a while after the job is complete and they are also fairly lightweight, can we keep those for successful jobs of ~1 week to make sure we have time to parse them?
- For playground jobs, I think we can remove everything in the
result
directory after the job is complete. I'm not sure if we want to retain more for real triggers. - The data file uses the vast majority of the space. I think that this file can safely be removed after the job has been completed. I should double-check that this file is not used at all by pesummary. Maybe the
gwcelery
task can remove this file after the Bilby task successfully completes? - I note that an additional ~3.5GB file was removed from
/home/emfollow-playground/.cache/bilby/S230508am/production/data/G1044531_data0_1367556606-3536878_roq_weight_file.hdf5
as that file is no longer needed and should be removed from the written files in a subsequent release ofbilby_pipe
. - We use the data plots to do a spot check for any obvious data quality issues, so I'd like them to be retained at least until the Bilby task has been
Another issue with playground jobs is that many of the jobs with the largest data files are also long-running and so many of them are being aborted due to the timeout setting for the runs. This is another thing to be aware of.
@soichiro.morisaki @deep.chatterjee
Also cc the other PE chairs @aaron.zimmerman and @patricia.schmidt