Storage of PE data products
@soichiro.morisaki and @cody.messick and I briefly discussed storage options for PE data products created by emfollow/emfollow-playground.
The current behaviour as I understand it is
- jobs run in ~/.cache/...
- files from successful jobs are removed from the cache after the posterior samples are uploaded to gracedb
- some information is preserved in ~/public_html/online_pe
There are a few things that it would be good to maintain from the PE side.
- the uploaded posterior samples file doesn't contain all of the parameters, just the parameters needed for skymap generation/em bright
- the information in the cache that is deleted includes a lot of useful information, ideally, we would want that stored permanently
- currently, the public HTML pages don't include useful information as the
PESummary
task has been removed due to time considerations
Questions:
- can the files written in the cache be stored somewhere more permanent after successful completion?
- If storage space is a concern, it would be good to coordinate moving these files to a different account, some kind of shared PE account should be possible. Could gwcelery handle that?
- If this is too complicated, is there any reason why the files couldn't be directly written to the shared PE account?
- the reason the
PESummary
task has been disabled is so that the posteriors can be uploaded to gracedb without waiting for the summary pages to be generated. Is it possible to have the pesummary task launch after the posterior samples are uploaded to gracedb?
A final side comment, currently, the information in the cache is retained for jobs that fail. This is a very useful debugging tool and so it would be nice to maintain that these files continue to be retained for some amount of time.