bilby_pipe issueshttps://git.ligo.org/lscsoft/bilby_pipe/-/issues2024-03-27T15:36:53Zhttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/294Sampling seed interaction with parallel jobs2024-03-27T15:36:53ZMichael Williamsmichael.williams@ligo.orgSampling seed interaction with parallel jobsSetting the sampling seed and n-parallel\>1 results in all the parallel jobs using the same sampling seed. If the sampler being used is deterministic, then this will result in the parallel jobs producing identical results.
I don't think...Setting the sampling seed and n-parallel\>1 results in all the parallel jobs using the same sampling seed. If the sampler being used is deterministic, then this will result in the parallel jobs producing identical results.
I don't think this impacts any existing analyses, as they do not set the sampling seed and it is instead set at run time.
**Example**
See the log files in this directory: https://ldas-jobs.ligo.caltech.edu/~michael.williams/bilby-dev/bilby-pipe/sampling-seed-parallel-jobs/outdir_dynesty_v1.3.0/
Both report:
```plaintext
02:28 bilby_pipe INFO : Sampling seed set to 1234
```Michael Williamsmichael.williams@ligo.orgMichael Williamsmichael.williams@ligo.orghttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/292Adaptively set distance prior boundary for online PE2024-03-04T15:33:33ZColm Talbotcolm.talbot@ligo.orgAdaptively set distance prior boundary for online PEOne of the main issues we see with online PE is railing in distance for high-mass BBH. We could just extend the prior further, but it would be good to choose the distance on either:
- the Bayestar skymap
- the PSDs/trigger masses
@soich...One of the main issues we see with online PE is railing in distance for high-mass BBH. We could just extend the prior further, but it would be good to choose the distance on either:
- the Bayestar skymap
- the PSDs/trigger masses
@soichiro.morisakiSoichiro MorisakiSoichiro Morisakihttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/291Use information from all G events to setup online analyses2024-03-04T15:33:09ZColm Talbotcolm.talbot@ligo.orgUse information from all G events to setup online analysesCurrently, we just use the preferred event in the `bilby_pipe_gracedb` executable to set the prior/duration/etc. There are cases (especially for BBH) where the masses from different search pipelines can differ significantly and also diff...Currently, we just use the preferred event in the `bilby_pipe_gracedb` executable to set the prior/duration/etc. There are cases (especially for BBH) where the masses from different search pipelines can differ significantly and also differ significantly from PE results. We may be able to make this process more robust by ingesting masses from all pipelines.
@soichiro.morisaki do you have any additional thoughts?
(This is currently a public-facing issue, so please avoid posting any proprietary information.)Soichiro MorisakiSoichiro Morisakihttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/144Add a gracedb CI test job2023-11-20T16:13:59ZGregory Ashtongregory.ashton@ligo.orgAdd a gracedb CI test jobGregory Ashtongregory.ashton@ligo.orgGregory Ashtongregory.ashton@ligo.orghttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/286Generation Node creation breaks some injection jobs2023-11-13T15:37:24ZJacob GolombGeneration Node creation breaks some injection jobs[The line to resolve frame files](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/job_creation/nodes/generation_node.py#L46) gets called even if the job is an injection job, which will not have a data dict, so the job wi...[The line to resolve frame files](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/job_creation/nodes/generation_node.py#L46) gets called even if the job is an injection job, which will not have a data dict, so the job will enter [this loop in resolve_frame_files](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/job_creation/nodes/generation_node.py#L103). If the injection time does not correspond to a time that is actually during an observing run, the job will fail (because it will try to [grab the frame type for a time during the run that does not exist](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/utils.py#L945)). But if users want to inject during times outside the valid observing time (e.g., to make projections), this will prevent the job from running.
If data is None and the job is an injection job, it probably can just bypass this function anyway.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/283Can't inject a normal waveform when recovering with an ROQ2023-11-13T15:36:41ZJacob GolombCan't inject a normal waveform when recovering with an ROQWhen injecting a signal and recovering with an ROQ, the injection fails with `KeyError: 'frequency_nodes_linear'`, because the injection frequency_domain_source_model defaults to the `frequency_domain_source_model` used for sampling (see...When injecting a signal and recovering with an ROQ, the injection fails with `KeyError: 'frequency_nodes_linear'`, because the injection frequency_domain_source_model defaults to the `frequency_domain_source_model` used for sampling (see [here](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/data_generation.py#L475)). This can probably be circumvented by passing the ROQ args in injection-waveform-arguments, but this still means there is no way to inject with the normal waveform (i.e. lal_binary_black_hole) and recover with the ROQ.
A simple fix would be allow an `injection-frequency-domain-source-model` to be passed as an input, then the line after [here](https://git.ligo.org/lscsoft/bilby_pipe/-/blob/master/bilby_pipe/data_generation.py#L469) should check for the existence of this argument (defaulting to `frequency-domain-source-model` if `injection-frequency-domain-source-model` is not specified, and pass it to the waveform generator that is created.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/285Test set for making sure essential analyses still work2023-11-13T15:36:19ZColm Talbotcolm.talbot@ligo.orgTest set for making sure essential analyses still workI want to make a checklist of runs that should be done before making new releases to make sure we don't accidentally break some backwards compatibility.
@daniel-williams do you have any asimov ini files that you can share for this?
- a...I want to make a checklist of runs that should be done before making new releases to make sure we don't accidentally break some backwards compatibility.
@daniel-williams do you have any asimov ini files that you can share for this?
- an event from gracedb using `bilby_pipe_gracedb` without any authentication available
- a public event from each observing run with/without authentication
- an event from O4a following up a real trigger, this can be one where we add Virgo data to make sure nothing breaks when adding an event with no PSD/premade frame file.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/274Merge/final result jobs fail due to pickle dumps2023-09-11T15:30:02ZColm Talbotcolm.talbot@ligo.orgMerge/final result jobs fail due to pickle dumpsEven with the disabling of HDF locking, we're seeing some lock-related failures.
One dangerous way this can manifest is:
- writing the result file fails and Bilby automatically falls back to a pickle dump to preserve progress.
- the me...Even with the disabling of HDF locking, we're seeing some lock-related failures.
One dangerous way this can manifest is:
- writing the result file fails and Bilby automatically falls back to a pickle dump to preserve progress.
- the merge and final_result jobs then try to read the HDF5 stub files, which fails.
- this error is caught and replaced with a warning by `bilby_result`.
I'm not sure what the best solution is, but we could:
- raise an error in the merge/final_result to more clearly advertise the issue
- update `bilby_result` to look for a pickle file if the specified file behaves strangely
@gregory.ashton1.1.0https://git.ligo.org/lscsoft/bilby_pipe/-/issues/281Fix checkpointing exceptions2023-09-11T15:28:40ZYannick LecoeucheFix checkpointing exceptionsMy runs are throwing out the following exception every time they try to checkpoint:
```
Traceback (most recent call last):
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/dynesty...My runs are throwing out the following exception every time they try to checkpoint:
```
Traceback (most recent call last):
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/dynesty/dynesty.py", line 910, in __call__
return self.func(np.asarray(x).copy(), *self.args, **self.kwargs)
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 53, in _log_likelihood_wrapper
return _sampling_convenience_dump.likelihood.log_likelihood_ratio()
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/gw/likelihood/base.py", line 412, in log_likelihood_ratio
log_l = self.compute_log_likelihood_from_snrs(total_snrs)
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/gw/likelihood/base.py", line 432, in compute_log_likelihood_from_snrs
log_l = self.time_marginalized_likelihood(
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/gw/likelihood/base.py", line 761, in time_marginalized_likelihood
log_l_tc_array = self.distance_marginalized_likelihood(
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/gw/likelihood/base.py", line 748, in distance_marginalized_likelihood
return self._interp_dist_margd_loglikelihood(
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/core/utils/calculus.py", line 231, in __call__
output[~bad], ier = bispeu(*self.tck, x[~bad], y[~bad])
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 731, in write_current_state_and_exit
super(Dynesty, self).write_current_state_and_exit(signum=signum, frame=frame)
File "/cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/envs/igwn-py39-20230425/lib/python3.9/site-packages/bilby/core/sampler/base_sampler.py", line 727, in write_current_state_and_exit
sys.exit(self.exit_code)
SystemExit: 77
```
This applies whether or not the runs are converging, so I'm not sure that it's causing any real problems (perhaps slowing things down?). Here is an example log file that shows this exception occuring:
https://ldas-jobs.ligo.caltech.edu/~yannick.lecoeuche/glitch-ylecoeuche/fastscatter2/GW190521/outdir_B/log_data_analysis/label_data5_1240770945-0_analysis_H1L1V1.errhttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/282Add support for requesting scitokens for data generation jobs2023-09-11T13:35:21ZDuncan Macleodduncan.macleod@ligo.orgAdd support for requesting scitokens for data generation jobsIn O4 the aggregated h(t) datasets for each interferometer will only be available from CVMFS (with some caveats) and will require a SciToken for authorised access.
I think this means that the `GenerationNode` for bilby workflows needs t...In O4 the aggregated h(t) datasets for each interferometer will only be available from CVMFS (with some caveats) and will require a SciToken for authorised access.
I think this means that the `GenerationNode` for bilby workflows needs to support requesting the relevant token from HTCondor. As per https://computing.docs.ligo.org/guide/htcondor/credentials/#scitokens, this is done by adding the following lines to a condor submit file:
```
use_oauth_services = igwn
igwn_oauth_permissions = read:/kagra read:/ligo read:/virgo
```
where the `read:/<collaboration>` _scopes_ can be added/removed as necessary depending on the data required by the workflow.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/280Missing keys in skymap from bilby samples2023-05-25T15:56:48ZSoichiro MorisakiMissing keys in skymap from bilby samplesI got the following message from a follow-up observer. I am not sure if this is something we can fix from the bilby_pipe side though.
```
Various keywords are missing from the initial and update maps (which are present in the mock event...I got the following message from a follow-up observer. I am not sure if this is something we can fix from the bilby_pipe side though.
```
Various keywords are missing from the initial and update maps (which are present in the mock events).
Some of them are probably pipeline specific, but here's the diff between preliminary and initial:
INSTRUME
LOGBCI
LOGBSN
MOC
OBJECT
REFERENC
RUNTIME
```https://git.ligo.org/lscsoft/bilby_pipe/-/issues/276Online PE jobs failing with duration issues2023-05-09T10:53:15ZColm Talbotcolm.talbot@ligo.orgOnline PE jobs failing with duration issuesMDC jobs are occasionally failing the prior duration test, e.g., https://git.ligo.org/colm.talbot/testing/-/issues/2901.
I suspect there's a mismatch somewhere between how the time to merger is calculated in different places. [This](htt...MDC jobs are occasionally failing the prior duration test, e.g., https://git.ligo.org/colm.talbot/testing/-/issues/2901.
I suspect there's a mismatch somewhere between how the time to merger is calculated in different places. [This](https://git.ligo.org/lscsoft/bilby/-/blob/master/bilby/gw/utils.py#L1044-L1075) is the version in Bilby. @soichiro.morisaki how does this compare with how the prior bounds are chosen?https://git.ligo.org/lscsoft/bilby_pipe/-/issues/261Follow-up from "Fix ci"2023-04-12T13:45:03ZColm Talbotcolm.talbot@ligo.orgFollow-up from "Fix ci"Remove the pin on the framecpp version from !511Remove the pin on the framecpp version from !511https://git.ligo.org/lscsoft/bilby_pipe/-/issues/265Remove temporary pin of matplotlib2023-04-12T13:45:03ZGregory Ashtongregory.ashton@ligo.orgRemove temporary pin of matplotlibIf !515 is merged, we have a temporary pin of matplotlib to resolve a gwpy issue: https://github.com/gwpy/gwpy/pull/1586. This issue is to remind us to remove that pin once there is a gwpy release.
Next gwpy release: https://github.com/...If !515 is merged, we have a temporary pin of matplotlib to resolve a gwpy issue: https://github.com/gwpy/gwpy/pull/1586. This issue is to remind us to remove that pin once there is a gwpy release.
Next gwpy release: https://github.com/gwpy/gwpy/milestone/63https://git.ligo.org/lscsoft/bilby_pipe/-/issues/272Don't optimize calibration parameters for relative binning likelihood2023-04-11T10:41:12ZColm Talbotcolm.talbot@ligo.orgDon't optimize calibration parameters for relative binning likelihoodColm Talbotcolm.talbot@ligo.orgColm Talbotcolm.talbot@ligo.orghttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/206Abuse of CPU resources2023-04-05T14:18:22ZJames Alexander Clark PhDAbuse of CPU resourcesThe `request-cpus` parameter in the ini file can result in `request-cpus**2` cpus being used in condor jobs, leading to accidental abuse of CPU resources.
Observed using the IGWN conda environments with dynesty for local CIT and non-loc...The `request-cpus` parameter in the ini file can result in `request-cpus**2` cpus being used in condor jobs, leading to accidental abuse of CPU resources.
Observed using the IGWN conda environments with dynesty for local CIT and non-local OSG jobs:
* Condor default behavior sets the environment variable `OMP_NUM_THREADS` to the value of the `request_cpus` directive in the submit file [[1]](https://htcondor.readthedocs.io/en/latest/users-manual/services-for-jobs.html?highlight=NUM_THREADS#extra-environment-variables-htcondor-sets-for-jobs.).
* Bilby/dynesty starts `request_cpus` number of *processes* in a multiprocessing pool.
* When `OMP_NUM_THREADS` is found in the environment, **each** of those processes spawns that many threads (This is a known feature of numpy but I'm yet to confirm which library is actually multithreading)
* Resulting CPU usage: `request_cpus x OMP_NUM_THREADS = request_cpus**2`, instead of the intended value.
To fix: Bilby should set `OMP_NUM_THREADS` explicitly by including the following line in the submit files for any multi-process jobs:
```ini
environment = "OMP_NUM_THREADS=1"
```Gregory Ashtongregory.ashton@ligo.orgGregory Ashtongregory.ashton@ligo.orghttps://git.ligo.org/lscsoft/bilby_pipe/-/issues/270HDF5 saving error (and subsequent error while reading) -- maybe related to th...2023-03-15T11:22:57ZAvi Vajpeyiavi.vajpeyi@ligo.orgHDF5 saving error (and subsequent error while reading) -- maybe related to the metadata?The HDF5 encoder sometimes has trouble encoding stuff (i think specifically stuff in the metadata):
```
23:08 bilby ERROR :
Saving the data has failed with the following message:
No conversion path for dtype: dtype('<U63')
Data ha...The HDF5 encoder sometimes has trouble encoding stuff (i think specifically stuff in the metadata):
```
23:08 bilby ERROR :
Saving the data has failed with the following message:
No conversion path for dtype: dtype('<U63')
Data has been dumped to ../../../../../../../private/var/folders/qt/rxjvm_j566v9qn7g754s1v9hzb3p7f/T/pytest-of-avaj0001/pytest-69/test_end_to_end0/result/bbh_injection_result.pkl.
23:08 bilby INFO : Summary of results:
nsamples: 1
ln_noise_evidence: -16419.781
ln_evidence: -16324.042 +/- 1.764
ln_bayes_factor: 95.739 +/- 1.764
```
Example ini:
```ini
accounting = ligo.dev.o3.cbc.pe.lalinference
label = bbh_injection
outdir = outdir_bbh_injection
detectors = [H1, L1]
duration = 4
sampler = dynesty
sampler-kwargs = {'nlive': 5, 'nact': 1, 'dlogz':30}
trigger_time = 0
injection = True
injection-dict = {"mass_1": 50, 'mass_2': 45, 'a_1': 0, 'a_2': 0, 'tilt_1': 0, 'tilt_2': 0,
'phi_12': 0, 'phi_jl': 0, 'luminosity_distance': 1000, 'dec': -0.2, 'ra': 1.4,
'theta_jn': 0.2, 'psi': 0, 'phase': 0, 'geocent_time': 0}
gaussian-noise = True
n-simulation = 1
n-parallel = 1
request-cpus = 4
osg = True
prior-dict = {
mass_1 = 50.0
mass_2 = 45.0
a_1 = 0.0
a_2 = 0.0
tilt_1 = 0.0
tilt_2 = 0.0
phi_12 = 0.0
phi_jl = 0.0
luminosity_distance = bilby.gw.prior.UniformComovingVolume(name='luminosity_distance', minimum=1e2, maximum=5e3, unit='Mpc')
dec = -0.2
ra = 1.4
theta_jn = Sine(name='theta_jn')
psi = 0.0
phase = 0.0
geocent_time = 0.0
}
```
The HDF file is still saved, but when a user tries to read it in:
```
../../../bilby/bilby-master/bilby/core/result.py:95: in read_in_result
result = Result.from_hdf5(filename=filename)
../../../bilby/bilby-master/bilby/core/result.py:476: in from_hdf5
return cls(**data)
../../../bilby/bilby-master/bilby/core/result.py:399: in __init__
self.priors = priors
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <bilby.core.result.Result object at 0x12e0b0130>
priors = {'mass_1': DeltaFunction(peak=50.0, name=None, latex_label=None, unit=None), 'mass_2': DeltaFunction(peak=45.0, name=N...me=None, latex_label=None, unit=None), 'geocent_time': DeltaFunction(peak=0.0, name=None, latex_label=None, unit=None)}
@priors.setter
def priors(self, priors):
if isinstance(priors, dict):
if isinstance(priors, PriorDict):
self._priors = priors
else:
self._priors = PriorDict(priors)
if self.parameter_labels is None:
> self.parameter_labels = [self.priors[k].latex_label for k in
self.search_parameter_keys]
E TypeError: 'NoneType' object is not iterable
../../../bilby/bilby-master/bilby/core/result.py:543: TypeError
```
Ran with
```
>>> bilby.__version__
'2.0.1.dev21+g507d93c8.d20230315'
>>> bilby_pipe.__version__
'1.0.9.dev0+g383f28b.d20230315'
```https://git.ligo.org/lscsoft/bilby_pipe/-/issues/266sklearn required for GMM proposal2023-03-13T22:42:14ZSylvia Biscoveanusklearn required for GMM proposalThe default GW proposal cycle for bilby-mcmc uses the `GMMProposal`, which requires `sklearn`. This should be added as a requirement for `bilby_pipe`, since otherwise the package needs to be installed manually.The default GW proposal cycle for bilby-mcmc uses the `GMMProposal`, which requires `sklearn`. This should be added as a requirement for `bilby_pipe`, since otherwise the package needs to be installed manually.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/267Default dynesty sampling kwargs out of date2023-03-06T14:58:27ZSylvia BiscoveanuDefault dynesty sampling kwargs out of dateFollowing the changes to dynesty in https://git.ligo.org/lscsoft/bilby/-/merge_requests/1187, the default bilby_pipe sampling kwargs need to be updated to match the bilby defaults, as the old defaults cause the sampling to fail immediate...Following the changes to dynesty in https://git.ligo.org/lscsoft/bilby/-/merge_requests/1187, the default bilby_pipe sampling kwargs need to be updated to match the bilby defaults, as the old defaults cause the sampling to fail immediately at the moment.https://git.ligo.org/lscsoft/bilby_pipe/-/issues/260Add retries for analysis jobs2023-03-03T17:13:11ZColm Talbotcolm.talbot@ligo.orgAdd retries for analysis jobsI've noticed an increasing number of jobs failing at CIT due to file system issues, mostly due to "no space left on disk".
I think that one workaround for this is to [specify some number of automatic retries](https://htcondor.readthedoc...I've noticed an increasing number of jobs failing at CIT due to file system issues, mostly due to "no space left on disk".
I think that one workaround for this is to [specify some number of automatic retries](https://htcondor.readthedocs.io/en/latest/users-manual/automatic-job-management.html#automatically-rerunning-a-failed-job).
@gregory.ashton are you in favour of having this?