Agenda
- 0.5.3 released, not in time for full reruns
- phi_JL issue & running L.I. on an O3 event (Colm)
- High-spin PP tests (Greg)
- 15D Gaussian (Moritz)
- Calibration (Sylvia)
Minutes Minutes: Lasky
Bilby review - 190724
Attendance: Hoy, Talbot, Romero-Shaw, Hübner, Sarin, Lasky, Galaudage, Stevenson, Biscoveanu, Payne
https://git.ligo.org/lscsoft/bilby_pipe/wikis/O3-review/minutes/190723
CT: Did 0.5.3 release yesterday.
GA: Conda and Pip are updated as well
CT: Fixes in 0.5.3. Improvements in time/phase/distance marginalisation that will help with high SNR events. Added tests to the CI. Analytic Gaussian. Coupled of minor bug fixes and improvements in efficiency.
SS: Is 0.5.3 the one we intend to re-run everything and get signed off, or are there still outstanding issues?
CT: There is nothing outstanding at the moment. So it depends on how review runs go.
GA: Different opinion. There will likely need code changes, albeit minor, but will definitely need 0.5.4. Think we should not re-run yet. Question for Simon: are you happy to see runs on 0.5.3, even if final version is 0.5.4.
SS: Depends on timescales. e.g., how long it takes to rerun things. Generally happy with running some stuff on 0.5.3 and looking through preliminary results that way.
CT: Difference in spin conventions. phi_JL is differerent for all events. We ran Bilby on an O3 event and did comparison that way. Ran on 190521r — https://ldas-jobs.ligo.caltech.edu/~colm.talbot/bilby_review/0.5.3/190521r/EXP0/comparison/html/Comparison_network_optimal_snr.html. Chirp mass is 39-ish. Network SNR is ~25. Bilby generated more samples than LI. Evident in e.g., chirp mass. phi_JL difference is now similar — the big difference is gone. There are still small differences, but we think this is consistent with the difference in other spin parameters; e.g., a1. This is likely because LI is undersampled.
SS: In general, this looks good. Shape of phi_JL is now the same; can be confident it’s just the change in the waveform interface that led to previous issues. Small but noticeable differences in spin parameters, but could be due to this event/undersampling with LI, etc. But generally looks good.
CT: Yes, O1/O2 events we can get a far better posterior match with LI. But the point here is that the wild difference between phi_JLs has now gone away.
SS: Yes.
Consensus: move forward. phi_JL is not a problem.
GA: high-spin pp tests. 128s prior files use low-spin prior from 0 to 0.5. Matt P suggested we agree this goes up to 0.8 in agreement with other prior files. Original reason for doing low spin was that LI only tested up to 0.5 with 128s. But it’s right to do the full thing. With standard dynesty settings, this failed (bilby#388 (closed)). Declination failed, maybe phi. Which parameters aren’t significant, but the failure is. This was solved by increasing the number of walks to 200 (still Nlive=1000). This passed — p-value is a random number between 0 and 1. This increased sampling time by about a factor 2. We now have 2 options: we can make walks=200 by default for all events, but this would waste CPU time for short events. Or we could have different defaults for long and short events.
CT: The reason nwalks works is because it increases the auto-correlation length. But according to arXiv:1404.7070, pushing the reference frequency higher also increases the auto-correlation length. This is another thing we could look at that would save costs.
GA: Other option. CT has a MR for dynesty that should actually reduce our run times significantly. Also decreasing the number of live points could potentially speed us up. Main point: we can solve this issue, it’s just an engineering decision as to the best path forward.
SS: pp plot looks fine. Agree it’s now a choice for settings, and whether this will be overkill.
PL: Lot’s of decisions get made anyway based on the length of the signal. Why not have this one of those decisions as well.
GA: No real reason not too. Depends on question of review, or setting up online PE.
MH: 15D Gaussian: uni- and bi-modal distributions — https://git.ligo.org/lscsoft/bilby_pipe/wikis/O3-review/15D_Gaussian Unimodal looks good. For bi-modal, one mode seems slightly preferred over another. More live points didn’t improve.
SS: Is it always the positive peak, or is it random?
MH: I think it’s random. If we combined many runs, it might work, but that’s unreasonable.
GA: Have discussed merging runs. Has this been done?
MH: Merged runs from four runs. One was strongly biased, and that dominated. So things still looked biased.
GA: okay, maybe we should try more walks, etc.
NS: FWIW - for GRB stuff always needed more than 100 walks.
MH: Doesn’t think this is to do with the number of live points, because the issue is that points aren’t walking between modes.
CT: We should re-look at how combining runs is effecting things. It would also be good to have evidences on these wiki pages. Suspect we get the same evidence regardless of the branching fraction. Suggest we move on.
SB: Calibration. Previous difference in calibration. Wondered if it was due to LI priors being different to Bilby priors. https://git.ligo.org/lscsoft/bilby_pipe/wikis/O3-review/calibration. Priors now plotted. LI posterior matches its prior. Bilby posterior matches its prior. So both are working, but they’re loading the prior differently. Also re-ran calibration for S190521r, and they look a little different.
CT: Calibration envelope files look sligthly more like LI thank Bilby. Is this an issue.
PL: Where are we at in terns of review? The initial plots SB showed were the action items. We show that both codes return the prior. So if the known events are the same, where are we at with review?
SS: It’s difficult to see where the impact is on the astrophysical priors. There must be some, but it’s hard to see. It’s important to understand what is being done, and that that is being done correctly. That doesn’t necessarily mean we get the same answer as LI. The recovery of the priors is very promising; each individual code is doing what we expect. The potential issue is that there is some difference in the implementation of the model. Need to make a decision about how much time we want to spend tracking that down.
SB: SB has previously sat down with Salvo and looked at how Bilby code was working, and pushed changes to Bilby, etc. At the time, they believed Bilby was doing the same as LI. Happy to go back through the code and check that things are working.
CT: Suspicion that something weird is happening with the interpolation.
CT: Bilby is returning the prior, which is good from a sampling perspective. Need to make sure we’re loading the file correctly, but that’s nothing astrophysical. This will come up in the distance, but this is hard to compare with LI because of their distance bias.
PL: Worried about chasing our tales looking for minor differences that don’t affect astrophysical results. At what point will we be happy? If we re-run O1/O2 without the distance bias, and we see good agreement, will we be happy that calibration doesn’t play a role?
SS: Need to touch base with Matt. But yes, at some point we should be happy with these things.
Meeting close.