... | ... | @@ -22,30 +22,124 @@ Char / Minute Taker / Focus Session [Rota](https://git.ligo.org/groups/gstlal/-/ |
|
|
* Announcements (5 minutes)
|
|
|
- Please check the rota for next week's call
|
|
|
- Confirmation of next week's focus session
|
|
|
- No focus session planned yet
|
|
|
|
|
|
* Last week's action items
|
|
|
- [ ] Ryan and Divya to prepare C00 & C01 comparison for an all aky call?[Slides](https://docs.google.com/presentation/d/1Bp_ibCwnWgip2Ovk_QJ7J6Oaksruzc5dx6CvnYL4FQs/edit?usp=sharing)
|
|
|
- Becca: I *think* there was general agreement on all sky call that c00 data is good enough for searches
|
|
|
- [ ] Chad: put together notes on table definitions to prepare for incorporating new injection file format.
|
|
|
- no update, roll over for next week
|
|
|
|
|
|
* Last week's [East call](https://wiki.ligo.org/CBC/Searches/GstLALEastAgenda20220630)
|
|
|
- Lots of MRs went through, all approved but one
|
|
|
- Feedback on bank splitter bug fix - it now will change how templates are allocated to sub banks, now there can be more variation in the number of templates per sub bank. Leo will look into it some more
|
|
|
* Quick updates (45 minutes)
|
|
|
- Operations (5 minutes)
|
|
|
- LL CBC operations
|
|
|
- Edward - Found last expected event on 6/30 - Prathamesh to remove counts
|
|
|
- Edward / EW both taken down for CIT maintenance (no one should log into the shared account on July 5)
|
|
|
- EW - working on profiling
|
|
|
- Offline CBC operations
|
|
|
- Save update for focus session
|
|
|
- LL IDQ operations
|
|
|
- Fixed two small issues - adding LR to likelihood ratio, compression of likelihood files
|
|
|
- Jacob dist stats files are consistently larger than Edward and we don't know why
|
|
|
- O4 Dev (30 minutes)
|
|
|
- Low latency integrated testing and Monitoring
|
|
|
- Becca: adding new pastro plots to Test Suite to compare FGMC and mchirp methods
|
|
|
- Jolien: are these expected to lie on the diagonal, what rates are being used?
|
|
|
- Anarya: the rates are scaled from O1/O2, the injection rates are higher than astrophysical so we dont expect these to lie on diagonal
|
|
|
- Jolien: to lie on the diagonal this would have to include all triggers, not just injections and we would need to scale the rates to the injection rate
|
|
|
- Ron: [frame status dashboard](https://icinga.infra.gwave.psu.edu/icingaweb2/monitoring/list/servicegroups), [frame lag dashboard](https://icinga.infra.gwave.psu.edu/icingaweb2/monitoring/list/servicegroups) (only accessible to PSU for now)
|
|
|
- monitoring runs 2 minutes and looks for last frame, if it's older than 1 minute it alerts
|
|
|
- chad: can we integrate these into our existing dashboards?
|
|
|
- Ron: yes, Grafana can read from the separate influx databases or we can put all the info into one database
|
|
|
- Chad: separate influx database is fine, but it would be nice to incorporate that in one dashboard. Eventually we should have a curated list of data sources that people can choose from to populate dashboards.
|
|
|
- Template bank
|
|
|
- no update
|
|
|
- Likelihood ratio, background and foreground sampling
|
|
|
- Anarya: Background sampling update: Testing the effects of restricted sampling on FAR's and FAP's of triggers. comparing lnpdf_noise vs lnL with observed zerolag counts .. should have results by this week's LR call.. (previously was doing this test for only bin 0321 , now doing for more (a total of 347) bins and hence more triggers)
|
|
|
- Optimization and throughput benchmarking
|
|
|
- Chad: it would be nice to have a gstlal container with perf installed, otherwise perf cant resolve symbols in the container (action item?)
|
|
|
- Chad: trying to find slow spots in gstlal inspiral - most of cpu time is not spent on signal processing. Looking for help to make speed improvements
|
|
|
- Data format wrangling
|
|
|
- no update
|
|
|
- Enabling running on OSG/IGWN grid
|
|
|
- Cort will update in focus session
|
|
|
- DQ dev
|
|
|
- no update
|
|
|
- HM search
|
|
|
- no update
|
|
|
- Exploratory development (5 minutes)
|
|
|
- no updates
|
|
|
- Misc projects
|
|
|
- no update
|
|
|
* Focus session
|
|
|
- [Status of the offline analyses](https://docs.google.com/presentation/d/1C-76qH_vtvSUDLrmh-l12pDUeskMfDUintrr8-eMmiE/edit?usp=sharing) - Cort
|
|
|
- Using a new branch for offline development: o4-offline-dev
|
|
|
- One analysis running on PSU - manifold bank, getting low through put
|
|
|
- removing request disk did not increase number of running jobs
|
|
|
- personal account has slower disk and possibly less priority than shared account
|
|
|
- Chad: if this disrupts Jacob for a few days is that okay?
|
|
|
- Rachael: that should be fine
|
|
|
- Action item: Ron to give us dedicated slots for online (need all queued, 0 idle) but also need to other cores to be able to prioritize work like this
|
|
|
- Action item: Crank Cort's priority up as high as possible for the time being - done during the call
|
|
|
- Running on IGWN grid: need a cvmfs container, were running into issues with file creation and transfer
|
|
|
- some inspiral jobs were not creating (or transferring) all the output files they needed
|
|
|
* AOB
|
|
|
- none
|
|
|
|
|
|
## Chat log
|
|
|
- <09:35:15> "Rachael Huxford": https://git.ligo.org/groups/gstlal/-/wikis/West-call/070522
|
|
|
- <09:37:56> "Becca Ewing": lots of MRs
|
|
|
- <09:42:42> "Becca Ewing": edward 
|
|
|
- <09:43:12> "Becca Ewing": but both have idq right?
|
|
|
- <09:44:14> "ron.tapia": hand up
|
|
|
- <09:44:40> "Becca Ewing": https://gstlal.ligo.caltech.edu/grafana/d/P84KbX97z/test-suite-template-dashboard?orgId=1&refresh=1m&from=now-4d&to=now-2d&var-DashDatasource=rebecca.ewing&viewPanel=100
|
|
|
- <09:45:16> "jolien.creighton": hand up
|
|
|
- <09:45:36> "ron.tapia": Frmae status: https://gstlal.ligo.caltech.edu/grafana/?orgId=1&search=open
|
|
|
- <09:45:52> "ron.tapia": Frame lag graph: https://gstlal.ligo.caltech.edu/grafana/?orgId=1&search=open
|
|
|
- <09:46:41> "Cort Posnansky": Is it the one you sent before?
|
|
|
- <09:46:47> "Cort Posnansky": https://grafana.infra.gwave.psu.edu/d/Dt7JaZenk/frame-lag?orgId=1&refresh=1m
|
|
|
- <09:47:03> "ron.tapia": https://icinga.infra.gwave.psu.edu/icingaweb2/monitoring/list/servicegroups
|
|
|
- <09:47:58> "Rachael Huxford": the one Cort sent looks good?
|
|
|
- <09:48:01> "ron.tapia": https://grafana.infra.gwave.psu.edu
|
|
|
- <09:48:22> "ron.tapia": https://grafana.infra.gwave.psu.edu/d/Dt7JaZenk/frame-lag?orgId=1&refresh=1m
|
|
|
- <09:49:18> "Becca Ewing": this is great thanks ron!
|
|
|
- <09:49:53> "chad.hanna": hand up
|
|
|
- <09:50:09> "Becca Ewing": i think jolien had a hand up too btw
|
|
|
- <09:51:02> "Becca Ewing": i think it would be nice to see next too the latency diagrams
|
|
|
- <09:51:32> "ron.tapia": Oh, cool.
|
|
|
- <09:53:21> "Becca Ewing": that shouldn't be necessary though right?
|
|
|
- <09:53:22> "chad.hanna": ok
|
|
|
- <09:53:44> "ron.tapia": True, I think it's reasonable to keep cit and psu grafana/dashboards separate
|
|
|
- <09:54:27> "Rachael Huxford": This one I believe: https://gstlal.ligo.caltech.edu/grafana/d/P84KbX97z/test-suite-template-dashboard?orgId=1&refresh=1m&from=now-4d&to=now-2d&var-DashDatasource=rebecca.ewing&viewPanel=100
|
|
|
- <09:56:42> "chad.hanna": agree on both of those
|
|
|
- <09:58:25> "Anarya": Background sampling update: Testing the effects of restricted sampling on FAR's and FAP's of triggers. comparing lnpdf_noise vs lnL with observed zerolag counts .. should have results by this week's LR call.. (previously was doing this test for only bin 0321 , now doing for more (a total of 347) bins and hence more triggers)
|
|
|
- <09:58:54> "alexander.pace": nothing from me
|
|
|
- <10:01:24> "ron.tapia": On https://ldas-jobs.gwave.ics.psu.edu/grafana there is now a datasource "gwave" that has the fileLag measurement.
|
|
|
- <10:03:42> "Cort Posnansky": https://docs.google.com/presentation/d/1C-76qH_vtvSUDLrmh-l12pDUeskMfDUintrr8-eMmiE/edit?usp=sharing
|
|
|
- <10:06:21> "ron.tapia": At them moment, there are 6338 jobs running on the cluster.
|
|
|
- <10:06:35> "ron.tapia": True
|
|
|
- <10:06:47> "Becca Ewing": that's pretty bad throughput
|
|
|
- <10:07:14> "Anarya": 350 jobs are mine probably  (on icds)
|
|
|
- <10:09:05> "ron.tapia": We could try adjusting priority. Also on my plate is to add new "slots" which are only accessible by some jobs (like mario/ew at CIT)
|
|
|
- <10:09:26> "chad.hanna": hand up
|
|
|
- <10:09:47> "Becca Ewing": this is also running on cort's personal account?
|
|
|
- <10:09:54> "Cort Posnansky": That's correct, Becca
|
|
|
- <10:09:55> "Rachael Huxford": around 1900 jobs for Jacob
|
|
|
- <10:10:01> "Cort Posnansky": Oh...
|
|
|
- <10:10:04> "Becca Ewing": do personal accounts have less prio than the shared account?
|
|
|
- <10:11:21> "Rachael Huxford": Yeah they're still loading in from when I took it down this morning.
|
|
|
- <10:11:47> "ron.tapia": 
|
|
|
- <10:12:40> "Becca Ewing": +1 on that
|
|
|
- <10:12:51> "Divya Singh": would it help to have a Jacob allocation on ICDS like we do on CIT especially if we are planning to run one of the analyses on ICDS for production? Would that help with offline in any way?
|
|
|
- <10:13:09> "Rachael Huxford": thumbs up from me
|
|
|
- <10:13:23> "chad.hanna": @divya - yes , but it needs condor work
|
|
|
- <10:13:37> "chad.hanna": and the only person who can do it is Ron and he is also doing other things 
|
|
|
- <10:13:48> "Divya Singh": okay, that makes sense!
|
|
|
- <10:14:57> "Rachael Huxford": It may delay my analysis of why the Jacob dist_stat files are a little larger than Edward - but that's totally fine. Its not super time sensitive and i can punt to next week.
|
|
|
- <10:16:12> "patrick.godwin": what issue?
|
|
|
- <10:17:13> "ron.tapia": You can see current priorities with `condor_userprio` (lower is better). I reset the usage for Cort's user and lowered the `priority factor` and `priority` for Cort's user. Cort, there is nothing else to be done with priority, just let me know if you start running using a different user.
|
|
|
- <10:17:28> "patrick.godwin": oh yeah, I remember now. I thought that issue is out of your control
|
|
|
- <10:18:39> "Cort Posnansky": @ron okay cool. That was easy enough haha |