Skip to content

batch round robin and segment queries (attempt 2)

Reed Essick requested to merge clean-batch_round_robin into master

this is one big patch that squashes together all the little changes made to get batch round robin and segments working. It should serve to replace !9 (closed). Hopefully the git log is somewhat cleaner. I've attempted to recreate the commit history below.

  • added in segment querying and the preliminary things needed to do round-robin training/evaluation based on time stretches. The actual division of labor within batch.batch is not yet implemented, though. Nor is any of this tested...

  • added in section to example config file for segments and split off the specification of how we define glitch/clean samples into a new section. This has not been tested.

  • fixed some typos

  • fixed a typo and ran some basic tests. It also seems that OVL and DOVL take an absurd amount of time to train but the sklearn classifiers take an absurd amount of time to evaluate. Not sure what's going on here, but it deserves closer attention.

  • cleaned up unnecessary calls in batch (defining segments which are immediately overwritten) and added a fix_segments call in segdb wrapper functions to hopefully avoid issues with segments lists returned by SegDb's API

  • work to actually get round-robin logic up and running. Implementation should be complete for batch_workflow=block, but has not been tested. batch_workflow=fork, condor have not been implemented although placeholders are present.

  • more work implementing round-robin logic within batch. batch_workflow=fork should also be functional now, although still not tested. Will implement condor next and then test everything.

  • added in some placeholders and thoughts about causal round-robin segmenting within idq-batch

  • attempt to implement condor workflow for idq-batch. We note that this will only work if each compute node can submit condor jobs when workflow=condor for individual steps like train, evaluate, calibrate, timeseries.

  • fixed two minor syntax errors

  • debugging to get batch workflows working smoothly. Everything appears to work as designed at this point. NOTE: we left CalibrationMap.optimize with the "pass" statement here, expecting it to be overwritten in short order by changes in origin/calibration_development

Specific changes, mostly copied from !9 (closed), are listed below

This merge request contains a lot of stuff, which is all necessary to get the batch pipeline running with round-robin logic in place. Except for calibration (which will be updated in short order from the calibration_development branch), the batch pipeline should now work with all workflows.

I summarize the changes below:

  • bin/*

    • I've added --exclude options to most of the batch executables. This is needed to support the round-robin logic within batch and we've exposed the API to the command line in case users find it useful independent of that.
    • I've mucked around with the options within idq-batch to specify a few things independently of the config file. These are limited and only have to do with the workflow and logging for the batch job itself; i.e.: the actual execution and results of train, evaluate, calibrate, and timeseries are independent of these command line options.
    • added idq-condor_batch, which is needed for batch_workflow=condor to function correctly.
  • etc/idq.ini

    • added a new section to specify how we select samples. This used to be part of [general], but now is included in a new section called [samples]
  • idq.utils

    • added segment query utility functions using SegDb's REST API
    • changes a bit of syntax (the name of check_segements)
    • guaranteed that ligolw.segmentlist is returned by segments_intersection
  • idq.batch

    • added segment queries to SegDb with the Python REST API
    • added a placeholder for causal_batch, which is suggest by #24 (closed)
    • implemented all batch workflows
    • added exclude kwarg to all batch functions, as needed.
    • modified how we select samples to reflect changes in INI format (see changes to etc/idq.ini)
  • idq.stream

    • changed how we read samples to reflect changes in INI format (see changes to etc/idq.ini)
  • idq.calibration

    • replaced the NotImplementedError with a pass statement within FixedBandwidth1DKDE.optimize. This should be quickly overwritten by changes in the calibration_development branch and allows the pipeline to run to completion.
  • idq.classifers

    • a minor change in syntax regarding how ranks are assigned to vectors
  • idq.condor

    • fixed a few typos re: passing kwargs to delegation functions
    • added functionality needed for idq-batch with batch_workflow=condor
  • idq.logs

    • mucked around with how loggers are instantiated to prevent repeated print statements
    • exposed logger path API so things in idq.batch can reference where other jobs will write their logs easily
  • idq.names

    • added support needed for idq-batch
  • setup.py

    • a bit of cleanup (repeated executables)
    • added idq-condor_batch to install list

Merge request reports