Skip to content

Add kwargs to filter ClassifierData.triggers() by segments/columns

Patrick Godwin requested to merge classifier_data_trigger_queries into master

Implements reed.essick/iDQ#35.

In particular:

  • Allows the return from ClassifierData.triggers() to actually return the data requested, rather than the entire data store.
  • Modify the is_cached method to make use of a private cache metadata property, _cached_data which is based off of a segmentlistdict. It is a dictionary of segmentlists keyed by channel, and has nice properties to deal with set logic on all members of that dictionary. Already included as part of the ligo-segments package, so no new dependencies.
  • In order to return subsets of the full data store, we actually need to return a copy of the data when triggers() is called. That addresses the FIXME you had before out of necessity.
  • I've added a new property _time_column to deal with filtering by segments, otherwise ClassifierData has no notion to filter by times. To be honest, this is probably the place we should be storing specific trigger backend stuff for time and significance columns anyways since it knows about the columns it's supposed to contain, but I'll leave the full propagation of this for another time to not rock the boat.
  • Moved column information for gstlal-based features in utils.py, similar to what's done for KW-based features.

@reed.essick, I've assigned you to this since I am modifying some really base-level stuff in iDQ. Feel free to unassign/reassign yourself from some of the other merge requests if you feel it's getting a bit much.

Update:

  • Fix KafkaClassifierData to raise a NoDataError if no data results from querying the Kafka topic. This is handled better downstream in StreamProcessor where it can catch these errors. The behavior is essentially the same as before, but it's a better way of handling missing data rather than the hack I had before.
Edited by Patrick Godwin

Merge request reports

Loading