segment logic bug fixes for ClassifierData
I caught a few bugs while trying to revamp batch.timeseries
, but since there hasn't been a merge request yet (having NFS issues causing testing to be slow), I cherry picked a few commits from there and added them into this merge request instead.
Specifically, there are two fixes:
-
ClassifierData.random_times()
: When sampling from random times, this was ignoring the segments defined withinClassifierData
, so the result was that it was giving random times outside of segments requested. This would be added to the random times for non-OVL classifiers and the result was that all the random times generated this way would produce feature vectors with all default values. This could cause the classifiers to perform worse as a result. -
Passing in
segs
intoUmbrellaClassifierData.triggers()
: The current implementation would pass in the same segments for each child, and since the children require the segments to be a strict subset of the segments defined, this would cause assertion errors. What I'm doing instead now is trimming the segments passed in by the child's start and end times (note, not the child's segments). This allows each child to possibly raise an assertion error if the segments passed in aren't a strict subset which is consistent with what we expect if we don't use an umbrella.