Skip to content

filter umbrella by max_samples within stream.train

Reed Essick requested to merge filter-umbrella-by-max-samples into master

added a little bit of logic to filter the big umbrella by max_samples. This is done by removing children as needed until we fall below "max_samples" remaining target_times. As long as each child is relatively small (expected for small training strides), this should work pretty well.

the actual implementation repeatedly calls umbrell.target_times, which in turn delegates to umbrella.triggers. This means we should autoatically cache data in a not-too-stupid way within each child and the I/O cost will remain small.

Merge request reports