Skip to content

WIP: kw io optimization

Reed Essick requested to merge kw-io-optimization into master

after implementing the KW predictive ClassifierData objects, I ran several timing tests. The results are presented below, but it does not look like this will be able to improve our I/O speed by even a factor of 2. Note, the results of these timing tests are representative when the filesystem has cached the files (they were read recently). I have not checked how things scale when the filesystem has not cached the files, but it will almost certainly be slower across the board (the new objects may be less slow, though).

While reading 1804 channels (all available channels) from 1000 seconds of data: (1186963840, 1186964840)

  • PredictiveKWMClassifierData: 24.850 +/- 0.719
  • KWMClassifierData: 26.064 +/- 1.693
  • PredictiveKWSClassifierData: 25.514 +/- 1.674
  • KWSClassifierData: 24.610 +/- 0.721

While reading 1 channel from 1000 seconds of data: (1186963840, 1186964840)

  • PredictiveKWMClassifierData: 11.256 +/- 0.477 sec
  • KWMClassifierData: 16.618 +/- 5.055
  • PredictiveKWSClassifierData: 0.001 +/- something less than 0.001
  • KWSClassifierData: 0.005 +/- 0.001

It looks like this will not be a large gain and we should look elsewhere. In particular, we should test

  • running the full idq-train pipeline with these ClassifierData objects and timing that
  • changing how FeatureVectors are instantiated so they each get a much smaller ClassifierData object instead of one big one
  • look at optimizations within FeatureVector.vectorize, like relying on sorted triggers and memoization.

Merge request reports