RCG: Optimize model calculation
Daniel has had several ideas about improving calculation speed to support fast ADC models, including using AVX512, increasing compiler optimization, combining multiple filters in one function.
I don't know how gcc treats object files compiled with different optimization levels and how compatible they are. If they are, one quick fix could be to put the IIR filter module into a proper .c file (without inlining it) and compile it with -O3 -mavx2 -mfma.
Looking at just the biquad, I get
13.8 sec for our current optimization flags (-O -ffast-math -m80387 -msse2) 10.5 sec by stepping up to -O2 8.84 sec by stepping up to -O3 8.42 sec with -O3 -mavx2 -mfma (adding fast-math actually makes it slower!) -O3 -march=haswell (or cannonlake) seems to produce the same result Not sure what the -m80387 option does in this case. Probably nothing.
Depending how much of our CPU budget is spent in calculating IIR filters this would give us a boost of <=1.6.
Daniel
PS. Chasing down an inconsistency in my performance measures, I found that the AVX2 code generated by specifying march=haswell or march=cannonlake is not the same. In the later, the compiled code uses 32 internal ymm registers, but only 16 for haswell. This is because for AVX512 the register set was increased by 2.
Also see svn repo
https://redoubt.ligo-wa.caltech.edu/svn/slowcontrols/trunk/EPICS/Utilities/BiquadAVX
for Daniel's AVX implementation.