Performance problems when using warp reduce in multiratespiir
When implementing determinism fixes, an atomicAdd was replaced by a warp reduction in multiratespiir. This produced a deterministic result but cost us a lot of performance (so much so that the pipeline no longer runs in real time).
For O4 we believe it’s acceptable to use the old, non-deterministic approach approved as part of the O3 review. The non-determinism has negligible impact on science output.
However it should be possible to get both — and in fact warp reduction should probably be faster — so this issue entails: [ ] Reproducing the problem [ ] Investigating why the slowdown happens [ ] If it’s an interesting reason, a short writeup [ ] Implementing some kind of fix [ ] Showing that fix is deterministic [ ] Showing that fix is faster