VectorMath: add ScalarMax and ScaleAdd functions
Description
This MR adds 2 sets of functions to the VectorMath module in LAL:
-
ScalarMax
: finds the (scalar) maximumM
of a vectorx_i
over all array elements:M = \max_i x_i
-
ScaleAdd
: performs a fused-multiply-add over vectorsx_i
,y_i
,z_i
:z_i = a x_i + y_i
Note: the ScaleAdd
functions, despite the name, do not use the AVX FMA instructions sets. Because these instructions round their results differently (i.e. at the end, rather than any intermediate results), values computed can be significantly different from those computed using usual IEEE arithmetic (I found ~0.1% level differences were easily achieved). There also wasn't much of a performance improvement over using equivalent SSE/AVX intrinsics for multiplication/addition.
Also includes miscellaneous other fixes.
API Changes and Justification
Backwards Compatible Changes
-
This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions -
This change adds new classes/functions/structs/types to a public C header file or Python module
Backwards Incompatible Changes
-
This change modifies an existing class/function/struct/type definition in a public C header file or Python module -
This change removes an existing class/function/struct/type from a public C header file or Python module
Review Status
n/a