VectorMath: add ScalarMax and ScaleAdd functions (!2297) · Merge requests · lscsoft / lalsuite

Karl Wette requested to merge ANU-CGA/lalsuite:vectorops-funcs into master Jun 11, 2024

Description

This MR adds 2 sets of functions to the VectorMath module in LAL:

ScalarMax: finds the (scalar) maximum M of a vector x_i over all array elements: M = \max_i x_i
ScaleAdd: performs a fused-multiply-add over vectors x_i, y_i, z_i: z_i = a x_i + y_i

Note: the ScaleAdd functions, despite the name, do not use the AVX FMA instructions sets. Because these instructions round their results differently (i.e. at the end, rather than any intermediate results), values computed can be significantly different from those computed using usual IEEE arithmetic (I found ~0.1% level differences were easily achieved). There also wasn't much of a performance improvement over using equivalent SSE/AVX intrinsics for multiplication/addition.

Also includes miscellaneous other fixes.

API Changes and Justification

Backwards Compatible Changes

This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions
This change adds new classes/functions/structs/types to a public C header file or Python module

Backwards Incompatible Changes

This change modifies an existing class/function/struct/type definition in a public C header file or Python module
This change removes an existing class/function/struct/type from a public C header file or Python module

Review Status

n/a

VectorMath: add ScalarMax and ScaleAdd functions

Description

API Changes and Justification

Backwards Compatible Changes

Backwards Incompatible Changes

Review Status

Merge request reports