CIT deployment fails on node149
Running a HTCondor CIT deployment spits out an error in logs/postcohspiir-<condor_job_id>-job007
(sometimes job001, but always on node149
)
Error unknown error at line 1421 in file multiratespiir/multiratespiir.c
This may be due to a faulty GPU on that specific node or specific driver version;
NVIDIA 465.19.01 CUDA 11.3 on node149,
NVIDIA 460.32.83 CUDA 11.2 on others.
Edited by Luke Davis