GStreamer 1.0: Performance issues
I ran a bg run on !129 (merged) which failed due to timeout, /fred/oz016/tdavies/projects/gstreamer_python_upgrade/testing/bg/run1/019/
. The run has 1/10th as many zerolags and livetime as an old bg run, so it's unlikely that it just failed to exit cleanly.
Each run also takes longer than I believe is typical to start producing files (Bank stats and marginalized stats take around 2 hours 40 minutes, with a livetime of 14400 (4 hours), so just a little faster than realtime)
This could be multiple issues. E.g. maybe there's a performance issue And a freeze, or a hang, or I'm misunderstanding the error message
slurmstepd: error: *** STEP 31797249.0 ON john74 CANCELLED AT 2022-12-27T17:49:49 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 31797249 ON john74 CANCELLED AT 2022-12-27T17:49:49 DUE TO TIME LIMIT ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
(It doesn't say DUE TO TIME LIMIT
if I kill it manually, right?)
The likely culprits are
- The cuda memcpy changes (I believe I'm reallocating the memory every buffer instead of once and reusing it.)
- Removed async from various cuda functions (to simplify debugging)
Edited by Timothy Davies