CI: Launch jobs on OzSTAR via SSH (minimal example)
In response to https://support.hpc.swin.edu.au/browse/HPCDESK-3381
This is a simple example of how to submit a job on OzSTAR via the CI.
Some important things to keep in mind:
- The SSH key needs to be encrypted by a passphrase for security reasons.
- The SSH key should only be able to run the commands necessary to submit the job, nothing else.
- Any branch that can run this CI should be set to "protected". OzSTAR users must not share their logins with anyone else, so the only person who should be able to trigger this CI (i.e. push or merge to this branch) is the person who owns the OzSTAR account.
As an example, I have created the directory ~/ci_test
on OzSTAR ci_test.zip
In this example, start.sh
runs the slurm job hello.slurm
, which just prints "hello world".
Add the following line
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="~/ci_test/start.sh" ssh-rsa YOUR_PUBLIC_KEY...
to ~/.ssh/authorized_keys on OzSTAR. You should be making a new SSH keypair for this, because the options preceding the key restrict what the key can access. In this case, the command ~/ci_test/start.sh
is always run; nothing else. Essentially, doing
ssh user@ozstar.swin.edu.au
will always do
ssh user@ozstar.swin.edu.au "~/ci_test/start.sh"
no matter what arguments you pass. Since we are using srun
, the script will wait for the job to start and finish before returning. The output will be passed back to the CI where they can be checked.
In practice, you'll want to do a git clone
in start.sh
(which runs on farnarkle and has internet access), and then srun
the job. Although pbilby creates a bash script that runs sbatch for you (it's not clear to me what utility this adds), you'll want to bypass this and just do srun
directly.
You will need to set the following variables on the repo:
- OZSTAR_ADDRESS: ozstar.swin.edu.au
- OZSTAR_USER: [your username]
- OZSTAR_PRIVATE_KEY: [your NEW private key]
- OZSTAR_PASSPHRASE: [your passphrase]
Enable "protect variable" so that they cannot be accessed on non-protected branches.
This example runs successfully here: https://git.ligo.org/conrad.chan/parallel_bilby/-/jobs/1020933