guardctrl is a tool for managing guardian node processes. It is essentially just a convenient wrapper around systemd, the built-in init and service supervision system standard on all major linux distributions. guardctrl uses systemd and journald to take care of starting/stopping/tracking guardian daemons and capturing/viewing their log messages.
Under the hood each guardian process is handled by a systemd templated service unit, guardian@.service, which describes how the processes should be supervised by systemd.
guardctrl host setup
This section describes how to setup a computer as a guardctrl host.
Once that archive is enabled the package can be installed directly:
$ sudo apt install guardctrl
The guardctrl package depends on guardian package, so you'll automatically get them both. guardctrl will install the command line interface, as well as all the needed systemd service unit files.
creating and configuring the guardctrl user
guardctrl uses the systemd --user instance of the invoking user. This means that guardctrl should always be invoked as the same user so that processes are managed in a unified way. The guardctrl interface knows it's running as the correct user by the presence of the ~/.guardctrl-home file. If this file is not present, guardctrl will assume it's running remotely and will try to ssh to GUARDCTRL_USER@GUARDCTRL_HOST to issue the command.
For the LIGO site installations we run everything under the guardian user. We therefore start by creating the guardian user account on the machine:
LIGO uses uid=1010 so as not to collide with any of the other standard system users, and the controls group, but there is no requirement on those configurations. (NOTE: For a site setup, where guardctrl will be accessed through a ~passwordless-SSH-interface, the guardian user should not have a password. Otherwise the guardian user can have a password as usual.)
The rest of the non-root commands specified below are assumed to be run as the guardctrl user created above (in this case the guardian user).
Create the ~/.guardctrl-home file in the guardctrl (guardian) user's home directory, to indicate that this is the user handling systemd supervision:
$ touch ~guardian/.guardctrl-home
Finally, enable the guardian.target unit for auto-starting nodes on startup:
$ systemctl --userenable guardian.target
user systemd persistence
Once the desired guardian user account is ready, we need to inform the system systemd instance that the user is "persistent". This prevents systemd from shutting down the systemd --user process when the user is not logged in. We do this with loginctl enable-linger, with the guardian user as argument:
$ sudo loginctl enable-linger guardian
You might also need to extend the startup timeout for this user, as starting all the guardian processes at boot can take awhile if there are a lot of processes. 10 minutes should be enough, but this can be adjusted. We handle this with a system-level "drop-in" for the relevant user's user service (NOTE: the number after the escaped \@ is the relevant user's uid):
It's good to make sure that the EPICS caRepeater is running system-wide before starting any of the guardian nodes. It's therefore good to declare a dependency of the guardian user on the caRepeater service. This can also be done with a drop-in:
The LIGO setups store logs from all guardian processes in perpetuity. To this end, the journald system logger is configured for "persistent" storage. This is done by setting Storage=persistent in /etc/systemd/journald.conf (included below are some other variables for increasing the log rate limit, and for increasing the disk storage limits for the logs):
Reload the journald config after these changes are made:
$ sudo systemctl force-reload systemd-journald
specifying local environment
The guardian@.service expects an /etc/guardian/local-env environment file to exist, for providing any needed environment variables to the supervised guardian processes. Here's an example of the file for H1 at LHO:
The IFO and SITE variables should be set as expected.
The GUARD_CHANFILE variable points to the location where the guardian channel list ini file will be written, used by the CDS DAQ. This file location must be writable by the guardctrl user. (For the LIGO case above we touch the file and make it writable by the controls group, which the guardian user is a member of).
The best way to allow remote control of guardctrl is via ssh. For a site install on a protected network, where you want to allow "remote" users (i.e. users on the same network but on hosts other than the guardctrl host) to be able to control the nodes without entering a password, you can setup a passwordless ssh "ForceCommand" for the guardctrl user.
First, modify the system PAM stack to allow passwordless login via ssh. Usually PAM is configured to not allow passwordless login on anything except for special TTYs. To loosen that restriction, on Debian systems, we modify /etc/pam.d/common-auth to change the following line:
Then add to the sshd_config a special "Match" stanza for the guardctrl user which specifies that it may login without a password, but is forced to execute only a single command (guardctrl). On most systems this would go in /etc/ssh/sshd_config:
Match User guardian PermitEmptyPasswords yes PermitTTY yes X11Forwarding no AllowTcpForwarding no ForceCommand /usr/bin/guardctrl
After adding the Match stanza, reload sshd:
$ sudo systemctl force-reload sshd
If for some reason you need to pass special environment variables to guardctrl, you can point the ForceCommand to something like /etc/guardian/guardctrl-ssh-bridge which can be a shell script that sets the needed environment and then execs /usr/bin/guardctrl (without arguments). Make sure the wrapper script is executable.
local guardian user access
Occasionally it might be necessary to access the guardian user directly, via e.g a terminal. If passwordless SSH access has been enabled as described above, then it won't be possible to access a guardian user terminal via ssh directly, and you'll need to change user from root. However, su and sudo do not provide access to the user dbus session needed to interact with systemctl --user. The systemd-container package includes the machinectl interface whose shell command allows for a clean user environment with all dbus interfaces available:
root@h1guardian1:~# machinectl shell guardian@ /bin/bashguardian@h1guardian1:~$ systemctl --user status* h1guardian1 State: running Jobs: 0 queued Failed: 0 units Since: Tue 2018-02-27 15:44:08 PST; 11s ago CGroup: /firstname.lastname@example.org`-init.scope |-11818 /lib/systemd/systemd --user`-11820(sd-pam)guardian@h1guardian1:~$ exitlogoutConnection to the local host terminated.root@h1guardian1:~#
If you happen to be cursed with segfaulting processes, here are some things that might help.