|
|
# GUARDCTRL systemd process supervision
|
|
|
|
|
|
`guardctrl` is a tool for managing guardian node processes. It is essentially just a convenient wrapper around [systemd](https://www.freedesktop.org/wiki/Software/systemd/), the built-in init and service supervision system standard on all major linux distributions. `guardctrl` uses systemd and journald to take care of starting/stopping/tracking guardian daemons and capturing/viewing their log messages.
|
|
|
`guardctrl` is a tool for managing guardian node processes. It is essentially just a convenient wrapper around [systemd](https://www.freedesktop.org/wiki/Software/systemd/), the built-in init and service supervision system standard on all major linux distributions. guardctrl uses systemd and journald to take care of starting/stopping/tracking guardian daemons and capturing/viewing their log messages.
|
|
|
|
|
|
Each guardian process is handled by a systemd templated service unit, `guardian@.service`, which describes how the processes should be supervised by systemd.
|
|
|
|
... | ... | @@ -23,42 +23,33 @@ The `guardctrl` package depends on `guardian` package, so you'll automatically g |
|
|
|
|
|
### creating and configuring the guardctrl user
|
|
|
|
|
|
`guardctrl` expects to be using the `systemd --user` instance of the invoking user. This means that `guardctrl` should always be invoked as the same user so that processes are managed in a unified way. The `guardctrl` interface knows it's running as the correct user by the presence of the `~/.guardctrl-home` file. If this file is not present, guardctrl will assume it's running remotely and will try to ssh to GUARDCTRL_USER@GUARDCTRL_HOST to issue the command.
|
|
|
`guardctrl` uses the `systemd --user` instance of the invoking user. This means that `guardctrl` should always be invoked as the same user so that processes are managed in a unified way. The guardctrl interface knows it's running as the correct user by the presence of the `~/.guardctrl-home` file. If this file is not present, guardctrl will assume it's running remotely and will try to ssh to GUARDCTRL_USER@GUARDCTRL_HOST to issue the command.
|
|
|
|
|
|
For the LIGO site installations we want to run everything under the `guardian` user. We therefore start by creating the `guardian` user account on the machine:
|
|
|
For the LIGO site installations we run everything under the `guardian` user. We therefore start by creating the `guardian` user account on the machine:
|
|
|
```shell
|
|
|
$ sudo adduser --gecos '' --uid 1010 --ingroup controls --disabled-password guardian
|
|
|
```
|
|
|
We use `uid=1010` so as not to collide with any of the other standard system users, and we add it to the `controls` group. (NOTE: For a site setup, where guardctrl will be accessed through a ~passwordless-SSH-interface, the guardian user should not have a password. Otherwise the guardian user can have a password as usual.)
|
|
|
LIGO uses `uid=1010` so as not to collide with any of the other standard system users, and the `controls` group, but there is no requirement on those configurations. (NOTE: For a site setup, where guardctrl will be accessed through a ~passwordless-SSH-interface, the guardian user should not have a password. Otherwise the guardian user can have a password as usual.)
|
|
|
|
|
|
Once we've got the user that will handle supervision, we touch the `~/.guardctrl-home` file in that user's home directory:
|
|
|
```shell
|
|
|
$ sudo -u guardian touch ~guardian/.guardctrl-home
|
|
|
```
|
|
|
*The rest of the non-root commands specified below are assumed to be run as the guardctrl user created above (in this case the `guardian` user).*
|
|
|
|
|
|
Finally, create and enable a `guardian.target` unit in the user's config for auto-starting nodes on startup:
|
|
|
Create the `~/.guardctrl-home` file in the guardctrl (guardian) user's home directory, to indicate that this is the user handling systemd supervision:
|
|
|
```shell
|
|
|
# ~guardian/.config/systemd/user/guardian.target
|
|
|
[Unit]
|
|
|
Description=Advanced LIGO Guardian target
|
|
|
|
|
|
[Install]
|
|
|
WantedBy=default.target
|
|
|
$ touch ~guardian/.guardctrl-home
|
|
|
```
|
|
|
Inform the user's systemd session about the changes, and enable the guardian target for startup on boot:
|
|
|
|
|
|
Finally, enable the `guardian.target` unit for auto-starting nodes on startup:
|
|
|
```shell
|
|
|
$ systemctl --user daemon-reload
|
|
|
$ systemctl --user enable guardian.target
|
|
|
```
|
|
|
(This can probably just be (and should be) provided as part of the `guardctrl` package.)
|
|
|
|
|
|
### user systemd persistence
|
|
|
|
|
|
Once the desired guardian user account is ready, we need to inform the system systemd instance that the user is "persistent", so that it's `systemd --user` process won't be shut down if the user is not logged in. We do this with `loginctl enable-linger`. So if we intend to run under the `guardian` user, the correct command is:
|
|
|
Once the desired guardian user account is ready, we need to inform the system systemd instance that the user is "persistent". This prevents systemd from shutting down the `systemd --user` process when the user is not logged in. We do this with `loginctl enable-linger`:
|
|
|
```shell
|
|
|
$ sudo loginctl enable-linger guardian
|
|
|
```
|
|
|
You might also need to extend the startup timeout for this user, as starting all the guardian processes at boot might take awhile. 10 minutes should be enough, but this can be adjusted. We handle this with a system-level "drop-in" for the relevant user's service (NOTE: the number after the \@ is the relevant user's uid):
|
|
|
You might also need to extend the startup timeout for this user, as starting all the guardian processes at boot can take awhile if there are a lot of processes. 10 minutes should be enough, but this can be adjusted. We handle this with a system-level "drop-in" for the relevant user's user service (NOTE: the number after the escaped '\@' is the relevant user's uid):
|
|
|
```
|
|
|
# /etc/systemd/system/user\@1010.service.d/timeout.conf
|
|
|
[Service]
|
... | ... | @@ -67,7 +58,7 @@ TimeoutStartSec=10min |
|
|
|
|
|
### caRepeater service
|
|
|
|
|
|
Because we want caRepeater to be running system-wide before starting any of the guardian nodes, we declare a dependency of the guardian user of the caRepeater service. This is also done with a drop-in:
|
|
|
It's good to make sure that the EPICS caRepeater is running system-wide before starting any of the guardian nodes. It's therefore good to declare a dependency of the guardian user on the caRepeater service. This can also be done with a drop-in:
|
|
|
```
|
|
|
# /etc/systemd/system/user\@1010.service.d/ca.conf
|
|
|
[Unit]
|
... | ... | @@ -92,7 +83,7 @@ WantedBy=multi-user.target |
|
|
|
|
|
### configuring journald for persistent logs
|
|
|
|
|
|
For the LIGO sites, we want to store logs from all guardian processes in perpetuity. To this end, the journald system logger needs to be configured for "persistent" storage. This is done by setting `Storage=persistent` in `/etc/systemd/journald.conf` (included below are some other variables for increasing the log rate limit, and for increasing the disk storage limits for the logs):
|
|
|
The LIGO setups store logs from all guardian processes in perpetuity. To this end, the journald system logger is configured for "persistent" storage. This is done by setting `Storage=persistent` in `/etc/systemd/journald.conf` (included below are some other variables for increasing the log rate limit, and for increasing the disk storage limits for the logs):
|
|
|
```
|
|
|
# /etc/systemd/journald.conf
|
|
|
[Journal]
|
... | ... | |