... | ... | @@ -12,12 +12,12 @@ The `guardctrl` package is available through the [LIGO Debian apt archives](http |
|
|
```shell
|
|
|
$ wget http://software.ligo.org/lscsoft/debian/pool/contrib/l/lscsoft-archive-keyring/lscsoft-archive-keyring_2016.06.20-2_all.deb
|
|
|
$ wget https://apt.ligo-wa.caltech.edu/debian/pool/stretch/cdssoft-release-stretch/cdssoft-release-stretch_1.3.0_all.deb
|
|
|
$ dpkg -i *.deb
|
|
|
$ apt-get update
|
|
|
$ sudo dpkg -i *.deb
|
|
|
$ sudo apt-get update
|
|
|
```
|
|
|
Once that archive is enabled the package can be installed directly:
|
|
|
```shell
|
|
|
# apt install guardctrl
|
|
|
$ sudo apt install guardctrl
|
|
|
```
|
|
|
The `guardctrl` package depends on `guardian` package, so you'll automatically get them both. guardctrl will install the command line interface, as well as all the needed systemd service unit files.
|
|
|
|
... | ... | @@ -27,18 +27,18 @@ The `guardctrl` package depends on `guardian` package, so you'll automatically g |
|
|
|
|
|
For the LIGO site installations we want to run everything under the `guardian` user. We therefore start by creating the `guardian` user account on the machine:
|
|
|
```shell
|
|
|
# adduser --gecos '' --uid 1010 --ingroup controls --disabled-password guardian
|
|
|
$ sudo adduser --gecos '' --uid 1010 --ingroup controls --disabled-password guardian
|
|
|
```
|
|
|
We use `uid=1010` to not collide with any of the other standard system users, and we add it to the `controls` group. (NOTE: For a site setup where guardctrl will be accessed through a ~passwordless-SSH-interface, the guardian user should not have a password. Otherwise the guardian user can have a password as usual.)
|
|
|
We use `uid=1010` so as not to collide with any of the other standard system users, and we add it to the `controls` group. (NOTE: For a site setup, where guardctrl will be accessed through a ~passwordless-SSH-interface, the guardian user should not have a password. Otherwise the guardian user can have a password as usual.)
|
|
|
|
|
|
Once we've got the user that will handle supervision, we touch the `~/.guardctrl-home` file (as the `guardian` user) in the user's home directory:
|
|
|
Once we've got the user that will handle supervision, we touch the `~/.guardctrl-home` file in that user's home directory:
|
|
|
```shell
|
|
|
$ -c "touch ~/.guardctrl-home"
|
|
|
$ sudo -u guardian touch ~guardian/.guardctrl-home
|
|
|
```
|
|
|
|
|
|
Finally, create and enable a `guardian.target` unit for auto-starting nodes on startup:
|
|
|
Finally, create and enable a `guardian.target` unit in the user's config for auto-starting nodes on startup:
|
|
|
```shell
|
|
|
$ cat > ~/.config/systemd/user/guardian.target
|
|
|
# ~guardian/.config/systemd/user/guardian.target
|
|
|
[Unit]
|
|
|
Description=Advanced LIGO Guardian target
|
|
|
|
... | ... | @@ -51,14 +51,13 @@ $ systemctl --user enable guardian.target |
|
|
|
|
|
### user systemd persistence
|
|
|
|
|
|
Once desired guardian user account is ready, we need to inform the system systemd instance that the user is "persistent", so that it's `systemd --user` process won't be shut down if the user is not logged in. We do this with `loginctl enable-linger`. So if we intend to run under the `guardian` user, the correct command is:
|
|
|
Once the desired guardian user account is ready, we need to inform the system systemd instance that the user is "persistent", so that it's `systemd --user` process won't be shut down if the user is not logged in. We do this with `loginctl enable-linger`. So if we intend to run under the `guardian` user, the correct command is:
|
|
|
```shell
|
|
|
# loginctl enable-linger guardian
|
|
|
$ sudo loginctl enable-linger guardian
|
|
|
```
|
|
|
In addition, you'll need to extend the startup timeout for this user, since starting all the guardian processes at boot takes awhile. 10 minutes should be enough, but this can be adjusted. We handle this with a system-level "drop-in" for the relevant user's service:
|
|
|
In addition, you'll need to extend the startup timeout for this user, since starting all the guardian processes at boot takes awhile. 10 minutes should be enough, but this can be adjusted. We handle this with a system-level "drop-in" for the relevant user's service (NOTE: the number after the \@ is the relevant user's uid):
|
|
|
```
|
|
|
# mkdir /etc/systemd/system/user\@1010.service.d
|
|
|
# cat > /etc/systemd/system/user\@1010.service.d/timeout.conf
|
|
|
# /etc/systemd/system/user\@1010.service.d/timeout.conf
|
|
|
[Service]
|
|
|
TimeoutStartSec=10min
|
|
|
```
|
... | ... | @@ -67,14 +66,13 @@ TimeoutStartSec=10min |
|
|
|
|
|
Because we want caRepeater to be running system-wide before starting any of the guardian nodes, we declare a dependency of the guardian user of the caRepeater service. This is also done with a drop-in:
|
|
|
```
|
|
|
# cat > /etc/systemd/system/user\@1010.service.d/ca.conf
|
|
|
# /etc/systemd/system/user\@1010.service.d/ca.conf
|
|
|
[Unit]
|
|
|
Wants=caRepeater.service
|
|
|
After=caRepeater.service
|
|
|
```
|
|
|
NOTE: the above assumes the existence of a `caRepeater.service`, so verify that it's there already, or create it if it doesn't exist:
|
|
|
```shell
|
|
|
# systemctl cat caRepeater.service
|
|
|
# /etc/systemd/system/caRepeater.service
|
|
|
[Unit]
|
|
|
Description=EPICS caRepeater
|
... | ... | @@ -93,8 +91,9 @@ WantedBy=multi-user.target |
|
|
|
|
|
### configure journald for persistent logs
|
|
|
|
|
|
For the LIGO sites, we want to store logs from all guardian processes in perpetuity. To do this, the journald system logger needs to be configured to store all logs indefinitely. This is done by setting `Storage=persistent` in `/etc/systemd/journald.conf` (included below are some other variables for increasing the log rate limit, and for increasing the disk storage limits for the logs):
|
|
|
For the LIGO sites, we want to store logs from all guardian processes in perpetuity. To this end, the journald system logger needs to be configured for "persistent" storage. This is done by setting `Storage=persistent` in `/etc/systemd/journald.conf` (included below are some other variables for increasing the log rate limit, and for increasing the disk storage limits for the logs):
|
|
|
```
|
|
|
# /etc/systemd/journald.conf
|
|
|
[Journal]
|
|
|
Storage=persistent
|
|
|
RateLimitBurst=100000
|
... | ... | @@ -103,14 +102,14 @@ SystemMaxFiles=100000 |
|
|
|
|
|
```
|
|
|
```shell
|
|
|
# systemctl force-reload systemd-journald
|
|
|
$ sudo systemctl force-reload systemd-journald
|
|
|
```
|
|
|
|
|
|
### specify local environment
|
|
|
|
|
|
The `guardian@.service` expects an `/etc/guardian/local-env` environment file to exist, for providing any needed environment variables to the supervised guardian processes. Here's an example of the file for H1 at LHO:
|
|
|
```shell
|
|
|
# cat /etc/guardian/local-env
|
|
|
# /etc/guardian/local-env
|
|
|
IFO=H1
|
|
|
SITE=LHO
|
|
|
GUARD_CHANFILE=/opt/rtcds/userapps/release/cds/h1/daqfiles/ini/H1EDCU_GRD.ini
|
... | ... | @@ -133,7 +132,7 @@ auth [success=1 default=ignore] pam_unix.so nullok |
|
|
```
|
|
|
|
|
|
We then add to the sshd_config a special "Match" stanza for the guardctrl user which specifies that it may login without a password, but is forced to execute only a single command (`guardctrl`). On most systems this would go in `/etc/ssh/sshd_config`:
|
|
|
```/etc/ssh/sshd_config
|
|
|
```
|
|
|
Match User guardian
|
|
|
PermitEmptyPasswords yes
|
|
|
PermitTTY yes
|
... | ... | @@ -143,7 +142,7 @@ Match User guardian |
|
|
```
|
|
|
After adding the Match stanza, reload sshd:
|
|
|
```shell
|
|
|
# systemctl force-reload sshd
|
|
|
$ sudo systemctl force-reload sshd
|
|
|
```
|
|
|
If for some reason you need to pass special environment variables to `guardctrl`, you can point the ForceCommand to something like `/etc/guardian/guardctrl-ssh-bridge` which can be a shell script that sets the needed environment and then execs `/usr/bin/guardctrl` (without arguments). Be sure to set your wrapper script to be executable.
|
|
|
|
... | ... | @@ -181,6 +180,6 @@ If you happen to be cursed with segfaulting processes, here are some things that |
|
|
|
|
|
NOTE: the coredump files will expire and be cleaned out by 3 days by default. To completely remove this expiration, creating the following file to override the defaults:
|
|
|
```shell
|
|
|
# cat > /etc/tmpfiles.d/00_coredump.conf
|
|
|
# /etc/tmpfiles.d/00_coredump.conf
|
|
|
d /var/lib/systemd/coredump 0755 root root -
|
|
|
``` |
|
|
\ No newline at end of file |