Compare revisions

6cd054da · ff6a2d8b · a23b2179 · 2b315d21 · eab17d43 · dedc9c9f
--- a/.gitignore
+++ b/.gitignore
 *.pyc
 *~
+locklost/version.py
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
+# https://computing.docs.ligo.org/gitlab-ci-templates
+include:
+  - project: computing/gitlab-ci-templates
+    file:
+      - conda.yml
+      - python.yml
+
+
+stages:
+  - lint
+
+
+# https://computing.docs.ligo.org/gitlab-ci-templates/python/#.python:flake8
+lint:
+  extends:
+    - .python:flake8
+  stage: lint
+  needs: []
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -14,8 +14,8 @@ cluster.
 make sure all code is python3 compatible.

 The `locklost` project uses [git](https://git-scm.com/) and is hosted
-at [git.ligo.org](https://git.ligo.org).  Issues are tracked via the
-[GitLab issue
+at [git.ligo.org](https://git.ligo.org/jameson.rollins/locklost).
+Issues are tracked via the [GitLab issue
 tracker](https://git.ligo.org/jameson.rollins/locklost/issues).


@@ -98,8 +98,11 @@ exec python3 -m locklost.web
 Note the `LOCKLOST_WEB_ROOT` variable, which tells the cgi script where
 the base URL of the web pages.

-Finally, make the script executable:
+Finally, make sure the web script is executable and the web directory
+tree is only owner writable:
 ```shell
+$ chmod 755 ~/public_html
+$ chmod 755 ~/public_html/lockloss
 $ chmod 755 ~/public_html/lockloss/index.cgi
 ```


--- a/README.md
+++ b/README.md
@@ -12,77 +12,227 @@ losses".  It consists of four main components:
  follow-up analyses.
 * `web` interface to view lock loses event pages.

-A command line interface is provided to launch the `online` analysis
-and condor jobs to `search` for and `analyze` events.
+The `locklost` command line interface provides access to all of the
+above functionality.

+The LIGO sites have `locklost` deployments that automatically find and
+analyze lock loss events.  The site lock loss web pages are available
+at the following URLs:

-# Deployments
+* [H1: https://ldas-jobs.ligo-wa.caltech.edu/~lockloss/](https://ldas-jobs.ligo-wa.caltech.edu/~lockloss/)
+* [L1: https://ldas-jobs.ligo-la.caltech.edu/~lockloss/](https://ldas-jobs.ligo-la.caltech.edu/~lockloss/)
+
+If you notice any issues please [file a bug
+report](https://git.ligo.org/jameson.rollins/locklost/-/issues), or if
+you have any questions please contact the ["lock-loss" group on
+chat.ligo.org](https://chat.ligo.org/ligo/channels/lock-loss).
+
+
+# Analysis plugins
+
+Lock loss event analysis is handled by a set of [analysis
+"plugins"](/locklost/plugins/).  Each plugin is registered in
+[`locklost/plugins/__init__.py`](/locklost/plugins/__init__.py).  Some
+of the currently enabled plugins are:
+
+* `discover.discover_data` wait for data to be available
+* `refine.refine_event` refine event time
+* `saturations.find_saturations` find saturating channels before event
+* `lpy.find_lpy` find length/pitch/yaw oscillations in suspensions
+* `glitch.analyze_glitches` look for glitches around event
+* `overflows.find_overflows` look for ADC overflows
+* `state_start.find_lock_start` find the start the lock leading to
+  current lock loss
+
+Each plugin does it's own analysis, although some depend on the output
+of other plugins.  The output from any plugin (e.g. plots or data)
+should be written into the event directory.
+
+
+# Site deployments

 Each site (LHO and LLO) has a dedicated "lockloss" account on their
-local LDAS cluster where `locklost` is running:
+local LDAS cluster where `locklost` is deployed.  The `locklost`
+command line interface is available when logging into these acocunts.
+
+NOTE: these accounts are for the managed `locklost` deployments
+**ONLY**.  These are specially configured.  Please don't change the
+configuration or run other jobs in these accounts unless you know what
+you're doing.
+
+The online searches and other periodic maintenance tasks are handled
+via `systemd --user` on the dedicated "detchar.ligo-?a.caltech.edu"
+nodes.  `systemd --user` is a user-specific instance of the
+[systemd](https://systemd.io/) process supervision daemon.  The [arch
+linux wiki](https://wiki.archlinux.org/title/Systemd) has a nice intro
+to systemd.
+
+
+## online systemd service
+
+The online search service is called `locklost-online.service`.  To
+control and view the status of the process use the `sysctemctl`
+command, e.g.:
+```shell
+$ systemctl --user status locklost-online.service
+$ systemctl --user start locklost-online.service
+$ systemctl --user restart locklost-online.service
+$ systemctl --user stop locklost-online.service
+```
+
+To view the logs from the online search user `journalctl`:
+```shell
+$ journalctl --user-unit locklost-online.service -f
+```
+The `-f` options tells journalctl to "follow" the logs and print new
+log lines as they come in.
+
+The "service unit" files for the online search service (and the timer
+services described below) that describe how the units should be
+managed by systemd are maintained in the [support](/support/systemd)
+directory in the git source.

-* https://ldas-jobs.ligo-wa.caltech.edu/~lockloss/
-* https://ldas-jobs.ligo-la.caltech.edu/~lockloss/

-These accounts have deployments of the `locklost` package and command
-line interface, run the production online and followup condor jobs,
-and host the web pages.
+## systemd timers
+
+Instead of cron, the site deployments use `systemd timers` to handle
+periodic background jobs.  The enabled timers and their schedules can
+be viewed with:
+```shell
+$ systemctl --user list-timers
+```
+
+Timers are kind of like other services, which can be started/stopped,
+enabled/disabled, etc.  Unless there is some issue they should usually
+just be left alone.
+

 ## deploying new versions

 When a new version is ready for release, create an annotated tag for
-the release and push it to the main repo (https://git.ligo.org/jameson.rollins/locklost):
+the release and push it to the [main
+repo](https://git.ligo.org/jameson.rollins/locklost):
 ```shell
-$ git tag -m "release" 0.16
+$ git tag -m release 0.22.1
 $ git push --tags
 ```
-In the LDAS lockloss account, pull the new tag and run
-the test/deploy script:
+In the "lockloss" account on the LDS "detchar" machines, pull the new
+release and run the `deploy` script, which should automatically run the
+tests before installing the new version:
 ```shell
 $ ssh lockloss@detchar.ligo-la.caltech.edu
-$ cd ~/src/locklost
+$ cd src/locklost
 $ git pull
-$ locklost-deploy
-```
-If there are changes to the online search, restart the online condor
-process, otherwise analysis changes should get picked up automatically
-in new condor executions:
-```shell
-$ locklost online restart
+$ ./deploy
 ```
+The deployment should automatically install any updates to the online
+search, and restart the search service.


-# Usage
+## common issues

-To start/stop/restart the online analysis use the `online` command:
-```shell
-$ locklost online start
+There are a couple of issues that crop up periodically, usually due to
+problems with the site LDAS clusters where the jobs run.
+
+
+### online analysis restarting due to NDS problems
+
+One of the most common problems is that the online analysis is falling
+over because it can't connect to the site NDS server, for example:
+```
+2021-05-12_17:18:31 2021-05-12 17:18:31,317 NDS connect: 10.21.2.4:31200
+2021-05-12_17:18:31 Error in write(): Connection refused
+2021-05-12_17:18:31 Error in write(): Connection refused
+2021-05-12_17:18:31 Traceback (most recent call last):
+2021-05-12_17:18:31   File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
+2021-05-12_17:18:31     "__main__", mod_spec)
+2021-05-12_17:18:31   File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
+2021-05-12_17:18:31     exec(code, run_globals)
+2021-05-12_17:18:31   File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.3-py3.6.egg/locklost/online.py", line 109, in <module>
+2021-05-12_17:18:31     stat_file=stat_file,
+2021-05-12_17:18:31   File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.3-py3.6.egg/locklost/search.py", line 72, in search_iterate
+2021-05-12_17:18:31     for bufs in data.nds_iterate([channel], start_end=segment):
+2021-05-12_17:18:31   File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.3-py3.6.egg/locklost/data.py", line 48, in nds_iterate
+2021-05-12_17:18:31     with closing(nds_connection()) as conn:
+2021-05-12_17:18:31   File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.3-py3.6.egg/locklost/data.py", line 28, in nds_connection
+2021-05-12_17:18:31     conn = nds2.connection(HOST, PORT)
+2021-05-12_17:18:31   File "/usr/lib64/python3.6/site-packages/nds2.py", line 3172, in __init__
+2021-05-12_17:18:31     _nds2.connection_swiginit(self, _nds2.new_connection(*args))
+2021-05-12_17:18:31 RuntimeError: Failed to establish a connection[INFO: Error occurred trying to write to socket]
 ```
-This launches a condor job that runs the online analysis.

-To launch a condor job to search for lock losses within some time
-window:
-```shell
-$ locklost search --condor START END
+This is usually because the NDS server itself has died and needs to be
+restarted/reset (frequently due to Tuesday maintenance).
+Unfortunately the site admins aren't necessarily aware of this issue
+and need to be poked about it.  Once the NDS server is back the job
+should just pick up on it's own.
+
+
+### analyze jobs failing because of cluster data problems
+
+Another common failure mode is failing follow-up analysis jobs due to
+data access problems in the cluster.  These often occur during Tuesday
+maintenance, but often mysteriously at other times as well.  These
+kinds of failures are indicated by the following exceptions in the
+event analyze log:
 ```
-This will find lock losses with the specified time range, but will not
-run the follow-up analyses.  This is primarily needed to backfill
-times when the online analysis was not running (see below).
+2021-05-07 16:02:51,722 [analyze.analyze_event] exception in discover_data:
+Traceback (most recent call last):
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/analyze.py", line 56, in analyze_event
+    func(event)
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/plugins/discover.py", line 67, in discover_data
+    raise RuntimeError("data discovery timeout reached, data not found")
+RuntimeError: data discovery timeout reached, data not found
+```
+or:
+```
+2021-05-07 16:02:51,750 [analyze.analyze_event] exception in find_previous_state:
+Traceback (most recent call last):
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/analyze.py", line 56, in analyze_event
+    func(event)
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/plugins/history.py", line 26, in find_previous_state
+    gbuf = data.fetch(channels, segment)[0]
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/data.py", line 172, in fetch
+    bufs = func(channels, start, stop)
+  File "/home/lockloss/.local/lib/python3.6/site-packages/locklost-0.21.0-py3.6.egg/locklost/data.py", line 150, in frame_fetch_gwpy
+    data = gwpy.timeseries.TimeSeriesDict.find(channels, start, stop, frametype=config.IFO+'_R')
+  File "/usr/lib/python3.6/site-packages/gwpy/timeseries/core.py", line 1291, in find
+    on_gaps="error" if pad is None else "warn",
+  File "/usr/lib/python3.6/site-packages/gwpy/io/datafind.py", line 335, in wrapped
+    return func(*args, **kwargs)
+  File "/usr/lib/python3.6/site-packages/gwpy/io/datafind.py", line 642, in find_urls
+    on_gaps=on_gaps)
+  File "/usr/lib/python3.6/site-packages/gwdatafind/http.py", line 433, in find_urls
+    raise RuntimeError(msg)
+RuntimeError: Missing segments: 
+[1304463181 ... 1304463182)
+```
+
+This problem sometimes just corrects itself, but often needs admin
+poking as well.
+

-Any time argument ('START', 'END', 'TIME', etc.) can be either a GPS
-times or a full (even relative) date/time string, e.g. '1 week ago'.
+## back-filling events

-To run a full analysis on a specific lock loss time found from the
-search above:
+After things have recovered from any of the issues mentioned above,
+you'll probably want to back-fill any missed events.  The best way to
+do that is to run a condor `search` for missed event (e.g. from "4
+weeks ago" until "now"):
 ```shell
-$ locklost analyze TIME
+$ locklost search --condor '4 weeks ago' now
 ```
-To launch a condor job to analyze all un-analyzed events within a time
-range:
+
+The search will find lock loss events, but it will not run the
+follow-up analyses.  To run the full analysis on any un-analyzed
+events, run the condor `analyze` command over the same time range,
+e.g.:
 ```shell
-$ locklost analyze --condor START END
+$ locklost analyze --condor '4 weeks ago' now
 ```
-To re-analyze events add the `--rerun` flag e.g.:
+The `analyze` command will analyze both missed/new events but also
+"failed" events.
+
+You can also re-analyze a specific event with the `--rerun` flag:
 ```shell
 $ locklost analyze TIME --rerun
 ```
@@ -91,38 +241,19 @@ or
 $ locklost analyze --condor START END --rerun
 ```

-It has happened that analysis jobs are improperly killed by condor,
-not giving them a chance to clean up their run locks.  The site
-locklost deployments include a command line utility to find and remove
+Occaissionally analysis jobs are improperly killed by condor, not
+giving them a chance to clean up their run locks.  The find and remove
 any old, stale analysis locks:
 ```shell
-$ find-locks -r
+$ locklost find-locks -r
 ```

-
-# Analysis plugins
-
-Lock loss event analysis consists of a set of follow-up "plugin"
-analyses, located in the `locklost.plugins` sub-package:
-
-* [`locklost/plugins/`](/locklost/plugins/)
-
-Each follow-up module is registered in
-[`locklost/plugins/__init__.py`](/locklost/plugins/__init__.py).  Some
-of the currently enabled follow-up are:
-
-* `discover.discover_data` wait for data to be available
-* `refine.refine_event` refine event time
-* `saturations.find_saturations` find saturating channels before event
-* `lpy.find_lpy` find length/pitch/yaw oscillations in suspensions
-* `glitch.analyze_glitches` look for glitches around event
-* `overflows.find_overflows` look for ADC overflows
-* `state_start.find_lock_start` find the start the lock leading to
-  current lock loss
-
-Each plugin does it's own analysis, although some depend on the output
-of other analyses.  The output from any analysis (e.g. plots or data)
-should be written into the event directory.
+We also have timer services available to handle the backfilling after
+Tuesday maintenance if need be:
+```shell
+$ systemctl --user enable --now locklost-search-backfill.timer
+$ systemctl --user enable --now locklost-analyze-backfill.timer
+```


 # Developing and contributing

--- a/deploy
+++ b/deploy
+#!/bin/bash -ex
+
+if [[ $(hostname -f) =~ detchar.ligo-[lw]+a.caltech.edu ]] ; then
+    hostname -f
+else
+    echo "This deploy script is intended to be run only on the detchar.ligo-{l,w}.caltech.edu hosts."
+    exit 1
+fi
+
+CONDA_ROOT=/cvmfs/software.igwn.org/conda
+CONDA_ENV=igwn-py39
+VENV=~/opt/locklost-${CONDA_ENV}
+
+if [[ "$1" == '--setup' ]] ; then
+    shift
+    source ${CONDA_ROOT}/etc/profile.d/conda.sh
+    conda activate ${CONDA_ENV}
+    python3 -m venv --system-site-packages ${VENV}
+    ln -sfn ${VENV} ~/opt/locklost 
+fi
+
+source ${VENV}/bin/activate
+
+git pull --tags
+
+if [[ "$1" == '--skip-test' ]] ; then
+    echo "WARNING: SKIPPING TESTS!"
+else
+    ./test/run
+fi
+
+pip install .
+
+# install -t ~/bin/ support/bin/*
+install -m 644 -t ~/.config/systemd/user support/systemd/*
+
+systemctl --user daemon-reload
+
+systemctl --user restart locklost-online.service
+
+systemctl --user status locklost-online.service
--- a/locklost/__init__.py
+++ b/locklost/__init__.py
-from __future__ import print_function, division
+import os
 import signal
 import logging

+from . import config
+
 import matplotlib
 matplotlib.use('Agg')

@@ -10,7 +12,9 @@ try:
 except ImportError:
    __version__ = '?.?.?'

-from . import config
+
+logger = logging.getLogger('LOCKLOST')
+

 # signal handler kill/stop signals
 def signal_handler(signum, frame):
@@ -18,10 +22,25 @@ def signal_handler(signum, frame):
        signame = signal.Signal(signum).name
    except AttributeError:
        signame = signum
-    logging.error("Signal received: {} ({})".format(signame, signum))
+    logger.error("Signal received: {} ({})".format(signame, signum))
    raise SystemExit(1)

+
 def set_signal_handlers():
    signal.signal(signal.SIGINT, signal_handler)
    signal.signal(signal.SIGTERM, signal_handler)
    signal.signal(eval('signal.'+config.CONDOR_KILL_SIGNAL), signal_handler)
+
+
+def config_logger(level=None):
+    if not level:
+        level = os.getenv('LOG_LEVEL', 'INFO')
+    handler = logging.StreamHandler()
+    if os.getenv('LOG_FMT_NOTIME'):
+        fmt = config.LOG_FMT_NOTIME
+    else:
+        fmt = config.LOG_FMT
+    formatter = logging.Formatter(fmt)
+    handler.setFormatter(formatter)
+    logger.addHandler(handler)
+    logger.setLevel(level)
--- a/locklost/__main__.py
+++ b/locklost/__main__.py
 import os
 import subprocess
 import argparse
-import logging
-import signal

-from . import __version__, set_signal_handlers
+from . import __version__
+from . import set_signal_handlers
+from . import config_logger
 from . import config
 from . import search
 from . import analyze
 from . import online
 from . import event
 from . import segments
+from . import summary
 from . import plots

 ##########

+
 parser = argparse.ArgumentParser(
    prog='locklost',
 )
@@ -22,10 +24,10 @@ subparser = parser.add_subparsers(
    title='Commands',
    metavar='<command>',
    dest='cmd',
-    #help=argparse.SUPPRESS,
 )
 subparser.required = True

+
 def gen_subparser(cmd, func):
    assert func.__doc__, "empty docstring: {}".format(func)
    help = func.__doc__.split('\n')[0].lower().strip('.')
@@ -41,8 +43,11 @@ def gen_subparser(cmd, func):

 ##########

-parser.add_argument('--version', action='version', version='%(prog)s {}'.format(__version__),
-                    help="print version and exit")
+
+parser.add_argument(
+    '--version', action='version', version='%(prog)s {}'.format(__version__),
+    help="print version and exit")
+

 p = gen_subparser('search', search.main)
 search._parser_add_arguments(p)
@@ -56,6 +61,10 @@ p = gen_subparser('online', online.main)
 online._parser_add_arguments(p)


+p = gen_subparser('summary', summary.main)
+summary._parser_add_arguments(p)
+
+
 def list_events(args):
    """List all events"""
    for e in event.find_events():
@@ -71,6 +80,7 @@ def list_events(args):
            e.path(),
        ))

+
 p = gen_subparser('list', list_events)


@@ -93,6 +103,7 @@ def show_event(args):
    # print("files:")
    # subprocess.call(['find', e.path()])

+
 p = gen_subparser('show', show_event)
 p.add_argument('--log', '-l', action='store_true',
               help="show event log")
@@ -120,6 +131,7 @@ def tag_events(args):
        else:
            e.add_tag(tag)

+
 p = gen_subparser('tag', tag_events)
 p.add_argument('event',
               help="event ID")
@@ -133,6 +145,7 @@ def compress_segments(args):
    """
    segments.compress_segdir(config.SEG_DIR)

+
 p = gen_subparser('compress', compress_segments)


@@ -150,6 +163,7 @@ def mask_segment(args):
        [Segment(args.start, args.end)],
    )

+
 p = gen_subparser('mask', mask_segment)
 p.add_argument('start', type=int,
               help="segment start")
@@ -163,24 +177,55 @@ def plot_history(args):
    """
    plots.plot_history(args.path, lookback=args.lookback)

+
 p = gen_subparser('plot-history', plot_history)
 p.add_argument('path',
               help="output file")
 p.add_argument('-l', '--lookback', default='7 days ago',
               help="history look back ['7 days ago']")

+
+def find_locks(args):
+    """Find/clear event locks
+
+    These are sometimes left by failed condor jobs.  This should not
+    normally need to be run, and caution should be taken to not remove
+    active locks and events currently being analyzed.
+
+    """
+    root = config.EVENT_ROOT
+    for d in os.listdir(root):
+        try:
+            int(d)
+        except ValueError:
+            continue
+        for dd in os.listdir(os.path.join(root, d)):
+            try:
+                int(d+dd)
+            except ValueError:
+                continue
+            lock_path = os.path.join(root, d, dd, 'lock')
+            if os.path.exists(lock_path):
+                print(lock_path)
+                if args.remove:
+                    os.remove(lock_path)
+
+
+p = gen_subparser('find-locks', find_locks)
+p.add_argument('-r', '--remove', action='store_true',
+               help="remove any found event locks")
+
 ##########

+
 def main():
    set_signal_handlers()
-    logging.basicConfig(
-        level=os.getenv('LOG_LEVEL', 'INFO'),
-        format=config.LOG_FMT,
-    )
+    config_logger()
    args = parser.parse_args()
    if not config.EVENT_ROOT:
        raise SystemExit("Must specify LOCKLOST_EVENT_ROOT env var.")
    args.func(args)

+
 if __name__ == '__main__':
    main()
--- a/locklost/analyze.py
+++ b/locklost/analyze.py
 import os
 import sys
-import argparse
 import logging
+import argparse

 from matplotlib import pyplot as plt

-from . import __version__, set_signal_handlers
+from . import __version__
+from . import set_signal_handlers
+from . import config_logger
+from . import logger
 from . import config
 from .plugins import PLUGINS
 from .event import LocklossEvent, find_events
 from . import condor
 from . import search

+
 ##################################################

+
 def analyze_event(event, plugins=None):
    """Execute all follow-up plugins for event

@@ -21,19 +26,20 @@ def analyze_event(event, plugins=None):
    # log analysis to event log
    # always append log
    log_mode = 'a'
-    logger = logging.getLogger()
    handler = logging.FileHandler(event.path('log'), mode=log_mode)
    handler.setFormatter(logging.Formatter(config.LOG_FMT))
    logger.addHandler(handler)
+
    # log exceptions
    def exception_logger(exc_type, exc_value, traceback):
-        logging.error("Uncaught exception:", exc_info=(exc_type, exc_value, traceback))
+        logger.error("Uncaught exception:", exc_info=(exc_type, exc_value, traceback))
+
    sys.excepthook = exception_logger

    try:
        event.lock()
    except OSError as e:
-        logging.error(e)
+        logger.error(e)
        raise

    # clean up previous run
@@ -41,37 +47,37 @@ def analyze_event(event, plugins=None):
        event._scrub(archive=False)
    event._set_version(__version__)

-    logging.info("analysis version: {}".format(event.analysis_version))
-    logging.info("event id: {}".format(event.id))
-    logging.info("event path: {}".format(event.path()))
-    logging.info("event url: {}".format(event.url()))
+    logger.info("analysis version: {}".format(event.analysis_version))
+    logger.info("event id: {}".format(event.id))
+    logger.info("event path: {}".format(event.path()))
+    logger.info("event url: {}".format(event.url()))

    complete = True
    for name, func in PLUGINS.items():
        if plugins and name not in plugins:
            continue
-        logging.info("executing plugin: {}({})".format(
+        logger.info("executing plugin: {}({})".format(
            name, event.id))
        try:
            func(event)
        except SystemExit:
            complete = False
-            logging.error("EXIT signal, cleaning up...")
+            logger.error("EXIT signal, cleaning up...")
            break
        except AssertionError:
            complete = False
-            logging.exception("FATAL exception in {}:".format(name))
+            logger.exception("FATAL exception in {}:".format(name))
            break
-        except:
+        except Exception:
            complete = False
-            logging.exception("exception in {}:".format(name))
+            logger.exception("exception in {}:".format(name))
        plt.close('all')

    if complete:
-        logging.info("analysis complete")
+        logger.info("analysis complete")
        event._set_status(0)
    else:
-        logging.warning("INCOMPLETE ANALYSIS")
+        logger.warning("INCOMPLETE ANALYSIS")
        event._set_status(1)

    logger.removeHandler(handler)
@@ -82,38 +88,39 @@ def analyze_event(event, plugins=None):

 def analyze_condor(event):
    if event.analyzing:
-        logging.warning("event analysis already in progress, aborting")
+        logger.warning("event analysis already in progress, aborting")
        return
    condor_dir = event.path('condor')
    try:
        os.makedirs(condor_dir)
-    except:
+    except Exception:
        pass
    sub = condor.CondorSubmit(
        condor_dir,
        'analyze',
        [str(event.id)],
        local=False,
+        notify_user=os.getenv('CONDOR_NOTIFY_USER'),
    )
    sub.write()
    sub.submit()

 ##################################################

+
 def _parser_add_arguments(parser):
    from .util import GPSTimeParseAction
    egroup = parser.add_mutually_exclusive_group(required=True)
-    egroup.add_argument('event', action=GPSTimeParseAction, nargs='?',
+    egroup.add_argument('event', nargs='?', type=int,
                        help="event ID / GPS second")
    egroup.add_argument('--condor', action=GPSTimeParseAction, nargs=2, metavar='GPS',
-                        help="condor analyze all events within GPS range")
+                        help="condor analyze all events within GPS range.  If the second time is '0' (zero), the first time will be interpreted as an exact event time and just that specific event will be analyzed via condor submit.")
    parser.add_argument('--rerun', action='store_true',
                        help="condor re-analyze events")
    parser.add_argument('--plugin', '-p', action='append',
                        help="execute only specified plugin (multiple ok, not available for condor runs)")


-
 def main(args=None):
    """Analyze event(s)

@@ -128,20 +135,23 @@ def main(args=None):
        args = parser.parse_args()

    if args.condor:
-        logging.info("finding events...")
-        after, before = tuple(int(t) for t in args.condor)
-        events = [e for e in find_events(after=after, before=before)]
+        t0, t1 = tuple(int(t) for t in args.condor)
+        if t1 == 0:
+            events = [LocklossEvent(t0)]
+        else:
+            logger.info("finding events...")
+            events = [e for e in find_events(after=t0, before=t1)]
        to_analyze = []
        for e in events:
            if e.analyzing:
-                logging.debug("  {} analyzing".format(e))
+                logger.debug("  {} analyzing".format(e))
                continue
            if e.analysis_succeeded and not args.rerun:
-                logging.debug("  {} suceeded".format(e))
+                logger.debug("  {} suceeded".format(e))
                continue
-            logging.debug("  adding {}".format(e))
+            logger.debug("  adding {}".format(e))
            to_analyze.append(e)
-        logging.info("found {} events, {} to analyze.".format(len(events), len(to_analyze)))
+        logger.info("found {} events, {} to analyze.".format(len(events), len(to_analyze)))
        if not to_analyze:
            return

@@ -165,13 +175,13 @@ def main(args=None):
        except OSError:
            pass
        if not event:
-            logging.info("no event matching GPS {}.  searching...".format(int(args.event)))
-            search.search((int(args.event)-1, int(args.event)+1))
+            logger.info("no event matching GPS {}.  searching...".format(int(args.event)))
+            search.search((args.event-1, args.event+1))
            try:
                event = LocklossEvent(args.event)
            except OSError as e:
                sys.exit(e)
-        logging.info("analyzing event {}...".format(event))
+        logger.info("analyzing event {}...".format(event))
        if not analyze_event(event, args.plugin):
            sys.exit(1)

@@ -179,10 +189,7 @@ def main(args=None):
 # direct execution of this module intended for condor jobs
 if __name__ == '__main__':
    set_signal_handlers()
-    logging.basicConfig(
-        level='DEBUG',
-        format=config.LOG_FMT,
-    )
+    config_logger(level='DEBUG')
    parser = argparse.ArgumentParser()
    parser.add_argument('event', type=int,
                        help="event ID / GPS second")

--- a/locklost/condor.py
+++ b/locklost/condor.py
@@ -5,14 +5,16 @@ import shutil
 import collections
 import subprocess
 # import uuid
-import logging

 import datetime

+from . import logger
 from . import config

+
 ##################################################

+
 def _write_executable(
        module,
        path,
@@ -23,29 +25,19 @@ def _write_executable(
    The first argument is the name of the locklost module to be executed.

    """
+    PYTHONPATH = os.getenv('PYTHONPATH', '')
+    NDSSERVER = os.getenv('NDSSERVER', '')
+    LIGO_DATAFIND_SERVER = os.getenv('LIGO_DATAFIND_SERVER')
    with open(path, 'w') as f:
-        f.write("""#!/bin/sh
-export IFO={}
-export LOCKLOST_EVENT_ROOT={}
-export PYTHONPATH={}
-export DATA_ACCESS={}
-export NDSSERVER={}
-export LIGO_DATAFIND_SERVER={}
-export CONDOR_ACCOUNTING_GROUP={}
-export CONDOR_ACCOUNTING_GROUP_USER={}
-exec {} -m locklost.{} "$@" 2>&1
-""".format(
-    config.IFO,
-    config.EVENT_ROOT,
-    os.getenv('PYTHONPATH', ''),
-    data_access,
-    os.getenv('NDSSERVER', ''),
-    os.getenv('LIGO_DATAFIND_SERVER'),
-    config.CONDOR_ACCOUNTING_GROUP,
-    config.CONDOR_ACCOUNTING_GROUP_USER,
-    sys.executable,
-    module,
-    ))
+        f.write(f"""#!/bin/sh
+export IFO={config.IFO}
+export LOCKLOST_EVENT_ROOT={config.EVENT_ROOT}
+export PYTHONPATH={PYTHONPATH}
+export DATA_ACCESS={data_access}
+export NDSSERVER={NDSSERVER}
+export LIGO_DATAFIND_SERVER={LIGO_DATAFIND_SERVER}
+exec {sys.executable} -m locklost.{module} "$@" 2>&1
+""")
    os.chmod(path, 0o755)


@@ -75,6 +67,7 @@ class CondorSubmit(object):
            ('executable', '{}'.format(self.exec_path)),
            ('arguments', ' '.join(args)),
            ('universe', universe),
+            ('request_disk', '10 MB'),
            ('accounting_group', config.CONDOR_ACCOUNTING_GROUP),
            ('accounting_group_user', config.CONDOR_ACCOUNTING_GROUP_USER),
            ('getenv', 'True'),
@@ -98,11 +91,12 @@ class CondorSubmit(object):

    def submit(self):
        assert os.path.exists(self.sub_path), "Must write() before submitting"
-        logging.info("condor submit: {}".format(self.sub_path))
+        logger.info("condor submit: {}".format(self.sub_path))
        subprocess.call(['condor_submit', self.sub_path])

 ##################################################

+
 class CondorDAG(object):

    def __init__(self, condor_dir, module, submit_args_gen, log=None, use_test_job=False):
@@ -132,7 +126,7 @@ class CondorDAG(object):
            break

        if not log:
-            log='logs/$(jobid)_$(cluster)_$(process).'
+            log = 'logs/$(jobid)_$(cluster)_$(process).'

        self.sub = CondorSubmit(
            self.condor_dir,
@@ -153,28 +147,29 @@ class CondorDAG(object):
        s = ''
        for jid, sargs in enumerate(self.submit_args_gen()):
            VARS = self.__gen_VARS(('jobid', jid), *sargs)
-            s += '''
-JOB   {jid} {sub}
+            VARS = ' '.join(VARS)
+            s += f'''
+JOB   {jid} {self.sub.sub_path}
 VARS  {jid} {VARS}
 RETRY {jid} 1
-'''.format(jid=jid,
-           sub=self.sub.sub_path,
-           VARS=' '.join(VARS),
-       )
+'''
            if self.use_test_job and jid == 0:
                s += '''RETRY {jid} 1
 '''.format(jid=jid)
-        logging.info("condor DAG {} jobs".format(jid+1))
+        logger.info("condor DAG {} jobs".format(jid+1))
        return s

+    @property
+    def lock(self):
+        return os.path.join(self.condor_dir, 'dag.lock')
+
    @property
    def has_lock(self):
-        lock = os.path.join(self.condor_dir, 'dag.lock')
-        return os.path.exists(lock)
+        return os.path.exists(self.lock)

    def write(self):
        if self.has_lock:
-            raise CondorError("DAG already running: {}".format(lock))
+            raise RuntimeError("DAG already running: {}".format(self.lock))
        shutil.rmtree(self.condor_dir)
        try:
            os.makedirs(os.path.join(self.condor_dir, 'logs'))
@@ -187,8 +182,8 @@ RETRY {jid} 1
    def submit(self):
        assert os.path.exists(self.dag_path), "Must write() before submitting"
        if self.has_lock:
-            raise CondorError("DAG already running: {}".format(lock))
-        logging.info("condor submit dag: {}".format(self.dag_path))
+            raise RuntimeError("DAG already running: {}".format(self.lock))
+        logger.info("condor submit dag: {}".format(self.dag_path))
        subprocess.call(['condor_submit_dag', self.dag_path])
        print("""
 Run the following to monitor condor job:
@@ -198,6 +193,7 @@ tail -F {0}.*

 ##################################################

+
 def find_jobs():
    """find all running condor jobs

@@ -239,7 +235,7 @@ def stop_jobs(job_list):
    import htcondor
    schedd = htcondor.Schedd()
    for job in job_list:
-        logging.info("stopping job: {}".format(job_str(job)))
+        logger.info("stopping job: {}".format(job_str(job)))
        schedd.act(
            htcondor.JobAction.Remove,
            'ClusterId=={} && ProcId=={}'.format(job['ClusterId'], job['ProcId']),

--- a/locklost/config.py
+++ b/locklost/config.py
--- a/locklost/data.py
+++ b/locklost/data.py
 import os
 import glob
 import time
-import numpy as np
 from contextlib import closing
-import logging
+
+import numpy as np

 import nds2
 import gpstime
-import gwdatafind
 import gwpy.timeseries

+from . import logger
 from . import config

+
 ##################################################

+
 def nds_connection():
    try:
        HOSTPORT = os.getenv('NDSSERVER').split(',')[0].split(':')
@@ -24,7 +26,7 @@ def nds_connection():
        PORT = int(HOSTPORT[1])
    except IndexError:
        PORT = 31200
-    logging.debug("NDS connect: {}:{}".format(HOST, PORT))
+    logger.debug("NDS connect: {}:{}".format(HOST, PORT))
    conn = nds2.connection(HOST, PORT)
    conn.set_parameter('GAP_HANDLER', 'STATIC_HANDLER_NAN')
    # conn.set_parameter('ITERATE_USE_GAP_HANDLERS', 'false')
@@ -33,7 +35,11 @@ def nds_connection():

 def nds_fetch(channels, start, stop):
    with closing(nds_connection()) as conn:
-        bufs = conn.fetch(int(start), int(stop), channels)
+        bufs = conn.fetch(
+            int(np.round(start)),
+            int(np.round(stop)),
+            channels,
+        )
        return [ChannelBuf.from_nds(buf) for buf in bufs]


@@ -57,6 +63,7 @@ def nds_iterate(channels, start_end=None):

 ##################################################

+
 class ChannelBuf(object):
    def __init__(self, channel, data, gps_start, gps_nanoseconds, sample_rate):
        self.channel = channel
@@ -71,7 +78,7 @@ class ChannelBuf(object):
            gps=self.gps_start,
            sec=self.duration,
            nsamples=len(self),
-            )
+        )

    def __len__(self):
        return len(self.data)
@@ -132,22 +139,27 @@ class ChannelBuf(object):

 ##################################################

-def frame_fetch(channels, start, stop):
-    conn = glue.datafind.GWDataFindHTTPConnection()
-    cache = conn.find_frame_urls(config.IFO[0], IFO+'_R', start, stop, urltype='file')
-    fc = frutils.FrameCache(cache, verbose=True)
-    return [ChannelBuf.from_frdata(fc.fetch(channel, start, stop)) for channel in channels]
+
+# def frame_fetch(channels, start, stop):
+#     conn = glue.datafind.GWDataFindHTTPConnection()
+#     cache = conn.find_frame_urls(config.IFO[0], IFO+'_R', start, stop, urltype='file')
+#     fc = frutils.FrameCache(cache, verbose=True)
+#     return [ChannelBuf.from_frdata(fc.fetch(channel, start, stop)) for channel in channels]


 def frame_fetch_gwpy(channels, start, stop):
    gps_now = int(gpstime.tconvert('now'))

-    ### grab from shared memory if data is still available, otherwise use datafind
-    if gps_now - start < config.MAX_QUERY_LATENCY:
-        frames = glob.glob("/dev/shm/lldetchar/{}/*".format(config.IFO))
-        data = gwpy.timeseries.TimeSeriesDict.read(frames, channels, start=start, end=stop)
+    # grab from shared memory if data is still available, otherwise use datafind
+    if gps_now - start < config.DATA_DEVSHM_TIMEOUT:
+        frames = glob.glob(config.DATA_DEVSHM_ROOT + "/*")
+        try:
+            data = gwpy.timeseries.TimeSeriesDict.read(frames, channels, start=start, end=stop)
+        except IndexError:
+            raise RuntimeError(f"data not found in {config.DATA_DEVSHM_ROOT}")
    else:
-        data = gwpy.timeseries.TimeSeriesDict.find(channels, start, stop, frametype=config.IFO+'_R')
+        frametype = f'{config.IFO}_R'
+        data = gwpy.timeseries.TimeSeriesDict.find(channels, start, stop, frametype=frametype)
    return [ChannelBuf.from_gwTS(data[channel]) for channel in channels]


@@ -162,71 +174,42 @@ def fetch(channels, segment, as_dict=False):
    method = config.DATA_ACCESS.lower()
    if method == 'nds':
        func = nds_fetch
-    elif method == 'fr':
-        func = frame_fetch
+    # elif method == 'fr':
+    #     func = frame_fetch
    elif method == 'gwpy':
        func = frame_fetch_gwpy
    else:
-        raise ValueError("unknown data access method: {}".format(DATA_ACCESS))
-    logging.debug("{}({}, {}, {})".format(func.__name__, channels, start, stop))
-    bufs = func(channels, start, stop)
-    if as_dict:
-        return {buf.channel:buf for buf in bufs}
-    else:
-        return bufs
-
-
-def data_available(segment):
-    """Return True if the specified data segment is available.
-
-    Based on purely on end time of segment.
-
-    """
-    end = int(segment[-1])
-    start = end - 1
-    ifo = config.IFO[0]
-    method = config.DATA_ACCESS.lower()
-    if method == 'nds':
-        buf = nds2.fetch([config.GRD_STATE_N_CHANNEL], start, end)
-        return buf is not None
-    elif method == 'fr' or method == 'gwpy':
-        ftype = '{}_R'.format(config.IFO)
-        logging.debug('gwdatafind.find_times({}, {}, {}, {})'.format(
-            ifo, ftype, start, end,
-        ))
-        segs = gwdatafind.find_times(
-            ifo, ftype, start, end,
-        )
+        raise ValueError(f"unknown data access method: {method}")
+    logger.debug("{}({}, {}, {})".format(func.__name__, channels, start, stop))
+
+    # keep attempting to pull data in a loop, until we get the data or
+    # the timeout is reached
+    bufs = None
+    tstart = time.monotonic()
+    while time.monotonic() <= tstart + config.DATA_DISCOVERY_TIMEOUT:
        try:
-            seg_available = segs.extent()
-        except ValueError:
-            return False
-        return segment[1] >= seg_available[1]
+            bufs = func(channels, start, stop)
+            break
+        except RuntimeError as e:
+            logger.info(
+                "data not available, sleeping for {} seconds ({})".format(
+                    config.DATA_DISCOVERY_SLEEP,
+                    e,
+                )
+            )
+            time.sleep(config.DATA_DISCOVERY_SLEEP)
    else:
-        raise ValueError("unknown data access method: {}".format(DATA_ACCESS))
-
+        raise RuntimeError(f"data discovery timeout reached ({config.DATA_DISCOVERY_TIMEOUT}s)")

-def data_wait(segment):
-    """Wait until data segment is available.
-
-    Returns True if data is available, or False if the
-    DATA_DISCOVERY_TIMEOUT was reached.
+    if as_dict:
+        return {buf.channel: buf for buf in bufs}
+    else:
+        return bufs

-    """
-    start = time.monotonic()
-    while time.monotonic() <= start + config.DATA_DISCOVERY_TIMEOUT:
-        if data_available(segment):
-            return True
-        logging.info(
-            "no data available in LDR, sleeping for {} seconds".format(
-                config.DATA_DISCOVERY_SLEEP,
-            )
-        )
-        time.sleep(config.DATA_DISCOVERY_SLEEP)
-    return False

 ##################################################

+
 def gen_transitions(buf, previous=None):
    """Generator of transitions in ChannelBuf


--- a/locklost/event.py
+++ b/locklost/event.py
 import os
 import shutil
-import logging
+
 import numpy as np

 from gpstime import tconvert

+from . import logger
 from . import config

+
 ##################################################

+
 def _trans_int(trans):
    return tuple(int(i) for i in trans)

@@ -16,7 +19,7 @@ def _trans_int(trans):
 class LocklossEvent(object):
    __slots__ = [
        '__id', '__epath',
-        '__transition_index', '__transition_gps',
+        '_transition_index', '__transition_gps',
        '__refined_gps', '__previous_state',
    ]

@@ -34,7 +37,7 @@ class LocklossEvent(object):
        self.__epath = self._gen_epath(self.__id)
        if not os.path.exists(self.path()):
            raise OSError("Unknown event: {}".format(self.path()))
-        self.__transition_index = None
+        self._transition_index = None
        self.__transition_gps = None
        self.__refined_gps = None
        self.__previous_state = None
@@ -90,11 +93,11 @@ class LocklossEvent(object):
    @property
    def transition_index(self):
        """tuple of indices of Guardian state transition"""
-        if not self.__transition_index:
+        if not self._transition_index:
            with open(self.path('transition_index')) as f:
                tis = f.read().split()
-            self.__transition_index = _trans_int(tis)
-        return self.__transition_index
+            self._transition_index = _trans_int(tis)
+        return self._transition_index

    @property
    def transition_gps(self):
@@ -103,7 +106,7 @@ class LocklossEvent(object):
            try:
                with open(self.path('guard_state_end_gps')) as f:
                    self.__transition_gps = float(f.read().strip())
-            except:
+            except Exception:
                # if this is an old event and we don't have a
                # guard_state_end_gps file, just return the id (int of
                # transition time) as the transition GPS
@@ -142,7 +145,7 @@ class LocklossEvent(object):
        if transition_index:
            with open(event.path('transition_index'), 'w') as f:
                f.write('{:.0f} {:.0f}\n'.format(*transition_index))
-        logging.info("event created: {}".format(event.path()))
+        logger.info("event created: {}".format(event.path()))
        return event

    def _scrub(self, archive=True):
@@ -166,7 +169,7 @@ class LocklossEvent(object):
            except OSError:
                return
            bak_dir = self.path('bak', str(last_time))
-            logging.info("archiving old artifacts: {}".format(bak_dir))
+            logger.info("archiving old artifacts: {}".format(bak_dir))
            try:
                os.makedirs(bak_dir)
            except OSError:
@@ -176,7 +179,7 @@ class LocklossEvent(object):
                    continue
                shutil.move(self.path(f), bak_dir)
        else:
-            logging.info("purging old artifacts...")
+            logger.info("purging old artifacts...")
            for f in os.listdir(self.path()):
                if f in preserve_files:
                    continue
@@ -251,8 +254,13 @@ class LocklossEvent(object):
        path = self.path('status')
        if not os.path.exists(path):
            return None
-        with open(path, 'r') as f:
-            return int(f.readline().strip())
+        # FIXME: why do we get FileNotFoundError on the web server
+        # even after the above test passed?
+        try:
+            with open(path, 'r') as f:
+                return int(f.readline().strip())
+        except FileNotFoundError:
+            return None

    @property
    def analysis_succeeded(self):
@@ -307,7 +315,7 @@ class LocklossEvent(object):
        for tag in tags:
            assert ' ' not in tag, "Spaces are not permitted in tags."
            open(self._tag_path(tag), 'a').close()
-            logging.info("added tag: {}".format(tag))
+            logger.info("added tag: {}".format(tag))

    def rm_tag(self, *tags):
        """remove tags"""
@@ -383,6 +391,7 @@ class LocklossEvent(object):

 ##################################################

+
 def _generate_all_events():
    """generate all events in reverse chronological order"""
    event_root = config.EVENT_ROOT
@@ -436,7 +445,8 @@ def find_events(**kwargs):
        if states and trans_idx not in states:
            continue

-        if 'tag' in kwargs and kwargs['tag'] not in event.list_tags():
-            continue
+        if 'tag' in kwargs:
+            if not set(kwargs['tag']) <= set(event.list_tags()):
+                continue

        yield event
--- a/locklost/online.py
+++ b/locklost/online.py
 import os
+import sys
 import argparse
-import logging

 from . import set_signal_handlers
+from . import config_logger
 from . import config
 from . import search
 from . import analyze
 from . import condor

+
 ##################################################

+
 def start_job():
    try:
        os.makedirs(config.CONDOR_ONLINE_DIR)
@@ -36,28 +39,31 @@ def _parser_add_arguments(parser):
        title='Commands',
        metavar='<command>',
        dest='cmd',
-        #help=argparse.SUPPRESS,
    )
    parser.required = True
-    ep = parser.add_parser(
+    sp = parser.add_parser(
        'exec',
-        help="direct execute online analysis",
+        help="directly execute online search",
+    )
+    sp.add_argument(
+        '--analyze', action='store_true',
+        help="execute the condor event analysis callback when events are found",
    )
    parser.add_parser(
        'start',
-        help="start condor online analysis",
+        help="start condor online search",
    )
    parser.add_parser(
        'stop',
-        help="stop condor online analysis",
+        help="stop condor online search",
    )
    parser.add_parser(
        'restart',
-        help="restart condor online analysis",
+        help="restart condor online search",
    )
    parser.add_parser(
        'status',
-        help="print condor job status",
+        help="print online condor job status",
    )
    return parser

@@ -72,20 +78,32 @@ def main(args=None):
        args = parser.parse_args()

    if args.cmd == 'exec':
+        if args.analyze:
+            event_callback = analyze.analyze_condor
+            stat_file = config.ONLINE_STAT_FILE
+            if not os.path.exists(stat_file):
+                open(stat_file, 'w').close()
+        else:
+            event_callback = None
+            stat_file = None
        search.search_iterate(
-            event_callback=analyze.analyze_condor,
+            event_callback=event_callback,
+            stat_file=stat_file,
        )

-    elif args.cmd == 'start':
+    elif args.cmd in ['start', 'stop', 'restart']:
+        print("The LIGO site deployments of the online search are now handled via systemd --user on the 'detechar.ligo-?a.caltech.edu' hosts.", file=sys.stderr)
+        print("See \"systemctl --user status\" for more info.", file=sys.stderr)
+        if input(f"Type 'yes' if you're really sure you want to execute the online condor '{args.cmd}' command: ") != 'yes':
+            print("aborting.", file=sys.stderr)
+            exit(1)
        ojobs, ajobs = condor.find_jobs()
-        assert not ojobs, "There are already currently acitve jobs:\n{}".format(
-            '\n'.join([condor.job_str(job) for job in ojobs]))
-        start_job()
-
-    elif args.cmd in ['stop', 'restart']:
-        ojobs, ajobs = condor.find_jobs()
-        condor.stop_jobs(ojobs)
-        if args.cmd == 'restart':
+        if args.cmd == 'start':
+            assert not ojobs, "There are already currently acitve jobs:\n{}".format(
+                '\n'.join([condor.job_str(job) for job in ojobs]))
+        else:
+            condor.stop_jobs(ojobs)
+        if args.cmd in ['start', 'restart']:
            start_job()

    elif args.cmd == 'status':
@@ -97,12 +115,10 @@ def main(args=None):
 # direct execution of this module intended for condor jobs
 if __name__ == '__main__':
    set_signal_handlers()
-    logging.basicConfig(
-        level='DEBUG',
-        format='%(asctime)s %(message)s'
-    )
-    stat_file = os.path.join(config.CONDOR_ONLINE_DIR, 'stat')
-    open(stat_file, 'w').close()
+    config_logger(level='DEBUG')
+    stat_file = config.ONLINE_STAT_FILE
+    if not os.path.exists(stat_file):
+        open(stat_file, 'w').close()
    search.search_iterate(
        event_callback=analyze.analyze_condor,
        stat_file=stat_file,

--- a/locklost/plots.py
+++ b/locklost/plots.py
 import io
 import sys
-import logging
+import xml.etree.ElementTree as ET
+
 import numpy as np
 import matplotlib.pyplot as plt
 import matplotlib.dates as dates
-import xml.etree.ElementTree as ET

 import gpstime
 from gwpy.segments import DataQualityFlag

+from . import logger
 from . import config
 from . import data
 from .event import find_events

+
 # plt.rc('text', usetex=True)

+
 ##################################################

+
 def gtmin(gt):
    """GPS time rounded to minute from gpstime object"""
    return int(gt.gps() / 60) * 60
@@ -74,12 +78,12 @@ def plot_history(path, lookback='7 days ago', draw_segs=False):
    bufs = data.nds_fetch([channel], gtmin(start), gtmin(end))
    state_index, state_time = bufs[0].yt()
    if np.all(np.isnan(state_index)):
-        logging.warning("state data [{}] is all nan??".format(channel))
+        logger.warning("state data [{}] is all nan??".format(channel))

    # find events
    events = list(find_events(after=start.gps()))

-    fig = plt.figure(figsize=(16, 4)) #, dpi=80)
+    fig = plt.figure(figsize=(16, 4))
    ax = fig.add_subplot(111)

    # plot analyzing
@@ -180,6 +184,7 @@ def plot_history(path, lookback='7 days ago', draw_segs=False):

 ##################################################

+
 def main():
    path = sys.argv[1]
    plot_history(path)

--- a/locklost/plotutils.py
+++ b/locklost/plotutils.py
@@ -12,6 +12,10 @@ def set_thresh_crossing(ax, thresh_crossing, gps, segment):
        lw=5,
    )
    x_fraction = (thresh_crossing-segment[0])/(segment[1]-segment[0])
+    if x_fraction < 0.05:
+        align = 'left'
+    else:
+        align = 'center'
    ax.annotate(
        'First threshold crossing',
        xy=(x_fraction, 1),
@@ -19,6 +23,7 @@ def set_thresh_crossing(ax, thresh_crossing, gps, segment):
        horizontalalignment='center',
        verticalalignment='bottom',
        bbox=dict(boxstyle="round", fc="w", ec="red", alpha=0.95),
+        ha=align,
    )



--- a/locklost/plugins/__init__.py
+++ b/locklost/plugins/__init__.py
@@ -18,21 +18,23 @@ from matplotlib import rcParamsDefault
 plt.rcParams.update(rcParamsDefault)


-PLUGINS = collections.OrderedDict()
 def register_plugin(func):
    PLUGINS.update([(func.__name__, func)])

+
+PLUGINS = collections.OrderedDict()
+
 from .discover import discover_data
 register_plugin(discover_data)

 from .refine import refine_time
 register_plugin(refine_time)

-from .observe import check_observe
-register_plugin(check_observe)
+from .ifo_mode import check_ifo_mode
+register_plugin(check_ifo_mode)

-from .history import find_previous_state
-register_plugin(find_previous_state)
+from .initial_alignment import initial_alignment_check
+register_plugin(initial_alignment_check)

 from .saturations import find_saturations
 register_plugin(find_saturations)
@@ -40,14 +42,23 @@ register_plugin(find_saturations)
 from .lpy import find_lpy
 register_plugin(find_lpy)

-from .lsc_asc import plot_lsc_asc
-register_plugin(plot_lsc_asc)
+from .pi import check_pi
+register_plugin(check_pi)

-from .glitch import analyze_glitches
-register_plugin(analyze_glitches)
+from .violin import check_violin
+register_plugin(check_violin)

-from .overflows import find_overflows
-register_plugin(find_overflows)
+from .fss_oscillation import check_fss
+register_plugin(check_fss)
+
+from .iss import check_iss
+register_plugin(check_iss)
+
+from .etm_glitch import check_glitch
+register_plugin(check_glitch)
+
+from .ham6_power import power_in_ham6
+register_plugin(power_in_ham6)

 from .brs import check_brs
 register_plugin(check_brs)
@@ -58,9 +69,34 @@ register_plugin(check_boards)
 from .wind import check_wind
 register_plugin(check_wind)

+from .overflows import find_overflows
+register_plugin(find_overflows)
+
+from .lsc_asc import plot_lsc_asc
+register_plugin(plot_lsc_asc)
+
+from .glitch import analyze_glitches
+register_plugin(analyze_glitches)
+
+from .darm import plot_darm
+register_plugin(plot_darm)
+
+# Check other possible tags to add
 from .ads_excursion import check_ads
 register_plugin(check_ads)

-# add last since this needs to wait for additional data
+from .sei_bs_trans import check_sei_bs
+register_plugin(check_sei_bs)
+
+from .soft_limiters import check_soft_limiters
+register_plugin(check_soft_limiters)
+
+from .omc_dcpd import check_omc_dcpd
+register_plugin(check_omc_dcpd)
+
+# Add the following at the end because they need to wait for additional data
 from .seismic import check_seismic
 register_plugin(check_seismic)
+
+from .history import find_previous_state
+register_plugin(find_previous_state)
\ No newline at end of file
--- a/locklost/plugins/__main__.py
+++ b/locklost/plugins/__main__.py
-import sys
-import logging
-import importlib
+import argparse

 from .. import set_signal_handlers
-from .. import config
+from .. import config_logger
 from ..event import LocklossEvent
+from . import PLUGINS
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    'plugin', nargs='?',
+)
+parser.add_argument(
+    'event', nargs='?',
+)


 def main():
    set_signal_handlers()
-    logging.basicConfig(
-        level='DEBUG',
-        format=config.LOG_FMT,
-    )
-    mpath = sys.argv[1].split('.')
-    event = LocklossEvent(sys.argv[2])
-    mod = importlib.import_module('.'+'.'.join(mpath[:-1]), __package__)
-    func = mod.__dict__[mpath[-1]]
+    config_logger()
+    args = parser.parse_args()
+    if not args.plugin:
+        for name, func in PLUGINS.items():
+            print(name)
+        return
+    if not args.event:
+        parser.error("must specifiy event")
+    try:
+        func = PLUGINS[args.plugin]
+    except KeyError:
+        parser.error(f"unknown plugin: {args.plugin}")
+    event = LocklossEvent(args.event)
    func(event)



--- a/locklost/plugins/ads_excursion.py
+++ b/locklost/plugins/ads_excursion.py
-import logging
 import numpy as np
 import matplotlib.pyplot as plt

 from gwpy.segments import Segment

+from .. import logger
 from .. import config
 from .. import data
 from .. import plotutils

+
 #################################################

+
 def check_ads(event):
    """Checks for ADS channels above threshold.

@@ -18,8 +20,8 @@ def check_ads(event):

    """

-    if event.transition_index[0] != config.GRD_NOMINAL_STATE[0]:
-        logging.info('lockloss not from nominal low noise')
+    if event.transition_index[0] < config.ADS_GRD_STATE[0]:
+        logger.info('lockloss not from state using ADS injections')
        return

    plotutils.set_rcparams()
@@ -45,9 +47,9 @@ def check_ads(event):
    if saturating:
        event.add_tag('ADS_EXCURSION')
    else:
-        logging.info('no ADS excursion detected')
+        logger.info('no ADS excursion detected')

-    fig, ax = plt.subplots(1, figsize=(22,16))
+    fig, ax = plt.subplots(1, figsize=(22, 16))
    for idx, buf in enumerate(ads_channels):
        srate = buf.sample_rate
        t = np.arange(segment[0], segment[1], 1/srate)

--- a/locklost/plugins/board_sat.py
+++ b/locklost/plugins/board_sat.py
-import logging
 import numpy as np
 import matplotlib.pyplot as plt

 from gwpy.segments import Segment

+from .. import logger
 from .. import config
 from .. import data
 from .. import plotutils

+
 ##############################################

+
 def check_boards(event):
    """Checks for analog board saturations.

@@ -28,7 +30,7 @@ def check_boards(event):
    for buf in board_channels:
        srate = buf.sample_rate
        t = np.arange(segment[0], segment[1], 1/srate)
-        before_loss = buf.data[np.where(t<event.gps-config.BOARD_SAT_BUFFER)]
+        before_loss = buf.data[np.where(t < event.gps-config.BOARD_SAT_BUFFER)]
        if any(abs(before_loss) >= config.BOARD_SAT_THRESH):
            saturating = True
            glitch_idx = np.where(abs(buf.data) > config.BOARD_SAT_THRESH)[0][0]
@@ -38,9 +40,9 @@ def check_boards(event):
    if saturating:
        event.add_tag('BOARD_SAT')
    else:
-        logging.info('no saturating analog boards')
+        logger.info('no saturating analog boards')

-    fig, ax = plt.subplots(1, figsize=(22,16))
+    fig, ax = plt.subplots(1, figsize=(22, 16))
    for idx, buf in enumerate(board_channels):
        srate = buf.sample_rate
        t = np.arange(segment[0], segment[1], 1/srate)

--- a/locklost/plugins/brs.py
+++ b/locklost/plugins/brs.py
-import logging
 import numpy as np
 import matplotlib.pyplot as plt

 from gwpy.segments import Segment

+from .. import logger
 from .. import config
 from .. import data
 from .. import plotutils


+#################################################
+
+
+CHANNEL_ENDINGS = [
+    '100M_300M',
+    '300M_1',
+    '1_3',
+]
+
+# H1 checks single status channel that says what combination of BRSX/BRSY is
+# being used
+if config.IFO == 'H1':
+    CONFIG = {
+        'ETMX': {
+            'skip_states': {65},
+            'axis': 'RY',
+        },
+        'ETMY': {
+            'skip_states': {60},
+            'axis': 'RX',
+        },
+    }
+    for station in CONFIG:
+        CONFIG[station]['state_chan'] = 'H1:GRD-SEI_CONF_REQUEST_N'
+        CONFIG[station]['skip_states'] |= {10, 17, 45}
+
+# L1 checks status channels for each individual BRS (ETMs and ITMs) that says
+# whether it's being used
+elif config.IFO == 'L1':
+    CONFIG = {
+        'ITMX': {
+            'state_chan': 'L1:ISI-ITMX_ST1_SENSCOR_X_FADE_CUR_CHAN_MON',
+            'axis': 'RY',
+        },
+        'ETMX': {
+            'state_chan': 'L1:ISI-ETMX_ST1_SENSCOR_Y_FADE_CUR_CHAN_MON',
+            'axis': 'RY',
+        },
+        'ITMY': {
+            'state_chan': 'L1:ISI-ITMY_ST1_SENSCOR_Y_FADE_CUR_CHAN_MON',
+            'axis': 'RX',
+        },
+        'ETMY': {
+            'state_chan': 'L1:ISI-ETMY_ST1_SENSCOR_Y_FADE_CUR_CHAN_MON',
+            'axis': 'RX',
+        },
+    }
+    for station in CONFIG:
+        CONFIG[station]['skip_states'] = {8, 9}
+
+for station in CONFIG:
+    channels = []
+    for ending in CHANNEL_ENDINGS:
+        channels.append('{}:ISI-GND_BRS_{}_{}_BLRMS_{}'.format(config.IFO, station, CONFIG[station]['axis'], ending))
+    CONFIG[station]['channels'] = channels
+
+THRESHOLD = 15
+
+SEARCH_WINDOW = [-30, 5]
+
+
+#################################################
+
+
 def check_brs(event):
    """Checks for BRS glitches at both end stations.

@@ -19,15 +83,15 @@ def check_brs(event):

    plotutils.set_rcparams()

-    segment = Segment(config.BRS_SEARCH_WINDOW).shift(int(event.gps))
+    segment = Segment(SEARCH_WINDOW).shift(int(event.gps))

-    for station, params in config.BRS_CONFIG.items():
+    for station, params in CONFIG.items():
        # fetch all data (state and data)
        channels = [params['state_chan']] + params['channels']
        try:
            buf_dict = data.fetch(channels, segment, as_dict=True)
        except ValueError:
-            logging.warning('BRS info not available for {}'.format(station))
+            logger.warning('BRS info not available for {}'.format(station))
            continue

        # check for proper state
@@ -35,7 +99,7 @@ def check_brs(event):
        t = state_buf.tarray
        state = state_buf.data[np.argmin(np.absolute(t-event.gps))]
        if state in params['skip_states']:
-            logging.info('{} not using sensor correction during lockloss'.format(station))
+            logger.info('{} not using sensor correction during lockloss'.format(station))
            continue
        del buf_dict[params['state_chan']]

@@ -44,14 +108,14 @@ def check_brs(event):
        thresh_crossing = segment[1]
        for channel, buf in buf_dict.items():
            max_brs = max([max_brs, max(buf.data)])
-            if any(buf.data > config.BRS_THRESHOLD):
-                logging.info('BRS GLITCH DETECTED in {}'.format(channel))
+            if any(buf.data > THRESHOLD):
+                logger.info('BRS GLITCH DETECTED in {}'.format(channel))
                event.add_tag('BRS_GLITCH')
-                glitch_idx = np.where(buf.data > config.BRS_THRESHOLD)[0][0]
+                glitch_idx = np.where(buf.data > THRESHOLD)[0][0]
                glitch_time = buf.tarray[glitch_idx]
                thresh_crossing = min(glitch_time, thresh_crossing)

-        fig, ax = plt.subplots(1, figsize=(22,16))
+        fig, ax = plt.subplots(1, figsize=(22, 16))
        for channel, buf in buf_dict.items():
            t = buf.tarray
            ax.plot(
@@ -62,7 +126,7 @@ def check_brs(event):
                lw=2,
            )
        ax.axhline(
-            config.BRS_THRESHOLD,
+            THRESHOLD,
            linestyle='--',
            color='black',
            label='BRS glitch threshold',
@@ -74,7 +138,7 @@ def check_brs(event):
        ax.grid()
        ax.set_xlabel('Time [s] since lock loss at {}'.format(event.gps), labelpad=10)
        ax.set_ylabel('RMS Velocity [nrad/s]')
-        ax.set_ylim(0, max_brs+1)
+        # ax.set_ylim(0, max_brs+1)
        ax.set_xlim(t[0]-event.gps, t[-1]-event.gps)
        ax.legend(loc='best')
        ax.set_title('{} BRS BLRMS'.format(station), y=1.04)
No results found