Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • steffen.grunewald/gstlal
  • sumedha.biswas/gstlal
  • spiir-group/gstlal
  • madeline-wade/gstlal
  • hunter.schuler/gstlal
  • adam-mercer/gstlal
  • amit.reza/gstlal
  • alvin.li/gstlal
  • duncanmmacleod/gstlal
  • rebecca.ewing/gstlal
  • javed.sk/gstlal
  • leo.tsukada/gstlal
  • brian.bockelman/gstlal
  • ed-maros/gstlal
  • koh.ueno/gstlal
  • leo-singer/gstlal
  • lscsoft/gstlal
17 results
Show changes
Showing
with 2616 additions and 1278 deletions
Documentation for creating fake data
====================================
.. _simulated-data-tutorial:
Tutorial: Generation of simulated data
=======================================
Introduction
------------
......@@ -17,13 +19,13 @@ Consult :any:`gstlal_fake_frames` for more details
The basic steps to generate and validate LIGO colored noise are:
1. use gstlal_fake_frames to make the data (examples in the documenation include this)
2. verify that the PSD is as you would expect with gstlal_reference_psd (examples in the documentation include this)
3. plot the resulting PSD with gstlal_plot_psd (examples in the documentation include this)
1. Use ``gstlal_fake_frames`` to make the data
2. Verify that the PSD is as you would expect with ``gstlal_reference_psd``
3. Plot the resulting PSD with ``gstlal_plot_psd``
An example PSD plot:
.. image:: ../images/H1L1fakedataexamplepsd.png
.. image:: ../gstlal/images/H1L1fakedataexamplepsd.png
:width: 400px
Custom colored noise, i.e. simulate your own detector
......@@ -31,16 +33,11 @@ Custom colored noise, i.e. simulate your own detector
Consult :any:`gstlal_fake_frames` for more details
1. Start by obtaining a reference PSD that you wish to have as the target for
recoloring. If you actually have a text file ASD such as this one: e.g. <a
href=http://www.lsc-group.phys.uwm.edu/cgit/gstlal/plain/gstlal/share/v1_early_asd.txt>here</a>,
then you will need to first use gstlal_psd_xml_from_asd_txt to convert it
(examples in the documentation include this)
1. Next use gstlal_fake_frames to make the data with the desired PSD (examples
in the documentation include this)
1. Repeat the same validation steps as above to obtain, e.g.:
1. Start by obtaining a reference PSD that you wish to have as the target for recoloring. If you actually have a text file ASD such as this one: e.g. `here <https://git.ligo.org/lscsoft/gstlal/raw/master/gstlal/share/v1_early_asd.txt>`_, then you will need to first use ``gstlal_psd_xml_from_asd_txt`` to convert it
2. Next use ``gstlal_fake_frames`` to make the data with the desired PSD
3. Repeat the same validation steps as above to obtain, e.g.:
.. image:: ../images/V1fakedataexamplepsd.png
.. image:: ../gstlal/images/V1fakedataexamplepsd.png
:width: 400px
......@@ -53,36 +50,31 @@ This procedure assumes you are on an LDG cluster which has the data you wish to
recolor. Note that some of the tools required on not gstlal based. Please
consult the documentation for the external tools should you have questions.
1. First obtain segments for the data using ligolw_segment_query
2. Next obtain the frame file cache from ligo_data_find
3. Then create the PSD you wish to recolor to (perhaps using gstlal_psd_xml_from_asd_txt)
4. compute a reference spectrum from the frame data that you wish to recolor using gstlal_reference_psd
5. You might choose to optionally "smooth" the reference spectrum in order to leave lines in the underlying data. You can try using gstlal_psd_polyfit
6. Now with segments, a frame cache, a PSD (possibly smoothed) measured from the frame cache, and a PSD that is the target for the recolored spectrum, you are free to use gstlal_fake_frames according to the documentation.
1. First obtain segments for the data using ``ligolw_query_gwosc_segments``
2. Then create the PSD you wish to recolor to (perhaps using ``gstlal_psd_xml_from_asd_txt``)
3. compute a reference spectrum from the strain data that you wish to recolor using ``gstlal_reference_psd``
4. You might choose to optionally "smooth" the reference spectrum in order to leave lines in the underlying data. You can try using ``gstlal_psd_polyfit``
5. Now with segments, a PSD (possibly smoothed) measured from the strain data, and a PSD that is the target for the recolored spectrum, you are free to use ``gstlal_fake_frames`` according to the documentation.
Recoloring existing data with a HTCondor dag
--------------------------------------------
Some of the steps required to automate the batch processing of recoloring a
large data set has been automated in a script that generates a condor DAG. The
input to the condor dag script has itself been automated in makefiles such as:
<a
href=http://www.lsc-group.phys.uwm.edu/cgit/gstlal/plain/gstlal/share/Makefile.2015recolored>this</a>.
input to the condor dag script has itself been automated in makefiles such as
`this <https://git.ligo.org/lscsoft/gstlal/raw/master/gstlal/share/Makefile.2015recolored>`_.
As an example try this::
$ wget http://www.lsc-group.phys.uwm.edu/cgit/gstlal/plain/gstlal/share/Makefile.2015recolored
$ wget https://git.ligo.org/lscsoft/gstlal/raw/master/gstlal/share/Makefile.2015recolored
$ wget https://git.ligo.org/lscsoft/gstlal/raw/master/gstlal/share/recolored_config.yml
$ make -f Makefile.2015recolored
$ condor_submit_dag gstlal_fake_frames_pipe.dag
$ condor_submit_dag fake_frames_dag.dag
You can monitor the dag progress with::
$ tail -f gstlal_fake_frames_pipe.dag.dagman.out
You should have directories called LIGO and Virgo that contain the recolored frame data. Try changing values in the Makefile to match what you need
TODO
----
$ tail -f fake_frames_dag.dag.dagman.out
1. Add support for making colored noise in the gstlal_fake_frames_pipe
You should have a ``/frames`` directory that contains the recolored frame data. Experiment
with changing parameters in the Makefile to generate different PSDs, create frames over different stretches
of data, etc.
Documentation for running an offline compact binary coalescence analysis
Running an offline compact binary coalescence analysis
========================================================================
Prerequisites
......
Documentation for starting an online compact binary coalescence analysis
Running an online compact binary coalescence analysis
========================================================================
Prerequisites
......
......@@ -4,5 +4,6 @@ Tutorials
.. toctree::
:maxdepth: 1
gstlal_fake_data_overview
online_analysis
offline_analysis
.. _workflow-config:
Workflow Configuration
=======================
WRITEME
#!/usr/bin/python
#!/usr/bin/env python3
# Copyright 2018 Chad Hanna
#
import sys
......@@ -6,14 +6,15 @@ import os
import subprocess
def process_source(prog, outfile):
for line in open(prog):
if not line.startswith("###"):
continue
outfile.write(line.replace("### ", "").replace("###",""))
with open(prog, 'r') as fid:
for line in fid.readlines():
if not line.startswith("###"):
continue
outfile.write(line.replace("### ", "").replace("###",""))
if len(sys.argv) == 1:
print "USAGE: sphinx-bindoc <output directory> <input directory> [patterns to exclude]"
print("USAGE: sphinx-bindoc <output directory> <input directory> [patterns to exclude]")
sys.exit()
assert(len(sys.argv) >= 3)
......@@ -23,7 +24,7 @@ outdir = sys.argv[1]
tocf = open(os.path.join(outdir, "bin.rst"), "w")
tocf.write("""bin
===
=====================
.. toctree::
:maxdepth: 1
......@@ -45,25 +46,27 @@ for prog in sorted(os.listdir(indir)):
tocf.write("\n %s" % os.path.split(fname)[-1].replace(".rst",""))
if os.path.exists(fname):
print >> sys.stderr, "File %s already exists, skipping." % fname
print("File %s already exists, skipping." % fname)
continue
else:
print >> sys.stderr, "Creating file ", fname
f = open(fname, "w", 0)
# parse the bin program itself for additional documentation
f.write("%s\n%s\n\n" % (prog, "".join(["="] * len(prog))))
process_source(path_to_prog, f)
print("Creating file ", fname)
# write the output of --help
f.write("%s\n%s\n\n" % ("Command line options", "".join(["-"] * len("Command line options"))))
f.write("\n\n.. code-block:: none\n\n")
proc = subprocess.Popen([path_to_prog, "--help"], stdout = subprocess.PIPE)
helpmessage = proc.communicate()[0]
helpmessage = "\n".join([" %s" % l for l in helpmessage.split("\n")])
f.write(helpmessage)
with open(fname, "w") as f:
# parse the bin program itself for additional documentation
f.write("%s\n%s\n\n" % (prog, "".join(["="] * len(prog))))
process_source(path_to_prog, f)
# close the file
f.close()
# write the output of --help
f.write("%s\n%s\n\n" % ("Command line options", "".join(["-"] * len("Command line options"))))
f.write("\n\n.. code-block:: none\n\n")
try:
proc = subprocess.Popen([path_to_prog, "--help"], stdout = subprocess.PIPE)
helpmessage = proc.stdout.read()
if isinstance(helpmessage, bytes):
helpmessage = helpmessage.decode('utf-8')
helpmessage = "\n".join([" %s" % l for l in helpmessage.split('\n')])
f.write(helpmessage)
except OSError:
pass
tocf.close()
dist_bin_SCRIPTS = \
gstlal_cherenkov_burst \
gstlal_cherenkov_calc_likelihood \
gstlal_cherenkov_calc_rank_pdfs \
gstlal_cherenkov_inj \
gstlal_cherenkov_plot_rankingstat \
gstlal_cherenkov_plot_summary \
gstlal_cherenkov_zl_rank_pdfs \
gstlal_cs_triggergen \
gstlal_excesspower \
gstlal_excesspower_trigvis \
gstlal_feature_aggregator \
gstlal_feature_extractor \
gstlal_feature_extractor_pipe \
gstlal_ll_feature_extractor_pipe \
gstlal_feature_extractor_whitener_check \
gstlal_feature_extractor_template_overlap \
gstlal_feature_hdf5_sink \
gstlal_feature_monitor \
gstlal_feature_synchronizer
gstlal_impulse_inj
#!/usr/bin/env python3
#
# Copyright (C) 2021 Soichiro Kuwahara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""Cherenkov burst searching tool"""
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
import numpy
from optparse import OptionParser
import sys
import tempfile
import threading
import gi
gi.require_version('Gst','1.0')
from gi.repository import GObject
GObject.threads_init()
from gi.repository import Gst
Gst.init(None)
import lal
from lal import series as lalseries
from lalburst import snglcoinc
import lalsimulation
from gstlal import datasource
from gstlal import pipeio
from gstlal import pipeparts
from gstlal.pipeparts import condition
from gstlal import simplehandler
from gstlal import snglbursttable
from gstlal import streamburca
from gstlal import cherenkov
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
from ligo import segments
from ligo.lw import ligolw
from ligo.lw import lsctables
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import ligolw_add
from ligo.lw.utils import process as ligolw_process
from ligo.lw.utils import segments as ligolw_segments
from ligo.lw.utils import time_slide as ligolw_time_slide
pipeparts.mkchecktimestamps = lambda pipeline, src, *args: src
@lsctables.use_in
class LIGOLWContentHandler(ligolw.LIGOLWContentHandler):
pass
program_name = "gstlal_cherenkov_burst"
#
# ================================================================================
#
# Custom SnglBurst Type
#
# ================================================================================
#
#
# override .template_id to induce the desired exact-match template
# comparison in the coincidence code
#
class SnglBurst(snglbursttable.GSTLALSnglBurst):
@property
def template_id(self):
return (self.central_freq, self.bandwidth)
#
# ================================================================================
#
# Command Line
#
# ================================================================================
#
def parse_command_line():
# prepare parser
parser = OptionParser(
description = "GstLAL-based cherenkov burst search pipeline."
)
parser.add_option("--verbose", action = "store_true", help = "Be verbose.")
parser.add_option("--template-bank", metavar = "filename", help = "Name of template file. Template bank for all detectors involved must be given.")
parser.add_option("--time-slide-file", metavar = "filename", help = "Name of time slide file. Required.")
parser.add_option("--reference-psd", metavar = "filename", help = "Use fixed PSDs from this file (one for each detector) instead of measuring them from the data. Optional.")
parser.add_option("--output", metavar = "filename", help = "Name of output xml file.")
parser.add_option("--rankingstat-output", metavar = "filename", help = "Name of output xml file to record rankingstat object.")
parser.add_option("--threshold", metavar = "snr_threshold", type = "float", default = 3.0, help = "SNR threshold (default = 3.0).")
parser.add_option("--cluster-events", metavar = "cluster_events", type = "float", default = 0.1, help = "Cluster events with input timescale in seconds (default = 0.1).")
parser.add_option("--min-instruments", metavar = "min_instruments", type = "int", default = 2, help = "Minimum number of instruments (default is 2).")
parser.add_option("--snr-dump", metavar = "start,stop", help = "Turn on SNR time series dumping to a text file within the GPS time range [start, stop).")
parser.add_option("--hoft-dump", metavar = "start,stop", help = "Turn on hoft dumping to a text file within the GPS time range [start, stop).")
datasource.append_options(parser)
# parse command line
options, filenames = parser.parse_args()
# do this before modifying the options object
process_params = options.__dict__.copy()
# check for missing options
required_options = set(("template_bank", "time_slide_file", "output", "rankingstat_output"))
missing_options = set(option for option in required_options if getattr(options, option) is None)
if missing_options:
raise ValueError("missing required option(s) %s" % ", ".join("--%s" % option.replace("_", "-") for option in (required_options - missing_options)))
options.gps_start_time = lal.LIGOTimeGPS(options.gps_start_time)
options.gps_end_time = lal.LIGOTimeGPS(options.gps_end_time)
datasourceinfo = datasource.DataSourceInfo.from_optparse(options)
if options.snr_dump is not None:
options.snr_dump = segments.segment(*map(lal.LIGOTimeGPS, options.snr_dump.split(",")))
if options.hoft_dump is not None:
options.hoft_dump = segments.segment(*map(lal.LIGOTimeGPS, options.hoft_dump.split(",")))
return options, process_params, datasourceinfo, filenames
#
# ================================================================================
#
# Pipeline Handler
#
# ================================================================================
#
class PipelineHandler(simplehandler.Handler):
def __init__(self, mainloop, pipeline, firbank, xmldoc_output, process, template_bank, triggergen, rankingstat, is_noninjections):
super(PipelineHandler, self).__init__(mainloop, pipeline)
self.lock = threading.Lock()
self.firbank = firbank
self.sngl_burst = lsctables.SnglBurstTable.get_table(xmldoc_output)
self.process = process
self.template_bank = template_bank
self.triggergen = triggergen
self.update_psd = dict.fromkeys(firbank, 0)
self.rankingstat = rankingstat
self.is_noninjections = is_noninjections
self.streamburca = streamburca.StreamBurca(xmldoc_output, process.process_id, rankingstat.delta_t, cherenkov.CherenkovBBCoincDef, min_instruments = rankingstat.min_instruments)
def appsink_new_buffer(self, elem):
with self.lock:
buf = elem.emit("pull-sample").get_buffer()
events = []
for i in range(buf.n_memory()):
memory = buf.peek_memory(i)
result, mapinfo = memory.map(Gst.MapFlags.READ)
assert result
if mapinfo.data:
events.extend(SnglBurst.from_buffer(mapinfo.data))
memory.unmap(mapinfo)
# get ifo from the appsink name property
instrument = elem.get_property("name")
# set event process id and event id
for event in events:
event.process_id = self.process.process_id
event.event_id = self.sngl_burst.get_next_id()
assert event.ifo == instrument
# extract segment. move the segment's upper
# boundary to include all triggers.
buf_timestamp = lal.LIGOTimeGPS(0, buf.pts)
buf_seg = segments.segment(buf_timestamp, max(buf_timestamp + lal.LIGOTimeGPS(0, buf.duration), max(event.peak for event in events) if events else 0.0))
if buf.mini_object.flags & Gst.BufferFlags.GAP:
# sanity check that gap buffers are empty
assert not events
else:
# update trigger rate and livetime
self.rankingstat.denominator.triggerrates[instrument].add_ratebin(tuple(map(float, buf_seg)), len(events))
# push the single detector trigger into the
# StreamBurca instance. the push method returns
# True if the coincidence engine has new results.
# In that case, call the pull() method to run the
# coincidence engine.
if self.streamburca.push(instrument, events, buf_timestamp):
self.streamburca.pull(self.rankingstat, self.rankingstat.segmentlists, noninjections = self.is_noninjections)
def flush(self):
self.streamburca.pull(self.rankingstat, self.rankingstat.segmentlists, noninjections = self.is_noninjections, flush = True)
def down_sample(self, psd, deltaf):
width = deltaf / psd.deltaF
# width must be integer otherwise we cant down sample psd.
assert width == int(width)
psd_data = psd.data.data
psd_data = [numpy.mean(psd_data[int(width * i):int(width * (i+1))], dtype = numpy.float64) for i in range(int(len(psd_data - 1) / width))]
psd_data = numpy.append(psd_data, 0.0)
new_psd = lal.CreateREAL8FrequencySeries("new_psd", psd.epoch, psd.f0, deltaf, psd.sampleUnits, len(psd_data))
new_psd.data.data = psd_data
return new_psd
def make_templates(self, instrument, psd):
# Create matrices to add firbank.
template_t = [None] * len(self.template_bank)
autocorr = [None] * len(self.template_bank)
# FIXME: down-sample PSD to, say, \Delta f = 0.5 Hz
psd = self.down_sample(psd, 0.5)
# obtain the value of sample rate and PSD kernel duration
sample_rate = 2 * (psd.f0 + psd.deltaF * (psd.data.length - 1))
assert sample_rate == int(sample_rate)
sample_rate = int(sample_rate)
fir_duration = 1. / psd.deltaF
assert fir_duration == int(fir_duration)
fir_length = int(fir_duration) * sample_rate
# This is to set rplan and fplan here for avoiding calculate these again in for loop.
# NOTE:This can be done because the length of the templates is fixed to fir_length.
fplan = lal.CreateForwardREAL8FFTPlan(fir_length, 0)
rplan = lal.CreateReverseREAL8FFTPlan(fir_length, 0)
template_f = lal.CreateCOMPLEX16FrequencySeries("template_freq", psd.epoch, 0.0, psd.deltaF, lal.Unit("strain s"), fir_length // 2 + 1)
template_f_squared = lal.CreateCOMPLEX16FrequencySeries("template_freq_squared", psd.epoch, 0.0, psd.deltaF, lal.Unit("strain s"), fir_length // 2 + 1)
autocorr_t = lal.CreateREAL8TimeSeries("autocorr", psd.epoch, 0.0, 1.0 / sample_rate, lal.Unit("strain"), fir_length)
# FIXME: adjust beta = what fraction of the window is in the tapers
window = lal.CreateTukeyREAL8Window(fir_length, 0.5).data.data
# make templates, whiten, put into firbank
for i, row in enumerate(self.template_bank):
# obtain cherenkov radiation-like waveform. Since this wave form is linealy polarized, only the plus mode will be considered.
template_t[i], _ = lalsimulation.SimBurstCherenkovRadiation(row.bandwidth, 1.0, 1.0 / sample_rate)
# Resize the template as its deltaF matches to the PSD.
assert template_t[i].data.length <= fir_length, "data length is too huge comparted to the zero pad."
lal.ResizeREAL8TimeSeries(template_t[i], -(fir_length - template_t[i].data.length) // 2, fir_length)
# Save the template file before whiten NOTE: Comment out when the waveform looks okay.
if False:
taxis = numpy.arange(template_t[i].data.length, dtype = "double") * template_t[i].deltaT + float(template_t[i].epoch)
template = template_t[i].data.data
template = numpy.array(list(map(list, zip(taxis,template))))
# FFT to frequency domain
lal.REAL8TimeFreqFFT(template_f, template_t[i], fplan)
# set DC and Nyquist to zero
template_f.data.data[0] = 0.0
template_f.data.data[template_f.data.length-1] = 0.0
# Save the template in frequency domain to check workability. NOTE: Comment out when this mission is completed.
if False:
spectrum = numpy.zeros(template_f.data.length)
data_spectrum = numpy.zeros(psd.data.length)
for j in range(template_f.data.length):
f = template_f.f0 + template_f.deltaF * j
spectrum[j] = f * f * (template_f.data.data[j].real * template_f.data.data[j].real + template_f.data.data[j].imag * template_f.data.data[j].imag)
numpy.savetxt("spectrum_beforewhiten.txt", spectrum, delimiter = ",")
numpy.savetxt("sample_template_faxis.txt", numpy.arange(template_f.data.length, dtype = "double") * template_f.deltaF + template_f.f0, delimiter = ",")
# whiten
lal.WhitenCOMPLEX16FrequencySeries(template_f, psd)
# Save the template in frequecncy domain after whiten to check. NOTE: Comment out when this mission is completed.
if False:
for j in range(template_f.data.length):
f = template_f.f0 + template_f.deltaF * j
spectrum[j] = f * f * (template_f.data.data[j].real * template_f.data.data[j].real + template_f.data.data[j].imag * template_f.data.data[j].imag)
data_spectrum[j] = numpy.sqrt(psd.data.data[j])
numpy.savetxt("spectrum_afterwhiten.txt", spectrum, delimiter = ",")
numpy.savetxt("data_spectrum.txt", data_spectrum, delimiter = ",")
# Perform inverse fft and get the template in time series which is whitened
lal.REAL8FreqTimeFFT(template_t[i], template_f, rplan)
# Obtain autocorrelation time series and add a sample of 0 in the tail to make the number of samle odd
template_f_squared.data.data = abs(template_f.data.data)**2
lal.REAL8FreqTimeFFT(autocorr_t, template_f_squared, rplan)
# Set the length of autocorrelation matrix to 2 seconds because we know this is lonh enough.
autocorr[i] = autocorr_t.data.data
autocorr[i] = numpy.concatenate((autocorr[i][1:(int(1 * sample_rate) + 1)][::-1], autocorr[i][:(int(1 * sample_rate) + 1)]))
assert len(autocorr[i]) % 2 == 1, "autocorr must have odd number of samples."
# normalize autocorrelation by central value so that central value becomes 1
autocorr[i] /= autocorr[i].max()
# apply Tukey window and normalize
template_t[i] = template_t[i].data.data * window
template_t[i] /= numpy.sqrt(numpy.dot(template_t[i], template_t[i]))
if False:
template = numpy.insert(template, 2, template_t[i], axis =1)
numpy.savetxt("sample_template.txt", template, delimiter= " ")
# make the number of samples odd by adding a sample of 0 in the tail
template_t[i] = numpy.append(template_t[i], 0.0)
assert len(template_t[i]) % 2 == 1, 'template must have odd number of samples.'
if False:
numpy.savetxt("autocornew.txt", autocorr[i], delimiter = ",")
#numpy.savetxt("sample_template_afterwhiten.txt", template_t[i], delimiter = ",")
# set them to the filter bank
self.firbank[instrument].set_property("fir_matrix", template_t)
self.triggergen[instrument].set_property("autocorrelation_matrix", autocorr)
def do_on_message(self, bus, message):
if message.type == Gst.MessageType.ELEMENT and message.get_structure().get_name() == "spectrum":
instrument = message.src.get_name().split("_")[-1]
psd = pipeio.parse_spectrum_message(message)
# To get proper PSD, make a condition on stability.
stability = float(message.src.get_property("n-samples")) / message.src.get_property("average-samples")
if stability > 0.3:
if self.update_psd[instrument] != 0:
self.update_psd[instrument] -=1
else:
self.update_psd[instrument] = 20
# FIXME: do this only once if using a reference psd
print("At GPS time", psd.epoch, "setting whitened templates for", instrument, file=sys.stderr)
self.make_templates(instrument, psd)
else:
# Use templates with all zeros so that we won't get any triggers.
if options.verbose:
print("At GPS time", psd.epoch, "burn in period", file = sys.stderr)
sample_rate = 2 * (psd.f0 + psd.deltaF * (psd.data.length - 1))
self.firbank[instrument].set_property("fir_matrix", [numpy.zeros(int(2 * sample_rate + 1))] * len(self.template_bank))
self.triggergen[instrument].set_property("autocorrelation_matrix", [numpy.zeros(int(2 * sample_rate + 1))] * len(self.template_bank))
return True
return False
#
# ================================================================================
#
# Main
#
# ================================================================================
#
#
# parse command line
#
options, process_params, datasourceinfo, filenames = parse_command_line()
# Obtain the instruments
all_inst = tuple(datasourceinfo.channel_dict.keys())
#
# load template bank file
#
template_bank_xml = ligolw_utils.load_filename(options.template_bank, contenthandler = LIGOLWContentHandler, verbose = options.verbose)
template_bank = lsctables.SnglBurstTable.get_table(template_bank_xml)
#
# load optional reference PSD
#
if options.reference_psd:
reference_psds = lalseries.read_psd_xmldoc(ligolw_utils.load_filename(options.reference_psd, contenthandler = lalseries.PSDContentHandler, verbose = options.verbose))
else:
reference_psds = None
#
# load time slide file and optionally injection file. this will be used as
# the starting point for our output file (lazy). check that he offset
# vectors match our instrument list
#
#FIXME: de-dupulicate time_slide table here
xmldoc_output = ligolw_add.ligolw_add(ligolw.Document(), [options.time_slide_file] + ([options.injection_file] if options.injection_file else []), verbose = options.verbose)
for time_slide_id, offset_vector in lsctables.TimeSlideTable.get_table(xmldoc_output).as_dict().items():
if set(offset_vector) != set(all_inst):
raise ValueError("time slide ID %d has wrong instruments: %s, need %s" % (time_slide_id, ", ".join("%s" % instrument for instrument in sorted(offset_vector)), ", ".join("%s" % instrument for instrument in sorted(all_inst))))
#
# initialize output
#
process = ligolw_process.register_to_xmldoc(xmldoc_output, program_name, process_params)
sngl_burst_table = xmldoc_output.childNodes[-1].appendChild(lsctables.New(lsctables.SnglBurstTable, ("process:process_id", "ifo", "central_freq", "bandwidth", "snr", "chisq", "chisq_dof", "peak_time", "peak_time_ns", "event_id")))
sngl_burst_table.sync_next_id()
#
# initialize ranking statistic
#
rankingstat = cherenkov_rankingstat.RankingStat(all_inst, delta_t = 0.010, min_instruments = options.min_instruments)
#
# built graph
#
mainloop = GObject.MainLoop()
pipeline = Gst.Pipeline(name = "pipeline")
firbank = dict.fromkeys(all_inst, None)
triggergen = dict.fromkeys(all_inst, None)
for instrument in all_inst:
# for frame file, statevector and dqvector are None
head, statevector, dqvector, idq_series = datasource.mkbasicsrc(pipeline, datasourceinfo, instrument, options.verbose)
#
# dump hoft series.
#
if options.hoft_dump is not None:
head = pipeparts.mktee(pipeline, head)
pipeparts.mknxydumpsink(pipeline, head, "%s_hoft_dump_%s.txt" % (program_name, instrument), segment = options.hoft_dump)
rate = 8192
head = condition.mkcondition(
pipeline,
src = head,
target_rate = rate,
instrument = instrument,
width = 32,
statevector = statevector,
dqvector = dqvector,
psd = reference_psds[instrument] if reference_psds is not None else None,
psd_fft_length = 32,
ht_gate_threshold = float("inf"),
track_psd = reference_psds is None
)
#
# dump whitened hoft series.
#
if options.hoft_dump is not None:
head = pipeparts.mktee(pipeline, head)
pipeparts.mknxydumpsink(pipeline, head, "%s_hoft_whiten_dump_%s_%d.txt" % (program_name, instrument, rate), segment = options.hoft_dump)
#
# filter bank
#
head = firbank[instrument] = pipeparts.mkfirbank(pipeline, head, fir_matrix = numpy.zeros((len(template_bank), int(2 * rate) + 1), dtype = numpy.float64), block_stride = 4 * rate, latency = int(rate))
#
# dump snr time series. Use for impulse resoponse checking.
#
if options.snr_dump is not None:
head = pipeparts.mktee(pipeline, head)
pipeparts.mknxydumpsink(pipeline, head, "%s_snr_dump_%s_%d.txt" % (program_name, instrument, rate), segment = options.snr_dump)
#
# trigger generator
# Using temp file adjusting the "ifo" column of template_bank to instrument in for loop.
#
with tempfile.NamedTemporaryFile(suffix = ".xml.gz") as fp:
template_bank.getColumnByName("ifo")[:] = [instrument] * len(template_bank)
xmldoc = ligolw.Document()
xmldoc.appendChild(ligolw.LIGO_LW())
xmldoc.childNodes[-1].appendChild(template_bank)
ligolw_utils.write_filename(xmldoc, fp.name, verbose = options.verbose)
head = triggergen[instrument] = pipeparts.mkgeneric(pipeline, head, "lal_string_triggergen", threshold = options.threshold, cluster = options.cluster_events, bank_filename = fp.name, autocorrelation_matrix = numpy.zeros((len(template_bank), int(2 * rate + 1)), dtype = numpy.float64))
#
# handler
#
handler = PipelineHandler(mainloop, pipeline, firbank, xmldoc_output, process, template_bank, triggergen, rankingstat, is_noninjections = options.injection_file is None)
#
# appsync
#
appsync = pipeparts.AppSync(appsink_new_buffer = handler.appsink_new_buffer)
appsinks = set(appsync.add_sink(pipeline, triggergen[inst], caps = Gst.Caps.from_string("application/x-lal-snglburst"), name = inst) for inst in all_inst)
#
# seek
#
if pipeline.set_state(Gst.State.READY) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline did not enter ready state")
datasource.pipeline_seek_for_gps(pipeline, options.gps_start_time, options.gps_end_time)
#
# run
#
if pipeline.set_state(Gst.State.PLAYING) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline did not enter playing state")
try:
pipeparts.write_dump_dot(pipeline, "%s.%s" % ("cherenkov-burst", "NULL"), verbose = options.verbose)
except ValueError:
pass
if options.verbose:
print("running pipeline ...", file = sys.stderr)
mainloop.run()
handler.flush()
#
# write output
#
process.set_end_time_now()
ligolw_utils.write_filename(xmldoc_output, options.output, verbose = options.verbose)
# write rankingstat object to an XML file
xmldoc_rankingstat = ligolw.Document()
xmldoc_rankingstat.appendChild(ligolw.LIGO_LW())
process_rankingstat = ligolw_process.register_to_xmldoc(xmldoc_rankingstat, program_name, process_params)
xmldoc_rankingstat.childNodes[-1].appendChild(rankingstat.to_xml("rankingstat"))
ligolw_utils.write_filename(xmldoc_rankingstat, options.rankingstat_output, verbose = options.verbose)
#!/usr/bin/env python3
#
# Copyright (C) 2010--2015 Kipp Cannon, Chad Hanna
# Copyright (C) 2021 Soichiro Kuwawhara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
### A program to compute the noise probability distributions of likehood ratios for inspiral triggers
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
from optparse import OptionParser
import sys
from tqdm import tqdm
from ligo.lw import ligolw
from ligo.lw import lsctables
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import process as ligolw_process
from lal.utils import CacheEntry
from gstlal import cherenkov
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
__author__ = "Soichiro Kuwara <soichiro.kuwahara@ligo.org>"
#
# =============================================================================
#
# Command Line
#
# =============================================================================
#
def parse_command_line():
parser = OptionParser(
description = "Rankngstat calculation program for Cherenkov burst search.",
usage = "%prog [options] [candidatesxml ...]"
)
parser.add_option("--candidates-cache", metavar = "filename", help = "Also load the candidates from files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("--ranking-stat-cache", metavar = "filename", help = "Also load the ranking statistic likelihood ratio data files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("--ranking-stat", metavar = "filename", action = "append", help = "Load ranking statistic data from this file. Can be given multiple times.")
parser.add_option("-v", "--verbose", action = "store_true", help = "Be verbose.")
options, urls = parser.parse_args()
paramdict = options.__dict__.copy()
if options.candidates_cache is not None:
urls += [CacheEntry(line).url for line in open(options.candidates_cache)]
if not urls:
raise ValueError("must provide at least one candidate file")
if options.ranking_stat is None:
options.ranking_stat = []
if options.ranking_stat_cache is not None:
options.ranking_stat += [CacheEntry(line).url for line in open(options.ranking_stat_cache)]
if not options.ranking_stat:
raise ValueError("must provide at least one ranking statistic file")
return options, urls, paramdict
#
# =============================================================================
#
# Main
#
# =============================================================================
#
#
# command line
#
options, urls, paramdict = parse_command_line()
#
# load parameter distribution data
#
rankingstat = cherenkov_rankingstat.marginalize_rankingstat_urls(options.ranking_stat, verbose = options.verbose)
#
# invoke .finish() to apply density estimation kernels and correct the
# normalization.
#
rankingstat.finish()
#
# load zero-lag candidates and histogram their ranking statistics
#
for n, url in enumerate(urls, 1):
if options.verbose:
print("%d/%d: " % (n, len(urls)), end = "", file = sys.stderr)
xmldoc = ligolw_utils.load_url(url, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler, verbose = options.verbose)
process = ligolw_process.register_to_xmldoc(xmldoc, "gstlal_cherenkov_calc_likelihood", paramdict = paramdict)
time_slide_index = lsctables.TimeSlideTable.get_table(xmldoc).as_dict()
sngl_burst_index = dict((row.event_id, row) for row in lsctables.SnglBurstTable.get_table(xmldoc))
coinc_index = {}
for row in lsctables.CoincMapTable.get_table(xmldoc):
if row.table_name == "sngl_burst":
if row.coinc_event_id not in coinc_index:
coinc_index[row.coinc_event_id] = []
coinc_index[row.coinc_event_id].append(sngl_burst_index[row.event_id])
coinc_def_id = lsctables.CoincDefTable.get_table(xmldoc).get_coinc_def_id(search = cherenkov.CherenkovBBCoincDef.search, search_coinc_type = cherenkov.CherenkovBBCoincDef.search_coinc_type, create_new = False)
for coinc in tqdm(lsctables.CoincTable.get_table(xmldoc), desc = "calculating LR", disable = not options.verbose):
if coinc.coinc_def_id == coinc_def_id:
coinc.likelihood = rankingstat.ln_lr_from_triggers(coinc_index[coinc.coinc_event_id], time_slide_index[coinc.time_slide_id])
process.set_end_time_now()
ligolw_utils.write_url(xmldoc, url, verbose = options.verbose)
xmldoc.unlink()
#!/usr/bin/env python
# Copyright (C) 2013 Madeline Wade
#!/usr/bin/env python3
#
# Copyright (C) 2010--2015 Kipp Cannon, Chad Hanna
# Copyright (C) 2021 Soichiro Kuwawhara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
......@@ -15,6 +17,8 @@
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
### A program to compute the noise probability distributions of likehood ratios for inspiral triggers
#
# =============================================================================
#
......@@ -24,50 +28,50 @@
#
import numpy
import sys
from gstlal import pipeparts
from gstlal import calibration_parts
import test_common_old
from optparse import OptionParser
from ligo.lw import ligolw
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import process as ligolw_process
from lal.utils import CacheEntry
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
__author__ = "Soichiro Kuwara <soichiro.kuwahara@ligo.org>"
#
# =============================================================================
#
# Pipelines
# Command Line
#
# =============================================================================
#
def lal_interleave_01(pipeline, name):
caps = "audio/x-raw-float, width=64, rate=2048, channels=1"
def parse_command_line():
parser = OptionParser(
description = "Rankngstat calculation program for Cherenkov burst search.",
usage = "%prog [options] [rankingstatxml ...]"
)
parser.add_option("--output", metavar = "filename", help = "Write ranking statistic PDFs to this LIGO Light-Weight XML file.")
parser.add_option("--ranking-stat-cache", metavar = "filename", help = "Also load the ranking statistic likelihood ratio data files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("--ranking-stat-samples", metavar = "N", default = 2**24, type = "int", help = "Construct ranking statistic histograms by drawing this many samples from the ranking statistic generator (default = 2^24).")
parser.add_option("-v", "--verbose", action = "store_true", help = "Be verbose.")
options, urls = parser.parse_args()
#
# build pipeline
#
paramdict = options.__dict__.copy()
head = test_common_old.test_src(pipeline)
headtee = pipeparts.mktee(pipeline, head)
if options.ranking_stat_cache is not None:
urls += [CacheEntry(line).url for line in open(options.ranking_stat_cache)]
if not urls:
raise ValueError("must provide some ranking statistic files")
head1 = pipeparts.mkfirbank(pipeline, headtee, latency=-10, fir_matrix = [[0,1]])
head1tee = pipeparts.mktee(pipeline, head1)
pipeparts.mknxydumpsink(pipeline, pipeparts.mkqueue(pipeline, head1tee), "%s_in_shifted.dump" % name)
if options.output is None:
raise ValueError("must set --output")
out = pipeparts.mkgeneric(pipeline, None, "lal_interleave", sync = True)
#out = pipeparts.mkgeneric(pipeline, None, "lal_adder", sync = True)
pipeparts.mkqueue(pipeline, headtee).link(out)
pipeparts.mkqueue(pipeline, head1tee).link(out)
#out = calibration_parts.mkinterleave(pipeline, calibration_parts.list_srcs(pipeline, headtee, head1tee), caps)
pipeparts.mknxydumpsink(pipeline, out, "%s_out.dump" % name)
pipeparts.mknxydumpsink(pipeline, pipeparts.mkqueue(pipeline, headtee), "%s_in_notshifted.dump" % name)
return options, urls, paramdict
#
# done
#
return pipeline
#
# =============================================================================
......@@ -78,5 +82,54 @@ def lal_interleave_01(pipeline, name):
#
test_common_old.build_and_run(lal_interleave_01, "lal_interleave_01")
#
# command line
#
options, urls, paramdict = parse_command_line()
#
# start the output document and record our start time
#
xmldoc = ligolw.Document()
xmldoc.appendChild(ligolw.LIGO_LW())
#
# load parameter distribution data
#
rankingstat = cherenkov_rankingstat.marginalize_rankingstat_urls(urls, verbose = options.verbose)
process = ligolw_process.register_to_xmldoc(xmldoc, "gstlal_cherenkov_calc_rank_pdfs", paramdict = paramdict, instruments = rankingstat.instruments)
#
# invoke .finish() to apply density estimation kernels and correct the
# normalization.
#
rankingstat.finish()
#
# generate likelihood ratio histograms
#
rankingstatpdf = cherenkov_rankingstat.RankingStatPDF(rankingstat, nsamples = options.ranking_stat_samples, verbose = options.verbose)
#
# Write the ranking statistic distribution data to a file
#
xmldoc.childNodes[-1].appendChild(rankingstatpdf.to_xml("gstlal_cherenkov_rankingstat_pdf"))
process.set_end_time_now()
ligolw_utils.write_filename(xmldoc, options.output, verbose = options.verbose)
#!/usr/bin/env python3
#
# Copyright (C) 2021 Soichiro Kuwahara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""Cherenkov burst injection tool"""
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
import math
from optparse import OptionParser
import random
import sys
import lal
from ligo.lw import ligolw
from ligo.lw import lsctables
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import process as ligolw_process
from ligo.lw.utils import segments as ligolw_segments
from ligo.segments import utils as segments_utils
@lsctables.use_in
class LIGOLWContentHandler(ligolw.LIGOLWContentHandler):
pass
__author__ = "Soichrio Kuwahara <soichiro.kuwahara@ligo.org>"
program_name = "gstlal_cherenkov_inj"
#
# =============================================================================
#
# Command Line
#
# =============================================================================
#
def parse_command_line():
parser = OptionParser(
description = "GstLAL-based cherenkov burst injection pipeline."
)
parser.add_option("--dEoverdA", type = "float", help = "Set dEoverdA. Non input of this generates uniform random amplitude.")
parser.add_option("--deltaT", metavar = "s", default = 0, type = "float", help = "Set the time interval of injections. Default value is 0 for injecting only one.")
parser.add_option("--start", type = "float", help = "start gps time of the injection. Non input will generate the injection in whole segements.")
parser.add_option("--stop", type = "float", help = "end gps time of the injection. Non input will generate the injection in whole segements.")
parser.add_option("--min-instruments", metavar = "num", default = 1, type = "int", help = "When constructing the segment list, require at least this many instruments to be on (default = 1). See also --segments-file and --segments-name.")
parser.add_option("--segments-file", metavar = "filename", help = "Load the segment list for the injections from this file (required). See also --segments-name and --min-instruments.")
parser.add_option("--segments-name", metavar = "name", default = "datasegments", help = "Use the segments with this name in the segments file (default = \"datasegments\")). See also --segments-file and --min-instruments")
parser.add_option("--time-slide-file", metavar = "filename", help = "Associate injections with the first time slide ID in this XML file (required).")
parser.add_option("--output", metavar = "filename", help = "Set the name of the output file (default = stdout).")
parser.add_option("--verbose", action = "store_true", help = "Be verbose.")
options, filenames = parser.parse_args()
# save for the process_params table
process_params = dict(options.__dict__)
# check for params
required_options = set(("min_instruments", "segments_file", "segments_name", "output", "time_slide_file"))
missing_options = set(option for option in required_options if getattr(options, option) is None)
if missing_options:
raise ValueError("missing required option(s) %s" % ", ".join("--%s" % option.replace("_", "-") for option in missing_options))
return options, process_params, filenames
#
# =============================================================================
#
# Main
#
# =============================================================================
#
options, process_params, filenames = parse_command_line()
#
# use the time-slide file to start the output document
#
xmldoc = ligolw_utils.load_filename(options.time_slide_file, verbose = options.verbose, contenthandler = LIGOLWContentHandler)
#
# add our metadata
#
process = ligolw_process.register_to_xmldoc(xmldoc, "cherenkov_burst_inj", process_params)
#
# use whatever time slide vector comes first in the table (lazy)
#
time_slide_table = lsctables.TimeSlideTable.get_table(xmldoc)
time_slide_id = time_slide_table[0].time_slide_id
offset_vector = time_slide_table.as_dict()[time_slide_id]
if options.verbose:
print("associating injections with time slide (%d) %s" % (time_slide_id, offset_vector), file = sys.stderr)
#
# construct the segment list for injection times
#
segmentlists = ligolw_segments.segmenttable_get_by_name(ligolw_utils.load_filename(options.segments_file, contenthandler = LIGOLWContentHandler, verbose = options.verbose), options.segments_name)
segmentlists.coalesce()
if set(segmentlists) != set(offset_vector):
raise ValueError("time slide had instruments %s but segment list has instruments %$. must be the same." % (", ".join(sorted(offset_vector)), ", ".join(sorted(segmentlists))))
# injections are done so that they are coincident when the data has the
# given offset vector applied. so we apply the offset vector to the
# segment lists, determine when the number of on instruments meets the
# --min-instruments criterion, and then *without restoring the offsets* use
# that segment list for the injections. when, later, the injection code
# performs these injections the injection times will have the opposite time
# shift applied which will place them back into the available segments in
# the respective detectors.
segmentlists.offsets.update(offset_vector)
segmentlist = segments_utils.vote(segmentlists.values(), options.min_instruments)
if not abs(segmentlist):
raise ValueError("the --min-instruments criterion cannot be satisfied")
#
# find or add a sim_burst table
#
try:
lsctables.SimBurstTable.get_table(xmldoc)
except ValueError:
# no sim_burst table in document
pass
else:
raise ValueError("%s contains a sim_burst table. this program isn't smart enough to deal with that." % options.time_slide_xml)
sim_burst_tbl = xmldoc.childNodes[-1].appendChild(lsctables.New(lsctables.SimBurstTable, ["process:process_id", "simulation_id", "time_slide:time_slide_id", "waveform", "waveform_number", "ra", "dec", "psi", "q", "hrss", "time_geocent_gps", "time_geocent_gps_ns", "time_geocent_gmst", "duration", "frequency", "bandwidth", "egw_over_rsquared", "amplitude", "pol_ellipse_angle", "pol_ellipse_e"]))
#
# populate the sim_burst table with injections
#
if options.start is None and options.stop is None:
start, stop = segmentlist.extent()
else:
start = options.start
stop = options.stop
for i in range(math.ceil(float(start) / options.deltaT), math.ceil(float(stop) / options.deltaT)):
# we use floating point arithmetic to compute the times because we
# don't really care when they occur, not to the nanosecond.
time_geocent = lal.LIGOTimeGPS(i * options.deltaT)
# skip injections outside the desired segments
if time_geocent not in segmentlist:
continue
# add an injection at the desired time
sim_burst_tbl.append(sim_burst_tbl.RowType(
# metadata
process_id = process.process_id,
simulation_id = sim_burst_tbl.get_next_id(),
time_slide_id = time_slide_id,
waveform = "Cherenkov",
# waveform parameters
time_geocent = time_geocent,
frequency = math.nan,
# bandwidth parameter is actually not used. Hence, the length of Enterprise in StarTrek.is inputted as a fixed parameter.
bandwidth = 641.,
egw_over_rsquared = math.nan,
# sky location and polarization axis orientation
ra = random.uniform(0., 2. * math.pi),
dec = math.asin(random.uniform(-1., 1.)),
psi = random.uniform(0., 2. * math.pi),
# unnecessary columns
waveform_number = 0.,
# Create the uniform and random amplitude in log10.
amplitude = pow(10, random.uniform(-1., 5.)) if options.dEoverdA is None else options.dEoverdA,
duration = math.nan,
q = math.nan,
hrss = math.nan,
pol_ellipse_angle = math.nan,
pol_ellipse_e = math.nan
))
#
# write output
#
ligolw_utils.write_filename(xmldoc, options.output, verbose = options.verbose)
#!/usr/bin/env python3
#
# Copyright (C) 2021 Soichiro Kuwahara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""Cherenkov burst rankingstat plotting tool"""
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
import math
from gstlal.plots import set_matplotlib_cache_directory
set_matplotlib_cache_directory()
import matplotlib
from matplotlib import figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
matplotlib.rcParams.update({
"font.size": 10.0,
"axes.titlesize": 10.0,
"axes.labelsize": 10.0,
"xtick.labelsize": 8.0,
"ytick.labelsize": 8.0,
"legend.fontsize": 8.0,
"figure.dpi": 300,
"savefig.dpi": 300,
"text.usetex": True
})
import numpy
from optparse import OptionParser
import sys
from ligo.lw import utils as ligolw_utils
from lal.utils import CacheEntry
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
__author__ = "Soichrio Kuwahara <soichiro.kuwahara@ligo.org>"
program_name = "gstlal_cherenkov_plot_rankingstat"
#
# =============================================================================
#
# Command Line
#
# =============================================================================
#
def parse_command_line():
parser = OptionParser(
description = "GstLAL-based cherenkov burst ranking statistic plotting tool."
)
parser.add_option("--ranking-stat-cache", metavar = "filename", help = "Also load the ranking statistic likelihood ratio data files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("--instrument", type = "string", help = "choose the instrument for the pdf calculation.")
parser.add_option("--width", type = "float", help = "width of the gaussian distribution in rankingstat in log scale.")
parser.add_option("--slope", type = "float", help = "slope of the pdf in the region where gaussian noise is dominant.")
parser.add_option("--threshold", type = "float", help = "threshold of the SNR where the dominance changes.")
parser.add_option("--matching", type = "float", help = "The chi^2 / rho value when the unmatch between template and injections become dominant.")
parser.add_option("--output", metavar = "string", help = "Name of output png.")
parser.add_option("-v", "--verbose", action = "store_true", help = "Be verbose.")
parser.add_option("--is-calculate", action = "store_true", help = "Whether plot calculated PDF for adjustment.")
options, urls = parser.parse_args()
if options.ranking_stat_cache is not None:
urls += [CacheEntry(line).url for line in open(options.ranking_stat_cache)]
# check for input parameters
required_options = set(("instrument", "width", "slope", "threshold", "matching"))
if options.is_calculate:
missing_options = set(option for option in required_options if getattr(options, option) is None)
if missing_options:
raise ValueError("missing required option(s) %s" % ", ".join("--%s" % option.replace("_", "-") for option in missing_options))
# save for the process_params table
options.options_dict = dict(options.__dict__)
return options, urls
#
# =============================================================================
#
# Main
#
# =============================================================================
#
options, urls = parse_command_line()
#
# load the ranking stat file
#
rankingstat = cherenkov_rankingstat.marginalize_rankingstat_urls(urls, verbose = options.verbose)
#
# create artficial PDF
#
def gaussian(x, mu, sig):
return numpy.exp(-(x - mu)**2. / (2. * sig**2.))
def calculated_pdf(x, y, width, slope, snr, matching):
pdf = numpy.zeros((len(x), len(y)))
for i in range(len(x)):
if x[i] >= snr:
# Gaussian in log scale
pdf[i][:] = gaussian(numpy.log10(y), matching, width) + (-4. * math.log10(x[i]))
else:
# Gaussian with decreasing width
pdf[i][:] = gaussian(numpy.log10(y), matching + slope * math.log10(x[i] / snr), 0.05 + (width - 0.05) * math.log10(x[i]) / math.log10(snr)) + (-4. * math.log10(x[i]))
return pdf
#
# plot denominator snr, chisq PDFs
#
for instrument, pdf in rankingstat.denominator.lnpdfs.items():
x,y = pdf.bins.centres()
lnpdf = pdf.at_centres()
if options.is_calculate:
calpdf = calculated_pdf(x, y, options.width, options.slope, options.threshold, options.matching)
fig = figure.Figure()
FigureCanvas(fig)
axes = fig.gca()
norm = matplotlib.colors.Normalize(vmin = -30., vmax = lnpdf.max())
mesh = axes.pcolormesh(x, y, lnpdf.T, norm = norm, cmap = "afmhot", shading = "gouraud")
if options.is_calculate:
axes.pcolormesh(x, y, calpdf.T, norm = norm, cmap = "winter", shading = "gouraud")
axes.contour(x, y, lnpdf.T, colors = "k", linestyles = "-", linewidths = .5, alpha = .3)
axes.loglog()
axes.grid(which = "both")
axes.set_xlim((3., 500.))
axes.set_ylim((.000001, 5.))
fig.colorbar(mesh, ax = axes)
axes.set_xlabel("$\mathrm{SNR}$")
axes.set_ylabel("$\chi^{2}/ \mathrm{SNR}^{2}$")
fig.savefig("%s_%s_pdf.png" % (options.output, instrument))
rankingstat.finish()
for instrument, pdf in rankingstat.denominator.lnpdfs.items():
x,y = pdf.bins.centres()
lnpdf = pdf.at_centres()
fig = figure.Figure()
FigureCanvas(fig)
axes = fig.gca()
norm = matplotlib.colors.Normalize(vmin = -30, vmax = lnpdf.max())
cmap = matplotlib.cm.afmhot
cmap.set_bad("k")
mesh = axes.pcolormesh(x, y, lnpdf.T, norm = norm, cmap = cmap, shading = "gouraud")
axes.contour(x, y, lnpdf.T, norm = norm, colors = "k", linestyles = "-", linewidths = .5, alpha = .3)
axes.loglog()
axes.grid(which = "both")
axes.set_xlim((3., 500.))
axes.set_ylim((.000001, 5))
colorbar = fig.colorbar(mesh, ax = axes)
colorbar.set_clim(-30, lnpdf.max())
axes.set_xlabel("$\mathrm{SNR}$")
axes.set_ylabel("$\chi^{2}/ \mathrm{SNR}^{2}$")
fig.savefig("%s_%s_kdepdf.png" % (options.output, instrument))
#!/usr/bin/env python3
#
# Copyright (C) 2021 Soichiro Kuwahara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""Cherenkov burst plotting summary tool"""
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
import itertools
import math
from gstlal.plots import set_matplotlib_cache_directory
set_matplotlib_cache_directory()
import matplotlib
from matplotlib import figure
from matplotlib import patches
matplotlib.rcParams.update({
"font.size": 12.0,
"axes.titlesize": 12.0,
"axes.labelsize": 12.0,
"xtick.labelsize": 12.0,
"ytick.labelsize": 12.0,
"legend.fontsize": 10.0,
"figure.dpi": 300,
"savefig.dpi": 300,
"text.usetex": True,
"path.simplify": True,
"font.family": "serif"
})
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
import numpy
from scipy import interpolate
from scipy import optimize
import sys
from optparse import OptionParser
from lal import rate
from lalburst import binjfind
import lalsimulation
from lal.utils import CacheEntry
from gstlal import cherenkov
from gstlal import stats as gstlalstats
from gstlal.plots import util as gstlalplotutil
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
from ligo.lw.utils import process as ligolw_process
from ligo.lw import utils as ligolw_utils
from ligo.lw import lsctables
#
# =============================================================================
#
# Command Line
#
# =============================================================================
#
def parse_command_line():
parser = OptionParser()
parser.add_option("--candidates-cache", metavar = "filename", help = "Also load the candidates from files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("--power", type = "float", help = "Power of the source. It is used for making the contour graph of true rate.")
parser.add_option("--ranking-stat-pdf", metavar = "filename", help = "Set the name of the xml file rankingstatpdf.")
parser.add_option("--verbose", "-v", action = "store_true", help = "Be verbose.")
options, urls = parser.parse_args()
if options.ranking_stat_pdf is None:
options.ranking_stat_pdf = []
if options.candidates_cache is not None:
urls += [CacheEntry(line).url for line in open(options.candidates_cache)]
if not options.ranking_stat_pdf:
raise ValueError("must provide at least one ranking statistic file")
if not options.candidates_cache:
raise ValueError("must provide at least one output file")
return options, urls
#
# =============================================================================
#
# Misc Utilities
#
# =============================================================================
#
def create_plot(x_label = None, y_label = None, width = 165.0, aspect = gstlalplotutil.golden_ratio):
fig = figure.Figure()
FigureCanvas(fig)
fig.set_size_inches(width / 25.4, width / 25.4 / aspect)
axes = fig.add_axes((0.1, 0.12, .875, .80))
axes.grid(True)
if x_label is not None:
axes.set_xlabel(x_label)
if y_label is not None:
axes.set_ylabel(y_label)
return fig, axes
def is_injection_document(xmldoc):
try:
lsctables.SimBurstTable.get_table(xmldoc)
except ValueError:
return False
return True
#
# =============================================================================
#
# False alarm rates/probabilities
#
# =============================================================================
#
#
# Class to compute false-alarm probabilities and false-alarm rates from
# ranking statistic PDFs
#
class FAPFAR(object):
def __init__(self, rankingstatpdf):
# obtain livetime from rankingstatpdf
# FIXME: temporaly unavailable
#self.livetime = float(abs(rankingstatpdf.segments))
# set the rate normalization LR threshold to the mode of
# the zero-lag LR distribution
rate_normalization_lr_threshold, = rankingstatpdf.zl_lr_lnpdf.argmax()
# FIXME: override with a manual cut until an automatic algorithm can be found
rate_normalization_lr_threshold = -19.
# record trials factor, with safety checks
counts = rankingstatpdf.zl_lr_lnpdf.count
self.count_above_threshold = counts[rate_normalization_lr_threshold:,].sum()
# get noise model ranking stat value and event count from
# bins
threshold_index = rankingstatpdf.noise_lr_lnpdf.bins[0][rate_normalization_lr_threshold]
ranks = rankingstatpdf.noise_lr_lnpdf.bins[0].lower()[threshold_index:]
counts = rankingstatpdf.noise_lr_lnpdf.array[threshold_index:]
# complementary cumulative distribution function
ccdf = counts[::-1].cumsum()[::-1]
ccdf /= ccdf[0]
# ccdf is P(ranking statistic > threshold | a candidate).
# we need P(ranking statistic > threshold), i.e. need to
# correct for the possibility that no candidate is present.
# specifically, the ccdf needs to =1-1/e at the candidate
# identification threshold, and cdf=1/e at the candidate
# threshold, in order for FAP(threshold) * live time to
# equal the actual observed number of candidates.
ccdf = gstlalstats.poisson_p_not_0(ccdf)
# build interpolator
self.ccdf_interpolator = interpolate.interp1d(ranks, ccdf)
# record min and max ranks so we know which end of the ccdf
# to use when we're out of bounds
self.minrank = ranks[0]
self.maxrank = ranks[-1]
@property
def threshold(self):
"""
Noise model event rate prediction only valid at or above
this value of ranking statistic.
"""
return self.minrank
@gstlalstats.assert_probability
def ccdf_from_rank(self, rank):
return self.ccdf_interpolator(numpy.clip(rank, self.minrank, self.maxrank))
def fap_from_rank(self, rank):
# implements equation (8) from Phys. Rev. D 88, 024025.
# arXiv: 1209.0718
return gstlalstats.fap_after_trials(self.ccdf_from_rank(rank), self.count_above_threshold)
def far_from_rank(self, rank):
# implements equation (B4) of Phys. Rev. D 88, 024025.
# arXiv: 1209.0718. the return value is divided by T to
# convert events/experiment to events/second. "tdp" =
# true-dismissal probability = 1 - single-event FAP, the
# integral in equation (B4)
log_tdp = numpy.log1p(-self.ccdf_from_rank(rank))
return self.count_above_threshold * -log_tdp# / self.livetime
#
# =============================================================================
#
# Ranking Stat PDF Plots
#
# =============================================================================
#
def rankingstat_pdfs_plot(rankingstatpdf):
fig = figure.Figure()
FigureCanvas(fig)
axes = fig.gca()
line1, = axes.semilogy(rankingstatpdf.noise_lr_lnpdf.bins[0].centres(), numpy.exp(rankingstatpdf.noise_lr_lnpdf.at_centres()), linewidth = 1)
rankingstatpdf.zl_lr_lnpdf.normalize()
axes.semilogy(rankingstatpdf.zl_lr_lnpdf.bins[0].centres(), numpy.exp(rankingstatpdf.zl_lr_lnpdf.at_centres()), color = 'red')
axes.set_xlim(-27., 5)
axes.set_ylim(1e-4, 1e1)
axes.set_xlabel("$\ln\mathcal{L}$")
axes.set_ylabel("$P$")
return fig
#
# =============================================================================
#
# Rate vs. Threshold Plots
#
# =============================================================================
#
def sigma_region(mean, nsigma):
return numpy.concatenate((mean - nsigma * numpy.sqrt(mean), (mean + nsigma * numpy.sqrt(mean))[::-1]))
def create_rate_vs_lnL_plot(axes, stats, stats_label, fapfar):
axes.semilogy()
#
# plot limits and expected counts
#
#xlim = fapfar.threshold, max(2. * math.ceil(stats.max() / 2.), 10.)
xlim = fapfar.threshold, 20.
#ylim = 5e-7, 10.**math.ceil(math.log10(expected_count(xlim[0])))
#xlim = -0.1, max(2. * math.ceil(stats.max() / 2.), 10.)
ylim = 1e-6, 1e5
#
# expected count curve
#
expected_count_x = numpy.linspace(xlim[0], xlim[1], 10000)
expected_count_y = fapfar.far_from_rank(expected_count_x)
line1, = axes.plot(expected_count_x, expected_count_y, 'k--', linewidth = 1)
#
# error bands
#
expected_count_x = numpy.concatenate((expected_count_x, expected_count_x[::-1]))
line2, = axes.fill(expected_count_x, sigma_region(expected_count_y, 3.0).clip(*ylim), alpha = 0.25, facecolor = [0.75, 0.75, 0.75])
line3, = axes.fill(expected_count_x, sigma_region(expected_count_y, 2.0).clip(*ylim), alpha = 0.25, facecolor = [0.5, 0.5, 0.5])
line4, = axes.fill(expected_count_x, sigma_region(expected_count_y, 1.0).clip(*ylim), alpha = 0.25, facecolor = [0.25, 0.25, 0.25])
#
# observed
#
if stats is not None:
N = numpy.arange(1., len(stats) + 1., dtype = "double")
line5, = axes.plot(stats.repeat(2)[1:], N.repeat(2)[:-1], 'k', linewidth = 2)
#
# legend
#
axes.legend((line5, line1, line4, line3, line2), (stats_label, r"Noise Model, $\langle N \rangle$", r"$\pm\sqrt{\langle N \rangle}$", r"$\pm 2\sqrt{\langle N \rangle}$", r"$\pm 3\sqrt{\langle N \rangle}$"), loc = "upper right")
#
# adjust bounds of plot
#
axes.set_xlim(xlim)
axes.set_ylim(ylim)
def RateVsThreshold(fapfar, zerolag_stats, background_stats):
zerolag_fig, axes = create_plot(r"$\ln \mathcal{L}$ Threshold", "Number of Events $\geq \ln \mathcal{L}$")
zerolag_stats = numpy.array(sorted(zerolag_stats, reverse = True))
create_rate_vs_lnL_plot(axes, zerolag_stats, r"Observed", fapfar)
axes.set_title(r"Event Count vs.\ Ranking Statistic Threshold")
background_fig, axes = create_plot(r"$\ln \mathcal{L}$ Threshold", "Number of Events $\geq \ln \mathcal{L}$")
background_stats = numpy.array(sorted(background_stats, reverse = True))
create_rate_vs_lnL_plot(axes, background_stats, r"Observed (time shifted)", fapfar)
axes.set_title(r"Event Count vs.\ Ranking Statistic Threshold (Closed Box)")
detection_threshold = zerolag_stats[0]
return zerolag_fig, background_fig, detection_threshold
#
# =============================================================================
#
# Efficeicny curve
#
# =============================================================================
#
def slope(x, y):
"""
From the x and y arrays, compute the slope at the x co-ordinates
using a first-order finite difference approximation.
"""
slope = numpy.zeros((len(x),), dtype = "double")
slope[0] = (y[1] - y[0]) / (x[1] - x[0])
for i in range(1, len(x) - 1):
slope[i] = (y[i + 1] - y[i - 1]) / (x[i + 1] - x[i - 1])
slope[-1] = (y[-1] - y[-2]) / (x[-1] - x[-2])
return slope
def upper_err(y, yerr, deltax):
z = y + yerr
deltax = int(deltax)
upper = numpy.zeros((len(yerr),), dtype = "double")
for i in range(len(yerr)):
upper[i] = max(z[max(i - deltax, 0) : min(i + deltax, len(z))])
return upper - y
def lower_err(y, yerr, deltax):
z = y - yerr
deltax = int(deltax)
lower = numpy.zeros((len(yerr),), dtype = "double")
for i in range(len(yerr)):
lower[i] = min(z[max(i - deltax, 0) : min(i + deltax, len(z))])
return y - lower
def render_from_bins(axes, efficiency_numerator, efficiency_denominator, cal_uncertainty, filter_width, colour = "k", erroralpha = 0.3, linestyle = "-"):
# extract array of x co-ordinates, and the factor by which x
# increases from one sample to the next.
(x,) = efficiency_denominator.centres()
x_factor_per_sample = efficiency_denominator.bins[0].delta
# compute the efficiency, the slope (units = efficiency per
# sample), the y uncertainty (units = efficiency) due to binomial
# counting fluctuations, and the x uncertainty (units = samples)
# due to the width of the smoothing filter.
eff = efficiency_numerator.array / efficiency_denominator.array
dydx = slope(numpy.arange(len(x), dtype = "double"), eff)
yerr = numpy.sqrt(eff * (1. - eff) / efficiency_denominator.array)
xerr = numpy.array([filter_width / 2.] * len(yerr))
# compute the net y err (units = efficiency) by (i) multiplying the
# x err by the slope, (ii) dividing the calibration uncertainty
# (units = percent) by the fractional change in x per sample and
# multiplying by the slope, (iii) adding the two in quadradure with
# the y err.
net_yerr = numpy.sqrt((xerr * dydx)**2. + yerr**2. + (cal_uncertainty / x_factor_per_sample * dydx)**2.)
# compute net xerr (units = percent) by dividing yerr by slope and
# then multiplying by the fractional change in x per sample.
net_xerr = net_yerr / dydx * x_factor_per_sample
# plot the efficiency curve and uncertainty region
#patch = patches.Polygon(zip(numpy.concatenate((x, x[::-1])), numpy.concatenate((eff + upper_err(eff, yerr, filter_width / 2.), (eff - lower_err(eff, yerr, filter_width / 2.))[::-1]))), edgecolor = colour, facecolor = colour, alpha = erroralpha)
#axes.add_patch(patch)
line, = axes.plot(x, eff, colour + linestyle)
# compute 50% point and its uncertainty
A50 = None #optimize.bisect(interpolate.interp1d(x, eff - 0.5), x[0], x[-1], xtol = 1e-40)
A50_err = None #interpolate.interp1d(x, net_xerr)(A50)
# print some analysis FIXME: this calculation needs attention
num_injections = efficiency_denominator.array.sum()
num_samples = len(efficiency_denominator.array)
#print("Bins were %g samples wide, ideal would have been %g" % (filter_width, (num_samples / num_injections / interpolate.interp1d(x, dydx)(A50)**2.0)**(1.0/3.0)), file=sys.stderr)
#print("Average number of injections in each bin = %g" % efficiency_denominator.array.mean(), file=sys.stderr)
return line, A50, A50_err
def create_efficiency_plot(axes, all_injections, found_injections, detection_threshold):
# formats
axes.semilogx()
axes.set_position([0.10, 0.150, 0.86, 0.77])
# set desired yticks
axes.set_yticks((0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
axes.set_yticklabels((r"\(0\)", r"\(0.1\)", r"\(0.2\)", r"\(0.3\)", r"\(0.4\)", r"\(0.5\)", r"\(0.6\)", r"\(0.7\)", r"\(0.8\)", r"\(0.9\)", r"\(1.0\)"))
axes.xaxis.grid(True, which = "major, minor")
axes.yaxis.grid(True, which = "major, minor")
# the denominator and numerator binning for the plot
efficiency_numerator = rate.BinnedArray(rate.NDBins((rate.LogarithmicBins(min(sim.amplitude for sim in all_injections), max(sim.amplitude for sim in all_injections), 400),)))
efficiency_denominator = rate.BinnedArray(efficiency_numerator.bins)
# add into the vins
for sim in found_injections:
efficiency_numerator[sim.amplitude,] += 1
for sim in all_injections:
efficiency_denominator[sim.amplitude,] += 1
# adjust window function normalization
filter_width = 10
windowfunc = rate.gaussian_window(filter_width)
windowfunc /= windowfunc[int((len(windowfunc) + 1) / 2)]
rate.filter_array(efficiency_numerator.array, windowfunc)
rate.filter_array(efficiency_denominator.array, windowfunc)
# regularize: adjust unused bins so that the efficiency is 0 avoid Nan
assert(efficiency_numerator.array <= efficiency_denominator.array).all()
efficiency_denominator.array[(efficiency_numerator.array == 0) & (efficiency_denominator.array == 0)] = 1
# FIXME: for output checking only
eff = rate.BinnedArray(efficiency_numerator.bins, array = efficiency_numerator.array / efficiency_denominator.array)
#print(eff, file = sys.stderr)
line1, A50, A50_error = render_from_bins(axes, efficiency_numerator, efficiency_denominator, 0.2, filter_width)
# add a legend to the axis
axes.legend((line1,), (r"\noindent Injections with $\log \Lambda > %.2f$" % detection_threshold,), loc = "lower right")
# adjust limits
axes.set_ylim([0.0, 1.0])
# return eff for true rate contour
return eff
def Efficiency(all_injections, found_injections, detection_threshold):
fig, axes = create_plot(r"Injection Amplitude ($\mathrm{J/m^2}$)", "Detection Efficiency")
eff = create_efficiency_plot(axes, all_injections, found_injections, detection_threshold)
axes.set_title(r"Detection Efficiency vs.\ Amplitude")
return fig, eff
#
# =============================================================================
#
# Constrains Contour
#
# =============================================================================
#
def create_truerate_contour(axes, power, eff):
# set beta and impact parameter axis
rate = numpy.zeros((300,300))
beta = 10.**numpy.linspace(0.01, 2., 300)
r = 10.**numpy.linspace(5., 9., 300)
print(eff, file = sys.stderr)
# calculate
for i in range(len(beta)):
for j in range(len(r)):
dE_dA = lalsimulation.SimBurstCherenkov_dE_dA(power, beta[i], r[j])
dE_dA = max(dE_dA, eff.bins[0].min)
dE_dA = min(dE_dA, eff.bins[0].max)
rate[i][j] = 3.890 / eff[dE_dA,]
# set color maps
norm = matplotlib.colors.LogNorm(vmin = 1., vmax = 10000.)
cmap = matplotlib.cm.viridis
cmap.set_bad("k")
mesh = axes.pcolormesh(beta, r, rate.T, norm = norm, cmap = cmap, shading = "gouraud")
contour = axes.contour(beta, r, rate.T, norm = norm, colors = "black", linestyle = "-", linewidth = .5, alpha = 1., levels = numpy.logspace(0., 3., 7, base = 10.))
fmt = matplotlib.ticker.LogFormatterMathtext()
axes.clabel(contour, contour.levels, colors = "black", fmt = fmt)
axes.loglog()
axes.grid(which = "both")
def Truerate(power, eff):
fig, axes = create_plot(r"$\beta$", "Impact parameter(metres)")
create_truerate_contour(axes, power, eff)
axes.set_title(r"90\%% Confidence Rate Upper Limit for NCC-1701-D Enterprise ($P=%s$ W)" % gstlalplotutil.latexnumber("%.4g" % power))
return fig
#
# =============================================================================
#
# Main
#
# =============================================================================
#
#
# Parse command line
#
options, urls = parse_command_line()
#
# load ranking statistic PDFs
#
rankingstatpdf = cherenkov_rankingstat.RankingStatPDF.from_xml(ligolw_utils.load_filename(options.ranking_stat_pdf, verbose = options.verbose, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler), "gstlal_cherenkov_rankingstat_pdf")
fapfar = FAPFAR(rankingstatpdf)
#
# plot PDFs
#
fig = rankingstat_pdfs_plot(rankingstatpdf)
for ext in (".png", ".pdf"):
filename = "rankingstat_pdfs"
if options.verbose:
print("writing %s%s ..." % (filename, ext), file = sys.stderr)
fig.savefig("%s%s" % (filename, ext))
#
# load zero-lag candidates.
#
zerolag_stats = []
background_stats = []
for n, url in enumerate(urls, 1):
if options.verbose:
print("%d/%d: " % (n, len(urls)), end = "", file = sys.stderr)
xmldoc = ligolw_utils.load_url(url, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler, verbose = options.verbose)
if is_injection_document(xmldoc):
continue
coinc_def_id = lsctables.CoincDefTable.get_table(xmldoc).get_coinc_def_id(search = cherenkov.CherenkovBBCoincDef.search, search_coinc_type = cherenkov.CherenkovBBCoincDef.search_coinc_type, create_new = False)
zl_time_slide_ids = frozenset(time_slide_id for time_slide_id, offsetvector in lsctables.TimeSlideTable.get_table(xmldoc).as_dict().items() if not any(offsetvector.values()))
for coinc in lsctables.CoincTable.get_table(xmldoc):
if coinc.coinc_def_id != coinc_def_id:
continue
if coinc.time_slide_id in zl_time_slide_ids:
zerolag_stats.append(coinc.likelihood)
else:
background_stats.append(coinc.likelihood)
if options.verbose:
print("now %d background events" % len(background_stats), file = sys.stderr)
#
# rate_vs_thresh "expected from noise model"
#
rate_vs_thresh_zl, rate_vs_thresh_bg, threshold = RateVsThreshold(fapfar, zerolag_stats, background_stats)
for filename, fig in [("rate_vs_threshold_open", rate_vs_thresh_zl), ("rate_vs_threshold", rate_vs_thresh_bg)]:
for ext in (".png", ".pdf"):
if options.verbose:
print("writing %s%s ..." % (filename, ext), file = sys.stderr)
fig.savefig("%s%s" % (filename, ext))
#
# load injection files
#
all_injections = []
found_injections = []
is_injection = False
for n, url in enumerate(urls, 1):
if options.verbose:
print("%d/%d: " % (n, len(urls)), end = "", file = sys.stderr)
xmldoc = ligolw_utils.load_url(url, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler, verbose = options.verbose)
if not is_injection_document(xmldoc):
continue
is_injection = True
coinc_def_id = lsctables.CoincDefTable.get_table(xmldoc).get_coinc_def_id(search = cherenkov.CherenkovBBCoincDef.search, search_coinc_type = cherenkov.CherenkovBBCoincDef.search_coinc_type, create_new = False)
lnL_index = dict((coinc.coinc_event_id, coinc.likelihood) for coinc in lsctables.CoincTable.get_table(xmldoc) if coinc.coinc_def_id == coinc_def_id)
coinc_def_id = lsctables.CoincDefTable.get_table(xmldoc).get_coinc_def_id(search = binjfind.CherenkovSBCNearCoincDef.search, search_coinc_type = binjfind.CherenkovSBCNearCoincDef.search_coinc_type, create_new = False)
coinc_event_ids = frozenset(coinc.coinc_event_id for coinc in lsctables.CoincTable.get_table(xmldoc) if coinc.coinc_def_id == coinc_def_id)
sim_lnL_index = {}
for coinc_event_id, rows in itertools.groupby(sorted(lsctables.CoincMapTable.get_table(xmldoc), key = lambda row: row.coinc_event_id), lambda row: row.coinc_event_id):
if coinc_event_id not in coinc_event_ids:
continue
rows = tuple(rows)
simulation_id, = (row.event_id for row in rows if row.table_name == "sim_burst")
sim_lnL_index[simulation_id] = max(lnL_index[row.event_id] for row in rows if row.table_name == "coinc_event")
for sim in lsctables.SimBurstTable.get_table(xmldoc):
#FIXME: In the injection generation process we picked the time when two detectors on it is better if we can check again in here.
all_injections.append(sim)
if sim_lnL_index.get(sim.simulation_id, float("-inf")) >= threshold:
found_injections.append(sim)
if is_injection:
efficiencyfig, eff = Efficiency(all_injections, found_injections, threshold)
for ext in (".png", ".pdf"):
if options.verbose:
print("writing Efficiency%s with threshold of %s" % (ext,threshold), file = sys.stderr)
efficiencyfig.savefig("Efficiency%s" % ext)
if options.power is not None:
rate = Truerate(options.power, eff)
for ext in (".png", ".pdf"):
if options.verbose:
print("writing True rate contour graph", file = sys.stderr)
rate.savefig("Truerate%s" % ext)
#!/usr/bin/env python3
#
# Copyright (C) 2010--2015 Kipp Cannon, Chad Hanna
# Copyright (C) 2021 Soichiro Kuwawhara
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
### A program to compute the noise probability distributions of likehood ratios for inspiral triggers
#
# =============================================================================
#
# Preamble
#
# =============================================================================
#
from optparse import OptionParser
import sys
from ligo.lw import ligolw
from ligo.lw import lsctables
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import process as ligolw_process
from lal.utils import CacheEntry
from gstlal import cherenkov
from gstlal.cherenkov import rankingstat as cherenkov_rankingstat
__author__ = "Soichiro Kuwara <soichiro.kuwahara@ligo.org>"
#
# =============================================================================
#
# Command Line
#
# =============================================================================
#
def parse_command_line():
parser = OptionParser(
description = "Rankngstat calculation program for Cherenkov burst search.",
usage = "%prog [options] [candidatesxml ...]"
)
parser.add_option("--candidates-cache", metavar = "filename", help = "Also load the candidates from files listed in this LAL cache. See lalapps_path2cache for information on how to produce a LAL cache file.")
parser.add_option("-v", "--verbose", action = "store_true", help = "Be verbose.")
parser.add_option("--ranking-stat-pdf", metavar = "filename", help = "Write zero-lag ranking statistic PDF to this file. Must contain exactly one Cherenkov burst ranking statistic PDF object.")
parser.add_option("--is-timeshifted", action = "store_true", help = "Whether the open(zerolag) or closed(time shifted) box.")
options, urls = parser.parse_args()
paramdict = options.__dict__.copy()
if options.candidates_cache is not None:
urls += [CacheEntry(line).url for line in open(options.candidates_cache)]
if not urls:
raise ValueError("must provide some candidate files")
if options.ranking_stat_pdf is None:
raise ValueError("must set --ranking-stat-pdf")
return options, urls, paramdict
#
# =============================================================================
#
# Main
#
# =============================================================================
#
#
# command line
#
options, urls, paramdict = parse_command_line()
#
# load ranking statistic PDF
#
# FIXME: it would be better to preserve all the contents of the original file instead of just the PDF object
rankingstatpdf = cherenkov_rankingstat.RankingStatPDF.from_xml(ligolw_utils.load_filename(options.ranking_stat_pdf, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler, verbose = options.verbose), "gstlal_cherenkov_rankingstat_pdf")
#
# zero the zero-lag ranking statistic PDF
#
rankingstatpdf.zl_lr_lnpdf.array[:] = 0.
#
# load zero-lag candidates and histogram their ranking statistics
#
for n, url in enumerate(urls, 1):
if options.verbose:
print("%d/%d: " % (n, len(urls)), end = "", file = sys.stderr)
xmldoc = ligolw_utils.load_url(url, contenthandler = cherenkov_rankingstat.LIGOLWContentHandler, verbose = options.verbose)
coinc_def_id = lsctables.CoincDefTable.get_table(xmldoc).get_coinc_def_id(search = cherenkov.CherenkovBBCoincDef.search, search_coinc_type = cherenkov.CherenkovBBCoincDef.search_coinc_type, create_new = False)
if options.is_timeshifted:
zl_time_slide_ids = frozenset(time_slide_id for time_slide_id, offsetvector in lsctables.TimeSlideTable.get_table(xmldoc).as_dict().items() if any(offsetvector.values()))
else:
zl_time_slide_ids = frozenset(time_slide_id for time_slide_id, offsetvector in lsctables.TimeSlideTable.get_table(xmldoc).as_dict().items() if not any(offsetvector.values()))
for coinc in lsctables.CoincTable.get_table(xmldoc):
if coinc.coinc_def_id == coinc_def_id and coinc.time_slide_id in zl_time_slide_ids:
rankingstatpdf.zl_lr_lnpdf.count[coinc.likelihood,] += 1
#
# apply density estimation kernel to zero-lag counts
#
rankingstatpdf.density_estimate(rankingstatpdf.zl_lr_lnpdf, "zero-lag")
rankingstatpdf.zl_lr_lnpdf.normalize()
#
# Write the parameter and ranking statistic distribution data to a file
#
xmldoc = ligolw.Document()
xmldoc.appendChild(ligolw.LIGO_LW())
process = ligolw_process.register_to_xmldoc(xmldoc, "gstlal_cherenkov_calc_rank_pdfs", paramdict = paramdict)
xmldoc.childNodes[-1].appendChild(rankingstatpdf.to_xml("gstlal_cherenkov_rankingstat_pdf"))
process.set_end_time_now()
ligolw_utils.write_filename(xmldoc, options.ranking_stat_pdf, verbose = options.verbose)
#!/usr/bin/env python
#!/usr/bin/env python3
import sys
import math
import numpy
import sys
import threading
import gi
gi.require_version('Gst','1.0')
from gi.repository import GObject
......@@ -14,15 +16,22 @@ from gstlal import pipeio
from gstlal import pipeparts
from gstlal import simplehandler
from gstlal import snglbursttable
from lal import LIGOTimeGPS
from gstlal import streamburca
from gstlal.psd import interpolate_psd, read_psd
from optparse import OptionParser
from ligo import segments
from ligo.lw.utils import segments as ligolw_segments
from ligo.lw.utils import ligolw_add
from ligo.lw.utils import process as ligolw_process
from ligo.lw import ligolw
from ligo.lw import lsctables
from ligo.lw import utils as ligolw_utils
from ligo.lw.utils import process as ligolw_process
import lal
from lal import series
from lal import LIGOTimeGPS
from lalburst import stringutils
import lalsimulation
......@@ -41,34 +50,41 @@ def parse_command_line():
)
parser.add_option("--sample-rate", metavar = "rate", type = "float", help = "Desired sample rate (Hz).")
parser.add_option("--frame-cache", metavar = "filename", help = "The frame cache file to load as input data.")
parser.add_option("--output", metavar = "filename", help = "Name of output xml file.")
parser.add_option("--injection-file", metavar = "filename", help = "Name of xml injection file.")
parser.add_option("--channel", metavar = "channel", type = "string",help = "Name of channel.")
parser.add_option("--template-bank", metavar = "filename", help = "Name of template file.")
parser.add_option("--frame-cache", metavar = "filename", help = "Frame cache file to load as input data.")
parser.add_option("--reference-psd", metavar = "filename", help = "Reference psd files as input to obtain the template and SNR. Can be given for multiple detectors, but must be in one file. If not given, the PSD will be measured on the fly, which can prevent loss of sensitivity for not using the most recent PSD. However, there will be some burn-in time where the data will not be analyzed until the PSD converges.")
parser.add_option("--output", metavar = "filename", help = "Name of output xml file to record candidate events.")
parser.add_option("--rankingstat-output", metavar = "filename", help = "Name of output xml file to record rankingstat objects.")
parser.add_option("--segments-file", metavar = "filename", help = "Set the name of the LIGO Light-Weight XML file with segment lists that are science mode, for the trigger generator to enable gating. See also --segments-name.")
parser.add_option("--segments-name", metavar = "name", help = "Set the name of the segment lists to retrieve from the segments file. See also --segments-file.")
parser.add_option("--vetoes-file", metavar = "filename", help = "Set the name of the LIGO Light-Weight XML file with segment lists that are vetoed, for the trigger generator to enable gating. See also --vetoes-name.")
parser.add_option("--vetoes-name", metavar = "name", help = "Set the name of the veto segment lists to retrieve from the veto segments file. See also --vetoes-file.")
parser.add_option("--injection-file", metavar = "filename", help = "Name of xml file with injections.")
parser.add_option("--time-slide-file", metavar = "filename", help = "Name of xml file with time slides for each detector.")
parser.add_option("--channel", metavar = "channel", action = "append", type = "string", help = "Name of channel. Can be given multiple inputs, but must be one for each detector.")
parser.add_option("--template-bank", metavar = "filename", action = "append", help = "Name of template file. Template bank for all the detectors involved should be given.")
parser.add_option("--gps-start-time", metavar = "start_time", type = "int", help = "GPS start time.")
parser.add_option("--gps-end-time", metavar = "end_time", type = "int", help = "GPS end time.")
parser.add_option("--threshold", metavar = "snr_threshold", type = "float", help = "SNR threshold.")
parser.add_option("--cluster-events", metavar = "cluster_events", type = "float", help = "Cluster events with input timescale.")
parser.add_option("--cluster-events", metavar = "cluster_events", type = "float", help = "Cluster events with input timescale (in seconds).")
parser.add_option("--user-tag", metavar = "user_tag", type = "string", help = "User tag set in the search summary and process tables")
parser.add_option("--delta-t", metavar = "delta_t", type = "float", default = 0.008, help = "Maximum time difference in seconds for coincidence, excluding the light-travel time between the detectors. Default: 0.008")
parser.add_option("--verbose", action = "store_true", help = "Be verbose.")
options, filenames = parser.parse_args()
required_options = ["sample_rate", "frame_cache", "output", "channel", "template_bank", "gps_start_time", "gps_end_time", "threshold", "cluster_events"]
required_options = ["sample_rate", "frame_cache", "output", "rankingstat_output", "time_slide_file", "channel", "template_bank", "gps_start_time", "gps_end_time", "threshold", "cluster_events"]
missing_options = [option for option in required_options if getattr(options, option) is None]
if missing_options:
raise ValueError, "missing required options %s" % ", ".join(sorted("--%s" % option.replace("_", "-") for option in missing_options))
raise ValueError("missing required options %s" % ", ".join(sorted("--%s" % option.replace("_", "-") for option in missing_options)))
if len(options.template_bank) != len(options.channel):
raise ValueError("number of --template-bank options must equal number of --channel options")
if options.segments_file is not None and options.segments_name is None:
raise ValueError("segments name should be specified for the input segments file")
if options.vetoes_file is not None and options.vetoes_name is None:
raise ValueError("vetoes name should be specified for the input vetoes file")
return options, filenames
#
# ================================================================================
#
# Main
#
# ================================================================================
#
#
# parse command line
......@@ -78,205 +94,318 @@ options, filenames = parse_command_line()
#
# handler for obtaining psd
# handler for updating templates using psd and putting triggers for coincidence
#
class PipelineHandler(simplehandler.Handler):
def __init__(self, mainloop, pipeline, template_bank, firbank):
def __init__(self, mainloop, pipeline, rankingstat, xmldoc, template_banks, sngl_burst, reference_psds, whitens, firbanks, triggergens, is_noninjections):
simplehandler.Handler.__init__(self, mainloop, pipeline)
self.template_bank = template_bank
self.firbank = firbank
self.triggergen = triggergen
# use central_freq to uniquely identify templates
self.sigma = dict((row.central_freq, 0.0) for row in template_bank)
# counter for controlling how often we update PSD
self.update_psd = 0
self.lock = threading.Lock()
self.rankingstat = rankingstat
self.template_bank = template_banks
self.sngl_burst = sngl_burst
self.whitens = whitens
self.firbank = firbanks
self.triggergen = triggergens
self.is_noninjections = is_noninjections
# template normalization. use central_freq to uniquely identify templates
self.sigma = {ifo: dict((row.central_freq, 0.0) for row in template_banks[ifo]) for ifo in template_banks.keys()}
# horizon distance
self.horizon_distance = None
# for PSD
self.update_psd = dict.fromkeys(triggergens, 0)
self.reference_psd = reference_psds
# create a StreamBurca instance, initialized with the XML document and the coincidence parameters
self.streamburca = streamburca.StreamBurca(xmldoc, process.process_id, options.delta_t, BBCoincDef = streamburca.burca.StringCuspBBCoincDef, min_instruments = 2, verbose = options.verbose)
def appsink_new_buffer(self, elem):
buf = elem.emit("pull-sample").get_buffer()
events = []
for i in range(buf.n_memory()):
memory = buf.peek_memory(i)
result, mapinfo = memory.map(Gst.MapFlags.READ)
assert result
if mapinfo.data:
events.extend(snglbursttable.GSTLALSnglBurst.from_buffer(mapinfo.data))
memory.unmap(mapinfo)
# put info of each event in the sngl burst table
for event in events:
event.process_id = process.process_id
event.event_id = sngl_burst_table.get_next_id()
event.amplitude = event.snr / self.sigma[event.central_freq]
sngl_burst_table.append(event)
# event counter
search_summary.nevents += 1
with self.lock:
buf = elem.emit("pull-sample").get_buffer()
events = []
for i in range(buf.n_memory()):
memory = buf.peek_memory(i)
result, mapinfo = memory.map(Gst.MapFlags.READ)
assert result
if mapinfo.data:
# FIXME: gst-python 1.18 returns a memoryview
# instead of a read-only bytes-like object, so
# cast to bytes. this is likely inefficient but
# a proper solution will require .from_buffer()
# to leverage the buffer protocol instead
events.extend(snglbursttable.GSTLALSnglBurst.from_buffer(bytes(mapinfo.data)))
memory.unmap(mapinfo)
# get ifo from the appsink name property
instrument = elem.get_property("name")
# calculalte event amplitude using sigma
for event in events:
event.process_id = process.process_id
event.event_id = self.sngl_burst.get_next_id()
event.amplitude = event.snr / self.sigma[instrument][event.central_freq]
# extract segment. move the segment's upper
# boundary to include all triggers.
buf_timestamp = LIGOTimeGPS(0, buf.pts)
if buf.mini_object.flags & Gst.BufferFlags.GAP:
# sanity check that gap buffers are empty
assert not events
# the horizon distance is zero at this timestamp
#self.record_horizon_distance(instrument, float(buf_timestamp), 0.0)
else:
# the horizon distance is non-zero at this timestamp
#self.record_horizon_distance(instrument, float(timestamp), self.horizon_distance)
# update trigger rate and livetime
buf_seg = segments.segment(buf_timestamp, max(buf_timestamp + LIGOTimeGPS(0, buf.duration), max(event.peak for event in events if event.ifo == instrument) if events else 0.0))
self.rankingstat.denominator.triggerrates[instrument].add_ratebin(buf_seg, len(events))
# put info of each event in the sngl burst table
if options.verbose:
print("at", buf_timestamp, "got", len(events), "triggers in", instrument, file=sys.stderr)
# push the single detector triggers into the StreamBurca instance
# the push method returns True if the coincidence engine has new results. in that case, call the pull() method to run the coincidence engine.
if events:
if self.streamburca.push(instrument, events, buf_timestamp):
self.streamburca.pull(self.rankingstat, self.rankingstat.denominator.triggerrates.segmentlistdict(), noninjections = self.is_noninjections)
def flush(self):
with self.lock:
# leftover triggers
self.streamburca.pull(self.rankingstat, self.rankingstat.denominator.triggerrates.segmentlistdict(), noninjections = self.is_noninjections, flush = True)
def update_templates(self, instrument, psd):
template_t = [None] * len(self.template_bank[instrument])
autocorr = [None] * len(self.template_bank[instrument])
# make templates, whiten, put into firbank
# NOTE Currently works only for cusps. this for-loop needs to be revisited when searching for other sources (kinks, ...)
for i, row in enumerate(self.template_bank[instrument]):
# Obtain cusp waveform. A cusp signal is linearly polarized, so just use plus mode time series
template_t[i], _ = lalsimulation.GenerateStringCusp(1.0,row.central_freq,1.0/options.sample_rate)
# zero-pad it to 32 seconds to obtain same deltaF as the PSD
# we have to make the number of samples in the template odd, but if we do that here deltaF of freq domain template will be different from psd's deltaF, and whitening cannot be done. So we keep it exactly 32 seconds, and after getting a whitened template we add a sample of 0 in the tail.
template_t[i] = lal.ResizeREAL8TimeSeries(template_t[i], -int(32*options.sample_rate - template_t[i].data.length) // 2, int(32*options.sample_rate))
# setup of frequency domain
length = template_t[i].data.length
duration = float(length) / options.sample_rate
epoch = - float(length // 2) / options.sample_rate
template_f = lal.CreateCOMPLEX16FrequencySeries("template_freq", LIGOTimeGPS(epoch), psd.f0, 1.0/duration, lal.Unit("strain s"), length // 2 + 1)
fplan = lal.CreateForwardREAL8FFTPlan(length,0)
# FFT to frequency domain
lal.REAL8TimeFreqFFT(template_f,template_t[i],fplan)
# set DC and Nyquist to zero
template_f.data.data[0] = 0.0
template_f.data.data[template_f.data.length-1] = 0.0
# whiten
if template_f.deltaF != psd.deltaF:
if options.verbose:
print("interpolating psd...", file=sys.stderr)
psd_interp = interpolate_psd(psd, template_f.deltaF)
template_f = lal.WhitenCOMPLEX16FrequencySeries(template_f,psd_interp)
else:
template_f = lal.WhitenCOMPLEX16FrequencySeries(template_f,psd)
# Obtain the normalization for getting the amplitude of signal from SNR
# Integrate over frequency range covered by template. Note that template_f is already whitened.
sigmasq = 0.0
sigmasq = numpy.trapz(4.0 * template_f.data.data**2, dx = psd.deltaF)
self.sigma[instrument][row.central_freq] = numpy.sqrt(sigmasq.real)
# obtain autocorr time series by squaring template and inverse FFT it
template_f_squared = lal.CreateCOMPLEX16FrequencySeries("whitened template_freq squared", LIGOTimeGPS(epoch), psd.f0, 1.0/duration, lal.Unit("strain s"), length // 2 + 1)
autocorr_t = lal.CreateREAL8TimeSeries("autocorr_time", LIGOTimeGPS(epoch), psd.f0, 1.0 / options.sample_rate, lal.Unit("strain"), length)
rplan = lal.CreateReverseREAL8FFTPlan(length, 0)
template_f_squared.data.data = abs(template_f.data.data)**2
lal.REAL8FreqTimeFFT(autocorr_t,template_f_squared,rplan)
# normalize autocorrelation by central (maximum) value
autocorr_t.data.data /= numpy.max(autocorr_t.data.data)
autocorr_t = autocorr_t.data.data
max_index = numpy.argmax(autocorr_t)
# find the index of the third extremum for the template with lowest high-f cutoff.
# we don't want to do this for all templates, because we know that
# the template with the lowest high-f cutoff will have the largest chi2_index.
if i == 0:
extr_ctr = 0
chi2_index = 0
for j in range(max_index+1, len(autocorr_t)):
slope1 = autocorr_t[j+1] - autocorr_t[j]
slope0 = autocorr_t[j] - autocorr_t[j-1]
chi2_index += 1
if(slope1 * slope0 < 0):
extr_ctr += 1
if(extr_ctr == 2):
break
assert extr_ctr == 2, 'could not find 3rd extremum'
# extract the part within the third extremum, setting the peak to be the center.
autocorr[i] = numpy.concatenate((autocorr_t[1:(chi2_index+1)][::-1], autocorr_t[:(chi2_index+1)]))
assert len(autocorr[i])%2==1, 'autocorr must have odd number of samples'
# Inverse FFT template bank back to time domain
template_t[i] = lal.CreateREAL8TimeSeries("whitened template_time", LIGOTimeGPS(epoch), psd.f0, 1.0 / options.sample_rate, lal.Unit("strain"), length)
lal.REAL8FreqTimeFFT(template_t[i],template_f,rplan)
# normalize
template_t[i] = template_t[i].data.data
template_t[i] /= numpy.sqrt(numpy.dot(template_t[i], template_t[i]))
# to make the sample number odd we add 1 sample in the end here
template_t[i] = numpy.append(template_t[i], 0.0)
assert len(template_t[i])%2==1, 'template must have odd number of samples'
self.firbank[instrument].set_property("latency", (len(template_t[0]) - 1) // 2)
self.firbank[instrument].set_property("fir_matrix", template_t)
self.triggergen[instrument].set_property("autocorrelation_matrix", autocorr)
def do_on_message(self, bus, message):
if message.type == Gst.MessageType.ELEMENT and message.get_structure().get_name() == "spectrum":
instrument = message.src.get_name().split("_")[-1]
psd = pipeio.parse_spectrum_message(message)
timestamp = psd.epoch
if self.reference_psd is None:
psd = pipeio.parse_spectrum_message(message)
timestamp = psd.epoch
else:
psd = self.reference_psd[instrument]
timestamp = psd.epoch
deltaf = self.whitens[instrument].get_property("delta-f")
psd_interp = interpolate_psd(psd, deltaf)
self.whitens[instrument].set_property("psd-mode", 1)
self.whitens[instrument].set_property("mean-psd", psd_interp.data.data)
stability = float(message.src.get_property("n-samples")) / message.src.get_property("average-samples")
if stability > 0.3:
if self.update_psd > 0:
# do nothing, just decrease the counter
self.update_psd -= 1
# the logic should be neater here, but this is hoped to be temporary
# until we wipe out everything when finishing transition to offline way
if stability > 0.3 or self.reference_psd is not None:
if self.update_psd[instrument] != 0:
# don't update the PSD, just decrease the counter
self.update_psd[instrument] -= 1
else:
# PSD counter reached zero
print >> sys.stderr, "At GPS time", timestamp, "updating PSD"
# NOTE this initialization determines how often the PSD gets updated. This should be given by the user, or we can think of other fancier updates.
self.update_psd = 10
template_t = [None] * len(self.template_bank)
autocorr = [None] * len(self.template_bank)
# make templates, whiten, put into firbank
# FIXME Currently works only for cusps. this for-loop needs to be revisited when searching for other sources (kinks, ...)
for i, row in enumerate(self.template_bank):
# Obtain cusp waveform. A cusp signal is linearly polarized, so just use plus mode time series
template_t[i], _ = lalsimulation.GenerateStringCusp(1.0,row.central_freq,1.0/options.sample_rate)
# zero-pad it to 32 seconds to obtain same deltaF as the PSD
template_t[i] = lal.ResizeREAL8TimeSeries(template_t[i],-int(32*options.sample_rate - template_t[i].data.length)//2,int(32*options.sample_rate))
# setup of frequency domain
length = template_t[i].data.length
duration = float(length) / options.sample_rate
epoch = -(length-1) // 2 / options.sample_rate
template_f = lal.CreateCOMPLEX16FrequencySeries("template_freq", LIGOTimeGPS(epoch), psd.f0, 1.0/duration, lal.Unit("strain s"), length // 2 + 1)
fplan = lal.CreateForwardREAL8FFTPlan(length,0)
# FFT to frequency domain
lal.REAL8TimeFreqFFT(template_f,template_t[i],fplan)
# set DC and Nyquist to zero
template_f.data.data[0] = 0.0
template_f.data.data[template_f.data.length-1] = 0.0
# whiten
assert template_f.deltaF == psd.deltaF, "freq interval not same between template and PSD"
template_f = lal.WhitenCOMPLEX16FrequencySeries(template_f,psd)
# Obtain the normalization for getting the amplitude of signal from SNR
# Integrate over frequency range covered by template. Note that template_f is already whitened.
sigmasq = 0.0
sigmasq = numpy.trapz(4.0 * template_f.data.data**2, dx = psd.deltaF)
self.sigma[row.central_freq] = numpy.sqrt(sigmasq.real)
# obtain autocorr time series by squaring template and inverse FFT it
template_f_squared = lal.CreateCOMPLEX16FrequencySeries("whitened template_freq squared", LIGOTimeGPS(epoch), psd.f0, 1.0/duration, lal.Unit("s"), length // 2 + 1)
autocorr_t = lal.CreateREAL8TimeSeries("autocorr_time", LIGOTimeGPS(epoch), psd.f0, 1.0 / options.sample_rate, lal.Unit("strain"), length)
rplan = lal.CreateReverseREAL8FFTPlan(length,0)
template_f_squared.data.data = abs(template_f.data.data)**2
lal.REAL8FreqTimeFFT(autocorr_t,template_f_squared,rplan)
# normalize autocorrelation by central (maximum) value
autocorr_t.data.data /= numpy.max(autocorr_t.data.data)
autocorr_t = autocorr_t.data.data
max_index = numpy.argmax(autocorr_t)
# find the index of the third extremum for the template with lowest high-f cutoff.
# we don't want to do this for all templates, because we know that
# the template with the lowest high-f cutoff will have the largest chi2_index.
if i == 0:
extr_ctr = 0
chi2_index = 0
for j in range(max_index+1, len(autocorr_t)):
slope1 = autocorr_t[j+1] - autocorr_t[j]
slope0 = autocorr_t[j] - autocorr_t[j-1]
chi2_index += 1
if(slope1 * slope0 < 0):
extr_ctr += 1
if(extr_ctr == 2):
break
assert extr_ctr == 2, 'could not find 3rd extremum'
# extract the part within the third extremum, setting the peak to be the center.
autocorr[i] = numpy.concatenate((autocorr_t[1:(chi2_index+1)][::-1], autocorr_t[:(chi2_index+1)]))
assert len(autocorr[i])%2==1, 'autocorr must have odd number of samples'
# Inverse FFT template bank back to time domain
template_t[i] = lal.CreateREAL8TimeSeries("whitened template time", LIGOTimeGPS(epoch), psd.f0, 1.0 / options.sample_rate, lal.Unit("strain"), length)
lal.REAL8FreqTimeFFT(template_t[i],template_f,rplan)
# normalize
template_t[i].data.data /= numpy.sqrt(numpy.dot(template_t[i].data.data, template_t[i].data.data))
template_t[i] = template_t[i].data.data
firbank.set_property("latency", -(len(template_t[0]) - 1) // 2)
firbank.set_property("fir_matrix", template_t)
triggergen.set_property("autocorrelation_matrix", autocorr)
self.firbank = firbank
self.triggergen = triggergen
if options.verbose:
print("setting whitened templates for", instrument, file=sys.stderr)
# if you don't give the reference psd, how often psd is updated is decided by the integer given here. Larger number, less often.
# if you give the reference psd, you need to make the template banks only once, so make the counter negative
if self.reference_psd is None:
self.update_psd[instrument] = 10
else:
self.update_psd[instrument] = -1
self.update_templates(instrument, psd)
# record horizon distance for this timestamp.
# If the signal is h(f) = A f^(-4/3) (f_l < f < f_h),
# SNR is proportional to A and sigma (template norm. factor).
# (See Siemens+ 06, PRD, 73, 105001 for details.)
# Horizon distance is defined as mean distance where a signal
# of fixed amplitude (e.g. 1.4-1.4 BNS) can be seen with SNR 8.
# Thus the horizon distance is inversely proportional to the
# template normalization parameter sigma.
# The normalization of the horizon distance can be arbitrary
# since only the relative difference of sensitivity (PSD) at
# different times/detectors is important.
# Since sigma is order 10^(21), we use sigma for the template
# with the highest high-f cutoff, and define the horizon distance
# to be sigma/10^(19) Mpc, a familiar number. This definition
# should be consistent in other places where this quantity is used.
self.horizon_distance = self.sigma[instrument][max(self.sigma[instrument].keys())]/1.e19
assert not (math.isnan(self.horizon_distance) or math.isinf(self.horizon_distance))
if options.verbose:
print("horizon distance for", instrument, ":", self.horizon_distance, file=sys.stderr)
else:
# use templates with all zeros during burn-in period, that way we won't get any triggers.
print >> sys.stderr, "At GPS time", timestamp, "burn in period"
template = [None] * len(self.template_bank)
autocorr = [None] * len(self.template_bank)
for i, row in enumerate(self.template_bank):
template[i], _ = lalsimulation.GenerateStringCusp(1.0,30,1.0/options.sample_rate)
template[i] = lal.ResizeREAL8TimeSeries(template[i], -int(32*options.sample_rate - template[i].data.length)//2 ,int(32*options.sample_rate))
template[i] = template[i].data.data
template[i] *= 0.0
# Set autocorrealtion to zero vectors as well.
# The length is set to be similar to that obtained when the PSD is stable, but probably the length doesn't matter
# Burn-in period. Use templates with all zeros so that we won't get any triggers.
if options.verbose:
print( "At GPS time", timestamp, "burn in period", file=sys.stderr)
template = [None] * len(self.template_bank[instrument])
autocorr = [None] * len(self.template_bank[instrument])
for i, row in enumerate(self.template_bank[instrument]):
template[i] = numpy.zeros(int(32*options.sample_rate+1))
# The length of autocorr is set to be similar to that for non-zero templates, but probably the length doesn't matter
autocorr[i] = numpy.zeros(403)
firbank.set_property("latency",-(len(template[0]) - 1) // 2)
firbank.set_property("fir_matrix", template)
triggergen.set_property("autocorrelation_matrix", autocorr)
self.firbank = firbank
self.triggergen = triggergen
self.firbank[instrument].set_property("latency", (len(template[0]) - 1) // 2)
self.firbank[instrument].set_property("fir_matrix", template)
self.triggergen[instrument].set_property("autocorrelation_matrix", autocorr)
# Since the whitener's average is not satisfactorily converged,
# claim horizon distance to be zero.
self.horizon_distance = 0.
return True
return False
#
# get data and insert injections if injection file is given
# =============================================================================
#
# Input and output files
#
# =============================================================================
#
pipeline = Gst.Pipeline(name="pipeline")
head = pipeparts.mklalcachesrc(pipeline, options.frame_cache)
head = pipeparts.mkframecppchanneldemux(pipeline, head)
pipeparts.framecpp_channeldemux_set_units(head, {options.channel:"strain"})
#
# from the given channels make a dict like {"H1":"H1:channelname", "L1":"L1:channelname", ...}
# so that we can easily obtain channel names valid for demuxer etc., and there is easy mapping with the psd for each IFO
#
elem = pipeparts.mkaudioconvert(pipeline, None)
pipeparts.src_deferred_link(head, options.channel, elem.get_static_pad("sink"))
head = elem
channel_dict = dict((channel.split(':')[0], channel) for channel in options.channel)
all_ifos = channel_dict.keys()
#
# injections
# load reference psds (if there are files given), and sort by instruments
# this gives a dictionary similar to one above like {"H1":"freq series", "L1":"freq series", ...}
#
if options.injection_file is not None:
head = pipeparts.mkinjections(pipeline, head, options.injection_file)
if options.reference_psd is not None:
psd = read_psd(options.reference_psd, verbose=options.verbose)
# if one is giving reference psds, make sure it covers all detectors
# that participate in this job
assert set(all_ifos).issubset(set(psd.keys())), 'missing detector in PSD'
else:
psd = None
#
# whiten
#
# delete the reference_psd to save memory
del options.reference_psd
head = pipeparts.mkwhiten(pipeline, head, fft_length = 32)
@lsctables.use_in
class LIGOLWContentHandler(ligolw.LIGOLWContentHandler):
pass
#
# resampler and caps filter
# load the segment file with specific segment name (if there is one) for gating
#
head = pipeparts.mkaudioconvert(pipeline,head)
head = pipeparts.mkresample(pipeline,head)
# FIXME check later if it's okay for filters that are exactly half of sample rate.
# FIXME NO hardcoding original sample rate!
head = pipeparts.mkaudioamplify(pipeline,head,1./numpy.sqrt(options.sample_rate/16384.0))
head = pipeparts.mkcapsfilter(pipeline,head,"audio/x-raw, format=F32LE, rate=%d" % options.sample_rate)
head = pipeparts.mkqueue(pipeline,head)
if options.segments_file is not None:
seglists = ligolw_segments.segmenttable_get_by_name(ligolw_utils.load_filename(options.segments_file, contenthandler = ligolw_segments.LIGOLWContentHandler, verbose = options.verbose), options.segments_name).coalesce()
# seglist contains all the ifos that participate in the whole search.
assert set(all_ifos).issubset(set(seglists.keys())), 'unknown ifo given'
for ifo in all_ifos:
seglists[ifo] &= segments.segmentlist([segments.segment(LIGOTimeGPS(options.gps_start_time), LIGOTimeGPS(options.gps_end_time))])
#
# load xml file and find single burst table
# load the vetoes file too (if there is one)
#
@lsctables.use_in
class LIGOLWContentHandler(ligolw.LIGOLWContentHandler):
pass
xmldoc = ligolw_utils.load_filename(options.template_bank, contenthandler = LIGOLWContentHandler, verbose = True)
template_bank_table = lsctables.SnglBurstTable.get_table(xmldoc)
if options.vetoes_file is not None:
vetolists = ligolw_segments.segmenttable_get_by_name(ligolw_utils.load_filename(options.vetoes_file, contenthandler = ligolw_segments.LIGOLWContentHandler, verbose = options.verbose), options.vetoes_name).coalesce()
assert set(all_ifos).issubset(set(vetolists.keys())), 'unknown ifo given'
for ifo in all_ifos:
vetolists[ifo] &= segments.segmentlist([segments.segment(LIGOTimeGPS(options.gps_start_time), LIGOTimeGPS(options.gps_end_time))])
#
# filter bank
# load template bank file and find the template bank table
# Mapping is done from instrument to sngl_burst table & xml file
#
head = firbank = pipeparts.mkfirbank(pipeline, head, fir_matrix = numpy.zeros((len(template_bank_table),int(32*options.sample_rate)),dtype=numpy.float64), block_stride = 4 * options.sample_rate)
template_file = dict.fromkeys(all_ifos, None)
template_bank_table = dict.fromkeys(all_ifos, None)
template_ids = []
for filename in options.template_bank:
xmldoc = ligolw_utils.load_filename(filename, contenthandler = LIGOLWContentHandler, verbose = options.verbose)
table = lsctables.SnglBurstTable.get_table(xmldoc)
template_bank_table[table[0].ifo] = table
template_file[table[0].ifo] = filename
# Obtain template_ids. We use the cutoff frequency as an ID.
# FIXME we assume that the same template bank is used for all detectors,
# and we check that here. This was true in past searches, but might not be
# if we e.g. rigorously threshold on overlaps between "whitened" templates.
these_template_ids = [row.central_freq for row in table]
# this always passes for the first file, and only passes for the
# subsequent files if the template banks exactly match.
assert len(template_ids)==0 or template_ids == these_template_ids, "mismatch in template bank between instruments"
template_ids = these_template_ids
num_templates = len(template_ids)
#
......@@ -287,36 +416,134 @@ xmldoc = ligolw.Document()
xmldoc.appendChild(ligolw.LIGO_LW())
process = ligolw_process.register_to_xmldoc(xmldoc, "StringSearch", options.__dict__)
#
# also for putting rankingstat objects
#
xmldoc_rankingstat = ligolw.Document()
xmldoc_rankingstat.appendChild(ligolw.LIGO_LW())
process_rankingstat = ligolw_process.register_to_xmldoc(xmldoc_rankingstat, "lalapps_string_meas_likelihood", options.__dict__)
#
# append the injection file and time slide file (ligolw_add job in previous pipeline)
# the injection file already has a time slide table in it.
#
if options.injection_file is not None:
# FIXME as a sanity check we can require in the dag generator script to NOT
# have time-slide file as argument when injection-file is given.
del options.time_slide_file
xmldoc = ligolw_add.ligolw_add(xmldoc, [options.injection_file], contenthandler = LIGOLWContentHandler, verbose = options.verbose)
is_noninjections = False
else:
xmldoc = ligolw_add.ligolw_add(xmldoc, [options.time_slide_file], contenthandler = LIGOLWContentHandler, verbose = options.verbose)
is_noninjections = True
time_slide_table = lsctables.TimeSlideTable.get_table(xmldoc)
#
# table for single-detector triggers
#
sngl_burst_table = lsctables.New(lsctables.SnglBurstTable, ["process:process_id", "event_id","ifo","search","channel","start_time","start_time_ns","peak_time","peak_time_ns","duration","central_freq","bandwidth","amplitude","snr","confidence","chisq","chisq_dof"])
xmldoc.childNodes[-1].appendChild(sngl_burst_table)
#
# =============================================================================
#
# append search_summary table
# nevents will be filled out later
# Main
#
# =============================================================================
#
mainloop = GObject.MainLoop()
pipeline = Gst.Pipeline(name="pipeline")
whiten = dict.fromkeys(all_ifos, None)
firbank = dict.fromkeys(all_ifos, None)
triggergen = dict.fromkeys(all_ifos, None)
for ifo in all_ifos:
if ifo == "V1":
head = pipeparts.mklalcachesrc(pipeline, options.frame_cache, cache_src_regex = "V")
else:
head = pipeparts.mklalcachesrc(pipeline, options.frame_cache, cache_src_regex = ifo[0], cache_dsc_regex = ifo)
head = pipeparts.mkframecppchanneldemux(pipeline, head, channel_list = [channel_dict[ifo]])
pipeparts.framecpp_channeldemux_set_units(head, {channel_dict[ifo]:"strain"})
elem = pipeparts.mkaudioconvert(pipeline, None)
pipeparts.src_deferred_link(head, channel_dict[ifo], elem.get_static_pad("sink"))
head = elem
# put gate for the segments and vetoes
if options.segments_file is not None:
head = datasource.mksegmentsrcgate(pipeline, head, seglists[ifo], invert_output = False)
if options.vetoes_file is not None:
head = datasource.mksegmentsrcgate(pipeline, head, vetolists[ifo], invert_output = True)
# limit the maximum buffer duration. keeps RAM use under control
# in the even that we are loading gigantic frame files
head = pipeparts.mkreblock(pipeline, head, block_duration = 8 * 1000000000)
#
# injections
#
if options.injection_file is not None:
head = pipeparts.mkinjections(pipeline, head, options.injection_file)
#
# whitener, resampler and caps filter
#
# The buffer for O3 Virgo data is for some reason forced to be 4 seconds long, so zero-padding
# 2*2 seconds will zero-out all the data in the buffer, resulting in failing to find triggers.
# As a hack less amount of zero-padding is done for Virgo.
# FIXME This is an O3-only problem, and didn't happen when using O2 V1 data. But why?
if ifo == "V1":
head = whiten[ifo] = pipeparts.mkwhiten(pipeline, head, fft_length = 4, zero_pad = 1, name = "lal_whiten_%s" % ifo)
else:
head = whiten[ifo] = pipeparts.mkwhiten(pipeline, head, fft_length = 8, zero_pad = 2, name = "lal_whiten_%s" % ifo)
head = pipeparts.mkaudioconvert(pipeline, head)
head = pipeparts.mkresample(pipeline, head)
# FIXME NO hardcoding original sample rate!
head = pipeparts.mkaudioamplify(pipeline, head, math.sqrt(16384./options.sample_rate))
head = pipeparts.mkcapsfilter(pipeline, head, "audio/x-raw, format=F32LE, rate=%d" % options.sample_rate)
head = pipeparts.mkqueue(pipeline, head, max_size_buffers = 8)
search_summary_table = lsctables.New(lsctables.SearchSummaryTable, ["process:process_id", "comment", "ifos", "in_start_time", "in_start_time_ns", "in_end_time", "in_end_time_ns", "out_start_time", "out_start_time_ns", "out_end_time", "out_end_time_ns", "nevents", "nnodes"])
xmldoc.childNodes[-1].appendChild(search_summary_table)
search_summary = lsctables.SearchSummary()
search_summary.process_id = process.process_id
if options.user_tag:
search_summary.comment = options.user_tag
search_summary.ifos = template_bank_table[0].ifo
search_summary.out_start = search_summary.in_start = LIGOTimeGPS(options.gps_start_time)
search_summary.out_end = search_summary.in_end = LIGOTimeGPS(options.gps_end_time)
search_summary.nnodes = 1
search_summary.nevents = 0
#
# filter bank
#
head = firbank[ifo] = pipeparts.mkfirbank(pipeline, head, fir_matrix = numpy.zeros((len(template_bank_table[ifo]),int(32*options.sample_rate)+1),dtype=numpy.float64), block_stride = 4 * options.sample_rate, latency = int(16*options.sample_rate))
#
# trigger generator
#
triggergen[ifo] = pipeparts.mkgeneric(pipeline, head, "lal_string_triggergen", threshold = options.threshold, cluster = options.cluster_events, bank_filename = template_file[ifo], autocorrelation_matrix = numpy.zeros((len(template_bank_table[ifo]), 403),dtype=numpy.float64))
#
# trigger generator
# Load/Initialize ranking statistic data.
# instruments must be ones that appear in the whole search,
# so that we can combine the rankingstat objects later on.
#
head = triggergen = pipeparts.mkgeneric(pipeline, head, "lal_string_triggergen", threshold = options.threshold, cluster = options.cluster_events, bank_filename = options.template_bank, autocorrelation_matrix = numpy.zeros((len(template_bank_table), 403),dtype=numpy.float64))
rankingstat = stringutils.RankingStat(instruments = seglists.keys(), delta_t = options.delta_t, snr_threshold = options.threshold, num_templates = num_templates, min_instruments = 2)
mainloop = GObject.MainLoop()
handler = PipelineHandler(mainloop, pipeline, template_bank_table, firbank)
#
# handler
#
handler = PipelineHandler(mainloop, pipeline, rankingstat, xmldoc, template_bank_table, sngl_burst_table, psd, whiten, firbank, triggergen, is_noninjections)
#
......@@ -324,30 +551,36 @@ handler = PipelineHandler(mainloop, pipeline, template_bank_table, firbank)
#
appsync = pipeparts.AppSync(appsink_new_buffer = handler.appsink_new_buffer)
appsync.add_sink(pipeline, head, caps = Gst.Caps.from_string("application/x-lal-snglburst"))
if pipeline.set_state(Gst.State.READY) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline did not enter ready state")
appsinks = set(appsync.add_sink(pipeline, triggergen[ifo], caps = Gst.Caps.from_string("application/x-lal-snglburst"), name = ifo) for ifo in all_ifos)
#
# seek
#
if pipeline.set_state(Gst.State.READY) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline did not enter ready state")
options.gps_start_time = LIGOTimeGPS(options.gps_start_time)
options.gps_end_time = LIGOTimeGPS(options.gps_end_time)
datasource.pipeline_seek_for_gps(pipeline, options.gps_start_time, options.gps_end_time);
#
# run
#
if pipeline.set_state(Gst.State.PLAYING) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline did not enter playing state")
if options.verbose:
print("running pipeline ...", file=sys.stderr)
mainloop.run()
handler.flush()
mainloop.run()
#
# write output to disk
#
# write triggers & coinc events to XML file
ligolw_utils.write_filename(xmldoc, options.output, verbose = options.verbose)
search_summary_table.append(search_summary)
ligolw_utils.write_filename(xmldoc, options.output, gz = (options.output or "stdout").endswith(".gz"), verbose = options.verbose)
# also write rankingstat object to an XML file
xmldoc_rankingstat.childNodes[-1].appendChild(rankingstat.to_xml())
ligolw_utils.write_filename(xmldoc_rankingstat, options.rankingstat_output, verbose = options.verbose)
......@@ -84,7 +84,7 @@ ep.append_options(parser)
datasource.append_options(parser)
(options, args) = parser.parse_args()
gw_data_source_opts = datasource.GWDataSourceInfo(options)
gw_data_source_opts = datasource.DataSourceInfo.from_optparse(options)
# Verbosity and diagnostics
verbose = options.verbose
......@@ -119,7 +119,7 @@ if verbose:
print "Assembling pipeline... this is gstlal_excesspower, version code name %s\n" % __version__,
# Construct the data acquisition and conditioning part of the pipeline
head, _, _ = datasource.mkbasicsrc(pipeline, gw_data_source_opts, handler.inst, verbose)
head, _, _, _ = datasource.mkbasicsrc(pipeline, gw_data_source_opts, handler.inst, verbose)
# If we're running online, we need to set up a few things
if gw_data_source_opts.data_source in ("lvshm", "framexmit"):
......
......@@ -43,7 +43,6 @@ import itertools
from ligo.lw import utils as ligolw_utils
from ligo.lw import lsctables
from ligo.lw import table
from ligo.lw import ligolw
lsctables.use_in(ligolw.LIGOLWContentHandler)
from ligo import segments
......@@ -84,14 +83,14 @@ def get_sim_inspirals_from_file(filename):
Get a sim inspiral table from the file name."
"""
xmldoc = ligolw_utils.load_filename(filename)
return table.get_table( xmldoc, lsctables.SimInspiralTable.tableName )
return lsctables.SimInspiralTable.get_table( xmldoc )
def get_sim_burst_from_file( filename ):
"""
Get a sim burst table from the file name."
"""
xmldoc = ligolw_utils.load_filename(filename)
return table.get_table( xmldoc, lsctables.SimBurstTable.tableName )
return lsctables.SimBurstTable.get_table( xmldoc )
def plot_wnb( sim_burst ):
"""
......
#!/usr/bin/env python
# Copyright (C) 2018 Patrick Godwin
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
__usage__ = "gstlal_feature_aggregator [--options]"
__description__ = "an executable to aggregate and generate job metrics for streaming features"
__author__ = "Patrick Godwin (patrick.godwin@ligo.org)"
#-------------------------------------------------
# Preamble
#-------------------------------------------------
from collections import defaultdict, deque
import json
from multiprocessing.dummy import Pool as ThreadPool
import optparse
import os
import signal
import sys
import time
import numpy
from confluent_kafka import Consumer, KafkaError
from ligo.scald import aggregator
from ligo.scald import io
from gstlal.fxtools import utils
#-------------------------------------------------
# Functions
#-------------------------------------------------
def parse_command_line():
parser = optparse.OptionParser(usage=__usage__, description=__description__)
group = optparse.OptionGroup(parser, "Aggregator Options", "General settings for configuring the online aggregator.")
group.add_option("-v","--verbose", default=False, action="store_true", help = "Print to stdout in addition to writing to automatically generated log.")
group.add_option("--log-level", type = "int", default = 10, help = "Sets the verbosity of logging. Default = 10.")
group.add_option("--rootdir", metavar = "path", default = ".", help = "Location where log messages and sqlite database lives")
group.add_option("--tag", metavar = "string", default = "test", help = "Sets the name of the tag used. Default = 'test'")
group.add_option("--sample-rate", type = "int", metavar = "Hz", default = 1, help = "Set the sample rate for feature timeseries output, must be a power of 2. Default = 1 Hz.")
group.add_option("--processing-cadence", type = "float", default = 0.1, help = "Rate at which the aggregator acquires and processes data. Default = 0.1 seconds.")
group.add_option("--request-timeout", type = "float", default = 0.2, help = "Timeout for requesting messages from a topic. Default = 0.2 seconds.")
group.add_option("--kafka-server", metavar = "string", help = "Sets the server url that the kafka topic is hosted on. Required.")
group.add_option("--input-topic-basename", metavar = "string", help = "Sets the input kafka topic basename. Required.")
group.add_option("--jobs", action="append", help="Specify jobs to process. Can be given multiple times.")
group.add_option("--data-backend", default="hdf5", help = "Choose the backend for data to be stored into, options: [hdf5|influx]. default = hdf5.")
group.add_option("--influx-hostname", help = "Specify the hostname for the influxDB database. Required if --data-backend = influx.")
group.add_option("--influx-port", help = "Specify the port for the influxDB database. Required if --data-backend = influx.")
group.add_option("--influx-database-name", help = "Specify the database name for the influxDB database. Required if --data-backend = influx.")
group.add_option("--data-type", metavar = "string", help="Specify datatypes to aggregate from 'min', 'max', 'median'. Default: max")
group.add_option("--num-processes", type = "int", default = 2, help = "Number of processes to use concurrently, default 2.")
parser.add_option_group(group)
options, args = parser.parse_args()
return options, args
#-------------------------------------------------
# Classes
#-------------------------------------------------
class StreamAggregator(object):
"""
Ingests and aggregates incoming streaming features, collects job metrics.
"""
def __init__(self, logger, options):
logger.info('setting up feature aggregator...')
### initialize timing options
self.request_timeout = options.request_timeout
self.processing_cadence = options.processing_cadence
self.sample_rate = options.sample_rate
self.is_running = False
### kafka settings
self.kafka_settings = {
'bootstrap.servers': options.kafka_server,
'group.id': 'aggregator_%s_%s'%(options.tag, options.jobs[0])
}
### other aggregator options
self.data_type = options.data_type
self.last_save = aggregator.now()
### initialize consumers
self.jobs = options.jobs
self.consumer_names = ['%s_%s' % (options.input_topic_basename, job) for job in self.jobs]
self.job_consumers = [(job, Consumer(self.kafka_settings)) for job, topic in zip(self.jobs, self.consumer_names)]
for topic, job_consumer in zip(self.consumer_names, self.job_consumers):
job_consumer[1].subscribe([topic])
### initialize 30 second queue for incoming buffers
self.feature_queue = {job: deque(maxlen = 30 * self.sample_rate) for job in self.jobs}
### set up aggregator
logger.info("setting up aggregator with backend: %s"%options.data_backend)
if options.data_backend == 'influx':
self.agg_sink = io.influx.Aggregator(
hostname=options.influx_hostname,
port=options.influx_port,
db=options.influx_database_name,
reduce_across_tags=False,
)
else: ### hdf5 data backend
self.agg_sink = io.hdf5.Aggregator(
rootdir=options.rootdir,
num_processes=options.num_processes,
reduce_across_tags=False,
)
### define measurements to be stored from aggregators
self.agg_sink.register_schema('latency', columns='data', column_key='data', tags='job', tag_key='job')
self.agg_sink.register_schema('snr', columns='data', column_key='data', tags=('channel', 'subsystem'), tag_key='channel')
def fetch_data(self, job_consumer):
"""
requests for a new message from an individual topic,
and add to the feature queue
"""
job, consumer = job_consumer
message = consumer.poll(timeout=self.request_timeout)
### only add to queue if no errors in receiving data
if message and not message.error():
### decode json and parse data
feature_subset = json.loads(message.value())
self.feature_queue[job].appendleft((feature_subset['timestamp'], feature_subset['features']))
def fetch_all_data(self):
"""
requests for a new message from all topics, and add
to the feature queue
"""
pool = ThreadPool(len(self.job_consumers))
result = pool.map_async(self.fetch_data, self.job_consumers)
result.wait()
pool.close()
def process_queue(self):
"""
process and aggregate features from feature extraction jobs on a regular cadence
"""
if utils.in_new_epoch(aggregator.now(), self.last_save, 1):
self.last_save = aggregator.now()
### format incoming packets into metrics and timeseries
feature_packets = [(job, self.feature_queue[job].pop()) for job in self.jobs for i in range(len(self.feature_queue[job]))]
all_timeseries, all_metrics = self.packets_to_timeseries(feature_packets)
### store and aggregate metrics
metric_data = {job: {'time': metrics['time'], 'fields': {'data': metrics['latency']}} for job, metrics in all_metrics.items()}
self.agg_sink.store_columns('latency', metric_data, aggregate=self.data_type)
### store and aggregate features
timeseries_data = {(channel, self._channel_to_subsystem(channel)): {'time': timeseries['trigger_time'], 'fields': {'data': timeseries['snr']}} for channel, timeseries in all_timeseries.items()}
self.agg_sink.store_columns('snr', timeseries_data, aggregate=self.data_type)
try:
max_latency = max(max(metrics['latency']) for metrics in all_metrics.values())
logger.info('processed features at time %d, highest latency is %.3f' % (self.last_save, max_latency))
except:
logger.info('no features to process at time %d' % self.last_save)
def packets_to_timeseries(self, packets):
"""
splits up a series of packets into ordered timeseries, keyed by channel
"""
metrics = defaultdict(lambda: {'time': [], 'latency': []})
### process each packet sequentially and split rows by channel
channel_rows = defaultdict(list)
for job, packet in packets:
timestamp, features = packet
metrics[job]['time'].append(timestamp)
metrics[job]['latency'].append(utils.gps2latency(timestamp))
for channel, row in features.items():
channel_rows[channel].extend(row)
### break up rows into timeseries
timeseries = {}
for channel, rows in channel_rows.items():
timeseries[channel] = {column: [row[column] for row in rows] for column in rows[0].keys()}
return timeseries, metrics
def start(self):
"""
starts ingestion and aggregation of features
"""
logger.info('starting feature aggregator...')
self.is_running = True
while self.is_running:
### ingest incoming features
self.fetch_all_data()
### push combined features downstream
self.process_queue()
### repeat with processing cadence
time.sleep(self.processing_cadence)
def stop(self):
"""
shut down gracefully
"""
logger.info('shutting down feature aggregator...')
self.is_running = False
@staticmethod
def _channel_to_subsystem(channel):
"""
given a channel, returns the subsystem the channel lives in
"""
return channel.split(':')[1].split('-')[0]
class SignalHandler(object):
"""
helper class to shut down the stream aggregator gracefully before exiting
"""
def __init__(self, aggregator_sink, signals = [signal.SIGINT, signal.SIGTERM]):
self.aggregator_sink = aggregator_sink
for sig in signals:
signal.signal(sig, self)
def __call__(self, signum, frame):
self.aggregator_sink.stop()
sys.exit(0)
#-------------------------------------------------
# Main
#-------------------------------------------------
if __name__ == '__main__':
# parse arguments
options, args = parse_command_line()
### set up logging
logger = utils.get_logger(
'-'.join([options.tag, 'feature_aggregator', options.jobs[0]]),
log_level=options.log_level,
rootdir=options.rootdir,
verbose=options.verbose
)
# create summary instance
aggregator_sink = StreamAggregator(logger, options=options)
# install signal handler
SignalHandler(aggregator_sink)
# start up listener
aggregator_sink.start()
#!/usr/bin/env python
# Copyright (C) 2017-2018 Sydney J. Chamberlin, Patrick Godwin, Chad Hanna, Duncan Meacher
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""
A program to extract features from auxiliary channel data in real time or in offline mode
"""
# =============================
#
# preamble
#
# =============================
### .. graphviz::
###
### digraph llpipe {
### labeljust = "r";
### label="gstlal_feature_extractor"
### rankdir=LR;
### graph [fontname="Roman", fontsize=24];
### edge [ fontname="Roman", fontsize=10 ];
### node [fontname="Roman", shape=box, fontsize=11];
###
###
### subgraph clusterNodeN {
###
### style=rounded;
### label="gstreamer pipeline";
### labeljust = "r";
### fontsize = 14;
###
### H1L1src [label="H1(L1) data source:\n mkbasicmultisrc()", color=red4];
###
### Aux1 [label="Auxiliary channel 1", color=red4];
### Aux2 [label="Auxiliary channel 2", color=green4];
### AuxN [label="Auxiliary channel N", color=magenta4];
###
### Multirate1 [label="Auxiliary channel 1 whitening and downsampling:\nmkwhitened_multirate_src()", color=red4];
### Multirate2 [label="Auxiliary channel 2 whitening and downsampling:\nmkwhitened_multirate_src()", color=green4];
### MultirateN [label="Auxiliary channel N whitening and downsampling:\nmkwhitened_multirate_src()", color=magenta4];
###
### FilterBankAux1Rate1 [label="Auxiliary Channel 1:\nGlitch Filter Bank", color=red4];
### FilterBankAux1Rate2 [label="Auxiliary Channel 1:\nGlitch Filter Bank", color=red4];
### FilterBankAux1RateN [label="Auxiliary Channel 1:\nGlitch Filter Bank", color=red4];
### FilterBankAux2Rate1 [label="Auxiliary Channel 2:\nGlitch Filter Bank", color=green4];
### FilterBankAux2Rate2 [label="Auxiliary Channel 2:\nGlitch Filter Bank", color=green4];
### FilterBankAux2RateN [label="Auxiliary Channel 2:\nGlitch Filter Bank", color=green4];
### FilterBankAuxNRate1 [label="Auxiliary Channel N:\nGlitch Filter Bank", color=magenta4];
### FilterBankAuxNRate2 [label="Auxiliary Channel N:\nGlitch Filter Bank", color=magenta4];
### FilterBankAuxNRateN [label="Auxiliary Channel N:\nGlitch Filter Bank", color=magenta4];
###
### TriggerAux1Rate1 [label="Auxiliary Channel 1:\nTrigger Max (1 sec)", color=red4];
### TriggerAux1Rate2 [label="Auxiliary Channel 1:\nTrigger Max (1 sec)", color=red4];
### TriggerAux1RateN [label="Auxiliary Channel 1:\nTrigger Max (1 sec)", color=red4];
### TriggerAux2Rate1 [label="Auxiliary Channel 2:\nTrigger Max (1 sec)", color=green4];
### TriggerAux2Rate2 [label="Auxiliary Channel 2:\nTrigger Max (1 sec)", color=green4];
### TriggerAux2RateN [label="Auxiliary Channel 2:\nTrigger Max (1 sec)", color=green4];
### TriggerAuxNRate1 [label="Auxiliary Channel N:\nTrigger Max (1 sec)", color=magenta4];
### TriggerAuxNRate2 [label="Auxiliary Channel N:\nTrigger Max (1 sec)", color=magenta4];
### TriggerAuxNRateN [label="Auxiliary Channel N:\nTrigger Max (1 sec)", color=magenta4];
###
### H1L1src -> Aux1;
### H1L1src -> Aux2;
### H1L1src -> AuxN;
###
### Aux1 -> Multirate1;
### Aux2 -> Multirate2;
### AuxN -> MultirateN;
###
### Multirate1 -> FilterBankAux1Rate1 [label="Aux 1 4096Hz"];
### Multirate2 -> FilterBankAux2Rate1 [label="Aux 2 4096Hz"];
### MultirateN -> FilterBankAuxNRate1 [label="Aux N 4096Hz"];
### Multirate1 -> FilterBankAux1Rate2 [label="Aux 1 2048Hz"];
### Multirate2 -> FilterBankAux2Rate2 [label="Aux 2 2048Hz"];
### MultirateN -> FilterBankAuxNRate2 [label="Aux N 2048Hz"];
### Multirate1 -> FilterBankAux1RateN [label="Aux 1 Nth-pow-of-2 Hz"];
### Multirate2 -> FilterBankAux2RateN [label="Aux 2 Nth-pow-of-2 Hz"];
### MultirateN -> FilterBankAuxNRateN [label="Aux N Nth-pow-of-2 Hz"];
###
### FilterBankAux1Rate1 -> TriggerAux1Rate1;
### FilterBankAux1Rate2 -> TriggerAux1Rate2;
### FilterBankAux1RateN -> TriggerAux1RateN;
### FilterBankAux2Rate1 -> TriggerAux2Rate1;
### FilterBankAux2Rate2 -> TriggerAux2Rate2;
### FilterBankAux2RateN -> TriggerAux2RateN;
### FilterBankAuxNRate1 -> TriggerAuxNRate1;
### FilterBankAuxNRate2 -> TriggerAuxNRate2;
### FilterBankAuxNRateN -> TriggerAuxNRateN;
### }
###
###
### Synchronize [label="Synchronize buffers by timestamp"];
### Extract [label="Extract features from buffer"];
### Save [label="Save triggers to disk"];
### Kafka [label="Push features to queue"];
###
### TriggerAux1Rate1 -> Synchronize;
### TriggerAux1Rate2 -> Synchronize;
### TriggerAux1RateN -> Synchronize;
### TriggerAux2Rate1 -> Synchronize;
### TriggerAux2Rate2 -> Synchronize;
### TriggerAux2RateN -> Synchronize;
### TriggerAuxNRate1 -> Synchronize;
### TriggerAuxNRate2 -> Synchronize;
### TriggerAuxNRateN -> Synchronize;
###
### Synchronize -> Extract;
###
### Extract -> Save [label="Option 1"];
### Extract -> Kafka [label="Option 2"];
###
### }
###
import math
import optparse
import os
import resource
import socket
import sys
import tempfile
import h5py
import numpy
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst
GObject.threads_init()
Gst.init(None)
from lal import LIGOTimeGPS
from ligo import segments
from ligo.segments import utils as segmentsUtils
from gstlal import aggregator
from gstlal import bottle
from gstlal import datasource
from gstlal import httpinterface
from gstlal import pipeparts
from gstlal import simplehandler
from gstlal.fxtools import auxcache
from gstlal.fxtools import feature_extractor
from gstlal.fxtools import multichannel_datasource
from gstlal.fxtools import multirate_datasource
from gstlal.fxtools import sngltriggertable
from gstlal.fxtools import utils
from gstlal.fxtools import waveforms as fxwaveforms
#
# Make sure we have sufficient resources
# We allocate far more memory than we need, so this is okay
#
def setrlimit(res, lim):
hard_lim = resource.getrlimit(res)[1]
resource.setrlimit(res, (lim if lim is not None else hard_lim, hard_lim))
# set the number of processes and total set size up to hard limit and
# shrink the per-thread stack size (default is 10 MiB)
setrlimit(resource.RLIMIT_NPROC, None)
setrlimit(resource.RLIMIT_AS, None)
setrlimit(resource.RLIMIT_RSS, None)
# FIXME: tests at CIT show that this next tweak has no effect. it's
# possible that SL7 has lowered the default stack size from SL6 and we
# don't need to do this anymore. remove?
setrlimit(resource.RLIMIT_STACK, 1024 * 1024) # 1 MiB per thread
# =============================
#
# command line parser
#
# =============================
def parse_command_line():
parser = optparse.OptionParser(usage = '%prog [options]', description = __doc__)
# First append datasource and feature extraction common options
multichannel_datasource.append_options(parser)
feature_extractor.append_options(parser)
# parse the arguments
options, filenames = parser.parse_args()
# Sanity check the options
# set gps ranges for live and offline sources
if options.data_source in ("framexmit", "lvshm", "white_live"):
if options.data_source in ("framexmit", "lvshm"):
options.gps_start_time = int(aggregator.now())
else:
options.gps_start_time = 0 # NOTE: set start time for 'fake' live sources to zero, since seeking doesn't work with 'is_live' option
options.gps_end_time = 2000000000 # NOTE: set the gps end time to be "infinite"
if options.feature_start_time is None:
options.feature_start_time = int(options.gps_start_time)
if options.feature_end_time is None:
options.feature_end_time = int(options.gps_end_time)
# check if input sample rate is sensible
assert options.sample_rate == 1 or options.sample_rate % 2 == 0
# check if persist and save cadence times are sensible
assert options.persist_cadence >= options.cadence
assert (options.persist_cadence % options.cadence) == 0
# check if there are any segments to dump to disk
if options.nxydump_segment:
options.nxydump_segment, = segmentsUtils.from_range_strings([options.nxydump_segment], boundtype = LIGOTimeGPS)
return options, filenames
# =============================
#
# main
#
# =============================
#
# parsing and setting up some core structures
#
options, filenames = parse_command_line()
data_source_info = multichannel_datasource.DataSourceInfo(options)
instrument = data_source_info.instrument
basename = '%s-%s' % (instrument[:1], options.description)
waveforms = {}
#
# set up logging
#
duration = options.feature_end_time - options.feature_start_time
logdir = os.path.join(options.out_path, 'logs', options.job_id)
aggregator.makedir(logdir)
logger = utils.get_logger('gstlal-feature-extractor_%d-%d' % (options.feature_start_time, duration), rootdir=logdir, verbose=options.verbose)
logger.info("writing log to %s" % logdir)
#
# set up local frame caching, if specified
#
if options.local_frame_caching:
# get base temp directory
if '_CONDOR_SCRATCH_DIR' in os.environ:
tmp_dir = os.environ['_CONDOR_SCRATCH_DIR']
else:
tmp_dir = os.environ['TMPDIR']
# create local frame directory
local_path = os.path.join(tmp_dir, 'local_frames/')
aggregator.makedir(local_path)
# save local frame cache
logger.info("caching frame data locally to %s" % local_path)
f, fname = tempfile.mkstemp(".cache")
f = open(fname, "w")
data_source_info.local_cache_list = auxcache.cache_aux(data_source_info, logger, output_path = local_path, verbose = options.verbose)
for cacheentry in data_source_info.local_cache_list:
# guarantee a lal cache compliant file with only integer starts and durations
cacheentry.segment = segments.segment( int(cacheentry.segment[0]), int(math.ceil(cacheentry.segment[1])) )
print >>f, str(cacheentry)
f.close()
data_source_info.frame_cache = fname
#
# process channel subsets in serial
#
for subset_id, channel_subset in enumerate(data_source_info.channel_subsets, 1):
#
# checkpointing for offline analysis for hdf5 output
#
if options.data_source not in data_source_info.live_sources and options.save_format == 'hdf5':
try:
# get path where triggers are located
duration = options.feature_end_time - options.feature_start_time
fname = utils.to_trigger_filename(basename, options.feature_start_time, duration, 'h5')
fpath = utils.to_trigger_path(os.path.abspath(options.out_path), basename, options.feature_start_time, options.job_id, str(subset_id).zfill(4))
trg_file = os.path.join(fpath, fname)
# visit groups within a given hdf5 file
with h5py.File(trg_file, 'r') as f:
f.visit(lambda item: f[item])
# file is OK and there is no need to process it,
# skip ahead in the loop
continue
except IOError:
# file does not exist or is corrupted, need to
# reprocess
logger.info("checkpoint: {0} of {1} files completed and continuing with channel subset {2}".format((subset_id - 1), len(data_source_info.channel_subsets), subset_id))
pass
logger.info("processing channel subset %d of %d" % (subset_id, len(data_source_info.channel_subsets)))
#
# if web services serving up bottle routes are enabled,
# create a new, empty, Bottle application and make it the
# current default, then start http server to serve it up
#
if not options.disable_web_service:
bottle.default_app.push()
# uncomment the next line to show tracebacks when something fails
# in the web server
#bottle.app().catchall = False
import base64, uuid # FIXME: don't import when the uniquification scheme is fixed
httpservers = httpinterface.HTTPServers(
# FIXME: either switch to using avahi's native name
# uniquification system or adopt a naturally unique naming
# scheme (e.g., include a search identifier and job
# number).
service_name = "gstlal_idq (%s)" % base64.urlsafe_b64encode(uuid.uuid4().bytes),
service_properties = {},
verbose = options.verbose
)
# Set up a registry of the resources that this job provides
@bottle.route("/")
@bottle.route("/index.html")
def index(channel_list = channel_subset):
# get the host and port to report in the links from the
# request we've received so that the URLs contain the IP
# address by which client has contacted us
netloc = bottle.request.urlparts[1]
server_address = "http://%s" % netloc
yield "<html><body>\n<h3>%s %s</h3>\n<p>\n" % (netloc, " ".join(sorted(channel_list)))
for route in sorted(bottle.default_app().routes, key = lambda route: route.rule):
# don't create links back to this page
if route.rule in ("/", "/index.html"):
continue
# only create links for GET methods
if route.method != "GET":
continue
yield "<a href=\"%s%s\">%s</a><br>\n" % (server_address, route.rule, route.rule)
yield "</p>\n</body></html>"
# FIXME: get service-discovery working, then don't do this
open("registry.txt", "w").write("http://%s:%s/\n" % (socket.gethostname(), httpservers[0][0].port))
#
# building the event loop and pipeline
#
logger.info("assembling pipeline...")
mainloop = GObject.MainLoop()
pipeline = Gst.Pipeline(sys.argv[0])
# generate multiple channel sources, and link up pipeline
head = multichannel_datasource.mkbasicmultisrc(pipeline, data_source_info, channel_subset, verbose = options.verbose)
src = {}
for channel in channel_subset:
# define sampling rates used
samp_rate = int(data_source_info.channel_dict[channel]['fsamp'])
max_rate = min(data_source_info.max_sample_rate, samp_rate)
min_rate = min(data_source_info.min_sample_rate, max_rate)
n_rates = int(numpy.log2(max_rate/min_rate) + 1)
rates = [min_rate*2**i for i in range(n_rates)]
# choose range of basis parameters
# NOTE: scale down frequency range by downsample_factor to deal with rolloff from downsampler
downsample_factor = 0.8
qlow = 3.3166
if data_source_info.extension == 'ini':
flow = max(data_source_info.channel_dict[channel]['flow'], min_rate/4.)
fhigh = min(data_source_info.channel_dict[channel]['fhigh'], max_rate/2.)
qhigh = min(data_source_info.channel_dict[channel]['qhigh'], options.qhigh)
else:
flow = min_rate/4.
fhigh = max_rate/2.
qhigh = options.qhigh
# generate templates
if 'sine_gaussian' in options.waveform:
parameter_range = {'frequency': (flow, fhigh), 'q': (qlow, qhigh)}
if options.waveform == 'half_sine_gaussian':
waveforms[channel] = fxwaveforms.HalfSineGaussianGenerator(parameter_range, rates, mismatch=options.mismatch, downsample_factor=downsample_factor)
elif options.waveform == 'sine_gaussian':
waveforms[channel] = fxwaveforms.SineGaussianGenerator(parameter_range, rates, mismatch=options.mismatch, downsample_factor=downsample_factor)
elif options.waveform == 'tapered_sine_gaussian':
waveforms[channel] = fxwaveforms.TaperedSineGaussianGenerator(parameter_range, rates, mismatch=options.mismatch, downsample_factor=downsample_factor, max_latency=options.max_latency)
else:
raise NotImplementedError
if options.latency_output:
head[channel] = pipeparts.mklatency(pipeline, head[channel], name=utils.latency_name('beforewhitening', 2, channel))
# whiten auxiliary channel data
for rate, thishead in multirate_datasource.mkwhitened_multirate_src(pipeline, head[channel], rates, samp_rate, instrument, channel_name = channel, width=32, nxydump_segment=options.nxydump_segment, psd_fft_length=options.psd_fft_length).items():
if options.latency_output:
thishead = pipeparts.mklatency(pipeline, thishead, name=utils.latency_name('afterwhitening', 3, channel, rate))
# determine whether to do time-domain or frequency-domain convolution
time_domain = (waveforms[channel].sample_pts(rate)*rate) < (5*waveforms[channel].sample_pts(rate)*numpy.log2(rate))
# create FIR bank of half sine-gaussian templates
fir_matrix = numpy.array([waveform for waveform in waveforms[channel].generate_templates(rate)])
thishead = pipeparts.mkqueue(pipeline, thishead, max_size_buffers = 0, max_size_bytes = 0, max_size_time = Gst.SECOND * 30)
thishead = pipeparts.mkfirbank(pipeline, thishead, fir_matrix = fir_matrix, time_domain = time_domain, block_stride = int(rate), latency = waveforms[channel].latency(rate))
# add queues, change stream format, add tags
if options.latency_output:
thishead = pipeparts.mklatency(pipeline, thishead, name=utils.latency_name('afterFIRbank', 4, channel, rate))
thishead = pipeparts.mkqueue(pipeline, thishead, max_size_buffers = 1, max_size_bytes = 0, max_size_time = 0)
thishead = pipeparts.mktogglecomplex(pipeline, thishead)
thishead = pipeparts.mkcapsfilter(pipeline, thishead, caps = "audio/x-raw, format=Z64LE, rate=%i" % rate)
thishead = pipeparts.mktaginject(pipeline, thishead, "instrument=%s,channel-name=%s" %( instrument, channel))
# dump segments to disk if specified
tee = pipeparts.mktee(pipeline, thishead)
if options.nxydump_segment:
pipeparts.mknxydumpsink(pipeline, pipeparts.mkqueue(pipeline, tee), "snrtimeseries_%s_%s.txt" % (channel, repr(rate)), segment = options.nxydump_segment)
# extract features from time series
if options.feature_mode == 'timeseries':
thishead = pipeparts.mktrigger(pipeline, tee, int(rate // options.sample_rate), max_snr = True)
elif options.feature_mode == 'etg':
thishead = pipeparts.mktrigger(pipeline, tee, rate, snr_thresh = options.snr_threshold)
if options.latency_output:
thishead = pipeparts.mklatency(pipeline, thishead, name=utils.latency_name('aftertrigger', 5, channel, rate))
# link to src for processing by appsync
src[(channel, rate)] = thishead
# define structures to synchronize output streams and extract triggers from buffer
logger.info("setting up pipeline handler...")
handler = feature_extractor.MultiChannelHandler(mainloop, pipeline, logger, data_source_info, options, keys = src.keys(), waveforms = waveforms, basename = basename, subset_id = subset_id)
logger.info("attaching appsinks to pipeline...")
appsync = feature_extractor.LinkedAppSync(appsink_new_buffer = handler.bufhandler)
appsinks = set(appsync.add_sink(pipeline, src[(channel, rate)], name = "sink_%s_%s" % (rate, channel)) for channel, rate in src.keys())
logger.info("attached %d appsinks to pipeline." % len(appsinks))
# Allow Ctrl+C or sig term to gracefully shut down the program for online
# sources, otherwise it will just kill it
if data_source_info.data_source in data_source_info.live_sources:# what about nds online?
simplehandler.OneTimeSignalHandler(pipeline)
# Seek
if pipeline.set_state(Gst.State.READY) == Gst.StateChangeReturn.FAILURE:
raise RuntimeError("pipeline failed to enter READY state")
if data_source_info.data_source not in data_source_info.live_sources:# what about nds online?
datasource.pipeline_seek_for_gps(pipeline, options.gps_start_time, options.gps_end_time)
#
# Run pipeline
#
if pipeline.set_state(Gst.State.PLAYING) == Gst.StateChangeReturn.FAILURE:
raise RuntimeError("pipeline failed to enter PLAYING state")
logger.info("running pipeline...")
mainloop.run()
# save remaining triggers
logger.info("persisting features to disk...")
handler.flush_and_save_features()
#
# Shut down pipeline
#
logger.info("shutting down pipeline...")
#
# Shutdown the web interface servers and garbage collect the Bottle
# app. This should release the references the Bottle app's routes
# hold to the pipeline's data (like template banks and so on).
#
if not options.disable_web_service:
del httpservers
bottle.default_app.pop()
#
# Set pipeline state to NULL and garbage collect the handler
#
if pipeline.set_state(Gst.State.NULL) != Gst.StateChangeReturn.SUCCESS:
raise RuntimeError("pipeline could not be set to NULL")
del handler.pipeline
del handler
#
# Cleanup local frame file cache and related frames
#
if options.local_frame_caching:
logger.info("deleting temporary cache file and frames...")
# remove frame cache
os.remove(data_source_info.frame_cache)
# remove local frames
for cacheentry in data_source_info.local_cache_list:
os.remove(cacheentry.path)
del data_source_info.local_cache_list
#
# close program manually if data source is live
#
if options.data_source in data_source_info.live_sources:
sys.exit(0)
#!/usr/bin/env python
#
# Copyright (C) 2011-2018 Chad Hanna, Duncan Meacher, Patrick Godwin
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
"""
This program makes a dag to run offline gstlal_feature_extractor batch jobs
"""
__author__ = 'Duncan Meacher <duncan.meacher@ligo.org>, Patrick Godwin <patrick.godwin@ligo.org>'
# =============================
#
# preamble
#
# =============================
import os
import optparse
import lal
from ligo import segments
from gstlal import aggregator
from gstlal import dagparts
from gstlal.fxtools import feature_extractor
from gstlal.fxtools import multichannel_datasource
from gstlal.fxtools import multirate_datasource
from gstlal.fxtools import utils
# =============================
#
# functions
#
# =============================
def analysis_segments(ifo, allsegs, boundary_seg, segment_length, max_template_length = 30):
"""
get a dictionary of all the analysis segments
"""
segsdict = segments.segmentlistdict()
# start pad to allow whitener to settle + the maximum template_length
start_pad = multirate_datasource.PSD_DROP_TIME + max_template_length
segsdict[ifo] = segments.segmentlist([boundary_seg])
segsdict[ifo] = segsdict[ifo].protract(start_pad)
segsdict[ifo] = gstlaldagparts.breakupsegs(segsdict[ifo], segment_length, start_pad)
if not segsdict[ifo]:
del segsdict[ifo]
return segsdict
def feature_extractor_node_gen(feature_extractor_job, dag, parent_nodes, segsdict, ifo, options, data_source_info, max_template_length = 30):
"""
get a dictionary of all the channels per gstlal_feature_extractor job
"""
feature_extractor_nodes = {}
# parallelize jobs by channel subsets
for ii, channel_subset in enumerate(data_source_info.channel_subsets):
print "Creating feature extractor jobs for channel subset %d" % ii
# parallelize jobs by segments
for seg in segsdict[ifo]:
# only produce jobs where the analysis runtime after applying segments is nonzero
if not data_source_info.frame_segments[ifo].intersects_segment(seg):
if options.verbose:
print " Skipping segment (%d, %d) for channel subset %d since there is no analyzable data here" % (int(seg[0]), int(seg[1]), ii)
continue
# set maximum number of jobs reading concurrently from the same frame file to prevent I/O locks
if ii / options.concurrency == 0:
dep_nodes = parent_nodes
else:
dep_nodes = [feature_extractor_nodes[(ii - options.concurrency, seg)]]
# creates a list of channel names with entries of the form --channel-name=IFO:CHANNEL_NAME:RATE
channels = [''.join(["--channel-name=",':'.join([channel, str(int(data_source_info.channel_dict[channel]['fsamp']))])]) for channel in channel_subset]
# FIXME: hacky way of getting options to get passed correctly for channels
channels[0] = channels[0].split('=')[1]
outpath = os.path.join(options.out_path, "gstlal_feature_extractor")
# define analysis times
gps_start_time = int(seg[0])
feature_start_time = gps_start_time + multirate_datasource.PSD_DROP_TIME + max_template_length
feature_end_time = min(int(seg[1]), options.gps_end_time)
feature_extractor_nodes[(ii, seg)] = \
dagparts.DAGNode(feature_extractor_job, dag, parent_nodes = dep_nodes,
opts = {"gps-start-time":gps_start_time,
"gps-end-time":feature_end_time,
"feature-start-time":feature_start_time,
"feature-end-time":feature_end_time,
"data-source":"frames",
"sample-rate": options.sample_rate,
"mismatch":options.mismatch,
"waveform":options.waveform,
"qhigh":options.qhigh,
"channel-name":' '.join(channels),
"job-id":str(ii + 1).zfill(4),
"cadence":options.cadence,
"persist-cadence":options.persist_cadence,
"max-streams":options.max_serial_streams,
"disable-web-service":options.disable_web_service,
"local-frame-caching":options.local_frame_caching,
"frame-segments-name": options.frame_segments_name,
"save-format": options.save_format,
"verbose":options.verbose
},
input_files = {"frame-cache":options.frame_cache,
"frame-segments-file":options.frame_segments_file},
output_files = {"out-path":outpath}
)
if options.verbose:
print " Creating node for channel subset %d, gps range %d - %d" % (ii, feature_start_time, feature_end_time)
return feature_extractor_nodes
# =============================
#
# command line parser
#
# =============================
def parse_command_line():
parser = optparse.OptionParser(usage = '%prog [options]', description = __doc__)
# generic data source options
multichannel_datasource.append_options(parser)
feature_extractor.append_options(parser)
# DAG architecture options
parser.add_option("--max-parallel-streams", type = "int", default = 50, help = "Number of streams (sum(channel_i * num_rates_i)) to process in parallel. This gives the maximum number of channels to process for a given job. Default = 50.")
parser.add_option("--max-serial-streams", type = "int", default = 100, help = "Number of streams (sum(channel_i * num_rates_i)) to process serially within a given job. Default = 100.")
parser.add_option("--concurrency", type = "int", default = 4, help = "Maximum allowed number of parallel jobs reading from the same file, done to prevent I/O locks")
parser.add_option("--segment-length", type = "int", default = 6000, help = "Maximum segment length to process per job. Default = 6000 seconds.")
# Condor commands
parser.add_option("--request-cpu", default = "2", metavar = "integer", help = "set the requested node CPU count, default = 2")
parser.add_option("--request-memory", default = "8GB", metavar = "integer", help = "set the requested node memory, default = 8GB")
parser.add_option("--request-disk", default = "50GB", metavar = "integer", help = "set the requested node local scratch space size needed, default = 50GB")
parser.add_option("--condor-command", action = "append", default = [], metavar = "command=value", help = "set condor commands of the form command=value; can be given multiple times")
options, filenames = parser.parse_args()
# set max parallel streams to options.max_streams for use in data_source_info for splitting up channel lists to process in parallel
options.max_streams = options.max_parallel_streams
# FIXME: once we figure out what the maximum concurrency is for parallel reads, should set that as a sanity check
# sanity check to enforce a minimum segment length
# Minimum segment length chosen so that the overlap is a ~33% hit in run time
min_segment_length = int(4 * multirate_datasource.PSD_DROP_TIME)
assert options.segment_length >= min_segment_length
return options, filenames
# =============================
#
# main
#
# =============================
#
# parsing and setting up core structures
#
options, filenames = parse_command_line()
data_source_info = multichannel_datasource.DataSourceInfo(options)
ifo = data_source_info.instrument
channels = data_source_info.channel_dict.keys()
# FIXME Work out better way to determine max template length
max_template_length = 30
#
# create directories if needed
#
listdir = os.path.join(options.out_path, "gstlal_feature_extractor/channel_lists")
aggregator.makedir(listdir)
aggregator.makedir("logs")
#
# set up dag and job classes
#
dag = dagparts.DAG("feature_extractor_pipe")
condor_options = {"request_memory":options.request_memory, "request_cpus":options.request_cpu, "want_graceful_removal":"True", "kill_sig":"15"}
condor_commands = dagparts.condor_command_dict_from_opts(options.condor_command, condor_options)
feature_extractor_job = dagparts.DAGJob("gstlal_feature_extractor", condor_commands = condor_commands)
segsdict = analysis_segments(ifo, data_source_info.frame_segments, data_source_info.seg, options.segment_length, max_template_length=max_template_length)
#
# set up jobs
#
feature_extractor_nodes = feature_extractor_node_gen(feature_extractor_job, dag, [], segsdict, ifo, options, data_source_info, max_template_length=max_template_length)
#
# write out dag and sub files
#
dag.write_sub_files()
dag.write_dag()
dag.write_script()