- Condor managed computing resource using the LIGO Data Grid configuration
- Access to gravitational wave data, stored locally or via CVMFS
Introduction
------------
This tutorial will help you to setup and run a offline gravitational wave search for binary neutron stars. The information contained within this document can easily be modified to perform a wide range of searches.
The offline analysis has a somewhat involved setup procedure. This documentation covers all of it. The analysis itself is performed by a pipeline contained within a dag (Directed Acyclic Graph) that is managed by condor. The dag and job sub files are produced by running gstlal_inspiral_pipe. This program requires several input files that are produced in several steps, all of which are detailed below. These input files are:
* segments.xml.gz
* vetoes.xml.gz
* frame.cache
* inj_tisi.xml
* tisi.xml
* injection file
* split bank cache files
The steps to produce teh full analysis dag file are:
1. Analysis variables defined at the top of offline Makefile
2. Generate frame cache, segments, vetoes, and tisi files
3. Generate/copy template bank and then split this into sub-banks
4. Run gstlal_inspiral_pipe to produce offline analysis dag
The information contained within this page is based off the O2 BNS test dag, an offline analysis focused on 100,000s centered around GW170817. The dag used to perform the analysis can be produced using a [Makefile](https://git.ligo.org/lscsoft/gstlal/blob/master/gstlal-inspiral/share/O3/offline/O2/Makefile.BNS_HL_test_dag_O2) that generats most of the required files. This tutorial will just cover the HL detector pair configureation, a HLV Makefile exists [here](). Here we detail each stage of the Makefile needed to run an offline analysis.
Analysis variables defined at the top of offline Makefile
There many variables that are set at the top of the offline Makefile. Some of these should not be changed unless you know what you are doing. The variables that should be changed/set are explained here:
```make
ACCOUNTING_TAG=ligo.dev.o3.cbc.uber.gstlaloffline
```
An accounting tag used to measure LDG computational use. See https://ldas-gridmon.ligo.caltech.edu/ldg_accounting/user .
```make
GROUP_USER=albert.einstein
```
This should be your alber.einstein user idenification. This is only needed if using a shared account.
```make
IFOS = H1 L1
MIN_IFOS = 2
```
Define which detectors to include within the analysis. H1, L1, and V1 are currently supported. Set minimum number of operational dectors for which to analyise. Able to analyse single detector time.
```make
START = 1187000000
STOP = 1187100000
```
Set start and stop time of the analysis in GPS seconds. The times stated here are 100,000s around GW170817. See https://www.gw-openscience.org/gps/ for conversions.
Used to specify injection file, and chirpmass range over which to filter it. Multiple injection files can be given at once, these should be space seperated, with no whitespace at the end of the line.
Veto definer file. Used to determine what data to veto. See https://git.ligo.org/detchar/veto-definitions/tree/master/cbc for all veto definer files.
```make
# GSTLAL_SEGMENTS Options
SEG_SERVER=https://segments.ligo.org
# C02
LIGO_SEGMENTS="$*:DCH-CLEAN_SCIENCE_C02:1"
# The LIGO frame types
# C02
HANFORD_FRAME_TYPE='H1_CLEANED_HOFT_C02'
LIVINGSTON_FRAME_TYPE='L1_CLEANED_HOFT_C02'
# The Channel names.
# C02 cleaned
H1_CHANNEL=DCH-CLEAN_STRAIN_C02
L1_CHANNEL=DCH-CLEAN_STRAIN_C02
```
Gravitational wave data segment, frame type, and channel name information. See https://wiki.ligo.org/LSC/JRPComm/ for full information about all observing runs.
```make
include /path/to/Makefile.offline_analysis_rules
```
Full path to [Makefile.offline_analysis_rules](https://git.ligo.org/lscsoft/gstlal/blob/master/gstlal-inspiral/share/Makefile.offline_analysis_rules). This file contains sets of ruls for string parsing/manipulation used within the main Makefile and an up-to-date version must be included.
Generate segments, vetoes, frame cache, and tisi files
The frame.cache file contains the full paths to the Gravitational Wave data .gwf files using the following format:
Detector site identfier, frame type, start GPS time, duration, full path to file.
```
H H1__H1_CLEANED_HOFT_C02 1186998263 4096 file://localhost/hdfs/frames/O2/hoft_C02_clean/H1/H-H1_CLEANED_HOFT_C02-11869/H-H1_CLEANED_HOFT_C02-1186998263-4096.gwf
```
If the .gwf data files are stored locally, then you can produce individuel detector frame cache files with:
The awk command provides some formating to put the output in the required format.
If the data must be accessed via CVMFS then the following option needs to be added to the gw_data_find arguments:
```make
--server datafind.ligo.org:443
```
And then create a combined frame.cache file with some additional formating:
```make
cat H1_frame.cache L1_frame.cache > frame.cache
sed -i s/H\ $(LIGO_FRAME_TYPE)/H\ H1_$(LIGO_FRAME_TYPE)/g frame.cache
sed -i s/L\ $(LIGO_FRAME_TYPE)/L\ L1_$(LIGO_FRAME_TYPE)/g frame.cache
```
### Generating segments.xml.gz and vetoes.xml.gz files
The segments.xml.gz file contains a list of all data segments that should be analyised. The vetoes.xml.gz file contains a list of all data segments that should be ignored.
This returns an initial segments list. This command makes use of some Makefile variables segmentspadded files for each detector specified by $IFOS. ligolw_no_ilwdchar is run on the output files to convert some table column types from ilwd:char to int4s. This command will beed to be run on any xml file produced by a non-gstlal program.
The next step is to aquire a template bank that will be used to filter the data. The BNS Makefile produces its own BNS template bank containing ~13,500 templates (parametters are shown below) but there are also existing template bank that can be used. If you are using a pre-existing template bank, then much of the next two sections can be ignored/removed.
```make
############################
# Template bank parameters #
############################
# Note that these can can change if you modify the template bank program.
# Waveform approximant
APPROXIMANT = TaylorF2
# Minimum component mass for the template bank
MIN_MASS = 0.99
# Maximum component mass for the template bank
MAX_MASS = 3.1
# Minimum total mass for the template bank
MIN_TOTAL_MASS = 1.98
# Maximum total mass for the template bank
MAX_TOTAL_MASS = 6.2
# Maximum symmetric mass ratio for the template bank
MAX_ETA = 0.25
# Minimum symmetric mass ratio for the template bank
MIN_ETA = 0.18
# Low frequency cut off for the template bank placement
LOW_FREQUENCY_CUTOFF = 15.0
# High pass frequency to condition the data before measuring the psd for template placement
HIGH_PASS_FREQ = 10.0
# Highest frequency at which to compute the metric
HIGH_FREQUENCY_CUTOFF = 1024.0
# The sample rate at which to compute the template bank
SAMPLE_RATE = 4096
# The minimal match of the template bank; determines how much SNR is retained for signals "in between the bank points"
MM = 0.975
# The start time for reading the data for the bank
BANKSTART = 1187000000
# The stop time for reading the data for the bank (Bank start + 2048s)
lalapps_tmpltbank is a rather old program and newer ones exist, such as lalapps_cbc_sbank. Which ever program you use to generate the bank, gstlal_inspiral_add_template_ids needs to be run on it in order to work with the mass model used in the main analysis.
```make
mkdir -p $*_split_bank
gstlal_bank_splitter \
--f-low $(LOW_FREQUENCY_CUTOFF) \
--group-by-chi $(NUM_CHI_BINS) \
--output-path $*_split_bank \
--approximant $(APPROXIMANT1) \
--approximant $(APPROXIMANT2) \
--output-cache $@ \
--overlap $(OVERLAP) \
--instrument $* \
--n $(NUM_SPLIT_TEMPLATES) \
--sort-by mchirp \
--max-f-final $(HIGH_FREQUENCY_CUTOFF) \
--write-svd-caches \
--num-banks $(NUMBANKS) \
H1-TMPLTBANK-$(START)-2048.xml
```
This program needs to be run on the template bank being used to split it up into sub banks that will be passed to the singular value decompositon code within the pipeline.
Run gstlal_inspiral_pipe to produce offline analysis dag
A set of sed commands to to make the memory requet of jobs dynamical. These commands shouldn't be needed for most standard cases, but if you notice that jobs are being placed on hold by condor for going over their requested memory allowcation, then these should allow the jobs to run.
```make
sed -i "/^environment/s?\$$?GSTLAL_FIR_WHITEN=0;?" *.sub
```
A sed command to set 'GSTLAL_FIR_WHITEN=0' for all jobs. Required in all cases. This environment variable is sometimes also set within the env.sh file when sourcing an enviroment, if it was built by the user. This sed command should be included if using the system build.
```make
sed -i 's@environment = GST_REGISTRY_UPDATE=no;@environment = "GST_REGISTRY_UPDATE=no LD_PRELOAD=$(MKLROOT)/lib/intel64/libmkl_core.so"@g' gstlal_inspiral_injection_snr.sub
```
A sed command to force the use of MKL libraries for injection SNRs. Only needed if using an optimised build.
Commands for submitting the dag to condor and then to monitor the status of the dag. The grep command provides some formatting to the output, removing superfluous information.
Running the Makefile
--------------------
Assuming you have all the prerequisites, running the BNS Makefile as it is only requires a few changes. These are:
* Line 3: set accounting tag
* Line 66: Set analysis run tag. Use this to identify different runs, i.e. TAG = BNS_test_dag_190401
* Line 129: Set path to veto definer file
* Line 183: Set path to Makefile.offline_analysis_rules
Then to run it, ensuring you have the correct envirnment set, run with: make -f Makefile.BNS_HL_test_dag_O2