Commit d7b027bd authored by Jonah Kanner's avatar Jonah Kanner 🤓
Browse files

First import

parent 9d45277c
## GW170814 auxiliary data release
Code associated with LIGO auxiliary data releases.
For context, see the [GWOSC Auxiliary Data Release page](https://gw-osc.org/auxiliary/GW170814)
## Documentation
**[Examples Directory](./examples)** contains example code to read and plot these data
**[Channel List: chanlist.csv](./dataprep/chanlist.csv)** contains a list of included channels. Not all channels are available during this time period,
and in some cases, there may be differences in which channels are available at the two LIGO sites.
**[Channel Documentation: channel_info.md](./channel_info.md)** contains notes on how to understand the content in the various channels
`dataprep` contains code used to prepare the data set
`review` cotains some scripts used to check the data set before release
## Information about the Channels
1. What are these channels?
1. Why were these channels picked?
1. Why have the data been low-pass filtered and downsampled?
## What Are These Channels?
The AUX channel list chosen for this list includes all of the sensors (real or virtual) which:
* are part of the LIGO Physical Environment Monitor (PEM) system (e.g. seismometers, microphones, magnetometers)
* non-interferometric sensors of the suspended interferometer optics (e.g. local motion sensors, "shadow" sensors)
* sensors and actuators of the seismic vibration isolation platofrms (SEI)
* interferometric signals which measure degrees of freedom not directly related to the gravitational-wave (differential-arm or DARM) degree of freedom
## Why were these channels picked?
This channel list contains all of the sensors (both real and virtual) which are known to:
1. couple strongly to the GW strain channel
2. measure parts of the environment which may modulate the coupling of other channels to the strain channel
3. be of auxiliary use in diagnosing other misbehavior in the interferometer or related systems
4. be of general interest in the monitoring of the environment for scientific purposes not directly related to GW detection
## Why have the data been low-pass filtered and downsampled?
The full (non-reduced) data are available as GWF files or through the LIGO NDS interface.
The HDF5 files have been reduced in size via decimation (aka low-pass filtering and downsampling) to the sample rate which is indicated for each channel in the HDF5 files.
The purpose of the processing was to reduce the data size, both for file transfer time, as well as to reduce data processing computation load. We believe that the chosen sample
rates still represent the overwhelming majority of interesting information for each sensor. For example, the seismometer channels need not be recorded at a rate much beyond the
analog bandwidth of the instrument.
\ No newline at end of file
## Auxiliary Data Preparation Scripts
## Creating data sets
dataLib.py: Main code to read channels from google sheet and then fetch the data using NDS.
The decimation & zero checking functions are also provided here.
To run the code, one can do:
python dataLib.py --gps-start GPS_START --duration DURATION --ifo IFO
This will grab data for IFO (ie H1 or L1) and save them into multiple h5 files.
Each h5 file starts at an integer multiple of 64 ses and has a duration of 64 sec.
The routine will make sure to all the h5 files together covers the range
[GPS_START, GPS_START+DURATION].
By default, each data file has a name '%s-%s_AUXR-%i-%i.h5'%(ifo[0], ifo, gps_start, duration)
(e.g. L-L1_AUXR-1186736384-64.h5)
Currently all the h5 files are stored in the CIT cluster at
/home/nlnoise/public_html/Data/aux_release/
If one instead wants to bypass this 64 sec requirement
and start exactly at GPS_START for a duration exactly of DURATION,
then one can do:
python dataLib.py --gps-start GPS_START --duration DURATION --ifo IFO --mode -1
The list of channels to grab will be loaded from the google sheet:
http://bit.ly/auxgwosc
together with the desired sampling rates.
Each channel will first be downsampled to the rates specified in the sheet
before saved to the h5 files.
Each h5 files is stored with two layers:
The top layer corresponds to the three-letter abbreviation of an aLIGO subsystem
(e.g. ASC, LSC, PEM, ...)
The second layer corresponds to a channel.
The key is the channel name (excluding the `H1:' part).
Each channel group contains an attribute, 'valid'.
If valid=0, the channel does not exists or it outputs identically 0 during the period.
If valid=1, we further store 4 attributes for each channel: gps_start, duration, fs0, fs
where fs0 is the original sampling rate and fs is the rate stored in the h5 file,
and a dataset 'data' for the down-sampled time series.
routinely_check.py: Codes for performing some routine checks of the data validity in the hdf5 files.
Should be placed in the same location as the data.
make_AUXR.py: Create the GWF frame files. Adopted from RDS data set scripts.
make_nds_data.py: Script for copying LIGO frame files into the LOSC file system
The script picks out channels set by "tags", and copies
only those into the new frame files. This is intended to
create data sets to be used accessed by the LOSC NDS2 server
This diff is collapsed.
#!/usr/bin/env python
import numpy as np
import scipy.interpolate as interp
import scipy.signal as sig
import h5py as h5
import datetime
import argparse
import os, sys
import pickle
import os.path
#########################################################
### default values that seldom changes
#########################################################
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
SPREADSHEET_ID = '1eePtFtobKiKsTf26UZpt50KkfS8KNXRwTPnx9YxW9SM'
RANGE_NAME = "Longer list w/ ctrl sigs!A:E"
#########################################################
### main functions
#########################################################
def read_google_sheet():
"""
reading google spreadsheet for the latest channel lists
"""
# only locally import google related packages;
# so that other functions can be called in an environment w/o google
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('sheets', 'v4', credentials=creds)
# Call the Sheets API
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SPREADSHEET_ID,
range=RANGE_NAME).execute()
values = result.get('values', [])
# return lists of channels & desired fs'
nChan = len(values)-1
chan= []
note= []
cal = []
unit = []
fs = np.zeros(nChan, dtype=np.int)
cnt = 0
for i in range(nChan):
if len(values[i+1])==5:
chan.append(values[i+1][0])
fs[cnt] = values[i+1][1]
note.append(values[i+1][2])
cal.append(values[i+1][3])
unit.append(values[i+1][4])
cnt += 1
elif len(values[i+1])==4:
chan.append(values[i+1][0])
fs[cnt] = values[i+1][1]
note.append(values[i+1][2])
cal.append(values[i+1][3])
unit.append('')
cnt += 1
elif len(values[i+1]) == 3:
chan.append(values[i+1][0])
fs[cnt] = values[i+1][1]
note.append(values[i+1][2])
cal.append('')
unit.append('')
cnt += 1
elif len(values[i+1]) == 2:
chan.append(values[i+1][0])
fs[cnt] = values[i+1][1]
note.append('')
cal.append('')
unit.append('')
cnt += 1
else:
continue
fs=fs[:cnt]
return chan, fs, note, cal, unit
def fetch_data_nds(gps_start, duration, chan_list, fs_list, note_list, cal_list, unit_list,\
ifo='L1', data_dir='data/', data_file='', \
**kwargs):
# only import nds2 locally
# so that other functions can be called in an environment w/o nds2 client
import nds2
server = kwargs.pop('server', 'nds.ligo.caltech.edu') # 40m: 131.215.115.200
port = kwargs.pop('port', 31200)
allow_data_on_tape = kwargs.pop('allow_data_on_tape', '1')
n_chan_per_fetch = kwargs.pop('n_chan_per_fetch', 32) # fetch this many channels per connection
n_chan = len(fs_list)
n_fetch = np.int(np.ceil(np.float(n_chan)/np.float(n_chan_per_fetch)))
# chan_list does not contain IFO: part
chan_list_nds=[]
for i in range(len(chan_list)):
chan_list_nds.append('%s:%s'%(ifo, chan_list[i]))
if not os.path.exists(data_dir):
os.makedirs(data_dir)
if not data_file:
data_file='%s-%s_AUXR-%i-%i.h5'%(ifo[0], ifo, gps_start, duration)
# use 'a' so that if we want to include new channels, we can just add them to the existing h5 file
fid=h5.File(data_dir + data_file, 'a')
fid.attrs['gps_start']=gps_start
fid.attrs['duration']=duration
fid.attrs['ifo']=ifo
for i in range(n_fetch):
# reset connection to avoid timing-out
# print('resetting connection!', i)
conn = nds2.connection(server, port)
conn.set_parameter('ALLOW_DATA_ON_TAPE', allow_data_on_tape)
idx0 = np.int(i*n_chan_per_fetch)
idx1 = np.int((i+1)*n_chan_per_fetch)
chan_list_loop=chan_list_nds[idx0:idx1]
fs_list_loop=fs_list[idx0:idx1]
note_list_loop=note_list[idx0:idx1]
cal_list_loop=cal_list[idx0:idx1]
unit_list_loop=unit_list[idx0:idx1]
n_loop = len(chan_list_loop)
try:
# first try to grab all data in once
data_buffer = conn.fetch(gps_start, gps_start+duration, chan_list_loop)
for j in range(n_loop):
key = chan_list_loop[j][3:] # drop the IFO: part
key0 = key[:3]
# first group channels together according to its first three letters
# e.g., ASC, LSC, PEM, etc.
if key0 not in fid.keys():
grp0=fid.create_group(key0)
else:
grp0=fid[key0]
# then save each channel as a group under grp0
# if the channel has been fetched before, delete the old data
# and replace it with the new ones
if key in grp0.keys():
del grp0[key]
grp=grp0.create_group(key)
data=data_buffer[j].data
fs0 = data_buffer[j].sample_rate
fs = fs_list_loop[j]
valid = checkAllZero(data)
# # debug
# print('fetched!', key, fs0, fs, valid)
if valid>0:
if fs<fs0:
data=myDecimateFunc(data, fs0, fs)
else:
fs=int(fs0)
grp.create_dataset('data', shape=data.shape, dtype=data.dtype, data=data, \
compression="gzip", compression_opts=9)
grp.attrs['fs']=fs
grp.attrs['fs0']=fs0
grp.attrs['note']=note_list_loop[j]
grp.attrs['cal']=cal_list_loop[j]
grp.attrs['unit']=unit_list_loop[j]
grp.attrs['valid']=1
grp.attrs['gps_start']=gps_start
grp.attrs['duration']=duration
else:
grp.attrs['valid']=0
except RuntimeError:
# at least one channel is problematic
# fetch each data individually
print('invalid channel found when bulk fetching!')
# first try resetting connection
conn = nds2.connection(server, port)
conn.set_parameter('ALLOW_DATA_ON_TAPE', allow_data_on_tape)
for j in range(n_loop):
# get each channel individually
chan_sing = [chan_list_loop[j]]
fs = fs_list_loop[j]
key = chan_sing[0][3:]
key0= key[:3]
if key0 not in fid.keys():
grp0=fid.create_group(key0)
else:
grp0=fid[key0]
if key in grp0.keys():
del grp0[key]
grp=grp0.create_group(key)
try:
data_buffer = conn.fetch(gps_start, gps_start+duration, chan_sing)
data=data_buffer[0].data
fs0=data_buffer[0].sample_rate
valid=checkAllZero(data)
# # debug
# print('fetched!', key, fs0, fs, valid)
if valid>0:
if fs<fs0:
data=myDecimateFunc(data, fs0, fs)
else:
fs=int(fs0)
grp.create_dataset('data', shape=data.shape, dtype=data.dtype, data=data, \
compression="gzip", compression_opts=9)
grp.attrs['fs']=fs
grp.attrs['fs0']=fs0
grp.attrs['note']=note_list_loop[j]
grp.attrs['cal']=cal_list_loop[j]
grp.attrs['unit']=unit_list_loop[j]
grp.attrs['valid']=1
grp.attrs['gps_start']=gps_start
grp.attrs['duration']=duration
else:
grp.attrs['valid']=0
except RuntimeError:
print('WARNING!!!', key, 'does not exist!')
grp.attrs['valid']=0
fid.close()
def myDecimateFunc(xx, fs0, fs):
down_factor = int(fs0/fs)
fir_aa = sig.firwin(20*down_factor+2, 0.8/down_factor, \
window='blackmanharris')
yy=sig.decimate(xx, down_factor, \
ftype=sig.dlti(fir_aa[1:-1], 1), zero_phase=True)
return yy
def checkAllZero(xx, cut=1.e-25):
if np.max(np.abs(xx))<=cut:
valid=0
else:
valid=1
return valid
if __name__=='__main__':
# parse some args here
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type=int, default=1,
help='>0 for real data fetching; \
<=0 for debugging')
# nds
parser.add_argument('--gps-start', type=int, default=1186736418,
help='GPS starting time of the dataset; default to Aug 14, 2017, 09:00:00 UTC')
parser.add_argument('--duration', type=int, default=1,
help='Total duration of the datasets in sec')
parser.add_argument('--ifo', type=str, default='L1',
help='Which IFO?')
parser.add_argument('--server', type=str, default='nds.ligo.caltech.edu',
help='NDS server')
parser.add_argument('--port', type=int, default=31200,
help='NDS port')
parser.add_argument('--allow-data-on-tape', type=str, default='1',
help='Allow data on tape?')
# data output
parser.add_argument('--data-dir', type=str, default='data/',
help='Directory to store the data')
parser.add_argument('--data-file', type=str, default='',
help='File name (should include .h5)')
kwargs = parser.parse_args()
# convert it to a dict
kwargs=vars(kwargs)
# read data from google sheet
chan_list, fs_list, note_list, cal_list, unit_list = read_google_sheet()
# for i in range(len(chan_list)):
# print(chan_list[i], fs_list[i], note_list[i], cal_list[i])
# # adding GW chan
# chan_list = ['GDS-CALIB_STRAIN'] + chan_list
# fs_list=np.hstack([512, fs_list])
mode = kwargs['mode']
if mode>0:
# do serieous fetching;
# making sure that the starting time is an integer number of 64 s
# and the duration per file is 64 s
# fetch data from NDS
gps_start = kwargs.pop('gps_start')
duration = kwargs.pop('duration')
ifo = kwargs.pop('ifo')
data_dir = kwargs.pop('data_dir')
data_file = kwargs.pop('data_file')
gps_start_64 = int(np.floor(gps_start/64.)*64)
gps_end_64 = int(np.ceil((gps_start+duration)/64.)*64)
gps_start_list = np.arange(gps_start_64, gps_end_64, 64)
nFiles = len(gps_start_list)
print(gps_start_64, gps_end_64, nFiles)
for i in range(nFiles):
fetch_data_nds(int(gps_start_list[i]), 64, chan_list, fs_list, note_list, cal_list, unit_list,\
ifo=ifo, data_dir=data_dir, data_file=data_file, \
**kwargs)
else:
# debugging
# do not force multiples of 64 sec per file
# fetch data from NDS
gps_start = kwargs.pop('gps_start')
duration = kwargs.pop('duration')
ifo = kwargs.pop('ifo')
data_dir = kwargs.pop('data_dir')
data_file = kwargs.pop('data_file')
fetch_data_nds(gps_start, duration, chan_list, fs_list, note_list, cal_list, unit_list,\
ifo=ifo, data_dir=data_dir, data_file=data_file, \
**kwargs)
This diff is collapsed.
"""
Script to generate AUXR data for three hour data release.
Adopted from RDS script by Greg Mendell
"""
import os
import glob
from subprocess import call
import csv
# -- New type
frtype = 'AUXR'
# -- Set top level directory where new files should live
outroot = './AUXR'
#-- Set the start and stop GPS times
dur = int(3*3600/2) # 1.5 hours
t0 = 1186741861 # centered on GW170814
start = t0 - dur
end = t0 + dur
# -- Read in channel list
chanlist = []
with open('chanlist.csv', newline='') as csvfile:
chanreader = csv.reader(csvfile)
for row in chanreader:
channame = row[0]
if channame is '':
continue
else:
channame = '*:' + channame
chanlist.append(channame)
chanstr = ' '.join(chanlist)
print(chanstr)
#-- Tags are used to select channels we want. See FrCopy documentation
#tags = '*PEM-*_SEIS_*_*_DQ'
ifoList = {'H1', 'L1'}
for ifo in ifoList:
rootdir = '/archive/frames/O2/raw/{ifo}'.format(ifo=ifo, obs=ifo[0])
print(rootdir)
# -- Get list of directories
dirList = glob.glob(rootdir+'/*')
print(dirList)
for directory in dirList:
fileList = glob.glob(directory + '/*.gwf')
head, tail = os.path.split(directory)
dirspl = tail.split('-')
newdir = '{zero}-{ifo}_{frtype}-{two}'.format(zero=dirspl[0], ifo=ifo, frtype=frtype, two=dirspl[2])
outdir = os.path.join(outroot, ifo, newdir)
print(outdir)
for filename in fileList:
spl = os.path.basename(filename).split('-')
gps = int(spl[2])
if gps>=start and gps<=end:
if not os.path.exists(outdir):
os.makedirs(outdir)
outname = os.path.join(outdir, '{obs}-{ifo}_{frtype}-{gps}-{size}'.format(obs=ifo[0], ifo=ifo,
frtype=frtype, gps=gps, size=spl[3]))
cmd = ['FrCopy', '-i', filename, '-o', outname, '-t', chanstr]
call(cmd)
"""
Script for copying LIGO frame files into the LOSC file system
The script picks out channels set by "tags", and copies
only those into the new frame files. This is intended to
create data sets to be used accessed by the LOSC NDS2 server
2017 by Jonah Kanner
"""
import os
import glob
from subprocess import call
# -- New type
frtype = 'LOSCSEIS'
# -- Set top level directory where new files should live
outroot = '/loscdata/archive/NDS2/o1seis'
#-- Set the start and stop GPS times
start = 1130132928
end = start + 3600*24*3
#-- Tags are used to select channels we want. See FrCopy documentation
tags = '*PEM-*_SEIS_*_*_DQ'
ifoList = {'L1'}
for ifo in ifoList:
rootdir = '/archive/frames/O1/raw/{ifo}'.format(ifo=ifo, obs=ifo[0])
print rootdir
# -- Get list of directories
dirList = glob.glob(rootdir+'/*')
print dirList
for directory in dirList:
fileList = glob.glob(directory + '/*.gwf')
head, tail = os.path.split(directory)
dirspl = tail.split('-')
newdir = '{zero}-{ifo}_{frtype}-{two}'.format(zero=dirspl[0], ifo=ifo, frtype=frtype, two=dirspl[2])
outdir = os.path.join(outroot, ifo, newdir)
print outdir
for filename in fileList:
spl = os.path.basename(filename).split('-')
gps = int(spl[2])
if gps>start and gps<end:
if not os.path.exists(outdir):
os.makedirs(outdir)
outname = os.path.join(outdir, '{obs}-{ifo}_{frtype}-{gps}-{size}'.format(obs=ifo[0], ifo=ifo,
frtype=frtype, gps=gps, size=spl[3]))