bulk transfer notes in place authored by James Clark's avatar James Clark
......@@ -69,6 +69,11 @@ After a short time, there is an ascii-dump of the frame cache at:
```
## Offline registration
### ER13 Test
Let's run a small-scale test on a single instrument's subset of ER13 data to
make sure everything works in the container:
First, create a **registration file** for a subset of ER13 for testing:
```
L-L1_HOFT_C00:
......@@ -116,6 +121,88 @@ Check they show up in the database (again, use the singularity container):
+--------------------+--------------+
```
### O3: the real thing
Now repeat this exercise for the data we're really interested in: h(t) for
Hanford, Livingston and Virgo from O3 until now. Once that initial dataset is
registered, we'll start the live registration in the next step.
Add a new scope:
```
rucio-admin scope add --account root --scope O3
Added new scope to account: O3-root
```
As before, set up a registration file. We have 3 datasets, corresponding to 3
frame types. The `gwrucio_registrar` tool can handle this with 3 sections in
the reg-file `O3-HOFT.yml`:
```
H-H1_HOFT_C00:
scope: "O3"
regexp: "H-H1_HOFT_C00"
minimum-gps: 1238163456
maximum-gps: 2000000000
rse: LIGO-CIT-ARCHIVE
L-L1_HOFT_C00:
scope: "O3"
regexp: "L-L1_HOFT_C00"
minimum-gps: 1238163456
maximum-gps: 2000000000
rse: LIGO-CIT-ARCHIVE
V-V1Online:
scope: "O3"
regexp: "V-V1Online"
minimum-gps: 1238162000
maximum-gps: 2000000000
rse: LIGO-CIT-ARCHIVE
```
Where:
* `minimum-gps`: GPS start time of the first frame in each dataset already at LIGO-CIT
* `maximum-gps`: an arbitrarily large time to catch all data available up to
the time we execute the registration.
Then execute a similar call to `gwrucio_registrar` as before:
```
singularity exec --bind /archive ../gwrucio-latest.simg gwrucio_registrar -r O3-HOFT.yml daemon --run-once /home/jclark/Projects/rucio-O3/CNAF/diskcache/frame_cache_dump
2019-05-03 00:30:13,293 INFO Starting gwrucio_registrar as daemon
/usr/bin/gwrucio_registrar:188: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
rset = yaml.load(stream)
2019-05-03 00:30:13,296 INFO V-V1Online: reading diskcache [/home/jclark/Projects/rucio-O3/CNAF/diskcache/frame_cache_dump]
2019-05-03 00:30:13,299 INFO H-H1_HOFT_C00: reading diskcache [/home/jclark/Projects/rucio-O3/CNAF/diskcache/frame_cache_dump]
2019-05-03 00:30:13,301 INFO L-L1_HOFT_C00: reading diskcache [/home/jclark/Projects/rucio-O3/CNAF/diskcache/frame_cache_dump]
2019-05-03 00:30:13,303 INFO --------------------------------------------------
2019-05-03 00:30:13,303 INFO V-V1Online: looking for new data
2019-05-03 00:30:33,638 INFO 1370 new files to register
2019-05-03 00:30:33,639 INFO Computing file checksums
2019-05-03 00:32:18,326 INFO Time spent on checksums: 1.74 mins [104.69 s]
2019-05-03 00:32:20,465 INFO RSE write blacklisted, no replication rule
2019-05-03 00:32:20,525 INFO Registering files
2019-05-03 00:34:11,301 INFO Files registered
2019-05-03 00:34:11,305 INFO H-H1_HOFT_C00: looking for new data
2019-05-03 00:34:21,747 INFO 676 new files to register
2019-05-03 00:34:21,747 INFO Computing file checksums
2019-05-03 00:41:14,876 INFO Time spent on checksums: 6.89 mins [413.13 s]
2019-05-03 00:41:14,909 INFO RSE write blacklisted, no replication rule
2019-05-03 00:41:14,954 INFO Registering files
2019-05-03 00:42:01,274 INFO Files registered
2019-05-03 00:42:01,276 INFO L-L1_HOFT_C00: looking for new data
2019-05-03 00:42:11,161 INFO 674 new files to register
2019-05-03 00:42:11,162 INFO Computing file checksums
2019-05-03 00:55:20,747 INFO Time spent on checksums: 13.16 mins [789.59 s]
2019-05-03 00:55:20,846 INFO RSE write blacklisted, no replication rule
2019-05-03 00:55:20,893 INFO Registering files
2019-05-03 00:56:07,945 INFO Files registered
2019-05-03 00:56:07,947 INFO Total uptime: 1554.6533 sec.
```
### Batch Registration
In this case, the number of files to register is still manageable for a single
process. For larger datasets, one can instead split the registration process
into smaller sets of files and launch an HTCondor workflow using [`gwrucio_pipe`](https://git.ligo.org/james-clark/gwrucio/blob/master/bin/gwrucio_pipe).
An example will be provided elsewhere.
## Online registration