KAGRA rsync finds extra flags and 2 files per day instead of 1
From a 2023.11.17 email from Robert Bruntz to Takahiro Yamamoto, titled "Two minor topics preventing publishing of KAGRA segments":
Publishing of KAGRA segments stopped automatically on Sept. 26, 2023, because of 2 minor issues with new KAGRA segment files that need some attention:
- Starting on Sept. 26, instead of there being exactly 6 new files to publish [1], there were 12 new files to publish [2] and a number of new files covering previous dates. The system that publishes KAGRA segments was originally expected to be replaced by a different publishing system using a different system of file generation, so it was intentionally made to be brittle and simply stop working if anything unusual happened, which is what happened when the unexpected segment files started showing up. Should all 12 of the flags listed in [1] and [2] be published every day? And if so, should all of the unpublished old segments for GRD_PEM_EARTHQUAKE_SEGMENT_UTC_[date].xml also be published? (Note that the issue of the new files for older dates (revised or corrected segments) is separate and is discussed in a separate email chain. Publishing of new files can continue independent of that topic.)
- Additionally, I noticed that the rsync system started pulling 2 xml files for each flag every day, with one being for the completed day and the other being for the new day, which had just started, with the one for the new day being replaced by rsync the following day. On investigating, I noticed that the second file is updated every 15 minutes, through the end of the day. Was the intention that the segments would be published every 15 minutes? If so, that will require some more significant changes to the publishing system, which we can implement, but it will take several steps and some time. In the meantime, would it be possible to either have the files that are updated every 15 minutes be created in a separate directory, outside /mnt/segments/, so that rsync doesn't see them, then copy them over to their directories in /mnt/segments/ after the files are completed for the day, or alternately go back to the system of generating a single file at the end of the day? The publishing system for KAGRA segments doesn't have checks to notice that it published a file that was later updated (e.g., if the filesystem got full, so some files only covered half of the day, and those were rsync'd at the end of the day, but they were later updated to cover the whole day), so partial-day files introduce the possibility of segments never being published, without anyone noticing. ...
[1] Those files are: K1-GRD_LOCKED_SEGMENT_UTC_[date].xml, K1-GRD_PEM_EARTHQUAKE_SEGMENT_UTC_[date].xml, K1-GRD_SCIENCE_MODE_SEGMENT_UTC_[date].xml, K1-GRD_UNLOCKED_SEGMENT_UTC_[date].xml, K1-OMC_OVERFLOW_OK_SEGMENT_UTC_[date].xml, and K1-OMC_OVERFLOW_VETO_SEGMENT_UTC_[date].xml.
Actually, all 6 files are expected, but only 5 of those files are published. GRD_PEM_EARTHQUAKE_SEGMENT_UTC_[date].xml is not published, because it was not part of the original list of segments requested to be published, and an email attempt to clarify this wasn't answered.[2] The new files are: K1-DAQ-IPC_ERROR_SEGMENT_UTC_[date].xml, K1-DET_FRAME_AVAILABLE_SEGMENT_UTC_[date].xml, K1-ETMX_OVERFLOW_OK_SEGMENT_UTC_[date].xml, K1-ETMX_OVERFLOW_VETO_SEGMENT_UTC_[date].xml, K1-ETMY_OVERFLOW_OK_SEGMENT_UTC_[date].xml, and K1-ETMY_OVERFLOW_VETO_SEGMENT_UTC_[date].xml.
Takahiro's reply:
- Six files in [1] have been provided since O3GK. So we would like to publish them. About GRD-PEM_EARTHQUAKE_SEGMENT_UTC_[data].xml, we don't need to publish this type of segment which was created before O4a start (I'm not sure about which is more hard to publish all existing XML including too old files or to publish only limited period).
The reason why I send e-mail on this August is that we found a code bug to provide 5 of 6 files in [1] (except GRD_SCIENCE_MODE_SEGMENT_UTC_[date].xml). So we planned to replace the correct information about these 5 xmls for the period of KAGRA’s O4a (1 month long). And also, I’m sorry for confusing you about the segments in [2]. They were defined after KAGRA’s O4a to evaluate O4a data. So finally we would like to upload them. But they are now still in confirming. I thought they were still provided in another directory as the test. Somehow we created them in the same directory as the production version of segments.
- We can revert providing way as you suggest (1 day cadence for rsync process). Segment files are provided with the 15min. cadence for KAGRA’s site commissioning. Of course we though they could be used for the future observing run. But, because KAGRA’s data haven’t been used for the gravitational wave searches yet, publishing KAGRA’s segments to DQSegDB with 15min. cadence may not be urgent task. We had better to ask LL working group or KAGRA DAC about the requirement of the cadence about KAGRA’s segments in O4b.
Note that as of 2023.12.26, text and XML files are still being created for the 6 additional flags in the same dirs as the original 6 flags, and KAGRA XML files are still being created at the start of the day and updated every 15 minutes, so instead of 12 new files when the daily rsync command runs, there are 48.