Investigate modifying ligolw_publish_threaded_dqxml_dqsegdb to accept https connections to ligolw_dtd.txt
Currently, if ligolw_publish_threaded_dqxml_dqsegdb
tries to read http://ldas-sw.ligo.caltech.edu/doc/ligolwAPI/html/ligolw_dtd.txt
and gets redirected to an https connection, the read fails, like this:
Error: connection to host "ldas-sw.ligo.caltech.edu" failed for http URL "http://ldas-sw.ligo.caltech.edu/doc/ligolwAPI/html/ligolw_dtd.txt": Connection timed out
Traceback (most recent call last):
File "/usr/bin/ligolw_publish_threaded_dqxml_dqsegdb", line 240, in <module>
result=InsertMultipleDQXMLFileThreaded(infiles,logger,options.segment_url,hackDec11=False,debug=local_debug,threads=thread_count)
File "/usr/lib/python3.6/site-packages/dqsegdb/apicalls.py", line 801, in InsertMultipleDQXMLFileThreaded
segment_md = setupSegment_md(filename,xmlparser,lwtparser,debug)
File "/usr/lib/python3.6/site-packages/dqsegdb/apicalls.py", line 738, in setupSegment_md
segment_md.parse(xmltext)
File "/usr/lib/python3.6/site-packages/dqsegdb/ldbd.py", line 286, in parse
ligolwtup = self.xmlparser(xml.encode("utf-8"))
pyRXPU.error: Error: Couldn't open dtd entity http://ldas-sw.ligo.caltech.edu/doc/ligolwAPI/html/ligolw_dtd.txt
in unnamed entity at line 1 char 115 of [unknown]
Couldn't open dtd entity http://ldas-sw.ligo.caltech.edu/doc/ligolwAPI/html/ligolw_dtd.txt
Parse Failed!
While there are many files that specifically point to the http version, so that needs to remain available, it would be good for resilience if this tool could access the file over https, as well.