Encoding error with non-text data
Created by: SeanDS
When I try to download a PDF from the DCC, I get the following error (both Python 2 and 3):
>>> from ligo.org import request
>>> request("https://dcc.ligo.org/DocDB/0077/T1100616/002/T1100616-v2.pdf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "ligo/org/ecp.py", line 330, in request
return out.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 10: invalid continuation byte
Seems like request
should not assume UTF-8 encoding, or at least it should try to identify the encoding and handle it appropriately.