Jsonify results
Following the discussion in #301 (closed), I've implemented default saving for the results object as a json file. The only functions I've changed are read_in_result
, save_to_file
, and the filename
function. I have included a flag that if save='hdf5'
in the arguments of run_sampler
, the results will be saved in the standard hdf5 format.
Merge request reports
Activity
- Automatically resolved by Sylvia Biscoveanu
Yeah @colm.talbot I'm fine with that. Do you want to keep the
save='hdf5'
flag that @gregory.ashton suggested in therun_sampler
argument or add aformat
flag there too?I think it makes sense for it to be
save
in that context.Edited by Colm TalbotI'm easy with exactly how it is implemented (i.e. save/format etc).
One suggestion - in the
dump()
method, we could addindent=2
(or some other integer). This "pretty-prints" the file meaning you can do$ cat outdir/result.h5 | grep 'log_evidence' "log_evidence": -248.49284903824
to quickly see what is in the file (there may be a better way to parse JSON files from the command line?).
This will however cause the file to be a little larger (by default everything is written on one line).
- Resolved by Moritz Huebner
Another thing that needs consideration when saving floats to an ASCII file is truncation of the numbers at a lower precision than they exist as in binary. This can be problematic for parameters with a very large dynamic range or intrinsically small dynamic range (e.g. of you have a parameter that has a Gaussian prior with a mean of one and sigma of 1e-9, and the numbers are truncated after 9dp or less then you lose all information). What is the default number of dp output into the json file for floats?
@matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this out and AFAI can tell, there is not truncation with json.
For example
In [8]: x=np.random.normal(0, 1e-15, 100) In [9]: x[:3] Out[9]: array([-8.72958245e-17, -4.21664705e-16, 1.43150680e-15]) In [10]: json.dump(dict(x=list(x)), open('test.json', 'w+')) In [12]: data_load = json.load(open('test.json', 'r')) In [16]: data_load['x'] == x In [17]: np.all(data_load['x'] == x) Out[17]: True
The only real danger points are in the conversion to a list and the
dump
, but since the data are stored in "scientific" notation, e.g.$ cat test, {"x": [-8.729582451048275e-17, -4.216647050876875e-16, 1.4315068008292956e-15, .... ]}
you get as many digits in the mantissa as are stored in the
list
object itselfIn [23]: x[0] Out[23]: -8.729582451048275e-17 In [24]: list(x)[0] Out[24]: -8.729582451048275e-17
So all in all I think the
json
use is quite safe with respect to this concern.Actually, I'm finding that it is currently failing with the message
00:59 bilby ERROR : Saving the data has failed with the following message: Object of type 'complex' is not JSON serializable
Which is being caused by the key
"L1_matched_filter_snr": { "0":
so I think we need to add a step to serialise a complex number, or do we need to store the complex number. @colm.talbot any ideas?
So I think there is actually an easier way to handle the numpy array and complex issue, by giving an encoder/decoder.
For example, something like this
diff --git a/bilby/core/result.py b/bilby/core/result.py index 3813e66..b37b271 100644 --- a/bilby/core/result.py +++ b/bilby/core/result.py @@ -20,6 +20,15 @@ from .utils import (logger, infer_parameters_from_function, from .prior import Prior, PriorDict, DeltaFunction +class NumpyAndComplexEncoder(json.JSONEncoder): + def default(self, obj): + if isinstance(obj, np.ndarray): + return obj.tolist() + if isinstance(obj, complex): + return (obj.real, obj.imag) + return json.JSONEncoder.default(self, obj) + + def result_file_name(outdir, label, extension='json'): """ Returns the standard filename used for a result file @@ -410,7 +419,7 @@ class Result(object): if extension == 'hdf5': deepdish.io.save(file_name, dictionary) else: - json.dump(dictionary, open(file_name, 'w'), indent=2) + json.dump(dictionary, open(file_name, 'w'), indent=2, cls=NumpyEncoder) except Exception as e: logger.error("\n\n Saving the data has failed with the " "following message:\n {} \n\n".format(e))
might working (haven't tested reading it back in yet)
Edited by Gregory Ashton@matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this >out and AFAI can tell, there is not truncation with json.
@gregory.ashton I was slightly more concerned about, e.g.
x=np.random.normal(1, 1e-15, 100)
, were the numbers can't be stored in scientific notation. But testing with this, as you have above, suggests that it's also not an issue and output text is stored to 16 dp, which is the same precision as the binary floats any way.mentioned in merge request !382 (merged)
mentioned in commit c6c95161
mentioned in issue pesummary#91 (closed)