Jsonify results

Yeah @colm.talbot I'm fine with that. Do you want to keep the save='hdf5' flag that @gregory.ashton suggested in the run_sampler argument or add a format flag there too?

I think it makes sense for it to be save in that context.

I'm easy with exactly how it is implemented (i.e. save/format etc).

One suggestion - in the dump() method, we could add indent=2 (or some other integer). This "pretty-prints" the file meaning you can do

$ cat outdir/result.h5 | grep 'log_evidence'
"log_evidence": -248.49284903824

to quickly see what is in the file (there may be a better way to parse JSON files from the command line?).

This will however cause the file to be a little larger (by default everything is written on one line).

resolved all discussions

added 1 commit

7c0def71 - Change the hdf5 flag to extension flag

Compare with previous version

added 1 commit

c97507a6 - Fix the unit tests with the extension flag

Compare with previous version

added 1 commit

dd2fb0ba - Fix flake8 errors

Compare with previous version

added 1 commit

6a2814e8 - Add indent for nicer parsing

Compare with previous version

Is it worth also having the option to save the JSON file as a gzipped file if requested?

If having a zipped option is going to significantly delay this, I would recommend leaving that as a separate issue and getting this in sooner rather than later.

Another thing that needs consideration when saving floats to an ASCII file is truncation of the numbers at a lower precision than they exist as in binary. This can be problematic for parameters with a very large dynamic range or intrinsically small dynamic range (e.g. of you have a parameter that has a Gaussian prior with a mean of one and sigma of 1e-9, and the numbers are truncated after 9dp or less then you lose all information). What is the default number of dp output into the json file for floats?

@matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this out and AFAI can tell, there is not truncation with json.

For example

In [8]: x=np.random.normal(0, 1e-15, 100)
In [9]: x[:3]
Out[9]: array([-8.72958245e-17, -4.21664705e-16,  1.43150680e-15])
In [10]: json.dump(dict(x=list(x)), open('test.json', 'w+'))
In [12]: data_load = json.load(open('test.json', 'r'))
In [16]: data_load['x'] == x
In [17]: np.all(data_load['x'] == x)
Out[17]: True

The only real danger points are in the conversion to a list and the dump, but since the data are stored in "scientific" notation, e.g.

$ cat test,
{"x": [-8.729582451048275e-17, -4.216647050876875e-16, 1.4315068008292956e-15, .... ]}

you get as many digits in the mantissa as are stored in the list object itself

In [23]: x[0]
Out[23]: -8.729582451048275e-17

In [24]: list(x)[0]
Out[24]: -8.729582451048275e-17

So all in all I think the json use is quite safe with respect to this concern.

approved this merge request

Actually, I'm finding that it is currently failing with the message

00:59 bilby ERROR   : 

 Saving the data has failed with the following message:
 Object of type 'complex' is not JSON serializable

Which is being caused by the key

    "L1_matched_filter_snr": {
      "0":

so I think we need to add a step to serialise a complex number, or do we need to store the complex number. @colm.talbot any ideas?

So I think there is actually an easier way to handle the numpy array and complex issue, by giving an encoder/decoder.

For example, something like this

diff --git a/bilby/core/result.py b/bilby/core/result.py
index 3813e66..b37b271 100644
--- a/bilby/core/result.py
+++ b/bilby/core/result.py
@@ -20,6 +20,15 @@ from .utils import (logger, infer_parameters_from_function,
 from .prior import Prior, PriorDict, DeltaFunction
 
 
+class NumpyAndComplexEncoder(json.JSONEncoder):
+    def default(self, obj):
+        if isinstance(obj, np.ndarray):
+            return obj.tolist()
+        if isinstance(obj, complex):
+            return (obj.real, obj.imag)
+        return json.JSONEncoder.default(self, obj)
+
+
 def result_file_name(outdir, label, extension='json'):
     """ Returns the standard filename used for a result file
 
@@ -410,7 +419,7 @@ class Result(object):
             if extension == 'hdf5':
                 deepdish.io.save(file_name, dictionary)
             else:
-                json.dump(dictionary, open(file_name, 'w'), indent=2)
+                json.dump(dictionary, open(file_name, 'w'), indent=2, cls=NumpyEncoder)
         except Exception as e:
             logger.error("\n\n Saving the data has failed with the "
                          "following message:\n {} \n\n".format(e))

might working (haven't tested reading it back in yet)

@matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this >out and AFAI can tell, there is not truncation with json.

@gregory.ashton I was slightly more concerned about, e.g. x=np.random.normal(1, 1e-15, 100), were the numbers can't be stored in scientific notation. But testing with this, as you have above, suggests that it's also not an issue and output text is stored to 16 dp, which is the same precision as the binary floats any way.

mentioned in merge request !382 (merged)

approved this merge request

resolved all discussions

merged

mentioned in commit c6c95161

mentioned in issue pesummary#91 (closed)

Jsonify results

Merged by Moritz Huebner 6 years ago (Feb 25, 2019 11:33pm UTC) 6 years ago

Activity

Admin message

Jsonify results

Merge request reports

Merged by Moritz Huebner 6 years ago (Feb 25, 2019 11:33pm UTC) 6 years ago

Activity