Skip to content
Snippets Groups Projects

Jsonify results

Merged Sylvia Biscoveanu requested to merge sylvia.biscoveanu/bilby:jsonify_results into master
All threads resolved!

Following the discussion in #301 (closed), I've implemented default saving for the results object as a json file. The only functions I've changed are read_in_result, save_to_file, and the filename function. I have included a flag that if save='hdf5' in the arguments of run_sampler, the results will be saved in the standard hdf5 format.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Yeah @colm.talbot I'm fine with that. Do you want to keep the save='hdf5' flag that @gregory.ashton suggested in the run_sampler argument or add a format flag there too?

  • I think it makes sense for it to be save in that context.

    Edited by Colm Talbot
  • I'm easy with exactly how it is implemented (i.e. save/format etc).

    One suggestion - in the dump() method, we could add indent=2 (or some other integer). This "pretty-prints" the file meaning you can do

    $ cat outdir/result.h5 | grep 'log_evidence'
    "log_evidence": -248.49284903824

    to quickly see what is in the file (there may be a better way to parse JSON files from the command line?).

    This will however cause the file to be a little larger (by default everything is written on one line).

  • Sylvia Biscoveanu resolved all discussions

    resolved all discussions

  • added 1 commit

    • 7c0def71 - Change the hdf5 flag to extension flag

    Compare with previous version

  • added 1 commit

    • c97507a6 - Fix the unit tests with the extension flag

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    • 6a2814e8 - Add indent for nicer parsing

    Compare with previous version

  • Colm Talbot
  • Is it worth also having the option to save the JSON file as a gzipped file if requested?

  • If having a zipped option is going to significantly delay this, I would recommend leaving that as a separate issue and getting this in sooner rather than later.

  • Another thing that needs consideration when saving floats to an ASCII file is truncation of the numbers at a lower precision than they exist as in binary. This can be problematic for parameters with a very large dynamic range or intrinsically small dynamic range (e.g. of you have a parameter that has a Gaussian prior with a mean of one and sigma of 1e-9, and the numbers are truncated after 9dp or less then you lose all information). What is the default number of dp output into the json file for floats?

  • @matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this out and AFAI can tell, there is not truncation with json.

    For example

    In [8]: x=np.random.normal(0, 1e-15, 100)
    In [9]: x[:3]
    Out[9]: array([-8.72958245e-17, -4.21664705e-16,  1.43150680e-15])
    In [10]: json.dump(dict(x=list(x)), open('test.json', 'w+'))
    In [12]: data_load = json.load(open('test.json', 'r'))
    In [16]: data_load['x'] == x
    In [17]: np.all(data_load['x'] == x)
    Out[17]: True

    The only real danger points are in the conversion to a list and the dump, but since the data are stored in "scientific" notation, e.g.

    $ cat test,
    {"x": [-8.729582451048275e-17, -4.216647050876875e-16, 1.4315068008292956e-15, .... ]}

    you get as many digits in the mantissa as are stored in the list object itself

    In [23]: x[0]
    Out[23]: -8.729582451048275e-17
    
    In [24]: list(x)[0]
    Out[24]: -8.729582451048275e-17

    So all in all I think the json use is quite safe with respect to this concern.

  • Gregory Ashton approved this merge request

    approved this merge request

  • Actually, I'm finding that it is currently failing with the message

    00:59 bilby ERROR   : 
    
     Saving the data has failed with the following message:
     Object of type 'complex' is not JSON serializable 

    Which is being caused by the key

        "L1_matched_filter_snr": {
          "0": 

    so I think we need to add a step to serialise a complex number, or do we need to store the complex number. @colm.talbot any ideas?

  • So I think there is actually an easier way to handle the numpy array and complex issue, by giving an encoder/decoder.

    For example, something like this

    diff --git a/bilby/core/result.py b/bilby/core/result.py
    index 3813e66..b37b271 100644
    --- a/bilby/core/result.py
    +++ b/bilby/core/result.py
    @@ -20,6 +20,15 @@ from .utils import (logger, infer_parameters_from_function,
     from .prior import Prior, PriorDict, DeltaFunction
     
     
    +class NumpyAndComplexEncoder(json.JSONEncoder):
    +    def default(self, obj):
    +        if isinstance(obj, np.ndarray):
    +            return obj.tolist()
    +        if isinstance(obj, complex):
    +            return (obj.real, obj.imag)
    +        return json.JSONEncoder.default(self, obj)
    +
    +
     def result_file_name(outdir, label, extension='json'):
         """ Returns the standard filename used for a result file
     
    @@ -410,7 +419,7 @@ class Result(object):
                 if extension == 'hdf5':
                     deepdish.io.save(file_name, dictionary)
                 else:
    -                json.dump(dictionary, open(file_name, 'w'), indent=2)
    +                json.dump(dictionary, open(file_name, 'w'), indent=2, cls=NumpyEncoder)
             except Exception as e:
                 logger.error("\n\n Saving the data has failed with the "
                              "following message:\n {} \n\n".format(e))

    might working (haven't tested reading it back in yet)

    Edited by Gregory Ashton
  • @matthew-pitkin this is a good point, and one @joshua.willis raised the other day at lunch. I checked this >out and AFAI can tell, there is not truncation with json.

    @gregory.ashton I was slightly more concerned about, e.g. x=np.random.normal(1, 1e-15, 100), were the numbers can't be stored in scientific notation. But testing with this, as you have above, suggests that it's also not an issue and output text is stored to 16 dp, which is the same precision as the binary floats any way.

  • Gregory Ashton mentioned in merge request !382 (merged)

    mentioned in merge request !382 (merged)

  • Moritz Huebner approved this merge request

    approved this merge request

  • Moritz Huebner resolved all discussions

    resolved all discussions

  • Moritz Huebner mentioned in commit c6c95161

    mentioned in commit c6c95161

  • Please register or sign in to reply
    Loading