Saving data
A wish-list for things to change in how we save data. This should probably be at most v 0.2.
-
Ensure that the h5
file does not just pickle the data (see #73 (closed) , and maybe #72 (closed)). Currently we just dump theResult()
object into the pickle, whichdeepdish
does not know how to save so it just pickles it. We need to rewrite it as a dictionary. -
Reduce the saved data filesize by thinking about what we want to save - for example we currently save the samples twice, once in an array and once in a data frame -
maybe separate the output: save the samples in a separate data frame and the Results()
(which contain the logz and details of the run). The upside is this might reduce the filesizes and allow quick concatenation of samples. The downside is that samples can get separated from information about how they where produced. -
Add some option to save as a text file for when people inevitably can't handle h5 files -
Add a help tutorial for the saved data, noting things like how we save the data and tools such as ddls
see discussion in here which can be used to quickly check what data is saved. -
Add labels to the saved prior.txt
file and implement loading such a file (not sure if this last part is already done?)
Feel free to add other things as well. I'm not actively working on any of this as I don't think it is urgent (if you use the same version of python it's fine), but its a good to gather everything together in one place.
Edited by Gregory Ashton