Set random seeds for reproducibility
This merge requests adds optional random_seed
and random_state
kwargs to try to reproduce results when running various batch and streaming processes with the same configuration.
Adding the random_seed
kwarg to the [samples]
section of the .INI file sets a random seed via numpy.random.seed
to batch and streaming processes, e.g.:
2019-05-07 09:12:10,701 | test-train : INFO : setting random seed: 30
This for example will produce the same random times when run again with the same configuration.
The random_state
kwarg has been added to the scikit-learn classifiers following their API convention. If set, will set globally to all classifiers in the sklearn Pipeline object as well as RandomizedSearchCV for hyperparameter searches. To set this, we pass in random_state
to the relevant classifier section.
If you'd rather have random_state
renamed to random_seed
, that's fine by me too. I just did it to match their API.