Skip to content

Set random seeds for reproducibility

Patrick Godwin requested to merge random_seeds into master

This merge requests adds optional random_seed and random_state kwargs to try to reproduce results when running various batch and streaming processes with the same configuration.

Adding the random_seed kwarg to the [samples] section of the .INI file sets a random seed via numpy.random.seed to batch and streaming processes, e.g.:

2019-05-07 09:12:10,701 | test-train : INFO : setting random seed: 30

This for example will produce the same random times when run again with the same configuration.

The random_state kwarg has been added to the scikit-learn classifiers following their API convention. If set, will set globally to all classifiers in the sklearn Pipeline object as well as RandomizedSearchCV for hyperparameter searches. To set this, we pass in random_state to the relevant classifier section.

If you'd rather have random_state renamed to random_seed, that's fine by me too. I just did it to match their API.

Merge request reports