Sklearn discrete distributions + gradient boosted trees (!54) · Merge requests · lscsoft / iDQ

Patrick Godwin requested to merge sklearn_discrete_dist into master Aug 29, 2018

This merge request serves two purposes:

Allows hyperparameter tuning via both types of cross-validation schemes with discrete parameters. Before, this wasn't possible since for the randomized sampling I wasn't using discrete distributions. For grid-based searches it created a regularly spaced grid but unless one was very careful with the spacing, it would not be integer spaced. This essentially checks whether the lower and upper bounds passed in are integers, and if so will do the right thing. In order to force hyperparameters to sample from continuous distributions and you want to pass in integer spaced quantities, just pass in a float, like 1. or 1.0.
Creates a new sklearn-based classifier called GradientBoostedTree(). The parameters passed in are similar to a random forest but uses gradient boosting on weak ensembles instead. It could provide better performance than a random forest and thought of it today on the call so I went along and created a wrapper for it. Tests work fine on my end.

Sklearn discrete distributions + gradient boosted trees