DiscreteCalibrationMap
OVL
(and friends) will produce a finite set of ranks and therefore the assumptions behind CalibrationMap
and FixedBandwidth1DKDE
are not a perfect match. I propose we make OVL
(and friends) instead use discrete versions, which should support the same functionality as the other ones so everything else works.
The basic idea is to replace the FixedBandwdith1DKDE
with something that wraps around collections.defaultdict(int)
for the pdf. We then keep track of an overall count of samples and compute the pdf and cdf on the fly by summing and dividing as needed. Variance estimates are easy (we just have binomial error estimates and slap those into the beta-distributions). If we do this right, it should be pretty easy to just slot this in place as needed.
We will need to either modify CalibrationMapReporter
or add a DiscreteCalibrationMapReporter
because of the different underlying data structures. We may just want to write something like the quiver helper functions but for the KDE objects that is smart enough to know which is which and slot things into place accordingly. However, it may be simpler to just write a separate Reporter and use a PickleReporter
for runs that use Classifiers that use different types of CalibrationMaps.
We should not pursue this further until we can confirm that the mis-calibration issues seen in OVL batch jobs is indeed due to the finite minb
used and that decreasing minb
(and increasing the number of ranks used) helps to resolve the problem.