calibration development (attempt 2)
Note: this merge should be morally equivalent to !12 (closed), but with a cleaner git log.
This merge will bring in the recent developments within calibration. In particular, FixedBandwidth1DKDE has been shown to scale reasonably well to large numbers of samples and to provide faithful error estimates for the value of the KDE (the pdf, not the cdf).
I've attached a 2 files showing the coverage behavior for error estimates using 2 different "known distributions", a beta distribution and the normal distribution.
beta distribution (alpha=2, beta=5)
normal distribution (mean=0.5, stdv=0.1)
Such plots can be generated relatively quickly with the included sanitycheck_calibrate-coverage
via
sanitycheck_calibrate-coverage -v --size 100 --Ntrial 50 --num_points 101
Importantly, they show that the KDE representation matches the expectation from the true pdf (left-most panels) and that the error estimates have diagonal coverage plots. In particular, we show coverage plots for pdf(rank) for each rank separately as red lines in the two right panels. The saturation of these curves is controlled by the value of the pdf at that rank, so values of rank that will be "visited more often" are darker and coverage for ranks that are visited "less often" are lighter. The blue line averages the coverage for all ranks together, weighting the coverage for each rank by pdf(rank). We see that the blue and red coverage estimates typically stay within the expected 1-sigma error regions for the coverage plot (grey shaded region) based on simplistic binomial error estimates for a cumulative histogram.
One can show that the errors all scale together as expected when we increase the number of observations fed into the KDE (--size
).
As a last note, it behooves me to mention that the coverage plots measure the distribution of different KDE estimates from different realizations of the same number of samples independently drawn from the same parent distribution and smoothed using the same bandwidth. They do not include the bias introduced by the smoothing (expected to be small for large numbers of samples: bias~1/n whereas standard deviation ~1/sqrt{n}) nor do they reflect the fact that different realizations of samples might lead to different "optimal bandwidths" returned by the call to optimize()
. These possible sources of error may need to be investigate in the future.