Re: why does leave one out cross validation have high variance



On Oct 27, 11:56 am, Shrihari <shrihari.vasude...@xxxxxxxxx> wrote:
Hi all,

I have a query regarding cross validation methods for estimating the
generalization error of a classifier.

Most prior literature says that leave-one-out CV (LOOCV) has low bias
and high variance. Low bias I completely understand. The thing I do
not understand is why it should have high variance - the only
explanation I have been able to gather thus far is that since the
models computed in each stage of the CV are very similar to each other
(obviously, because they only differ by 2 training data instances),
the variance is high (this is not obvious and in fact counter
intuitive as I would expect similar models to produce similar
results).

Could someone please explain this to me. Thanks for any help.

Regards
Shrihari

Think of this way. If you use 10 fold cross validation, you have 10
estimates of your statistic. We can call them X1,X2, . . .,X10. Each
of these is itself an average. So you would expected the 10
measurement to be normally distributed around the expected value <X>.


you would expect that values X1 through X10 to be similar and
approximately normally distributed around the mean <X>. But in LOO
CV, you have X1, X2, . . .,X_N where N is the number of data points in
your training sample. Each Xi is not an average but single
measurement. So the variance between each are much greater. The great
variance is often not a problem because what you really care about is
the estimate <X>. LOO CV usually gives the most accurate estimate of
<X>, i.e. it is the least biased.

.



Relevant Pages