Re: Comparing predictors

From: Ross Clement (clemenr_at_wmin.ac.uk)
Date: 06/24/04


Date: 24 Jun 2004 16:03:51 -0700

Lionel B <me@privacy.net> wrote in message news:<2k0ck4F16h2fdU1@uni-berlin.de>...
> Greetings,
>
> Suppose I have (jointly distributed, real-valued) random variables Y,
> X_1, ..., X_n and a real function f(x_1, ..., x_n) which is used to
> define the "predictor" Y' = f(X_1, ..., X_n) for Y. It is then standard
> to measure the quality of Y' by the mean square error E((Y'-Y)^2).
>
> Now I have the following situation: I have two such predictors; the
> first, derived from f(x_1, ..., x_n), say, is actually quadratic in the
> x_i (it is, in fact a simple least squares fit to a 2nd order
> polynomial). The other, derived from g(x_1, ..., x_n), say, is the
> output of a (trained) neural network.
>
> I now have the suspicion that the neural network predictor Y'' = g(X_1,
> ..., X_n) is actually doing "more-or-less the same thing" as the
> quadratic fit predictor Y' = f(X_1, ..., X_n). But how do I test this
> suspicion?
>
> So far the only way sensible way I can think of for comparing predictors
> is to measure their correlation. Indeed, in my case the correlation
> corr(Y',Y'') comes out at a significantly high (approx.) 0.8. But
> correlation alone doesn't somehow quite seem to confirm that the neural
> network really is doing more-or-less the same as the quadratic fit - it
> misses out on the joint distribution of the respective predictors with
> the independents X_1, ..., X_n.
>
> If this sounds confused, it is ... my real question is probably: what
> question should I be asking here?

Providing that your predictors aren't completely useless, there should
always be a strong correlation between their outputs, as they are
trying to predict the same value. If you think that your predictors
are doing "more or less the same thing", then not only the predicted
values, but the errors should also be correlated. So, you could
measure the correlation between (Y-Y') and (Y-Y''). If this is
strongly correlated, then there is reason to presume that the two
approaches have basic similarities. If there is low correlation then
you have failed to find evidence of similarity.

However, I don't think a high or low correlation between errors is
sufficient in itself to truly understand the relationships between
different algorithms for prediction/regression, though I don't yet
have a real solid basis for claiming this.

Cheers,

Ross-c