Re: predicting a difference of two variables
- From: "David Jones" <dajxxx@xxxxxxxxx>
- Date: Fri, 17 Nov 2006 15:30:21 -0000
Reef Fish wrote:
David Jones wrote:(y1-y2)
Beliavsky wrote:
Suppose one wants to predict (y1-y2) using variables x1, x2, x3,
... . If one uses all the predictors x in the regression, it does
not matter if one
(1) fits separate regression models for y1 and y2 and differences
the predictions or
(2) models (y1-y2) directly
In fact it does matter: approach (1) cannot produce estimates of
variances (or covariances) of the differences of the regression
parameters unless some extra data analysis is done, or unless some
extra assumptions are made. Neither can it produce confidence
intervals for the predicted difference.
But that's not the main difference in the two approaches.
(1) is the naive way of doing a multivariate regression when one has
two dependent variables on the same set of x's.
A MULTIVARIATE regression should be done, which not only
allows inferences on each y separately, but also provided the
covariance information between the ERRORS which is absent
in treating (1) as two univariate regression.
(2) is still a univariate regression of (Y1-Y2) on x1, x2, and x3.
The question here relates to the model for the dependence in the
regression errors for the two quantities. You can only model
if the observations effectively occur as pairs at least sometimes.
You may want to think about the case where there is only partial
overlap in the two datasets. If you are prepared to assume
independence of the regression errors some calculations using the
separately fitted models might be done.
Why go through all of that when all the relevant hypotheses
(assumption that can be tested) are already imbedded in the
multivariate regression
model?
It depends on both how much knowledge the OP has and on how much
effort the OP is prepared to put in. If the necessary assumptions for
a full multivariate analysis are demonstrably untrue, the neccessary
assumptions for a regression analysis of (y1-y2) may still be valid
and hence can lead to a valid data analysis, even if it is potentially
inefficient compared to an analysis specially developed to accomodate
the problematic features of the full dataset. It would depend on the
actual dataset whether there are two many unpaired observations of y1
and y2 for that infioramtion to be thrown away.
However if you are prepared to make the assumption then
you can improve over the usual regression estimates by fitting the
separate models jointly (see "seemingly unrelated regressions").
That is a DIFFERENT multivariate procedure altogether. The
"seemingly unrelated regressions" are MULTIVARIATE regression
models that make use of DIFFERENT sets of X predictors (hence
the "seemingly unrelated" term).
They are logically the same when treated as being within a class of
MULTIVARIATE regression models that allow use of different sets of X
predictors, where the sets of predictors may or may not completely
overlap, partly overlap or be entirely separate. The "seemingly
unrelated" term of course relates to one extreme of this. The OP's
scenario potentially involves fitting a model with overlapping
predictors, once a decision might have been made to omit certain of
the regressors: it might lead to the "seemingly unrelated" case.
David Jones
.
- References:
- predicting a difference of two variables
- From: Beliavsky
- Re: predicting a difference of two variables
- From: David Jones
- Re: predicting a difference of two variables
- From: Reef Fish
- predicting a difference of two variables
- Prev by Date: Re: (hyper)sensitivity of goodness-of-fit tests
- Next by Date: Re: Bayes decision boundary between 3 classes?
- Previous by thread: Re: predicting a difference of two variables
- Next by thread: Bayes decision boundary between 3 classes?
- Index(es):
Relevant Pages
|