Re: A methodology problem in Cox regressions
- From: David Winsemius <doe_snot@xxxxxxxxxxx>
- Date: Wed, 19 Apr 2006 20:20:11 -0500
"Alfa Beta" <alfa@xxxxxxx> wrote in
news:94T_f.52442$d5.207572@xxxxxxxxxxxxxxx:
Question: Is it ok or is it wrong in a Cox regression to add
observations that you have incomplete followup data on, i.e.
observations where data is missing some time before the terminal
event?
I haven't been able to find anyone who could give a straight answer
back on my home turf, so I turn to this list now.
I can illustrate the problem. You study the survival of X over 5 years
with regards to how various variables influence hazard. Let's say X is
a rare bird or something. You can't get a large enough sample in one
year alone to follow for 5 years, so you have to add newly hatched
observations/birds each spring, and so you follow these new birds for
5 years too. Under the proportional hazards assumption it would be ok
to clump all the observations together in the end and make one big Cox
regression on it all - let us assume we can make that assumption for
the sake of argument.
However, here comes the thing. In year 5 we decide to include a final
generation of birds and we will have enough data. We just have to get
5 years of followup data on those birds too and we're ready to do the
analysis. But after year 7 it turns out we have to abandon the
project. Funds running out or something. Let's see where we stand:
generation 1 - 7 years of followup data
gen 2 - 6 yrs
gen 3 - 5 yrs
gen 4 - 4 yrs
gen 5 - 3 yrs
Assume that there are some specific scientific reasons for setting the
terminal event at year 5 in the followup timeline. We can then safely
use generations 1-3 together for the regression, since they all have
at least 5 years of followup data. But my question is, is it wrong to
also include generations 4-5. And if so, why exactly? And how serious
a "statistical crime" would it be to use incomplete data like this if
it isn't right?
This paragraph is the crux of the matter. If gen 4-5 are not at-risk for
the outcome of interest, then there will be little value and only the
potential for bias in including them. They cannot contribute meaningful
information. The power in an analysis with a 0/1 outcome accrues from the
number of events, rather than the number at risk. If they cannot get to
the "risk stage" for whatever birdy-thing you are counting, then you
ought to leave them out.
That's a straight answer and an argument. Now let's see if anyone else
agrees.
--
David Winsemius
.
- References:
- A methodology problem in Cox regressions
- From: Alfa Beta
- A methodology problem in Cox regressions
- Prev by Date: Re: Regression question
- Next by Date: Re: ICC
- Previous by thread: A methodology problem in Cox regressions
- Next by thread: Population SD when sample size > 20?
- Index(es):
Loading