Re: Single-Factor-Cox-Regression: Citations



On Sun, 25 Feb 2007 16:23:49 -0600, David Winsemius
<doe_snot@xxxxxxxxxxx> wrote:


Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx> wrote in
news:grh1u25qmjbgaphi8d7ousfbrh6i8jitmf@xxxxxxx:


If anyone knows something more definite about that, I will
be happy to hear it.


References (extracted from PubMed) added in proof for earlier assertion
of non-inferiority of statistical power for Cox regression relative to
logistic regression:

Thanks, much, for the references and the detail.

By the way, there should be no doubt that Cox regression should
have better power, given adequate data that fit the model. What I
doubted, and what is not yet fully answered, is whether Cox regression
is more susceptible (than ML logistic regression) to non-robustness
when the number of Events is small.

The abstracts below discuss both logistic and Cox, and do not
raise the problem, so maybe I overestimate it.

Still, for a small count of events, there is the additional
question -- to fit the assumption -- of whether events
are *similarly* distributed across time in two groups.

My own initial experience suggests some sensitivity.

Here are the references and abstracts again.

-
Rich Ulrich


1) J Clin Epidemiol. 1995 Dec;48(12):1503-10.
Importance of events per independent variable in proportional hazards
regression analysis. II. Accuracy and precision of regression estimates.

* Peduzzi P,
* Concato J,
* Feinstein AR,
* Holford TR.

Cooperative Studies Program Coordinating Center, West Haven Veterans
Affairs Medical Center, Connecticut, USA.

The analytical effect of the number of events per variable (EPV) in a
proportional hazards regression analysis was evaluated using Monte Carlo
simulation techniques for data from a randomized trial containing 673
patients and 252 deaths, in which seven predictor variables had an
original significance level of p < 0.10. The 252 deaths and 7 variables
correspond to 36 events per variable analyzed in the full data set. Five
hundred simulated analyses were conducted for these seven variables at
EPVs of 2, 5, 10, 15, 20, and 25. For each simulation, a random
exponential survival time was generated for each of the 673 patients, and
the simulated results were compared with their original counterparts. As
EPV decreased, the regression coefficients became more biased relative to
the true value; the 90% confidence limits about the simulated values did
not have a coverage of 90% for the original value; large sample
properties did not hold for variance estimates from the proportional
hazards model, and the Z statistics used to test the significance of the
regression coefficients lost validity under the null hypothesis. Although
a single boundary level for avoiding problems is not easy to choose, the
value of EPV = 10 seems most prudent. Below this value for EPV, the
results of proportional hazards regression analyses should be interpreted
with caution because the statistical model may not be valid.

2) J Clin Epidemiol. 1996 Dec;49(12):1373-9.
A simulation study of the number of events per variable in logistic
regression analysis.

* Peduzzi P,
* Concato J,
* Kemper E,
* Holford TR,
* Feinstein AR.

Cooperative Studies Program Coordinating Center, Veterans Affairs
Medical Center, West Haven Connecticut 06516, USA.

We performed a Monte Carlo study to evaluate the effect of the number
of events per variable (EPV) analyzed in logistic regression analysis.
The simulations were based on data from a cardiac trial of 673 patients
in which 252 deaths occurred and seven variables were cogent predictors
of mortality; the number of events per predictive variable was (252/7 =)
36 for the full sample. For the simulations, at values of EPV = 2, 5, 10,
15, 20, and 25, we randomly generated 500 samples of the 673 patients,
chosen with replacement, according to a logistic model derived from the
full sample. Simulation results for the regression coefficients for each
variable in each group of 500 samples were compared for bias, precision,
and significance testing against the results of the model fitted to the
original sample. For EPV values of 10 or greater, no major problems
occurred. For EPV values less than 10, however, the regression
coefficients were biased in both positive and negative directions; the
large sample variance estimates from the logistic model both
overestimated and underestimated the sample variance of the regression
coefficients; the 90% confidence limits about the estimated values did
not have proper coverage; the Wald statistic was conservative under the
null hypothesis; and paradoxical associations (significance in the wrong
direction) were increased. Although other factors (such as the total
number of events, or sample size) may influence the validity of the
logistic model, our findings indicate that low EPV can lead to major
problems.

3) J Cardiovasc Risk. 1997 Apr;4(2):127-34.
An empirical comparison of multivariable methods for estimating risk
of death from coronary heart disease.

* Knuiman MW,
* Vu HT,
* Segal MR.

Department of Public Health, University of Western Australia,
Nedlands, Australia.

BACKGROUND: Logistic regression and, more recently, Cox regression
have been the predominant methods for identifying risk factors and
developing risk estimation equations for coronary heart disease (CHD).
Software for the regression tree method is now available for binary and
survival outcomes and thus offers an alternative methodology. This paper
compares these four methods for identifying significant risk factors from
among a set of candidate factors and for estimating the risk of death
from CHD using baseline and mortality follow-up data on 1,701 men
participating in the Busselton Health Study. The candidate risk factors
were age, body mass index, systolic and diastolic blood pressure,
treatment for hypertension, cholesterol and smoking. METHODS: Logistic
regression, Cox proportional hazards regression, binary regression tree,
and survival regression tree analyses have been applied to data obtained
from the same cohort of men for CHD death risk estimation and prediction.
The four methods are compared in terms of the variables selected,
goodness-of-fit of models, similarity of cross-validated estimated risks
for individuals, and ability to discriminate between those who died from
CHD and those who did not die from CHD during the follow-up period,
including the comparison of Receiver Operating Characteristic (ROC)
curves. RESULTS: Although age and a blood pressure variable were selected
by all four methods, body mass index was also selected by the regression
tree methods and smoking was also selected by Cox regression. There was
good, but not excellent, agreement between methods in estimates of risk
for individuals, the areas under the ROC curves were 0.66 for the binary
tree, 0.72 for logistic regression, 0.71 for the survival tree method and
0.78 for Cox regression. The average differences in estimated risk
between those who died from CHD and those who did not die from CHD during
the follow-up period were 0.051 for logistic regression, 0.070 for the
binary tree method, 0.073 for the survival tree method and 0.088 for Cox
regression. CONCLUSION: For a moderately sized cohort typical of many
applications of these methods in the literature, the two methods which
used the survival outcome performed better than the methods using a
binary outcome. Despite selecting some different variables and showing
moderate differences in risk estimates for individuals, the two binary
approaches were similar in performance. Cox regression appeared to be
superior to the survival tree method, but further larger studies of
completely separate samples for model development and evaluation of
prediction performance are required to confirm this finding.

4) Stat Med. 1989 Dec;8(12):1515-21.
Efficiency of the logistic regression and Cox proportional hazards
models in longitudinal studies.

* Annesi I,
* Moreau T,
* Lellouch J.

INSERM Research Unit on Statistical and Epidemiological Methods and
Applications to the Study of Chronic Diseases (Unit 169), Villejuif,
France.

Both logistic regression and Cox proportional hazards models are used
widely in longitudinal epidemiologic studies for analysing the
relationship between several risk factors and a time-related dichotomous
event. The two models yield similar estimates of regression coefficients
in studies with short follow-up and low incidence of event occurrence.
Further, with just one dichotomous covariate and identical censoring
times for all subjects, the asymptotic relative efficiency of the two
models is very close to 1 unless the duration of follow-up is extended.
We generalize this result to several qualitative or quantitative
covariates. This was motivated by the analysis of mortality data from a
study where all subjects are followed up during the same fixed period
without loss except by death. Logistic and Cox models were applied to
these data. Similar results were obtained for the two models in shorter
periods of follow-up of five years or less, but not in longer periods of
ten years or more, where the survival rate was lower.

5) Am J Ind Med. 1998 Jan;33(1):33-47.
Empirical comparisons of proportional hazards, poisson, and logistic
regression modeling of occupational cohort data.

* Callas PW,
* Pastides H,
* Hosmer DW.

Department of Biostatistics and Epidemiology, School of Public Health
and Health Sciences, University of Massachusetts, Amherst, USA.

This research was conducted to examine the effect of model choice on
the epidemiologic interpretation of occupational cohort data. Three
multiplicative models commonly employed in the analysis of occupational
cohort studies--proportional hazards. Poisson, and logistic regression--
were used to analyze data from an historical cohort study of workers
exposed to formaldehyde. Samples were taken from this dataset to create a
number of predetermined scenarios for comparing the models, varying study
size, outcome frequency, strength of risk factors, and follow-up length.
The Poisson and proportional hazards models yielded nearly identical
relative risk estimates and confidence intervals in all situations except
when confounding by age could not be closely controlled in the Poisson
analysis. Logistic regression findings were more variable, with risk
estimates differing most from the proportional hazards results when there
was a common outcome or strong relative risk. The logistic model also
provided less precise estimates than the other two. Thus, although
logistic was the easiest model to implement, it should be used only in
occupational cohort studies when the outcome is rare (5% or less), and
the relative risk is less than approximately 2. Even then, the
proportional hazards and Poisson models are better choices. Selecting
between these two can be based on convenience in most circumstances.

--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: Single-Factor-Cox-Regression: Citations
    ... of non-inferiority of statistical power for Cox regression relative to ... proportional hazards regression analysis was evaluated using Monte Carlo ... An empirical comparison of multivariable methods for estimating risk ... Logistic regression and, more recently, Cox regression ...
    (sci.stat.math)
  • Re: Comparison of Logistic regression and Cox regression
    ... > ratio from logistic regression and the hazard ratio from the cox ... Logistic regression has a single dichotomous outcome. ... Cox regression models a survivorship curve. ...
    (sci.stat.consult)
  • Re: Ordinal logistic regression and the relative risk
    ... to be a distorted scale of risk. ... are easily converted to risk ratios. ... regression model gives beta coefficients that when exponentiated are ... yielded estimates of risks and relative risks. ...
    (sci.stat.math)
  • Re: Fix x86 32 bit FRAME_POINTER chasing code
    ... if (next <= frame) ... pretty bad but not a recent regression, ... crashes during x86.git maintenance and others hit various crashes in ... no risk of months loss of quality to kerneloops.org data either. ...
    (Linux-Kernel)
  • Re: Fix x86 32 bit FRAME_POINTER chasing code
    ... pretty bad but not a recent regression, ... crashes during x86.git maintenance and others hit various crashes in ... no risk of months loss of quality to kerneloops.org data either. ... because in a week we'll trigger plenty of crashes in -git based x86 ...
    (Linux-Kernel)