Re: Predicting using SAS-Survival analysis

From: Vadim Pliner (Vadim.Pliner_at_VerizonWireless.com)
Date: 08/24/04


Date: 24 Aug 2004 10:36:34 -0700


> Ques. In Lifereg-
> 1. In order to decide which distribution to use I ran my code above with
> Weibull, Exponential, Lnormal etc. and simply picked the one with
> highest 'LIKELIHOOD' value. Am I right so far?

The problem with simply using the highest likelihood is that those
distributions have different numbers of parameters: exponential has
one parameter, Weibull, logistic, log-normal, normal, and log-logistic
are two-parameter distributions, and gamma has three. Although a
higher likelihood means a better model for the observed data, a higher
number of parameters causes weaker predictability for the new cases. I
can suggest two ways of taking into account both the likelihood and
the number of parameters.

First, you can use either AIC (Akaike Information Criterion) or SBC
(Schwarz's Bayesian Criterion) aka BIC according to the formula
-2*log-likelihood + k*(# parameters), where k = 2 for AIC and k =
log(n) (n is the number of observations) for SBC.

The second approach can be used for comparing "nested" models, such as
exponential vs Weibull, since the former is a special case of the
latter with the scale parameter = 1, Weibull, which is a special case
of Gamma with the shape parameter = 1, vs Gamma, and log-normal (a
special case of Gamma with shape = 0) vs Gamma. For example,
comparing Weibull with Gamma, you can use the fact that 2(L3-L2) has a
chi-square distribution with 1 degree of freedom, where L3 is the
log-likelihood for the Gamma distribution (3 stands for 3 parameters)
and L2 is the log-likelihood for the Weibull distribution. Low (not
significant) values of 2(L3-L2) would suggest Weibull as a better
choice over Gamma.

> 2. I compared the Predicted-values for the levels of X2 class-variable
> using code above and also by doing the above regression only on (for
> exg. S1) the dataset-S1 as shown:
> (DATA S1;
<snip>
> The predicted for level 'S1' by 2 methods above were different, and I
> was expecting it same as data used for 'S1' level is exactly same in
> both methods. Does anyone know why should they be different?

I can see at least one reason why the results could be different. Proc
lifereg is doing maximization of log-likelihood using an iterative
algorithm starting from initial values of parameters. Those
initializations (and the likelihoods themselves, BTW) are different in
the two scenarios, and the algorithm doesn't guarantee finding the
same maximum even if the likelihoods were identical.

> Ques. In Phreg-
> 1. How can I get the PREDICTED response values in phreg?

Use BASELINE statement.

HTH,
Vadim Pliner

anujwork@yahoo.com (AJ) wrote...
> Hi,
>
> I have dataset (has ~20,000 obs.) defined as-
>
> Y: (Numeric) Response var - Number of months a person stayed on
> program
> X1:(Character) [01,02,03,.....,65] - A person belongs to any one of
> these 65 segments.
> X2:(Character) [S1,S2,....,S12] - A person belongs to any one of
> these 12 stages.
> X3:(Character) [G1,G2,....,G15] - A person belongs to any one of
> these 15 groups.
> C: (Numeric) - Censor Status [0=uncensored or
> 1=censor]
>
> Additional Info:Around 80% of my data is censored.
>
>
> /* My code using LIFEREG */
> PROC LIFEREG data=ONE;
> CLASS X2;
> MODEL Y*C(1) = X2 / DIST=WEIBULL;
> OUTPUT OUT=LS P=PREDICTED;
> RUN;
> QUIT;
> /************************************/
>
> Ques. In Lifereg-
> 1. In order to decide which distribution to use I ran my code above with
> Weibull, Exponential, Lnormal etc. and simply picked the one with
> highest 'LIKELIHOOD' value. Am I right so far? The hazard plots were not
> very informative so couldn't use them to decide the distribution.
>
> 2. I compared the Predicted-values for the levels of X2 class-variable
> using code above and also by doing the above regression only on (for
> exg. S1) the dataset-S1 as shown:
> (DATA S1;
> SET one;
> IF X2 = 'S1';
> RUN;
>
> PROC LIFEREG data=S1 ;
> MODEL Y*C(1) = / DIST=WEIBULL;
> OUTPUT OUT=LS2 P=PREDICTED;
> RUN;
> QUIT;)
>
> The predicted for level 'S1' by 2 methods above were different, and I
> was expecting it same as data used for 'S1' level is exactly same in
> both methods. Does anyone know why should they be different?
> ---------------------------------------------------
>
> /* My code using PHREG */
> Proc phreg data=one;
> model Y*C(1) = X2;
> output out=one_out;
> Run;
> /**********************************/
>
> Ques. In Phreg-
> 1. How can I get the PREDICTED response values in phreg?
>
>
> Thanks,
> AJ



Relevant Pages

  • Re: Predicting using SAS-Survival analysis
    ... The problem with simply using the highest likelihood is that those ... one parameter, Weibull, logistic, log-normal, normal, and log-logistic ... are two-parameter distributions, and gamma has three. ... chi-square distribution with 1 degree of freedom, ...
    (sci.stat.math)
  • Re: Weibull distribution: reference for maximum likelihood inference
    ... estimation of the parameters of the Weibull ... gladly read about unbiased estimation, ... distribution is not very well covered in the ... The large-sample procedure to obtain a confidence interval for h, which mathematically is based on inverting the likelihood ratio test, is to numerically obtain the two-dimensional region ...
    (sci.stat.math)
  • Re: 3-parameter Gamma and Weibull distribution
    ... Maciej, the most common way to fit univariate distributions is by maximum likelihood, but with a threshold parameter, like the 3-param Gamma and Weibull have, maximum likelihood often doesn't give plausible fits, and sometimes never works at all. ... In the case of the lognormal, the problem is that the ML estimate of the threshold is the smallest observation, and the estimated distribution is degenerate at that point. ...
    (comp.soft-sys.matlab)
  • Problem about likelihood ratio hypothesis test
    ... gamma distribution and exponential distribution, ... Likelihood ratio hypothesis test ... were performed for the comparison of the fit to the gamma ... distribution with the fit to the exponential distribution. ...
    (comp.soft-sys.matlab)
  • 3-parameter Gamma and Weibull distribution
    ... Do exist m-files that I could use to fit 3-parameter ... distribution: Gamma and Weibull ...
    (comp.soft-sys.matlab)