Re: Predicting using SAS-Survival analysis
From: Vadim Pliner (Vadim.Pliner_at_VerizonWireless.com)
Date: 08/24/04
- Next message: Sangdon Lee: "Re: QUERY: Sample proportion and prediction"
- Previous message: Glen: "Re: Hypothesis testing on a NON normal distribution"
- In reply to: AJ: "Predicting using SAS-Survival analysis"
- Messages sorted by: [ date ] [ thread ]
Date: 24 Aug 2004 10:36:34 -0700
> Ques. In Lifereg-
> 1. In order to decide which distribution to use I ran my code above with
> Weibull, Exponential, Lnormal etc. and simply picked the one with
> highest 'LIKELIHOOD' value. Am I right so far?
The problem with simply using the highest likelihood is that those
distributions have different numbers of parameters: exponential has
one parameter, Weibull, logistic, log-normal, normal, and log-logistic
are two-parameter distributions, and gamma has three. Although a
higher likelihood means a better model for the observed data, a higher
number of parameters causes weaker predictability for the new cases. I
can suggest two ways of taking into account both the likelihood and
the number of parameters.
First, you can use either AIC (Akaike Information Criterion) or SBC
(Schwarz's Bayesian Criterion) aka BIC according to the formula
-2*log-likelihood + k*(# parameters), where k = 2 for AIC and k =
log(n) (n is the number of observations) for SBC.
The second approach can be used for comparing "nested" models, such as
exponential vs Weibull, since the former is a special case of the
latter with the scale parameter = 1, Weibull, which is a special case
of Gamma with the shape parameter = 1, vs Gamma, and log-normal (a
special case of Gamma with shape = 0) vs Gamma. For example,
comparing Weibull with Gamma, you can use the fact that 2(L3-L2) has a
chi-square distribution with 1 degree of freedom, where L3 is the
log-likelihood for the Gamma distribution (3 stands for 3 parameters)
and L2 is the log-likelihood for the Weibull distribution. Low (not
significant) values of 2(L3-L2) would suggest Weibull as a better
choice over Gamma.
> 2. I compared the Predicted-values for the levels of X2 class-variable
> using code above and also by doing the above regression only on (for
> exg. S1) the dataset-S1 as shown:
> (DATA S1;
<snip>
> The predicted for level 'S1' by 2 methods above were different, and I
> was expecting it same as data used for 'S1' level is exactly same in
> both methods. Does anyone know why should they be different?
I can see at least one reason why the results could be different. Proc
lifereg is doing maximization of log-likelihood using an iterative
algorithm starting from initial values of parameters. Those
initializations (and the likelihoods themselves, BTW) are different in
the two scenarios, and the algorithm doesn't guarantee finding the
same maximum even if the likelihoods were identical.
> Ques. In Phreg-
> 1. How can I get the PREDICTED response values in phreg?
Use BASELINE statement.
HTH,
Vadim Pliner
anujwork@yahoo.com (AJ) wrote...
> Hi,
>
> I have dataset (has ~20,000 obs.) defined as-
>
> Y: (Numeric) Response var - Number of months a person stayed on
> program
> X1:(Character) [01,02,03,.....,65] - A person belongs to any one of
> these 65 segments.
> X2:(Character) [S1,S2,....,S12] - A person belongs to any one of
> these 12 stages.
> X3:(Character) [G1,G2,....,G15] - A person belongs to any one of
> these 15 groups.
> C: (Numeric) - Censor Status [0=uncensored or
> 1=censor]
>
> Additional Info:Around 80% of my data is censored.
>
>
> /* My code using LIFEREG */
> PROC LIFEREG data=ONE;
> CLASS X2;
> MODEL Y*C(1) = X2 / DIST=WEIBULL;
> OUTPUT OUT=LS P=PREDICTED;
> RUN;
> QUIT;
> /************************************/
>
> Ques. In Lifereg-
> 1. In order to decide which distribution to use I ran my code above with
> Weibull, Exponential, Lnormal etc. and simply picked the one with
> highest 'LIKELIHOOD' value. Am I right so far? The hazard plots were not
> very informative so couldn't use them to decide the distribution.
>
> 2. I compared the Predicted-values for the levels of X2 class-variable
> using code above and also by doing the above regression only on (for
> exg. S1) the dataset-S1 as shown:
> (DATA S1;
> SET one;
> IF X2 = 'S1';
> RUN;
>
> PROC LIFEREG data=S1 ;
> MODEL Y*C(1) = / DIST=WEIBULL;
> OUTPUT OUT=LS2 P=PREDICTED;
> RUN;
> QUIT;)
>
> The predicted for level 'S1' by 2 methods above were different, and I
> was expecting it same as data used for 'S1' level is exactly same in
> both methods. Does anyone know why should they be different?
> ---------------------------------------------------
>
> /* My code using PHREG */
> Proc phreg data=one;
> model Y*C(1) = X2;
> output out=one_out;
> Run;
> /**********************************/
>
> Ques. In Phreg-
> 1. How can I get the PREDICTED response values in phreg?
>
>
> Thanks,
> AJ
- Next message: Sangdon Lee: "Re: QUERY: Sample proportion and prediction"
- Previous message: Glen: "Re: Hypothesis testing on a NON normal distribution"
- In reply to: AJ: "Predicting using SAS-Survival analysis"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|