Re: Help with BUGS model



On Aug 23, 10:33 am, "sci.stat.math" <fra...@xxxxxxxx> wrote:
On 22 aug, 15:16, vontres...@xxxxxx wrote:





This seems to work as an extension of your code:

model
{
for (i in 1:K)
{
Signal[i] <- mu[i]/E[i]
N[i]~ dpois( mu[i] )
mu[i]<- lambda[i]*E[i]
lambda[i] <-P*lambda1[i]+(1-P)*lambda2[i] #result of equation 4.
lambda1[i] ~ dgamma( a1, b1 )
lambda2[i] ~ dgamma( a2, b2 )

}

a1 ~dunif(0.000001, 100)
b1 ~ dunif(0.000001, 100 )
a2 ~dunif(0.001, 100)
b2 ~ dunif(0.001, 100 )
P~dbeta(10,100)}

}

Dear Mark, I tested the model with a simulated dataset (simulate N and
E) it works. However the parameters of the mixture of gamma
distributions do not converge well (the samplesHistory shows a bad
interation pattern for the P and the as and bs.
With the model where the Lambdas are drawn from only one gamma prior I
have very good iteration pattern on the other hand. What do you think
is the explanation for that?- Hide quoted text -

- Show quoted text -

Dear sci.stat.math,

Do you have a first name that we can use in this thread? I hate to
feel like I'm writing to the newsgroup instead of a serious
statistician. It is like assigning anthromophologic properties to a
supreme being. I feel like it's a sign we must again don our foil
helmets.

But to be serios and answer your question:

I'm not sure what data you used to fit the MLE's to, so I can't be
sure why your MLE estimation procedure had problems. However, if it
was my data, then the data was completely made up. I made no effort to
compute the E's from an rxc table. Note since K is 5, then there is
only one pair of dimensions for the table, 1 and 5. You should get bad
answers if you used my data. You might do better if you use real data.
I can't give you any of my real data.

My experience with real data indicates that you need a constrained
minimization program for the MLE estimation. Your objective function
is negative of the log likelihood function. You must constrain P to
[1.0E-8, 1-1.E-8], and a1, b1, a2, b2 into the range of [1.0E-8, 100].
I use nlpcg in proc IML of SAS for this. The Solver tool in excel can
also be used, but that would be a real pain to set up the objective
function and data. You need a derivative free conjugate gradiant
function minimization routine with constraints to estimate the MLE's
of the 5 parameters. For really large data sets, you have to throw in
sparse matrix techniques for speed. That is why DuMouchell used C++
for his code. There can be a lot of cells to iterate over in equation
12 during a maximum likelihood estimation procedure.

Even with this powerful software, my estimates tend to wind up on at
least one of the boundaries. That usually means that just one of the
gamma distributions is contributing to the EBGM. For instance, P often
is zero or one.

You do need to get the the MLE's though as this produces Emperical
Bayes estimates of the gamma parameters ad P. This is a fairly common
approch to estimating ancillary hyperparameters. Using a vague prior
on the gamma parameters and P really defeats the purpose of using
DuMouchell's paper. The idea is that you are using all of the data to
define your background information. This lets you avoid needing a
denominator to estimate incidence rates. The background information is
the expected value of the cell under the assumption of independence of
the rows and columns of the rxc table. It is exactly the same equation
used for the expected cell frequency in a chi-square test of
independence. So, your signal detection data mining algorithm is
really just a residual analysis in rxc tables. Cells with large cell
frequencies relative to the expected frequency under the assumption of
independence are "suspicious". These are large "relative residuals"
and merit scientific investigation.

I'm not sure that I would want to pursue using WinBugs for this
problem since the paper has all of the quantities in it that you need.
The posterior distribution of lambda given N has a closed form. The
means are worked out for you. Although the "confidence intervals" are
not defined in the paper, the credible region on EBGM is the 2.5'th
and 97.5'th percentiles of the posterior distribution. This is easy to
compute since it is a convex combination of two gamma distribution
functions. You just have to use braketting and bisection to find them.
WinBugs won't compute the MLE's so you would have to compute them in
R, then send them to WinBugs using the BRUGS package.

By the way, guess what I've been working on for the last 3 months at
my company. I used SAS Proc IML to implement the paper. It will be
slow on FDA sized databases, but it works for a lot of our internal
analyses.

Mark

.