Re: How to justify normality test, or how to determine reference ranges
- From: davidjones@xxxxxxxxxx
- Date: Wed, 16 Apr 2008 07:51:09 -0700 (PDT)
I am still working on this. I have red a couple of papers, one in
particular was very similar to what I am doing [1] (I have ordered
more). It seems the standard method is pretty much that outlined in
my original post, with some outlier detection (which is less important
for me as I am confident of the health of my sample). If the initial
test for normality fails other transforms as well as log are also
suggested: exponential, power, square root, and reciprocal.
To determine how well this works I managed to get hold of some old
data that is similar to what I am going to work on, with nearly 1000
measurements per parameter and 15 blood test parameters. I randomly
selected a small sample of each parameter and tested this for
normality or log normality Anderson-Darling test (I shall have a go at
the other transforms when I get time). If it passed the test I
calculated the reference ranges using that distribution and looked at
how many of the rest of the sample fell within that range. If it was
a valid method I would expect around 95% to.
My results were:
Given a sample size of 100 animals you would have:
Accepted normality on 5 parameters
Accepted log normality on 2
Used non-parametric tests on 8 (not implemented yet)
Of those 7, 2 would have given you a reference range that included
between 4% and 6% of the unsampled population, and 4 would give you a
reference range that included between 3% and 7% of the unsampled
population. Some of the parameters would have resulted in reference
ranges that excluded nearly 20% of the total population.
It seems to me that this method is not really fit for purpose. Any
comments, especially criticising my method are greatly appreciated.
Another strategy I had thought of was to be more stringent with
accepting normality. This would not have helped. The Anderson-
Darling statistic is accepted at 5% confidence if it is below 0.752
[5]. Some of the lowest values gave the worst fit (glucose gave
Anderson-Darling param = 0.364, but ~13% of the population was outside
the reference range, ALP gave Anderson-Darling param = 0.454 and only
4.9% were outside reference range).
I have some more questions:
In [1] to fit to the various distributions they transform the data to
normal and then test the transformed data for normality using the
Anderson-Darling test. Is this better or equivalent to directly
testing against the distribution, or did they just do that because it
was easy?
The bootstrap method is mentioned. I have read [2] and [3] and
started [4] (I shall try and finish it but it is quite long and not
very readable) but it is not clear what sampling strategy one should
use if you are not assuming a distribution. If you just take 10,000
with replacement from your sample of 100, surely you will end up with
a distribution and so reference range identical to your sample. If
anyone can recommend any reading on this method that would be great.
From [1] the non-parametric method they use if they cannot fit thedata to any method is described thus: "For non-normally distributed
parameters, reference intervals were estimated by using a non-
parametric approach based on the direct estimation of the 0.025 and
0.975 quantiles (q)." (IC95% for these quantiles are calculated at
[6]. Does this mean just taking the top and bottom 2.5%, so in their
case of 120 animals the bottom of the reference range would be the 3rd
smallest value?
Does anyone have a calculated example of the Anderson Darling test? I
would like to be able to put their values into my function and check I
get the same value for the test statistic.
It says in [5] that for normal distribution if A2* exceeds 0.752 then
the hypothesis of normality is rejected for a 5% level test. Any
other theoretical distribution can be assumed by using its CDF. Each
theoretical distribution has its own critical values, and some
examples are: lognormal, exponential, Weibull, extreme value type I
and logistic distribution. How can I find or calculate these critical
values?
Sorry about the essay, any help at all would be greatly appreciated.
[1] Dimauro, C et al., Estimating clinical chemistry reference values
based on an existing ..., The Veterinary Jornal (2007), doi:10.1016/
j.tvjl.2007.08.002
[2] http://en.wikipedia.org/wiki/Bootstrapping_(statistics)
[3] http://people.revoledu.com/kardi/tutorial/Bootstrap/index.html
[4] http://www.nt.tu-darmstadt.de/nt/fileadmin/spg/research_projects/bootstrap/boottut.ps.gz
[5] http://en.wikipedia.org/wiki/Anderson-Darling_test
[6] The IC95% for the q quantiles were found by using the binomial
distribution method (Conover, 1999; Garcia-Perez, 2005). Afterwards,
the observations were ranked and numbered so that an observation value
lower than the q quantile could be considered extracted from a
binomial distribution with parameters n and q, mean nq and standard
deviation (nq(1-q))^0.5 and k limits of IC95% for the quantile q are
j,k=n * q +- 1.96*(n*qlow*qup)^0.5 where qlow and qup indicate 0.025
and 0.975 quantiles, respectively. Limits j and k were rounded up to
the next integer.
.
- References:
- How to justify normality test, or how to determine reference ranges
- From: davidjones
- Re: How to justify normality test, or how to determine reference ranges
- From: Richard Ulrich
- How to justify normality test, or how to determine reference ranges
- Prev by Date: Re: Common factors in ARMA model
- Next by Date: problem relating to conditional probability
- Previous by thread: Re: How to justify normality test, or how to determine reference ranges
- Next by thread: Re: How to justify normality test, or how to determine reference ranges
- Index(es):
Relevant Pages
|