Re: Unbiased significance testing method; is there any?
- From: Rich Ulrich <rich.ulrich@xxxxxxxxxxx>
- Date: Fri, 28 Aug 2009 15:11:00 -0400
On Fri, 28 Aug 2009 12:11:54 +0200, AverageJoe <nospa@xxxxxxxxxxxxx>
wrote:
Paul wrote:
AverageJoe wrote:
Let's say in a hypothetical student assessment (multiple choice test)
there are 354 questions. The questions are independent of each other
and each has p=0.25 (ie. 4 choices, only 1 correct).
I assume you are talking about blind guessing here (if the
hypothetical assessment actually relates to a hypothetical course,
presumably a hypothetical student with a hypothetical clue would do
better than this), and you're also assuming that a "blind" guesser
guesses totally blind (no choices seem less like to the clueless than
other choices). Both assumptions are ok, so long as you realize
you're making them. (If you look at questions from the dreaded
textbook-accompaniment question banks, it tends to be the case that if
one choice is longer than the rest, it's liable to be the correct
answer.)
Since also the overall probability is p=0.25 it follows
that the expected mean would be 88.5.
One of the test takers answers 122 questions correct.
Now, the question is: statistically seen, how good is the result of this candiate?
Which statistical method should one use to answer this question?
The probability of a truly blind guesser (coin-tosser) getting 122 or
more can be computed from a Binomial(354, .25) distribution.
The "z score" method gives z=4.11 (1-sided), but what does this mean,
and is this method really appropriate in this case?
IIRC (and my memory on this is a tad fuzzy), the normal distribution
gives a decent approximation to the binomial for large n and p near
0.5, where the importance of p near .5 decreases (I think) as the n
increases. Your n of 354 is certainly large enough for the normal
approximation at p = 0.5, but I don't know off-hand how good the
normal approximation is for n=354 and p=0.25 (though I suspect it's
good enough). In any case, current software can quickly compute the
actual binomial probability for you.
I get this reault for 122 correct out of 354:
p(k=0..122) = 0.999972633032 (this corrosponds to z=4.034451386)
My problem is how to formulate, mathematically correct, that this result
is very good.
Our problem, here, is that you seem to be not-listening in
the same fashion as a high-performing autistic. You have been
given some good comments, which you seem oblivious to. We are
concerned, for one thing, that you will abuse a narrow answer
by presenting it in an entirely unsuitable context.
For instance, "very good" is a VERY PRESUMPTIVE term for the
result, for most real-world problems (which you state this as)
that start with "test takers."
"Non-random" is what the evidence suggests. Yes, with
a pretty good p-value.
"Very good", in the context of tests, implies a different
criterion, one involving other subjects. If the test is so
hard and deceptive that even clever test-takers have
trouble scoring above chance, then this could, indeed,
be "very good." Or if the test is being taken by an
Artificial Intelligence ... answering 10% of the questions
- intentionally - "correct" could be a sign of progress,
and "good" in comparison (say) to previous programs.
Even though there were twice as many "right" answers
solely by chance.
But for a context of tests and test-takers, "very good"
sounds "very wrong."
What about this wording:
"122 correct out of 354 is a very good result. It is significant with
a confidence level more than 99.99%, ie. practically 100%; one hardly can do better".
What, if the question was "How signicant is the result on a 95% confidence level?" ?
--
Rich Ulrich
.
- References:
- Unbiased significance testing method; is there any?
- From: AverageJoe
- Re: Unbiased significance testing method; is there any?
- From: Paul
- Re: Unbiased significance testing method; is there any?
- From: AverageJoe
- Unbiased significance testing method; is there any?
- Prev by Date: Re: can (how do) I use Bayesian inference for this problem
- Next by Date: Re: Very fast floating point PRNG
- Previous by thread: Re: Unbiased significance testing method; is there any?
- Next by thread: Re: Unbiased significance testing method; is there any?
- Index(es):
Relevant Pages
|