Re: Error Testing




"Old Mac User" <chendrixstats@xxxxxxxxx> wrote in message
news:1161274508.917519.322280@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Graham...

You wrote...

"Thanks for the answer. What value would be large enough for me to be
able to prove that it didn't make any difference?"

I know this is going to sound philosophical, not practical, but the
fact is that we cannot prove there is no difference.... we cannot prove
that two things (such as fraction defective or defect rate) are equal.
We can determine that they are different... always with a probability
attached.

So what do we do? If we are trying to "prove" that two things are
"equal", the best we can do is to calculate and report a confidence
interval on the difference. This means we will end up with a statement
of the form "the probability is (say) 95% that the true underlying
difference is not larger than XXX" The value of XXX gets smaller as we
get more data. In general, our best strategy is to get about the same
amount of data (equal sample sizes) for both categories.

So this is not really a philosophical thing. In particular, with some
preliminary data we can calculate (estimate is the better expression
here) about how much data we will need in order to insure that XXX is
small enough to satisfy our requirements. This, rather than just
"let's just get some data and see what happens". We can actually do
this calculation with no preliminary data whatsoever, but the
calculation will be based on hypotheticals... so it's best to have some
preliminary data.

You also wrote...

"I've been Googling for websites explaining this sort of stats, are
there any that come well recommended? I can see me doing a lot more of
this sort of stuff in the future, would be nice to understand it as
well as be able to regurgitate bits."

I'm not aware of any good websites that teach how to deal with this
class of data. My "everyday" expression for such data is "counting
data"... we count the number of "defectives" or "events" whether they
are favorable or unfavorable. The formal name for this is "attribute
data". In theory, this sort of data comes from any of several
statistical distributions. Yours should be coming from a binomial (pass
or fail, good or bad...) distribution. But other data can be coming
from a Poisson distribution or any of several others. To make it more
interesting, some data of this sort comes from unknown, arbitrary
distributions. Binomial and Poisson are the easiest to deal with. A
few facts about those are usually taught in courses in statistics. But
as I point out in others posts, there is a lot of misleading "stuff" in
textbooks and other publications... and websites as well.

When it comes to simple comparative binomial data (like yours...
comparing two fractions or proportions or rates) I stronly suggest
using Fisher's Exact Test. Sadly, the history of simple comparative
data of this sort (binomial, comparing two fractions, etc.) has a long
and tortured history. Ove the years many people have created "recipes"
some of which are truly homemade and whose properties have never been
documented. Some of those are really awful. Other such methods have
been "blessed", but have footnotes concerning their range of validity
and a lot of fine print. Even as I write this, more are surely being
created. Textbooks, journals, etc. abound in there. Fisher's Exact Test
is exact with no fine print. How did this come to be? Well, medical
technologists have their own "recipes"... sociolologists have theirs...
enginers have some of their own... etc. etc. Sadly, some people like
to haggle over whether "mine is better than yours". Fisher's Exact
(created at least 80 years ago) faded into the background. Fisher's
Exact Test is a "computer intensive method"... or at least it can be
computer intensive. That is, for some situations a software version of
it may run slowly on a computer. It was never practical before digital
computers appeared. But 80 years is a long, long time to neglect such
a valuable thing. So this is the first thing you need to know.
I gave you a link to a site that will calculate Fisher's Exact Test for
you. If you want my software for doing this I can e-mail it to you as
an attachment. NOTE: Some firewalls don't like this because it is
"active code". But I assure you it is harmless.

The second thing you need to learn is Chi-sq. Pronounce that as if you
are about to say the "ki.." in kite... then "square" ki-square.
Chi-sq allows us to compare more than two fractions or proportions.
Done properly, Chi-sq (with simple comparative data... just two
fractions to be compared) will produce probability values that are very
close to Fisher's Exact Test. Note the "done well" part because some
textbooks really don't do a good job of explaining how to do Chi-sq for
comparing just two fractions. Chi-sq has wide applications far beyond
what I'm saying here... very broad indeed.

There's one more thing. If you intend to "run experiments" and get
comparative data with the intention of learning which controllable
variables affect the outcome, then you want to consider using "designed
experiments". In particular, factorial designs. Used wisely, these
can be used, for instance, to study the effect of seven variables in
just eight experiments. NOTE: When working with binomial data,
experiments will have to be repeated. While it may not be obvious,
this actually reduces the amount of work to be done by a factor of
seven. There are other plans for even 15 variables in 16 trials, etc.
If you have a lot of variables to consider, then attacking them one at
a time is a losing game. With two-level factorial designs you can
often (not always) study all of the variables with a modest amount of
effort. OMU
============================================
Very, very good OMU. Most statistics courses totally ignore the issues of
planning tests and experiments. Experimental design just about never comes
up on the sci.stat trilog as a solution to problems.

ED has its faults, because of the prior assumptions that have to be made
regarding the unknown interactions.

The sheer success of Japanese automobiles in achieving quality and
dependability after WWII comes from ED methods. The really impressive
reliability of our "computer" devices in 2006 from cell phones to ipod to
laptops, etc, comes from very extensive ED approaches to the manufacturing
processes. Just think if we only had General Motors or the original AT&T
around to provide us with these devices.!

David Heiser


.



Relevant Pages

  • Re: Error Testing
    ... "I've been Googling for websites explaining this sort of stats, ... using Fisher's Exact Test. ... data of this sort (binomial, comparing two fractions, etc.) has a long ...
    (sci.stat.edu)
  • Re: Error Testing
    ... "I've been Googling for websites explaining this sort of stats, ... using Fisher's Exact Test. ... data of this sort (binomial, comparing two fractions, etc.) has a long ...
    (sci.stat.edu)
  • Re: Comparing fractions (or proportions)
    ... Just fyi I've run extensive studies comparing Chi-sq (with Yates ... well Chi-sq p-values approximate Fisher's p-values, ... Hypotheses Test named *Bernard's Exact Test*. ...
    (sci.stat.math)