Re: why is probability and statistics a hard subject?
- From: "S. F. Thomas" <thomas7243@xxxxxxxxxxxxx>
- Date: Fri, 09 Nov 2007 14:41:20 -0500
bm459@xxxxxxx wrote:
(( cuts ))
Different texts use different words and sometimes simply reading
something presented two different ways allows it to sink in. Plus
part of learning is repetition. Read something in three books and you
have to read it and think about it three times. When I was a student
I noticed the more sources I read on a topic the more I learned about
and understood the topic. I have also noticed, with age, that when I
used multiple sources the stuff stuck with me whereas when I used only
one source because the concepts were so simple they did not stick
nearly as well. After all, if muliple sources were not important why
would we even need teachers? Just reading the book would be
adequate. It is clear teachers do more then simply set the pace. I
have also noticed you can skip the teacher nicely providing you use
multiple books. Some books and some teachers simply do not click with
some students. The only solution is to find other books. I will also
grant that some people have a wiring problem in their brain that makes
some subjects very, very hard. The solution to that problem is to
find a different major that does not need the impossible subject and
is compatible with the wiring.
These are very useful observations, with which I concur. I would add that it is very useful to go back to the original sources. I agree with
the poster that says statistics IS hard. That is why the statisticians
disagree amongst themselves. Nobody disagrees at the level of the technical mathematics, mind you; rather it's at the level of core, foundational concepts that the disagreements set in.
In my case, I learned so-called classical statistics (Neyman-Pearson) as an undergraduate engineer. Then as a graduate student, I got introduced to Bayesian statistics. I was totally impressed, because it promised to make statistical life a lot simpler. It rather boldly postulated that the unknown parameters could be treated exactly as random variables, and the frequency model could be understood now as a conditional pdf, given the parameter-considered-as-random-variable. But then I at some point read Rev. Thomas Bayes' original 18th (?) century article, only to find that he expressed grave misgivings back then.
Then I came across the book "Foundations of Statistical Inference: A Symposium" by Godambe and Sprott (eds.) (1971). It was then I realized that statistics as a discipline rested on rather shaky foundations, for the classicists and Bayesians could not agree with each other. If the respective exponents could not agree amongst themselves, it should be no wonder that students find the subject hard. By all means we can all follow cookbook recipes, but at a level of deep and true understanding, it is perhaps another matter.
Somewhere along the line I read R. A. Fisher (1951), "Statistical Methods and Scientific Inference", and A. W. F. Edwards, "Likelihood" (1972). It is from these that I fully came to appreciate that the _entire_ inferential import of any experiment, given any proposed model, is wholly contained within the likelihood function. That is the core kernel about which all agree. The only problem was that there was no likelihood calculus to evaluate composite hypotheses, and thus to perform marginalization, except by a simple rule of maximization. Fisher famously said, the likelihood of w1 or w2 is like the income of Peter or Paul, we don't know what it is until we know which is meant. He was then led to maximization rules, which threw up a stream of paradoxes, especially in the case of multi-parammeter problems. It is that central difficulty that in my opinion lies at the root of the bifurcation of the discipline into the classical and Bayesian schools.
The Bayesians cut through the Gordian knot by, in effect, elevating the likelihood function to the status of a probability density function. From this position it becomes a simple matter to evaluate composite hypotheses and perform marginalization to eliminate nuisance parameters in multi-parameter problems -- the simple expedient is a rule of integration for the evaluation of composite hypotheses and of marginalization. The paradoxes engendered by likelihood marginalization go away.
The classicists remained cautious, rejecting the Bayesian expedient of considering the unknown parameters of frequentist models to be random variables, certainly not of the frequentist sort, and even not of a subjective belief sort. Hence their indirect and long-winded "solutions" to the problem of inference, which in a familiar example may be stated thusly: IF one were to perform an experiment an infinite number of times, then a confidence interval based on the experimental data and constructed in a certain well-defined way, would contain the value of the true parameter, whatever it is, 95% of the time. Thus inferential statements require appeal to a long series of experiments that have not been, and will never be performed.
Edwards would simply say: look at the likelihood! All would agree that this is good advice for a one- or two-parameter problem, but totally useless otherwise.
Anyway, then I encountered Zadeh's fuzzy set theory, which made a lot of sense up to a point. I realized that probabilities could be fuzzy: I considered the thought-experiment: If a friend fabricated an entirely new thumb-tack, never before seen or used anywhere, with an entirely new geometry. Contemplating this thumb-tack, one asks, what is the probability that it would land top down if tossed. One looks at its geometry, its weight distribution etc. and one is prepared to assert, before-hand, that it would do so with a "high" probability. Now you toss it and observe a result. What should be your new estimate of the probability of it landing top down, on another throw?
Or consider this example due to Zadeh. Given the statement, "most Swedes are tall", and an assumed Gaussian model with mean mu and standard deviation sigma, what is the fuzzy range of values allowed for mu and sigma consistent with the statement. How would a classical exponent solve such a problem? How would a Bayesian exponent? How would a fuzzy exponent, who at the time were insisting rather dogmatically that fuzziness had nothing at all to do with probability.
Anyway, to cut a long story short, I found I could find a way to resolve all these various views of the matter. The classicists are right that the unknown model parameters are in no sense a random variable. The Bayesians are right that we should be able to characterize the uncertainty directly and marginalize, perform changes of variable necessary to support decision action, etc. etc. The likelihood people are right that the likelihood function contains the entire inferential content of any statistical experiment. And the fuzzicists are right that our probabilities may be essentially fuzzy, and therefore that our model parameters may be essentially fuzzy also. In effect, the likelihood function and the likelihood semantics define a kind of fuzzy set, and certainly a kind of possibility distribution deriving from a fuzzy set. The fuzzy-set semantics allows for a way to manipulate likelihood that Fisher and Edwards could not have imagined, and that addresses the core Bayesian concern of direct manipulability. There are lots of other implications besides, new vistas that open up, and old vistas that need to be looked at again with fresh eyes. Anyway, it's all very exciting stuff, see my S. F Thomas, "Fuzziness and Probability" (1995).
If it answers anything, it is the OP's question, "is statistics hard?". The answer is yes, obviously, since the exponents, for good reason, have not been able to agree amongst themselves. However, that said, I now believe that if we go back to basics -- what is a phenomenon, what is a model of a phenomenon, what is probability, what is a probability model, what is measurement, is it better to have fuzzy measurement as the general case, from which point measurement is a special approximation, or to have point measurement as the general case, from which we may fuzzify or statistify as necessary, what is an instance of a phenomenon, how is a statistical distribution over the extension set of a phenomenon defined with reference to a probability model, does limited sample data lead to fuzzy uncertainty in model parameters, etc. etc. -- we may, perhaps paradoxically, render statistics less hard, perhaps even easy. In other words, I am saying that statistics is hard because not enough attention has been paid to the foundations, and huge superstructures have been built on what are in my opinion still shaky foundations. So often, at a deep level, the teachers will not know really what they're talking about, although at a cookbook level they may be entirely well enough trained (sic). In such a situation, the student who seeks a deep understanding, as Nasser obviously does, will find it hard. Possibly there is a problem with the brain-wiring, but that I rather doubt.
Be all that as it may, the classical paradigm is so entrenched, Bayesian protestations and assaults notwithstanding, it is likely that Nasser's grand-children would again be asking the same question that Nasser is now asking.
Regards,
S. F. Thomas
.
- References:
- why is probability and statistics a hard subject?
- From: Nasser Abbasi
- Re: why is probability and statistics a hard subject?
- From: Richard Ulrich
- Re: why is probability and statistics a hard subject?
- From: bm459
- why is probability and statistics a hard subject?
- Prev by Date: Re: why is probability and statistics a hard subject?
- Next by Date: Re: why is probability and statistics a hard subject?
- Previous by thread: Re: why is probability and statistics a hard subject?
- Next by thread: Re: why is probability and statistics a hard subject?
- Index(es):
Relevant Pages
|