Re: why is probability and statistics a hard subject?
- From: strgh@xxxxxxxxxxxxxxxxxxxxxxxx ()
- Date: Fri, 9 Nov 2007 14:16:20 +0000 (UTC)
Hi Nasser! [old chess opponent]
In article <aMRYi.5101$4k.4798@xxxxxxxxxxxx> you write:
...
Actually my professor is distinguished in the field of probability and
statistics. He is a known expert in this field with many scientific
publications. I myself just find the subject more slippery than other
subjects. I think probability and statistics simply requires more time to
sink in than any other subject. This subject simply requires more experience
and practice to become good at it, or it may be simply that I was not born
to be a statistician. I think people who are really good at this must have
their brains wired differently than the rest of us :)
The logic & concepts underlying "statistics" ARE slippery,
and may take a while to sink in. As an undergraduate,
I understood statistics much better six months after the lectures,
even without studying the notes in the interim. You'll need to:
1) Understand the concepts of independence & conditional independence.
2) Distinguish carefully between what is known & what isn't
(it may help to write unknown quantities in capitals & the
corresponding known/observed/assumed values in lower case).
3) Understand "likelihood" as a measure of compatibility between
observed data & possible parameter values.
I'll try to help...
In mathematics, you start with some axioms (e.g. for Euclidean geometry),
create an ideal universe based on the axioms, and explore this universe
using deduction to prove things like Pythagoras' theorem. Results
from this ideal universe can often be usefully applied to the real world
- for example, to ensure that the corners of the Great Pyramid are
(to all intents & purposes) right angles.
With probability theory, real-world applications are less clear cut
- e.g. what exactly does it mean to say that a coin is "fair"?
Real-world probability is IMHO (and in e.g. Renyi's opinion)
a way to quantify my lack of knowledge about a situation,
rather than being an inherent property of nature.
Any fool can see whether the corners of a pyramid are right angles,
but different fools can quite reasonably have very different
probabilities for the same event :-) - e.g. the result of a tennis match
(witness recent press reports of "strange betting patterns").
With statistical inference, it's still less clear cut
- you observe data and use induction (not deduction)
to infer how the data might have arisen.
Even words like "independence" are misleading. All probabilities are
in practice conditional (on your assumptions, background knowledge etc.),
so everything related to probability is also conditional. The results
of two successive coin tosses are NOT independent: they are only
CONDITIONALLY independent - conditional on your knowing the "true"
probability p of a head (whatever "true" probability means!)
But then you wouldn't be trying to estimate p experimentally anyway.
In practice, if your first toss results in a head, this makes it more
reasonable that the coin is biased towards heads than tails,
so your probability p for heads increases.
Google for "Sally Clark" to see the practical importance of all this.
If you want some mathematical background, Google for "exchangeability".
A natural way to bring mathematical rigour into statistics is
"likelihood" (it may help to draw pictures). Start with the
"sample space" of possible data (e.g. X=no of heads out of 10 tosses).
Think of possible probability models that could give rise to the data.
You might then make reasonable simplifying assumptions such as
"identically distributed", "(conditional) independence" etc.
This typically leads you to consider a family of probability models
corresponding to different possible values of a "parameter"
lying in a "parameter space", e.g. Binomial(n,P) where n=10,
P is the parameter, and P lies in the parameter space [0,1].
If you fix the parameter value P=p this means you fix the probability
distribution for X. For example, if P=p=0.9 then Pr(X=x|P=p)
is Binomial(10,0.9), so that observing X=x=9 is perfectly reasonable,
X=x=4 is surprising, and X=x=0 is astounding (but still possible).
So the points (p=0.9) and (x=9) are highly compatible,
(p=0.9) and (x=4) less so, and (p=0.9) & (x=0) are fairly incompatible.
Similarly any given point p in the parameter space
will typically be compatible with some points x in the sample space
but much less compatible with others.
The measure of compatibility is Pr(X=x|P=p).
However, for statistical inference, probability models are
"the wrong way round". It's not the case that you know P=p
and are directly interested in possible values X=x; instead
you know the data X=x, and want to compare possible values p for P.
If we now think of Pr(X=x|P=p) as just a formula involving x & p,
and fix x, then it becomes a function just of p, called the likelihood
Lik(p;x), and still represents a measure of compatibility between
possible p and the (now fixed) x. If Lik(p;x) is low, this means that
if that were the "true" p, then the data x would have been surprising
- so that particular p has low compatibility with the observed x.
Note that likelihood is very different from probability - for example,
Lik(p;x) does not integrate/sum to 1 over the parameter space.
There are various approaches to statistical inference,
but if you can get an understanding of "likelihood"
then you have a good starting point. Formulae are secondary.
One of the basics is to learn the vocabulary. Are you paying
close attention to definitions?
I try to, but more diagrams and pictures would help. Our textbook does not
have too many of these.
What textbook is it?
There are several good articles on Wikipedia - start browsing from
http://en.wikipedia.org/wiki/Statistics
...
Ok, I'll look it up. I just ordered a book called "Applied Statistics
Algorithms by P. Griffiths " here is the amazon link
http://www.amazon.com/gp/product/0130379875/002-5270856-0528819
That's a good book, but in statistics the algorithms will generally
not help anyone understand the concepts - e.g. the usual methods of
approximating the inverse Normal CDF (piecewise rational polynomials)
have nothing to do with the nature or use of the inverse Normal CDF!
I like to see algorithms of things. It helps me to understand something when
I see the steps needed to solve it. My brain is more mechanical in a way.
Another suggestion, for figuring out "what's it all about" - find
SM Stigler's books on the history of statistics, like "History of
Statistics. The measurement of uncertainty before 1900."
ok, thanks, I'll try to look it up also.
Nasser
who has too many books and too little time to read them all.
Good luck! -- Ewart Shaw
--
J.E.H.Shaw [Ewart Shaw] strgh@xxxxxxxxxxxxx TEL: +44 2476 523069
Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
http://www.warwick.ac.uk/statsdept http://www.ewartshaw.co.uk
3 ((4&({*.(=+/))++/=3:)@([:,/0&,^:(i.3)@|:"2^:2))&.>@]^:(i.@[) <#:3 6 2
.
- Follow-Ups:
- Re: why is probability and statistics a hard subject?
- From: Nasser Abbasi
- Re: why is probability and statistics a hard subject?
- References:
- why is probability and statistics a hard subject?
- From: Nasser Abbasi
- Re: why is probability and statistics a hard subject?
- From: Richard Ulrich
- Re: why is probability and statistics a hard subject?
- From: Nasser Abbasi
- why is probability and statistics a hard subject?
- Prev by Date: Re: why is probability and statistics a hard subject?
- Next by Date: Re: Induced Multi Correlation
- Previous by thread: Re: why is probability and statistics a hard subject?
- Next by thread: Re: why is probability and statistics a hard subject?
- Index(es):
Relevant Pages
|