Re: Number of bins in a histogram
- From: "Reef Fish" <Dr_Bob_Ling@xxxxxxxxx>
- Date: 18 May 2006 16:49:45 -0700
Herman Rubin wrote:
In article <1147973306.100992.295490@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Lou Thraki <louthraki@xxxxxxxxx> wrote:
Can someone give, or refere to, an explanation where the
1/N^(1/3) in the Freedman-Diakonis rule for the number of
bins in a histogram comes from?
There are two sources of error in estimating a
density from a histogram. One is the coarseness
of the histogram, and the other is the inaccuracy
of the height of a bin. For the first, lots of
bins are better, and for the second, large bins
are better. These two errors balance at the
order of magnitude quoted.
A histogram has been said to be the WORST possible tool
to use for the assessment of continuous distributions.
A histogram concentrates on the CENTER of a distribution.
while the TAIL of a distribution is most distinct part of it,
in distinguishing it from a Normal Distrbution, say.
In Chapter 1 of my "Data Analysis" book, I show, 4 on the
same page, the plots of four density functions (without showing
the scales or units), but the "bell shape curves" are very
accurately plotted from the theoretical pdfs -- which
is ALSO the worst characterization of a normal distribution. :-)
At the bottom of the page, is the question "Which of these
is Normal?"
Of course they are virtually indistinguisable by eye.
The practical corollary of that is, "If you can't tell a perfectly
graphed pdfs which one is NORMAL, what chance do you
have when the empirical pdf, the histogram, has all kind of
bumps and irregularities?"
Then we proceed to the discussion of PROBABILITY plots,
with my reminder of a seemingly X-rated mnemonic trick:
You tell two PEOPLE apart by looking at the BODY and FACE.
You tell two Probability Distribution apart by looking at the TAILS.
The evil of any histogram,, no matter how they are binned, is
that they lump the MOST USEFUL information in the tails into
the extreme bins, to the extent that one sometimes can't even
tell a Normal Distribution sample (T with infinite degrees of
freedom) from its most distant family member, the Cauchy
distribution (T with 1 d.f.).
The histogram is not nearly as bad for discrete distributions of
small number of possible values of the random variable.
The only histogram I use are the A,B,C,D,and F grades in my
classes. :-)
-- Bob.
.
- Follow-Ups:
- Re: Number of bins in a histogram
- From: Herman Rubin
- Re: Number of bins in a histogram
- References:
- Number of bins in a histogram
- From: Lou Thraki
- Re: Number of bins in a histogram
- From: Herman Rubin
- Number of bins in a histogram
- Prev by Date: Re: a question about PCA
- Next by Date: Re: xls-2-sas7bdat
- Previous by thread: Re: Number of bins in a histogram
- Next by thread: Re: Number of bins in a histogram
- Index(es):
Relevant Pages
|