Re: Number of bins in a histogram




Herman Rubin wrote:
In article <1147973306.100992.295490@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Lou Thraki <louthraki@xxxxxxxxx> wrote:
Can someone give, or refere to, an explanation where the
1/N^(1/3) in the Freedman-Diakonis rule for the number of
bins in a histogram comes from?

There are two sources of error in estimating a
density from a histogram. One is the coarseness
of the histogram, and the other is the inaccuracy
of the height of a bin. For the first, lots of
bins are better, and for the second, large bins
are better. These two errors balance at the
order of magnitude quoted.

A histogram has been said to be the WORST possible tool
to use for the assessment of continuous distributions.

A histogram concentrates on the CENTER of a distribution.
while the TAIL of a distribution is most distinct part of it,
in distinguishing it from a Normal Distrbution, say.

In Chapter 1 of my "Data Analysis" book, I show, 4 on the
same page, the plots of four density functions (without showing
the scales or units), but the "bell shape curves" are very
accurately plotted from the theoretical pdfs -- which
is ALSO the worst characterization of a normal distribution. :-)

At the bottom of the page, is the question "Which of these
is Normal?"

Of course they are virtually indistinguisable by eye.

The practical corollary of that is, "If you can't tell a perfectly
graphed pdfs which one is NORMAL, what chance do you
have when the empirical pdf, the histogram, has all kind of
bumps and irregularities?"

Then we proceed to the discussion of PROBABILITY plots,
with my reminder of a seemingly X-rated mnemonic trick:

You tell two PEOPLE apart by looking at the BODY and FACE.

You tell two Probability Distribution apart by looking at the TAILS.

The evil of any histogram,, no matter how they are binned, is
that they lump the MOST USEFUL information in the tails into
the extreme bins, to the extent that one sometimes can't even
tell a Normal Distribution sample (T with infinite degrees of
freedom) from its most distant family member, the Cauchy
distribution (T with 1 d.f.).

The histogram is not nearly as bad for discrete distributions of
small number of possible values of the random variable.

The only histogram I use are the A,B,C,D,and F grades in my
classes. :-)

-- Bob.

.



Relevant Pages

  • Re: The Promise of Forth
    ... Returns the computed median value of a list of numbers, ... number of bins to use for the histogram (more bins brings the computed value ... Association for Computing Machinery Inc., New York, ...
    (comp.lang.forth)
  • Re: Number of bins in a histogram
    ... density from a histogram. ... bins are better, and for the second, large bins ... while the TAIL of a distribution is most distinct part of it, ... and not valid for the tails. ...
    (sci.stat.math)
  • Re: Number of bins in a histogram
    ... density from a histogram. ... bins are better, and for the second, large bins ... while the TAIL of a distribution is most distinct part of it, ... and not valid for the tails. ...
    (sci.stat.math)
  • Re: Histogram equalization
    ... There is nothing in MATLAB that does accurate histogram matching. ... transform it accurately to the histogram of the second image. ... certain bins with other bins completely empty. ... get a very flat histogram and ALL the bins will be filled up. ...
    (comp.soft-sys.matlab)
  • Problem overlaying data on .gif
    ... I'm plotting some satellite data (latitude, ... every time step) in a 2d histogram (using hist2d from mathworks) so I ... % spaced bins in both dimensions ...
    (comp.soft-sys.matlab)