Statistical Graphical Displays of Data





A.G.McDowell wrote:
> In article <bV%me.126$W77.88@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, W.
> Watson <wolf_tracks@xxxxxxxxxxx> writes
> >Not many statisticians, if any, use pie charts for data presentations. Tufte in
> >his book "The Visual Presentation of ..." comes down pretty hard on pie charts,
> >"Given their low data-density and failure to order numbers along a visual
> >dimension, they should never be used." He quotes another author here, Bertin,
> >1981. Aside from this quote, the best rebutation of pie charts I have ever seen
> >is in a cartoon. A character asks another, "Why does anyone use a pie chart?"
> >The other replies, "To make the data easier to swallow."
> >
> >That may be, but (or should I say however?) they are nevertheless used in the
> >media and politics on a widespread scale. This doesn't make it right, but what
> >in those settings makes it wrong? From a statisticans view point, I can see they
> >are certainly at the bottom of the pile for presenting data visually, but what
> >about for non-statisticans? I guess this is like knocking out a virus. Tough to
> >do in practice.
> >
> It's worth looking at "The Elements of Graphing Data", by W.S.Cleveland.
> The whole book is an examination of what really does display data well
> and what doesn't, backed up by experimental studies of how accurately
> people can extract data by eye from various sorts of displays. In
> section 4.10 he quotes studies that show that people are usually quite
> bad at estimating sector sizes (with some exceptions, e.g. recognising
> 90 degrees) and displays a pie chart and a dot chart of the same data.
> The pie chart doesn't appear to have any particular structure, but on
> the dot chart it is immediately obvious that the data split into two
> clusters of values, one with a score of 12 +/- 1 or so and one with a
> score of 8 +/-1 or so. This gain in perceptive ability would be obvious
> to any viewer, no matter what their background.
>
> I have Tufte and two books by Cleveland: "The Elements of Graphing Data"
> and "Visualising Data". Tufte is arguably a more entertaining read, but
> the two Cleveland books provide more solid evidence for their views and
> are more relevant to a scientific setting.

Entertaining read aside, Turfs book was reviewed as a "tour de force"
(I am jogging from old memory) on the subject, which I agree. Your
characterization and comparison of Bill Cleveland's work on the subject
are also spots on.

Bill and I spent three years as classmates at Yale, and I am glad that
he abandoned his doctoral dissertation toopic of time series and made
his valuable contributions in statistical graphics, just as I abandoned
my doctoral dissertation topic of cluster analysis and made (far less
than Bill) my contributes to statistics (or statistical graphics). :-)


> However, since his emphasis
> is on harnessing our native ability to see patterns in pictures, in most
> cases the viewer of the graph does not require any particular background
> or training to gain from his work. The only catch is that he has been so
> influential, directly and via S-Plus, that the reader is in danger of
> saying "What's the big deal? don't all scientific graphs look like
> this?".
> --
> A.G.McDowell

While I do not disagree with you on any of the above, I think it's fair
to point out that your comparison of pie chart vs a dot chart in terms
of spotting clusters is not a fair comparison, in the sense that no
single (valid) graphical representation is uniformly more effective
than another in all applications. The dot chart happens to be so in
revealing clusters while the pie chart would be more effective in
a different context where the dot chart may not even be applicable.

"Cluster Analysis" was my dissertation topic. In terms of graphical
representation, I adopted an old idea (before the IBM computer days)
of shading the elements of a (pairwise) dissimilarity (or similarity)
matrix of objects (the closer the darker) by the use of overstrikes
in computer printout of the "shaded matrix".

When the objects are arranged in the order of groups of "clusters",
they would appear as a dark "block", separate from well separated
clusters. The "before" and "after" plots give an effective way of
visualizing whether the analytic method(s) of clustering did a good
job or not, or WHETHER there ARE any clustering effect! Analytically,
it is often harder to show that a method of analysis accoumplished
NOTHING (or the Emperor is naked) than when the Emperor is clothed.

I published the method in (1973) "A computer generated aid for
cluster analysis," Comm of the ACM, 16, 355-6.

Everitt, B.S. (1978) "Graphical Techniques for Multivariate Data,"
New York: North Holland, pp. 51-8.

gave a more thorough description and example of my method than
my paper.:-) Sometime during the decade after the appearance of
my paper, slight variations of that graphical method found its
way into the major statistical software packages of those days,
including BMDP, SAS, and I think SPSS, as optional plots in
cluster analysis (and factor analysis).

I don't know the status of the "shaded plot" in those or other
statistical software packages now because I haven't used any of
them for at least a decade. But later enhancements were made
possible by advances in computer technology, such as the use of
color graphics and interactive computing. One of my doctoral
students (in 1988) adapted the method into an interactive
routine that was effectively used in conjection with REGRESSION
DIANOSTICS <leverage and influece type> of "joint influence".

This is a long Rant? < :-)> to make the point that ONE of the basic
elements of an effective graphical representation of statistical
data is to use the MOST APPROPRISTE graphical method for displaying
what one WANTS the viewers/readers to SEE.

Such a method may be one-of-a-kind, such as the example in Tufts
book, of the graph showing the size and trail of Napoleon's
army in its defeat (by the weather) during Napolean's Russian
campaign.

There is more than meets the eys (pun intended) in statistical
graphical display. The effectiveness of ANY particular form of
display depends very much on WHAT are intended to be shown as
well as HOW they are best shown.

There ain't no single one-method-fits-all in graphical display
methods -- that would be my comment in general, as well as on
the particular "pie chart" vs "dot chart" example.

-- Bob.

.