An interesting conversation with Stan Cohen.



I have known Stan (President of SPEAKEASY Computing Corp),
as I have known Jim Goodnight (President of SAS), and as I
recalled in my reply to Gordon Sande in the thread "Time to
Teach Digital Etiquette", I invited both of them to a Session in
an International Symposium in 1978 because of the power,
versatility and USER-FRIENDLINESS of their software in that
era, and even by today's standards.

SAS had 4 employees then, but needs no introduction today,
except perhaps Jim Goodnight whom very few knew that he
is a superb statistician himself, and had many valuable
contributions in the computing software in SAS, for GLM
pseudo-inverses and generalized inverses, and many other
applications of the matrix operator SWEEP.

Stan Cohen is the quiet genius in computing software who
developed SPEAKEASY in 1960!! while he was a physicist
at Argonne Labs. Since the early 1960s, he formed his own
Corporation, and I wasn't aware of Speakeasy until after I
left the University of Chicago, while he was living in the
same Hyde Park neighborhood in Chicago.

SPEAKEASY remains today relatively unknown, but is in
my estimation, still the BEST software product ever
created and continuously improved and adapted to
modern computing environments, from platforms on
various mainframe computers to different PCs, to super-
computers and parallel computers. The SAME Speakeasy
software language applies!

Stan and I hadn't talked to each other since the last
millenium <g> until someone read something posted by
a Reef Fish about SPEAKEASY in the Math Forum (which
apparently shows all the posts in sci.stat.math) and
mentioned it to him. Stan wrote me an email, not knowing
who Reef Fish was and told me about the PC version,
not knowing that I had written a 6-page GLOWING review
of Micro-Speakeasy Delta Version in the American
Statistician in 1987, and that he had known me, and
knew me well, for years.

That was a BENEFIT of being in sci.stat.math that I'll
never forget! Getting re-acquainted to Stan. :-) Stan
immediately gave me the much, much improved THETA
version of 2002, when he learned that the ONLY reason
why I still had a 1990 TI lap top was that it had Speakez
in it, and I had no access to Speakez elsewhere. :-)

Now I have Speakeasy on everyone of my laptops, and
I used it to re-program and do computations with what
I had previously relied on the use of my own system IDA.

Our conversation this afternoon was prompted by my
suggestion to Stan that we should try to CO-AUTHOR
my Data Analysis "textbook" which I had been using
and revising for 30 years but never published. I knew
Stan is very creative with graphics and other fancy
stuff that were non-existent in IDA or in that era, and
I thought he could contribute heavily to the graphics
in the book while I revise and update the statistics --
I also had many Speakeasy programs already in
use for those computations that are not found in other
statistical software, for Advanced topics in Data
Analysis.

So far, it's just the intro of the BACKGROUND of
what was interesting. :-)

I knew that SPEAKEASY (Speakez for short, which
I'll further shorten to EZ) was never considered a
statistical package, but rather a general computing
software that have many statistical capabilities.

However, because of the POWER of that language,
I was able to easily write my own software using the
EZ as my base software.

So, the first thing we talked about was what I told
Stan are the TWO most commonly used graphical
methods in statistics, the Normal probability of qq
plot (the command NORM in IDA) and the PLTS
(PLoT Sequence command in IDA) for validating
the Normality and Independence assumptions in
a regression problem. The most glaring absence
in EZ is the capability to do a p-p or q-q plot.

Stan says, "But Bob, I don't know anything about
statistics!", and I assured him that I could tell him
what a q-q plot is in TWO minutes. :-)

That's where the interesting part begins. It took
over two minutes, but not by much, because Stan
didn't know what a normal quantile is nor what
the EZ function GAUSSINV does. :-) I said,
"Does GAUSSINV(.975) = 1.96 ring any bell?".
He won the "no bell" prize.

So, I was teaching the President of EZ how to do
some things in a package in which those had been
in place for nearly 50 years.

But the most interesting part was that he not only
grasped those simple ideas quickly, but we had
the QQ subroutine written in complete generality,
in TWO LINES of EZ code, within minutes while
we were talking on the phone, and both were
doing the same lines of computing, line by line.

This was how it went:

I told Stan we first have to create the EMPIRICAL
cdf (which he didn't know what it was) by taking
the integers 1 to n (sample size), subtract 1/2 for
correction and divide by n to create the n fractiles:

F = (INTS(n) - .5)/n

We then convert those to the Standard Normal
quantiles by

Q = GAUSSINV (F).

Then we take any set of data X and standardize it to

Z = (X - mean(X))/standdev(X), then order them to form

Q1 = ordered(Z)

finally do GRAPH(q,q1:q) to get the Q-Q plot!

Voila, we did it one line at a time of course, so that we
could both see what the result of each line was. But at
the end, what we had done was in fact the steps it takes
to write an EZ subroutine that looks like this:

1 SUBROUTINE QQ(X)
2 N=NOELS(X);Q=GAUSSINV((INTS(N)-.5)/N)
3 Q1=ORDERED((X-MEAN(X))/STANDDEV(X)); GRAPH(Q,Q1:Q)
4 END

We generated some U(0,1) data to show what its QQ plot looks like by

X = RANDOM (INTS(100)); QQ(X)

We generated N(0,1) data to show what Normal data look like:

Y = NORMRAND(X); QQ(Y)

Recalling my comment to Jack Tomsky in the Afonso thread
that his probability result of -2ln(1-p) for chi-sq with 2 d.f.
reminded me of the method of simulating chi-square r.v.
with 2 d.f. by using -2 ln (1 - U), where U is from U(0,1).

I had forgotten that I had deducted points from my students
for wasting the time in "1 - U" which has exactly the same
distribution as "U". :-)

So, in EZ,

W = -2*ln(X)

would have yielded an array of Chi-square (2) r.v. which
is a member of the exponential distribution as well.

The QQ(W) plot shows a really severe departure from the
diagonal straightline of what a normal sample should look.

I am showing what I had disussed with Stan because he was
really starting from "ground zero" on the subject of Q Q plot,
and yet in a matter of a few minutes, he not only understood
every step of it, but could write his own subroutine, similar
to my two-liner. I mentioned to him that with the added
capabilities of color graphics (in EZ), he could easily soup up
the QQ plot to highlight unusual behavior or outliers.

Then our conversation drifted to Stan telling me some
existing capabilities in EZ he created that nobody ever used. :-)
Those were nifty things he did in high dimensions, and we
immediately struck an accord in our mutual understanding
of the difficulty of representing points (or functions) in
anything over FOUR dimensions, graphically.

Without going into any details about those topics, Just from
a few minutes of that conversation, I could relate some of
what he did with what some of my doctoral students did in
the graphical representation of high dimension data -- and
I told him about the Chernoff Faces, which was ONE of
a dozen or so methods that I knew, for representing
data in dimension more than 4. Moreover, I could also
see that if I were in the days of wanting to publish papers,
I had enough new ideas to write three or four different
papers that are publishable in major statistical journals
on the conversation I had with Stan in one afternoon.

So, I am excited that I'll have the opportunity to with work
with the man who created my favorite software package,
SPEAKEASY, in using EZ and its powerful capabilities to
write routines and do computations on Applied Statistics
the way a Statistician wants, rather than trying as most
folks do, fit everything into the mode of SAS or SPSS,
whether its the right thing to do, or not -- and far often,
they are the WRONG things to do.

-- Reef Fish Bob.

.