# An interesting conversation with Stan Cohen.

*From*: "Reef Fish" <large_nassua_grouper@xxxxxxxxx>*Date*: 1 Nov 2006 20:38:01 -0800

I have known Stan (President of SPEAKEASY Computing Corp),

as I have known Jim Goodnight (President of SAS), and as I

recalled in my reply to Gordon Sande in the thread "Time to

Teach Digital Etiquette", I invited both of them to a Session in

an International Symposium in 1978 because of the power,

versatility and USER-FRIENDLINESS of their software in that

era, and even by today's standards.

SAS had 4 employees then, but needs no introduction today,

except perhaps Jim Goodnight whom very few knew that he

is a superb statistician himself, and had many valuable

contributions in the computing software in SAS, for GLM

pseudo-inverses and generalized inverses, and many other

applications of the matrix operator SWEEP.

Stan Cohen is the quiet genius in computing software who

developed SPEAKEASY in 1960!! while he was a physicist

at Argonne Labs. Since the early 1960s, he formed his own

Corporation, and I wasn't aware of Speakeasy until after I

left the University of Chicago, while he was living in the

same Hyde Park neighborhood in Chicago.

SPEAKEASY remains today relatively unknown, but is in

my estimation, still the BEST software product ever

created and continuously improved and adapted to

modern computing environments, from platforms on

various mainframe computers to different PCs, to super-

computers and parallel computers. The SAME Speakeasy

software language applies!

Stan and I hadn't talked to each other since the last

millenium <g> until someone read something posted by

a Reef Fish about SPEAKEASY in the Math Forum (which

apparently shows all the posts in sci.stat.math) and

mentioned it to him. Stan wrote me an email, not knowing

who Reef Fish was and told me about the PC version,

not knowing that I had written a 6-page GLOWING review

of Micro-Speakeasy Delta Version in the American

Statistician in 1987, and that he had known me, and

knew me well, for years.

That was a BENEFIT of being in sci.stat.math that I'll

never forget! Getting re-acquainted to Stan. :-) Stan

immediately gave me the much, much improved THETA

version of 2002, when he learned that the ONLY reason

why I still had a 1990 TI lap top was that it had Speakez

in it, and I had no access to Speakez elsewhere. :-)

Now I have Speakeasy on everyone of my laptops, and

I used it to re-program and do computations with what

I had previously relied on the use of my own system IDA.

Our conversation this afternoon was prompted by my

suggestion to Stan that we should try to CO-AUTHOR

my Data Analysis "textbook" which I had been using

and revising for 30 years but never published. I knew

Stan is very creative with graphics and other fancy

stuff that were non-existent in IDA or in that era, and

I thought he could contribute heavily to the graphics

in the book while I revise and update the statistics --

I also had many Speakeasy programs already in

use for those computations that are not found in other

statistical software, for Advanced topics in Data

Analysis.

So far, it's just the intro of the BACKGROUND of

what was interesting. :-)

I knew that SPEAKEASY (Speakez for short, which

I'll further shorten to EZ) was never considered a

statistical package, but rather a general computing

software that have many statistical capabilities.

However, because of the POWER of that language,

I was able to easily write my own software using the

EZ as my base software.

So, the first thing we talked about was what I told

Stan are the TWO most commonly used graphical

methods in statistics, the Normal probability of qq

plot (the command NORM in IDA) and the PLTS

(PLoT Sequence command in IDA) for validating

the Normality and Independence assumptions in

a regression problem. The most glaring absence

in EZ is the capability to do a p-p or q-q plot.

Stan says, "But Bob, I don't know anything about

statistics!", and I assured him that I could tell him

what a q-q plot is in TWO minutes. :-)

That's where the interesting part begins. It took

over two minutes, but not by much, because Stan

didn't know what a normal quantile is nor what

the EZ function GAUSSINV does. :-) I said,

"Does GAUSSINV(.975) = 1.96 ring any bell?".

He won the "no bell" prize.

So, I was teaching the President of EZ how to do

some things in a package in which those had been

in place for nearly 50 years.

But the most interesting part was that he not only

grasped those simple ideas quickly, but we had

the QQ subroutine written in complete generality,

in TWO LINES of EZ code, within minutes while

we were talking on the phone, and both were

doing the same lines of computing, line by line.

This was how it went:

I told Stan we first have to create the EMPIRICAL

cdf (which he didn't know what it was) by taking

the integers 1 to n (sample size), subtract 1/2 for

correction and divide by n to create the n fractiles:

F = (INTS(n) - .5)/n

We then convert those to the Standard Normal

quantiles by

Q = GAUSSINV (F).

Then we take any set of data X and standardize it to

Z = (X - mean(X))/standdev(X), then order them to form

Q1 = ordered(Z)

finally do GRAPH(q,q1:q) to get the Q-Q plot!

Voila, we did it one line at a time of course, so that we

could both see what the result of each line was. But at

the end, what we had done was in fact the steps it takes

to write an EZ subroutine that looks like this:

1 SUBROUTINE QQ(X)

2 N=NOELS(X);Q=GAUSSINV((INTS(N)-.5)/N)

3 Q1=ORDERED((X-MEAN(X))/STANDDEV(X)); GRAPH(Q,Q1:Q)

4 END

We generated some U(0,1) data to show what its QQ plot looks like by

X = RANDOM (INTS(100)); QQ(X)

We generated N(0,1) data to show what Normal data look like:

Y = NORMRAND(X); QQ(Y)

Recalling my comment to Jack Tomsky in the Afonso thread

that his probability result of -2ln(1-p) for chi-sq with 2 d.f.

reminded me of the method of simulating chi-square r.v.

with 2 d.f. by using -2 ln (1 - U), where U is from U(0,1).

I had forgotten that I had deducted points from my students

for wasting the time in "1 - U" which has exactly the same

distribution as "U". :-)

So, in EZ,

W = -2*ln(X)

would have yielded an array of Chi-square (2) r.v. which

is a member of the exponential distribution as well.

The QQ(W) plot shows a really severe departure from the

diagonal straightline of what a normal sample should look.

I am showing what I had disussed with Stan because he was

really starting from "ground zero" on the subject of Q Q plot,

and yet in a matter of a few minutes, he not only understood

every step of it, but could write his own subroutine, similar

to my two-liner. I mentioned to him that with the added

capabilities of color graphics (in EZ), he could easily soup up

the QQ plot to highlight unusual behavior or outliers.

Then our conversation drifted to Stan telling me some

existing capabilities in EZ he created that nobody ever used. :-)

Those were nifty things he did in high dimensions, and we

immediately struck an accord in our mutual understanding

of the difficulty of representing points (or functions) in

anything over FOUR dimensions, graphically.

Without going into any details about those topics, Just from

a few minutes of that conversation, I could relate some of

what he did with what some of my doctoral students did in

the graphical representation of high dimension data -- and

I told him about the Chernoff Faces, which was ONE of

a dozen or so methods that I knew, for representing

data in dimension more than 4. Moreover, I could also

see that if I were in the days of wanting to publish papers,

I had enough new ideas to write three or four different

papers that are publishable in major statistical journals

on the conversation I had with Stan in one afternoon.

So, I am excited that I'll have the opportunity to with work

with the man who created my favorite software package,

SPEAKEASY, in using EZ and its powerful capabilities to

write routines and do computations on Applied Statistics

the way a Statistician wants, rather than trying as most

folks do, fit everything into the mode of SAS or SPSS,

whether its the right thing to do, or not -- and far often,

they are the WRONG things to do.

-- Reef Fish Bob.

.