Re: Distribution of a vowel on the page



Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx> wrote in
news:9ahph3t1qek63cvlokenoro8g9sfgagb5d@xxxxxxx:

On Sun, 21 Oct 2007 20:36:40 -0500, David Winsemius
<doe_snot@xxxxxxxxxxx> wrote:

It is true that the Poisson is a good approximation to the binomial
when the rate parmeter is small, but the Poisson distribution may
also be good when even when the rate parmeter is not small. The
question is really only answerable by reference to the data.

I get it, if you are talking about the Poisson rate parameter
by itself, since that can be an arbitrary counter.
- Can you show me where a *binomial* rate parameter
is near or above 50% and results in Poisson appearance?


The question is not whether the binomial distribution is the same as the
Poisson. It's not. The question is which one offers a better fit to the
data, .... data we have not yet been shown. I can make arguments that it
"should" be a mixture distribution of a multinomial or binomial drawn
from a distribution of letter counts per line that would have a broad
"integer-ramp" that describes the last lines of paragraphs along with a
higher, narrow peak around 72 that describes the number of letters in the
non-last lines of the paragraphs.


But consonants and vowels are structured by words, and
words seldom start or end with more than 3 (say) consonants
or vowels. What you observe will have almost no instances
of more than 6 -- of *either* consonants or vowels. And
one of those has p > 0.5, so the event should not be enormously
rare.

The average number of vowels per line must be around 20-25, so your
statement about "almost no instances of more than six" does not make
much sense to me. The next sentence makes even less sense. I am
guessing that

Ah. I needed to be more clear. I was going on about the
*dependency*. If the occurrences are independent,
then there will be no "correlation" between *consecutive*
occurrences.

Clearly, vowels and consonants in words violate that assumption.

What assumption? We are not counting number of vowels per word, or the
sequences of letters, or repititions.

If there were a positive correlation (consecutive repetitions),
the distribution, if otherwise Poisson, would be over-dispersed
(variance too large for the mean). Since there is a negative
correlation, it will be under-dispersed.

Why should the vowel count of one line be dependent on the vowel count in
a prior line?

It occurs to me that if there is a positive correlation of having
words with many-versus-few vowels, then the between-word r
might tend to offset the negative correlation within words.
I don't know how that would work out.
But the OP already stated that the empirical distribution
he obtained did not look Poisson.

But he gave no data or summary statistics, nor did he say in what way it
departed from what he thought was "Poissonian". If it had a negative
skewness (say the left-sided shoulder due to the last line of
paragraphs), that would be decidedly non-Poissonian.

--
David Winsemius
.



Relevant Pages

  • random processes
    ... distribution of vowels per line on an English page followed Poisson ... The way I had reasoned it was that the English language is structured. ... I was thinking that the it should not show Poisson behavior. ...
    (sci.stat.math)
  • Re: Question about use of Poisson probabilities
    ... Probability theory was never my forte. ... I'm glad you're not going to call your boss an idiot. ... are a Poisson process, then 700,000 shows much, much, much too ... assumptions about your distribution that you need in order ...
    (sci.math)
  • Re: question about use of Poisson probabilities
    ... the normal distribution would give similar results. ... That alone does not prove the data came from a Poisson ... the idea that poisson probabilities are used to track traffic, ... If he wants to computer how many cars will pass his van in an hour ...
    (sci.stat.edu)
  • Re: Sample Size when Population distribution is Poisson
    ... assuming that the population distribution is Poisson. ... uses Margin of error as one of the terms. ... I do not know the population Standard deviation but have a sample ...
    (sci.stat.consult)
  • Re: Howard Hersheys Challenge of Sean Pitmans Assumptions
    ... Poisson distribution describes the probability of exactly k events ... likely to be found within a fixed distance." ...
    (talk.origins)

Loading