Re: Question about use of Poisson probabilities




Dora Smith wrote:

Not planning to call my boss an idiot, though I am questionning
how well he knows what he is talking about - but most people
are telling me for one reason and another to use the normal
distribution.

Probability theory was never my forte. By my model is wrong,
do you mean that my model does not consist of small numbers of
discrete events that are skewed toward the left?

I'm glad you're not going to call your boss an idiot. He
probably really isn't one, anyway. I've long noticed that humor
does not travel well over the Internet, but occasionally I
have to try anyway. Maybe that means I'm an idiot.

When I say your model is wrong, I mean that your assumptions
imply things that turn out not to be true about your data.

I'm not entirely sure that that is true in your case. I'm
guessing that 700,000 is not entirely impossible as a goal,
even if it is a stretch, since you mentioned the number.
If your records arriving or being processed or whatever
are a Poisson process, then 700,000 shows much, much, much too
much variability. In comparison, Big Foot riding by on
a winged unicorn to deliver your morning paper would be
boringly normal.

(If you have a Poisson with a rate of 617,000 reports per
month, then you *must have* a standard variation of
785 reports per month. Compare that to your sample
standard deviation. Is it wildly different? Then
it's not a Poisson process.)

I don't remember all of my statistics, but there are other
assumptions about your distribution that you need in order
to use the Poisson distribution.

One assumption is that each
event is independent of the others. That seems like a very
reasonable assumption to me, but it may not be true. There
may be background events that cause multiple reports to be
generated. I'm imagining insurance claims which would be
to some extent independent of each other, but also could be
jointly linked to bad or good weather. Maybe something similar
applies in your case.

I would strongly expect that some of the variability
is related to there being more working days in some
months than in others. There is also very likely a
seasonal component to it, too. There's a seasonal
component to practically everything.

It would be interesting if, after you've accounted for
things like number of working days and season, what
variability you have left looks like a Poisson. That might
mean you fully understand the processes that generate
reports for you. (Or maybe it doesn't; remember: I Am Not A
Mathematician.)

Whether or not you do all that, you should still be able to
characterize the monthly number of reports using a Gaussian
with some mean and some standard deviation. That's a
consequence of the Central Limit Theorem. (Theorem:
if you combine a bunch of distributions of whatever shape,
they'll start to look like a Gaussian.)

Jim Burns



"Jim Burns" <burns.87@xxxxxxx> wrote in message
news:45C63151.9425B9E5@xxxxxxxxxx

Dora Smith wrote:

My boss wants me to use poisson probabilities to compute the
liklihood of meeting various goals in relation to our project,
where the average number of records per month is 617,000.

I am having trouble getting Excel to compute probabilities as
other than 0 or 1, though when I do examples on the web that
involve very small numbers I get teh correct answers with no
trouble.

Are poisson probabilities intended for this use? Or are they
only applicable for small numbers of discrete events, like the
liklihood that 4 cars will run a traffic light in a day?

What would be the appropriate statistic for the probability of
meeting a goal of 700,000 discrete events in a month, or a few
million in a year?

The idea of changing your units to events per micro-month (or
something like that) doesn't seem very useful to me. Since the
Poisson distribution is discrete, this model would only include
the events { 0 records in Feb; 1,000,000 records in Feb,
2,000,000 records in Feb; ...} This is probably not what you want.

My thought was to approximate the Poisson distribution by
a Gaussian with the same mean and standard varation. This works
well for large enough numbers of events (including numbers
much smaller than 617,000). However, when I did this I realized
that source of your problems is your model. It just has
to be incorrect.

For the Poisson distribution you describe,
\mu = 617,000
\sigma = sqrt(\mu) = 785.5
For the question "What is the probability of at least 700,000
events in a month?", one can use the standard normal cdf with
z = (700,000 - 617,000)/785.5 = 105.7
(For a little perspective, the probability that z > 6 is
about one in 10^9) Maple will give me that number, but I'm
not surprised Excel won't:
prob(z > 105.7) = 1.146471703e-2427

If there is enough chance that you'll process 700,000 records
in a month that you're even asking this question, then
your model is wrong.


There are a lot of other well-studied distributions out
there, maybe even supported by Excel, and maybe even some
literature about which would be most appropriate for
your situation. I can't say anything about that deeper
than "Look on Google", or maybe "Look on Scholar.google"

If I were doing this, I would use the normal distribution
(which is nearly always not too wrong) with \mu and
\sigma gotten from the sample mean and sample standard
devaiation, of whatever data you got 617,000 from.


I notice, though, that you're looking for the probability
of reaching a goal. That makes me wonder whether these
statistics are being used incorrectly from the very
beginning.

The assumption behind using descriptive statistics as you
are doing is that the number of records processed per month
is *not* manipulable, that they just happen to have a certain
distribution. We can ask some questions, but the sort of
question you cannot have is "How well did we do last
month?" The answer will always be "We did the same as we
always do; it's just that last month we were lucky
(or, we were unlucky; it doesn't matter)" The assumption
that you did what you always do is built into the way
the question was asked.

Some more valuable statistics might be the correlations
between whatever you think *might* cause you to have
more records or fewer records processed and the numbers
actually processed.

Jim Burns
(Disclaimer: I am not a mathematician. If you're planning
to tell your boss he's an idiot for asking what he asked,
you need some heavier-weight support than me.)
.



Relevant Pages

  • Re: Howard Hersheys Challenge of Sean Pitmans Assumptions
    ... Poisson distribution describes the probability of exactly k events ... likely to be found within a fixed distance." ...
    (talk.origins)
  • Re: Poisson and exponential distributions
    ... i'm having some trouble seeing the relationship between the Poisson and the ... Because it is not a probability density, so does not need to integrate ... Thus, the complementary cumulative distribution of T is exp, ... if at least n arrivals occur by t. ...
    (sci.math)
  • Re: Probability question
    ... FreeCell in mind) and I want to calculate the probability that I win this ... Sum 0 to inf ) ... I intuitively suppose there is a relationship with a Poisson ... distribution but I don't see where p and n fit in, ...
    (sci.math)
  • Re: a simple(?) probability question...
    ... At one point you claimed that a hundred-year storm ... Now you say that a probability of 1 ... My comments in this dialogue have been directed at whether the Poisson ... distribution may be a better fit. ...
    (sci.math)
  • Re: Pigeons, People, and Priors
    ... the variance of the probability generator go to zero you have a continuum ... a random-interval 60 s schedule is not. ... The Exponential Distribution ... I probably should have used the phrase "statistical learning theory" rather ...
    (comp.ai.philosophy)