Re: Modeling Random Experiments with Probability Theory



On Aug 5, 2:10 pm, Marvin <marvin....@xxxxxxxxxxxxxx> wrote:
Modeling Random Experiments with Probability Theory

Probability theory is the (mathematical) theory of probability spaces.
Probability spaces of course are rigorously defined, just as groups are
defined for group theory, or meromorphic functions are for complex
analysis.  Probability theory is often used to model processes that
contain randomness (the word being used more in a colloquial sense
here), either by nature, e.g., physical experiments, or artificially,
e.g., randomized algorithms.  We also speak of 'random experiments', or
shortly 'experiments'.  This article on the one hand is concerned with
reasoning for the appropriateness of using certain probability spaces to
model certain random experiments.  On the other hand, it discusses how
conclusions reached through probability theory could be interpreted in
order to be of use for applications and understanding.  It appears that
most textbooks and other texts on probability theory omit this topic
although they often describe random experiments, e.g., flipping a coin,
for motivation of the mathematical concepts.  Of course, the reasoning
in question cannot be completely mathematical in nature.  But the goal
of this work is to provide reasoning that relies on as few "obvious"
facts as possible.  This work is also intended to initiate a discussion
on how the reasoning can be improved or extended.  Occosionally, formal
questions are posed (numbered as 'Question 1', 'Question 2', etc.)  that
could serve as starting points for discussion.  Any references to texts
covering or touching this topic are also welcome.

We start with the example of flipping a coin.  This is always modeled
using {H,T} as sample space and Pr({H})=Pr({T})=1/2 as probability
measure.  It denotes 'H' the outcome 'heads', meaning that this
particular side of the coin faces upwards, and 'T' analogously denotes
'tails'.  Why is that an appropriate modeling of the experiment of
flipping a coin?  Any reasoning beyond claiming this to be "obvious"
seems to be not particularly easy.  Any reasoning should also avoid to
refer to the concept of "probability", since this would require further
clarifications, and it would mean a circular reasoning to use
probability theory for it!  The same goes for all other synonymous or
related terms, like saying that one outcome was "more likely" than
another.

What reasoning will we use?  For the choice of the sample space, it is
enough to note that 'heads' and 'tails' are the only possible outcomes.
One might at first find that it would also be necessary to note that
both outcomes are in fact possible.  However, an impossible outcome w
could always be modeled by setting its probability to zero, i.e.,
Pr({w})=0.

For the choice of the probability measure, we note the following.  If we
had something to gain by correctly guessing in advance the outcome of
flipping the coin (either 'heads' or 'tails'), there would be *no way*
to argue that it would be better for us to guess 'heads' or to guess
'tails'.  The reason for this is that the experiment does not offer
anything to argue with:  the coin is symmetric (we assume it to be) and
although the exact way in which the coin is flipped may influence the
outcome, we have no idea in what way this would be, or we do not have
enough control over that physical process to apply any kind of "trick"
to induce a certain outcome.  (Maybe a specially skilled person would
have such an ability, but we assume that to not be the case; just as we
assume the coin being symmetric.)

Two things are important to note.  First, we avoided the term
"probability" by instead considering the question what outcome we would
guess when we had something to gain by a correct guess.  After all, such
questions are the original motivation for applying probability theory in
many cases.  Second, we argued for using equal probabilities by *(I)
symmetry* and *(II) an inability to assess or to control any influential
parameters*.  Symmetry was restricted to the coin itself -- certainly
the experiment introduces an asymmetry, since the location of the two
sides of the coin are different in the starting position of the coin,
when we hold it in our hand.  This however falls under the category of
non-assessable parameters (II).

What if we remove symmetry?  Assume the coin has been deformed.  Some
ways in which the coin could be deformed will make many people believe
that it would be wise to favor one guess (either 'heads' or 'tails')
over the other.  But would they be able to quantify, in order that we
can define Pr({H}) and Pr({T})?  It seems that without symmetry, it is
hard to model the experiment as a probability space and provide a
conclusive reasoning for it.  One way to go about it would be to do a
number of experiments and then set probabilities to resemble the ratio
of observed 'heads' and 'tails'.  Such a model would always be dependent
on the outcomes of these particular experiments.  So it appears that
symmetry is an essential ingredient for a good reasoning.

Notably, if we choose to reason by experiment in case of a *symmetric*
coin, we could well arrive at a model that does not assign equal
probabilities to the two outcomes, and hence to deviate from the
theoretically gained model.  One could be tempted to calculate the
"probability" that a series of n coinflips would lead to a model with
probabilities deviating by at most epsilon from 1/2.  But that would
require a model in the first place.

Evaluating Conclusions Drawn from the Model

There is a second aspect under which a model can be evaluated, namely
whether conclusions drawn inside of the model (in case of probability
theory, we could also speak of predictions) are in agreement with
experience and intuition.  A difficulty with our current understanding
is that we only have very limited ways of mapping probabilities (and
most conclusions of the model will be statements on probabilities of
certain events) to terms of the experiment or other relevant terms
outside of the model.  So, we automatically arrive at the second topic
of this article:  how to interpret results from probability theory.

Let us first generalize our original reasoning to experiments of rolling
a symmetric k-sided dice.  This can be done straightforwardly.  The
sample space is {1,...,k}, and the probability measure is defined by
Pr({i})=1/k for each i in {1,...,k}.  Now, fix k=3 and let A={1,2} and
B={3}, i.e., A is the event that the dice shows either '1' or '2', and B
is the event that it shows '3'.  The model clearly states Pr(A)=2/3 and
Pr(B)=1/3.  In what respect could that be in agreement or disagreement
with experience and intuition?  Recall that we chose equal probabilities
in the model to express that any guess is as good as the other.  If we
turn that around, we would now conclude that guessing A is better than
guessing B (if we were given the task to guess either A or B).

Question 1: Is the conclusion that guessing A is better than guessing B
really in agreement with experience and intuition?  Most readers will
say "Yes", without hesitation, but can they provide a detailed
explanation?  Some will argue that actually performing the experiment a
"large" number of times will teach us that the conclusion is right.  But
how can they be sure when they do not know the outcomes of the
experiments?  Again, there is no way to calculate the "probability" that
a series of experiments will lead to event A in a fraction of
2/3+epsilon or 2/3-epsilon of the cases, since that would require a
model in the first place.

From now on, we will use the term 'probability' always referring to a
model, i.e., the probability of an event, which is a number between 0
and 1 given by a probability measure.  The main question is how to map
probabilities to certain kinds of statements, e.g., whether we should
prefer guessing one event or the other.

Arguing that a higher probability is to prefer, is easy in case that one
of the events includes the other.  Let A and B be events such that B is
strictly included in A.  Then we have Pr(A)>=Pr(B), and even Pr(A)>Pr(B)
provided that Pr({w})>0 for each outcome w.  It is clear that we have
nothing to lose by guessing A instead of B.  In the following example,
it is even clear to see that it would be a *mistake* to not favor A over
B.  Consider n repeated coinflips.  Let A be the event that we have at
least k times 'heads', and B the event that we have at least j times
'heads', with k<j<=n.  Then B is strictly included in A.  If we had the
choice between guessing A or guessing B, we should guess A.  Here is an
explanation why.  We consider the coinflips one after the other.  If any
of the two events A or B occurs, then there is a point in time where it
is clear that A occurs, but it is still unclear whether B will occur as
well.  That is, of course, the point where we have seen 'heads' come up
at least k times, but fewer than j times.  So at this point, things may
still go wrong for someone who guessed B, but never for someone who
guessed A.  There is no way that something similar can occur with A and
B exchanged.

Ordering or Measure?

Our interpretation of probability so far neglects the actual numbers, it
just observes that Pr(A)>Pr(B), and that is enough to draw the
conclusion that guessing A is better than guessing B.  Would it be
enough then to provide an *ordering* on all the events in the
probability space, instead of a probability measure (which implies an
ordering)?  It will not be enough if we extend our scenario in the
following way.  We are still interested in making good guesses about the
outcome of some experiment.  As an extension, we are now given the
choice between two experiments.  The first experiment consists of
rolling a k-sided symmetric dice, k>=3.  The second experiment consists
of rolling a j-sided symmetric dice.  Let k<j.  We consider the event A
that the respective dice shows '1', i.e., A={1}, which is part of the
probability space in either experiment.  Assume that we may choose one
of these experiments, then make a guess whether A will occur or not,
then the chosen experiment is conducted, and we gain something in case
that we guessed correctly.  Would we rather choose the first or the
second experiment?  Probabilities for the first experiment are Pr(A)=1/k
and Pr(A^c)=1-1/k, with A^c denoting complement (which is different for
experiment one and two).  Probabilities for the second experiment are
Pr(A)=1/j and Pr(A^c)=1-1/j.  It is easy to see that 1-1/j is the
largest probability of all.  So, if we were to follow the interpretation
of the model in the way that we favor higher probabilities, we would
choose the second experiment and guess that A does not occur.

Question 2:  Repeat Question 1 for the new scenario.  Is the conclusion
of choosing the second experiment and guessing that A does not occur
really in agreement with experience and intuition?

Reference Experiments and Models

We are still concerned with interpretations of probabilities.  So far we
have posed the question (in several variants) whether or to what extent
it can be argued that putting a guess on an event with higher
probability is better.  We will not attempt any more explanations for
this in this section.  Instead we look at one more scenario, which is
especially relevant to the analysis of randomized algorithms.  It is
also one more example that it is useful to consider probability measures
and not just orderings on the events.

Suppose we have the opportunity of conducting an experiment where the
sample space is devided into two events, say 'success' and 'failure'.
Suppose also that we have something to gain if 'success' occurs, but we
have something to *lose* when 'failure' occurs.  Now we are given the
choice of either running the experiment (which means taking the chance
to gain something and the risk to lose something) or of not running the
experiment, in which case we neither gain nor lose something.  The
connection to randomized algorithms is the following.  A randomized
algorithm may provide a correct result.  It may also provide an
erroneous result, but without indicating that.  So we do not know
whether the result is correct or not, we only have a probability that it
is correct.  We assume that using a correct result gives us some gain
and that using an erroneous result as if it was correct leads to loss.

It appears reasonable that a decision whether to run the experiment or
not (or whether to trust the result provided by a randomized algorithm)
should not only be dependent on whether 'success' has a larger or
smaller probability than 'failure', but also on the actual
probabilities.  How can we relate a probability, e.g., 0.2 or 0.5^10, to
the world outside of the model which we used to model the experiment
(e.g., the randomized algorithm)?  We propose here to use a *reference
experiment* along with a corresponding model.  Recall that we already
convinced ourselves that rolling a symmetric k-sided dice is well-
modeled using sample space {1,...,k} and Pr({i})=1/k for each outcome i..
Let 'success' have probability p>1/2, then the probability for 'failure'
is q=1-p<1/2, and 1/q>2.  We assume 1/q being an integral number and set
k=1/q.  Then experiencing 'failure' is like rolling a '1' with a k-sided
dice, and experiencing 'success' is like rolling a number from {2,...,k}
with a k-sided dice.  This way of thinking relies on two prerequisites.
First, we assume having a "feeling" for what it means to throw a '1', or
anything else then a '1', with a k-sided dice.  We can then say:  "I
will run the experiment (e.g., trust the output of the randomized
algorithm) if I would as well roll a 1/q-sided dice and hope for the
dice showing anything else than a '1'."  Second, we rely on having used
the same modeling principles to model the rolling of the dice as we used
to model the experiment (e.g., the randomized algorithm).

Combination of Experiments -- Product Probability Spaces

In the previous section, we could have chosen any experiment as
reference, provided that we can give a good model for it.  So we may
look for reference experiments that provide a good "feeling" for
probability.  For instance, we could roll a k-sided dice until a '1'
shows up the first time and count the number of rolls.  Or we could roll
a k-sided dice n times and consider the probability that each time it
shows a number from a certain set.  We will focus on the latter.

Suppose that we conduct two experiments E_1 and E_2 that we have modeled
to our satisfaction with discrete probability spaces (Om_1,Pr_1) and
(Om_2,Pr_2); we omit the sigma algebra and assume as usual that it is
the power set of the corresponding sample space.  We wish to model the
experiment of conducting E_1 and E_2 with all possible outcomes denoted
as elements of the Cartesian product Om=Om_1 x Om_2.  We assign
probabilities Pr({(w_1,w_2)})=Pr_1({w_1})*Pr_2({w_2}) for each (w_1,w_2)
in Om.  It is a basic exercise to show that this in fact yields a
probability space (Om,Pr).  But is it also a good model for the
combination of the two experiments?  This can be argued for in the
following way.  We can "mask out" one of the two experiments, say E_2,
by considering events of the form {w_1} x Om_2 for w_1 in Om_1.  Such an
event expresses that we do not care about the outcome of E_2, which
should be equivalent to just conducting E_1.  And indeed, we have
Pr({w_1} x Om_2)=Pr_1({w_1}), as a simple calculation shows.  We can
apply the same principle to "mask out" E_1.

The above arguments extend straightforwardly to any number of
probability spaces (Om_1,Pr_1),...,(Om_n,Pr_n) and their product (Om_1
x...x Om_n,Pr), where Pr({(w_1,...,w_n)})=Pr_1({w_1})*...*Pr_n({w_n})
for each (w_1,...,w_n) in Om_1 x...x Om_n.  We call this the 'standard
product space'.  So, we are now able to model combinations of
experiments.

Question 3:  Are there further arguments for that the standard product
space is good to model combinations of experiments?  Some might say that
it was a well-known rule that the probability of a combination of
"independent events" was the product of their respective probabilities.
However, is that not a conclusion drawn from an already assumed model?

We can now express probabilities in numbers of dice rolls or coinflips
(a coin being a 2-sided dice).  When rolling a k-sided dice n times, the
probability that all outcomes are '1' is 1/k^n.  If we are given an
event that has probability q, we can "feel" its probability by thinking
of rolling a k-sided dice roughly n=-ln(q)/ln(k) times and hoping that
it shows '1' each time.  For flipping a coin, we have k=2.  For
instance, if a randomized algorithm has a failure probability of
q=0.001, then we can say:  "The algorithm failing is equivalent to
flipping a coin about -ln(q)/ln(2) times, that is about 10 times, and it
always showing 'heads'."

Random Bit Generators

One important achievement so far is that we can get a "feeling" for the
performance of a randomized algorithm by stating a reference experiment
(and its model), e.g., flipping a coin a number of times.  If we have a
"feeling" for the reference experiment, we can transform that to the
randomized algorithm, provided that we used the same modeling principles
to model the reference experiment as we used to model the randomized
algorithm.  How to assert the latter, is still to be discussed.  We do
not flip coins to run a randomized algorithm, but instead we use random
bit generators.  So we have the same challenge as before, namely to
reason why a particular model is appropriate to describe an experiment,
the experiment now being to read a bit from the random bit generator.
The model of choice would most likely be the same we used to model
coinflips, i.e., we have a two-element sample space and uniform
probabilities.

Question 4:  How can we establish a reasoning that this model is
appropriate for the various (true or pseudo) random bit generators being
used in applications?  Recall that for coinflips and dice rolls, we
argued by (I) symmetry and (II) an inability to assess or to control any
influential parameters.  Can something similar be applied to random bit
generators?  Is there some standard argument that helps establishing a
reasoning for whole classes of generators at once?

I think you might benefit by looking at the book "Probability Theory:
the Logic of Science", by the late E.T. Jaynes; much of it is
available free in the on-line source http://omega.math.albany.edu:8008/JaynesBook.html
, although some (less important) sections may be missing and it lacks
some editing, etc. Jaynes deals with many of the issues you mention
and many more besides. He takes a rather unconventional route: he
seems to not believe at all in the concept of "randomness", and he
argues against it forcefully (but to me, not wholly convincingly) in
many parts of the book. So, the book is enlightening but somewhat
unorthodox; it is a 600 page book on probability and its applications
but in which "randomness" plays no real role! Personally, I think that
Jaynes carries his eschewing of randomness too far---even dismissing
the possibility of randomness in the quantum world---but it is still
worth reading his views and thinking about them. Jaynes is a confirmed
Bayesian, and he writes accordingly. His starting point is that of
"plausible reasoning", and for him, a probability value is a measure
of 'plausibilty' attached to a claim or statement. He derives such
matters as the addition and multiplication rules for probabilities by
careful analysis of desirable properties of 'plausibilities',
essentially adopting a fairly convincing set of 'axioms' that he wants
plausibilities to obey. He builds on the work of the Physicist Cox and
others.

R.G. Vickson
.



Relevant Pages

  • Modeling Random Experiments with Probability Theory
    ... Probability theory is the theory of probability spaces. ... We start with the example of flipping a coin. ...
    (sci.math)
  • Re: Why the cards must have a memory
    ... Probability theory assumes uncertainty, ... This is why I used the "tiny universe" scenario, ... ***it assumes that the probability is heads half the time FROM NOW ... incident in this case being a coin toss). ...
    (rec.gambling.poker)
  • Re: Coin tossing guessing strategy...
    ... You need a precise string in exactly one order, ... with respect to sequences of n flips, is to note that if some sequence ... In fact, assuming a fair coin, all 10 coin sequences are equally ... probability distribution for the _difference_ between the number of ...
    (sci.math)
  • Re: Error on kurtosis and skewness
    ... that homeopathy is hogwash given that homeopathy working would ... > Where the frequentist has CIs and is unable to attach probability ... We can only live our lives reasoning ... I have actually reviewed the argument I had with Reef Fish on ...
    (sci.stat.math)
  • Re: Krigman article about sevens due
    ... You are tossing a fair coin. ... What are the chances of getting exactly what probability ... tails and half heads for this to be true. ...
    (rec.gambling.craps)