Re: probability of sequences and seating



(1) suppose I have a sequence of letters: A,B,C,D, and I know that the
sequence starts with 'ABC'. I want to compute the probability that the
sequence continues for n more letters after the initial 'ABC' before
reaching one of the terminal sequences 'BAC', 'BCD', or 'DCB'.

You may want to look up Markov chains if you want references on how to deal with things like this.

One way to handle it is to describe it through the states made up of all 2-letter words plus an additional state, let's call it End, which signifies the end. Define a vector P[k] so that P[k](End)=probability of reaching a terminal sequence after exactly k letter, and P[k](XY) for any 2-letter word XY denotes the probability that after k letters you have no terminal sequence and the last two letters were XY.

(In order to actually make this into a Markov chain, you'd have to add another state, Term, so that you pass from End to Term, and then stay in Term.)

Define the matrix Q as the transition probabilities: i.e. the probability of the transition XY->YZ is 1/4 if XYZ is not a terminal sequence, XY->End has probability 1/4 if XYZ is terminal for some Z, otherwise all the elements are zero.

If we start with P[2]=(0,1/16,...,1/16), we get P[k+1]=P[k]*Q. If e=[1,0,...,0]^T, the probability p[k] of reaching End exactly after k letters is p[k]=P[k]*e, i.e. the first component of P[k]. We may then make the generating function
p(x) = p[2]*x^2+p[3]*x^3+...
= P[2]*x^2*(I+x*Q+x^2*Q^2+...)*e
= P[2]*x^2*inverse(I-x*Q)*e.
If I'm not mistaken, p(x)=f(x)/g(x) where f and g are polynomials of degree 16 and 17 (or will it be 15 and 16?), which will result in p[k] becoming an expression on the form
p[k] = sum_{i=1..16} a_i*(b_i)^k
for suitable a_i and b_i (may get a little more complicated in general if Q's eigenvalues are not distinct).

(2) a second, related question. suppose I have another sequence of
letters composed of A, B, C, and D, each equally probable and
independent as before. suppose I now just want to calculate the
probability of seeing *exactly n* 'ABC' triplets anywhere in any
sequence of letters. this problem strikes me as a binomial problem but
it doesn't quite seem to work. the probability of seeing 'ABC' is
(1/4)^3.

One way to do this is to make the generating function p(x) as above for the similar case where 'ABC' is the only terminal word. The x^n-coefficient of p(x) then gives the probability of getting the first 'ABC' after exactly n letters.

You can then find the probability of finding the m'th occurence of 'ABC' after exactly n letters as the x^n-coefficient of p(x)^m. (If you had a set of terminal words which could potentially overlap, you'd have to do this slightly differently.)

However, I suspect there's a simpler way: will thing about it over the weekend.

Einar

.



Relevant Pages

  • Re: probability of sequences and seating
    ... I am trying to solve two simple probability problems. ... sequence starts with 'ABC'. ... is terminated by one of the three terminating triplets. ...
    (sci.stat.math)
  • Re: help with simple probabilities
    ... Let's say that we are going to choose K letters. ... Hence the probability of choosing a sequence of K letters ... probability of a sequence with at most n distinct letters. ...
    (sci.stat.math)
  • probability of sequences and seating
    ... I am trying to solve two simple probability problems. ... sequence starts with 'ABC'. ... is terminated by one of the three terminating triplets. ...
    (sci.stat.math)
  • Re: help with simple probabilities
    ... Hence the probability of choosing a sequence of K letters ...
    (sci.stat.math)
  • Re: probability of sequences and seating
    ... sequence starts with 'ABC'. ... I want to compute the probability that the ... sequence continues for n more letters after the initial 'ABC' before ...
    (sci.stat.math)