Re: Efficient unbiased prefix-free encodings



On Dec 12, 6:49 am, Ian Parker <ianpark...@xxxxxxxxx> wrote:
On 11 Dec, 23:46, "joeship...@xxxxxxx" <JoeShip...@xxxxxxx> wrote:





I am looking for an efficient way to encode an n-letter alphabet so
that a stream of random bits will produce an unbiased stream of random
elements of {0,...,n-1}. An easy way to do this is to encode modulo
the next power of 2 greater than (a multiple of) n, then reduce modulo
n, but in the event of an "overflow" (>= the last multiple of n
available) shift left, add the next bit, and take modulo the next
power of 2 and iterate.

For example, suppose n=23 and the bit stream goes 1 1 1 0 0 1 0 ....
Proceed as follows: The successively longer binary numbers this stream
gives, in decimal, are 1, 3, 7, 14, 28, 57, 114. The first of these
which has a multiple of 23 between it and the next higher power of 2
is 114, so take 114 mod 23 and return 22. Start over with the next bit
to get the next letter. It's easy to show that the resulting stream of
letters is random and unbiased.

This requires, on average, 5.702 bits per letter in a 23-letter
alphabet, and 5.711 bits per letter in a 17-letter alphabet, etc. Can
this be done more efficiently while preserving both the "unbiased" and
"prefix-free decodable" properties? (For example, 23^21 ~
0.996*(2^95), so using 95 bits at a time to get 21 successive letters
would be close to the information-theoretic bound of log(23)/
log(2)=4.52 bits, but it doesn't work "online" and turn a stream of
bits into a stream of letters by using a prefix-free encoding).

Best way is the Huffmann codes. The general principles of which are
given in.

http://www.data-compression.com/theory.html

You take sequences of letters and encode wrt. frequency.

- Ian Parker- Hide quoted text -

- Show quoted text -

That source tells me that there are codes asymptotically approaching
the information-theoretic bound, but I am asking a more specialized
question. Assuming the letters are equiprobable, what I want is an
explicit code more efficient than the one I described, which has the
key additional properties of being unbiased (so that random bit-stream
input leads to random letter-stream output) and expressible as a
prefix-free encoding (so I'm not allowed to save up input and then
emit multiple output letters at a time).

The way I outlined is not the only way. Another method, less efficient
but conceptually even simpler, is interpret the incoming bits as the
binary expansion of a real number between 0 and 1, and stopping after
k bits if the current interval of width 1/2^k lies entirely within one
of the intervals [0,1/n], [1/n,2/n,],...,[(n-1)/n,1]. That is, you
divide the interval into n equal bins and stop when you have enough
information to determine which bin you are in.

This second method is less efficient -- for example, when n=23 you
need 6.375 bits to make a decision, on average, compared with 5.702
bits in my original scheme. I am asking for an explicit construction
which works for all n which is better than my first method, not a
nonconstructive argument that for each n better codes exist.

-- Joe Shipman

.



Relevant Pages

  • Re: Efficient unbiased prefix-free encodings
    ... that a stream of random bits will produce an unbiased stream of random ... An easy way to do this is to encode modulo ... letters is random and unbiased. ...
    (sci.math.research)
  • Re: Post on Hols
    ... Are they on holiday this week ?? ... No but your letters are floating down some stream somewhere or at the bottom of some ditch. ...
    (uk.media.tv.misc)
  • Re: Can A Parking Fine Ever Lead to a CCJ?
    ... found a stream of letters regarding about unpaid parking fines going ... find warrants amongst all the love letters. ...
    (uk.legal)
  • Re: Can A Parking Fine Ever Lead to a CCJ?
    ... found a stream of letters regarding about unpaid parking fines going ... letters go to the point of Warrants for seizure of goods etc. ... is there a risk of getting a CCJ for these? ...
    (uk.legal)
  • Re: How do I handle ^M and ^H characters output from external programs?
    ... This program that you're trying to run, "cdrecord", apparently ... *STANDARD-OUTPUT* is bound to some Lisp stream maintained by SLIME, ... program which emulates a terminal by understandin the control codes ...
    (comp.lang.lisp)

Loading