Re: Word count of minimum vocabulary
- From: Mok-Kong Shen <mok-kong.shen@xxxxxxxxxxx>
- Date: Sun, 02 Jul 2006 18:18:27 +0200
Richard Wordingham wrote:
Mok-Kong Shen wrote:
Compression schemes on ASCII texts could indeed achieve such
efficiencies. However, how many ASCII characters does an
average word in a common text have?
I was drawing an analogy. The ASCII character set uses 7 bits per
character. A Huffman encoding uses an average of 5 bits per character.
Your 1024 word set needs 10 bits per word - a Huffman encoding would
need less on average. Incidentally, have you worked out how to handle
punctuation?
Maybe I gravely misunderstood you. But according to a post
of Lee, an "average" word has 5 characters, so it would
follow that Huffman generates 5*5=25 bits per word on the
average, which is far more than 10 bits, right?
I suppose punctuation sign must be treated as independent
entities in a system treating words as units. (Are there
better ways?) On the other hand, most spaces in a text
need not be considered in that system. One can namely
adopt the convention that a word is always to be followed
by a space, unless the next unit is a punctuation sign.
There's an interesting discussion at
http://www.stanford.edu/class/cs276a/projects/reports/dalmassi-sammysy.html
Many thanks for the valuable link. I'll look at it sometime later.
M. K. Shen
.
- Follow-Ups:
- Re: Word count of minimum vocabulary
- From: Richard Wordingham
- Re: Word count of minimum vocabulary
- References:
- Re: Word count of minimum vocabulary
- From: Mok-Kong Shen
- Re: Word count of minimum vocabulary
- From: Richard Wordingham
- Re: Word count of minimum vocabulary
- From: Mok-Kong Shen
- Re: Word count of minimum vocabulary
- From: Richard Wordingham
- Re: Word count of minimum vocabulary
- Prev by Date: Re: If I heard russian until I was two years old will I remember anything sixty years later lenneberg
- Next by Date: Learn Chinese Here - I Love Chinese
- Previous by thread: Re: Word count of minimum vocabulary
- Next by thread: Re: Word count of minimum vocabulary
- Index(es):
Relevant Pages
|