Re: Word count of minimum vocabulary
- From: "Richard Wordingham" <jrw0602@xxxxxxxxxxx>
- Date: 1 Jul 2006 12:22:42 -0700
Mok-Kong Shen wrote:
I believe that at least for the speicial case of Chinese,
if 10 bits suffice, then the simplicity for processing
purposes of a fixed coding with the same number of bits
overweighs the gain of employing schemes like Huffman,
which would probably only be of maginal profit for a limited
vocabulary.
Huffman saves 40% when you do a character-by-character compression of
English texts written in ASCII. Unfortunately, you do need to consider
error recovery schemes (possibly just resynching), and there you pays
your money and takes your choice.
But of course one could additionally apply
normal compression techniques to texts that only use
a limited vocabulary. Note also that only a font of size
1024 needs be stored/addressed in the present case.
Does any *single*, modern plain text writing system need that many
besides Chinese and its imitators?
I think that there may be certain advantages in rather commonly
occuring situations of communications. I conjecture that
in such cases, i.e. given the favourable conditions, one might
in general save both space (of storage and transmission) and
processsing time (consider e.g. the task of spelling checking).
SMS is the only case I can think of where that's really an issue, and
it has an apparently unused compression scheme. By 'spelling
checking', do you mean grammar checking?
Richard.
.
- Follow-Ups:
- Re: Word count of minimum vocabulary
- From: Mok-Kong Shen
- Re: Word count of minimum vocabulary
- References:
- Re: Word count of minimum vocabulary
- From: Mok-Kong Shen
- Re: Word count of minimum vocabulary
- Prev by Date: Re: arabic alphabet, two "ha" letters ?
- Next by Date: Re: arabic alphabet, two "ha" letters ?
- Previous by thread: Re: Word count of minimum vocabulary
- Next by thread: Re: Word count of minimum vocabulary
- Index(es):
Relevant Pages
|