Re: Word count of minimum vocabulary



Mok-Kong Shen wrote:

I believe that at least for the speicial case of Chinese,
if 10 bits suffice, then the simplicity for processing
purposes of a fixed coding with the same number of bits
overweighs the gain of employing schemes like Huffman,
which would probably only be of maginal profit for a limited
vocabulary.

Huffman saves 40% when you do a character-by-character compression of
English texts written in ASCII. Unfortunately, you do need to consider
error recovery schemes (possibly just resynching), and there you pays
your money and takes your choice.

But of course one could additionally apply
normal compression techniques to texts that only use
a limited vocabulary. Note also that only a font of size
1024 needs be stored/addressed in the present case.

Does any *single*, modern plain text writing system need that many
besides Chinese and its imitators?

I think that there may be certain advantages in rather commonly
occuring situations of communications. I conjecture that
in such cases, i.e. given the favourable conditions, one might
in general save both space (of storage and transmission) and
processsing time (consider e.g. the task of spelling checking).

SMS is the only case I can think of where that's really an issue, and
it has an apparently unused compression scheme. By 'spelling
checking', do you mean grammar checking?

Richard.

.



Relevant Pages

  • Next set of vetting
    ... Harrington's Compression Method, ... The First Fundamental is the Huffman. ... Pascals triangle has applications in many many different fields. ... possible outcomes of that many bits, ...
    (comp.compression)
  • Re: Next set of vetting
    ... design a model for a data source that fits to that. ... In typical compression the usage ... You forget here that the input of the huffman must be "out of balance" ... you will find that the result is in balance. ...
    (comp.compression)
  • Re: Huffman & Entropy Question
    ... Huffman encoding ... Does this Huffman code ensure "compression"? ... "compression on average" over a huge number of input messages. ... And just to touch on the subject of entropy, ...
    (comp.compression)
  • Re: huffman compression
    ... >i want a full code of huffman data compression for text files in C++. ... other algos, eg, huffman, for some increase in total compression ratio). ... faster than I can get an arithmatic coder. ... with static huffman, eg, chunking and stream encoding concerns). ...
    (comp.compression)
  • Re: huffman tree save to file
    ... > Could you suggest different ways for saving huffman tree in the ... > compressed file header? ... non-trivial size, imo, the huffman tree size doesn't matter that much. ... computer), but, as a cost, the compression ratio was rather poor. ...
    (comp.compression)