Re: Word count of minimum vocabulary
- From: Mok-Kong Shen <mok-kong.shen@xxxxxxxxxxx>
- Date: Mon, 03 Jul 2006 01:12:47 +0200
A recent post of Wordingham tells that Word-Huffman
has a performance of 6 bits per word on the average.
From this it is clear to me that my proposed scheme
of encoding words has barely any chance to compete
with that and thus I'll stop discussing further about
its use in transforming an arbitrary "given" text
to a bit string of minimum length, i.e. compression.
I apologize for having wasted the general reader's
time in reading the many posts about that topic
and my discussion partners' time in responding to me.
On the other hand, I like to remark that in my original
post in this thread my intention was rather to inquire
whether it is possible to "compose" a text to express
any given thought (this may eventually have to be
suitably restricted to certain domains of discourse,
e.g. private and business correspondence) in terms of
a common (preferrably standardized) vocabulary of minimum
size or, equivalently, to "paraphrase" an arbitrary
"given" text (with eventual restrictions as mentioned
above) in such a way that all the words are to be found
in such a common minimum-sized vocabulary. In that topic
comparatively little has been discussed todate in this
thread in my view. However, for Chinese an estimate
of the vocabulary size of 1000 words has been given
in Lee's post. Since Chinese ideographs don't have
"derivatives", a vocabulary of size 2^10=1024 would
thus serve the purpose. I conjecture that, for some
special domains of discourse, it may be conceivable
and practicable to employ even somewhat smaller
vocabularies. On the other hand, Chinese is a rather
special language (in particular, it doesn't have an
alphabet and is deemed by quite many people to be
rather hard to learn), even though it is one of
the major natural languages of the world. English,
on the other hand, is without doubt to be considered
the most important natural language for a large
number of application fields today and in the future.
What is the size of a vocabulary (with and without
counting the "derivatives") for English that
corresponds in functionality to the one for Chinese
of size 2^10? And how should one proceed to determine
its content? I should appreciate it very much, if
there could be some further discussions on such
questions in this thread.
Thanks.
M. K. Shen
.
- Follow-Ups:
- Re: Word count of minimum vocabulary
- From: Mok-Kong Shen
- Re: Word count of minimum vocabulary
- From: Ruud Harmsen
- Re: Word count of minimum vocabulary
- From: Lee Sau Dan
- Re: Word count of minimum vocabulary
- Prev by Date: Re: Code for pinyin's tone markers
- Next by Date: Re: Past Tense Stem Vowels in Arabic Verbs
- Previous by thread: Re: Word count of minimum vocabulary
- Next by thread: Re: Word count of minimum vocabulary
- Index(es):
Relevant Pages
|