Re: Word count of minimum vocabulary




A recent post of Wordingham tells that Word-Huffman
has a performance of 6 bits per word on the average.
From this it is clear to me that my proposed scheme
of encoding words has barely any chance to compete
with that and thus I'll stop discussing further about
its use in transforming an arbitrary "given" text
to a bit string of minimum length, i.e. compression.
I apologize for having wasted the general reader's
time in reading the many posts about that topic
and my discussion partners' time in responding to me.

On the other hand, I like to remark that in my original
post in this thread my intention was rather to inquire
whether it is possible to "compose" a text to express
any given thought (this may eventually have to be
suitably restricted to certain domains of discourse,
e.g. private and business correspondence) in terms of
a common (preferrably standardized) vocabulary of minimum
size or, equivalently, to "paraphrase" an arbitrary
"given" text (with eventual restrictions as mentioned
above) in such a way that all the words are to be found
in such a common minimum-sized vocabulary. In that topic
comparatively little has been discussed todate in this
thread in my view. However, for Chinese an estimate
of the vocabulary size of 1000 words has been given
in Lee's post. Since Chinese ideographs don't have
"derivatives", a vocabulary of size 2^10=1024 would
thus serve the purpose. I conjecture that, for some
special domains of discourse, it may be conceivable
and practicable to employ even somewhat smaller
vocabularies. On the other hand, Chinese is a rather
special language (in particular, it doesn't have an
alphabet and is deemed by quite many people to be
rather hard to learn), even though it is one of
the major natural languages of the world. English,
on the other hand, is without doubt to be considered
the most important natural language for a large
number of application fields today and in the future.
What is the size of a vocabulary (with and without
counting the "derivatives") for English that
corresponds in functionality to the one for Chinese
of size 2^10? And how should one proceed to determine
its content? I should appreciate it very much, if
there could be some further discussions on such
questions in this thread.

Thanks.

M. K. Shen

.



Relevant Pages

  • Re: Word count of minimum vocabulary
    ... Since Chinese ideographs don't have ... the most important natural language for a large ... one of the applications of the minimum-sized vocabularies. ... Chinese does not use ideographs. ...
    (sci.lang)
  • Re: UTM
    ... Are we suppose to do that with a UTM that has ... sci.lang deals with natural languages ... (English, Chinese, Spanish, etc.) ...
    (sci.lang)
  • Re: All known english words
    ... words in English. ... >2000 word vocabularies. ... >news tool) of mine over the past seven days. ... > radio news be given in Basic English with the appropriate Basic ...
    (sci.crypt)
  • Re: Plausibility Check
    ... of small vocabularies. ... English a poor language? ... I spoke and speak of the era of Middle English ...
    (sci.lang)
  • Re: I need help explaining basic linguistic concepts to a lay person
    ... >> English don't have those meanings in their vocabularies. ...
    (sci.lang)