Re: where do so many tenses come from?



6 Apr 2006 12:31:48 -0400: hrubin@xxxxxxxxxxxxxxxxxxxx (Herman Rubin):
in sci.lang:

Now, consider language A with a phoneme set of size 65, where the
first phoneme occurs with a probability of 7/8, and the rest occurs
with a chance of 1/512. Then, on average, each phoneme contains:

{-log_2 (7/8) + -log_2 (1/512) * 64 } / 65
={3-log_2 7 + 9*64} / 65
= 8.8645...

bits of information.

This is wrong. If the probabilities are p_i, the
information in the language is the sum of - p_i * log_(p_i).
The average amount of information per character is meaningless,
and it is the total information which matters.

So the amount of information is -7/8*log_2(7/8) + 9*64/512
which is approximately 1.3176 bits of information.

I'm currently writing a program to make such calculations
automatically, so that eventually I'll be able to tell if a Portuguese
and a Hungarian spoken news item contain more or less bits of
information.

Next, consider language B with a phoneme set of size 128, where each
phoneme occurs with equal probability (i.e. 1/128). How much
information does each phoneme convey? That's

{-log_2(1/128) * 128} / 128
= 7

Right.

So? Language A has a smaller stock of phonemes, but it has higher
information content per phoneme.

As you see, no. The amount of information has to be
weighted by its probability of occurrence.

Right. And that's what my program does (Huffman like only, though, not
counting distribution rules.

More once in presentable state, not yet now.

--
Ruud Harmsen - http://rudhar.com
.



Relevant Pages

  • Re: where do so many tenses come from?
    ... So if there are less signs to choose from, then for each the probability to ... But that is oblivious to the contrastive nature of elements and seems ... of phonemes needed per morpheme is bigger if there are to be enough morphemes, so each phoneme could be seen as carrying a smaller part of the 'information'. ...
    (sci.lang)
  • Re: where do so many tenses come from?
    ... information of a sign (in this case a phoneme) within a sequence of signs ... So if there are less signs to choose from, then for each the probability to ... But that is oblivious to the contrastive nature of elements and seems ... informative set than. ...
    (sci.lang)
  • Re: where do so many tenses come from?
    ... Joachim> content of each phoneme you encounter in a text is less ... The average amount of information per character is meaningless, ... phoneme occurs with equal probability. ...
    (sci.lang)
  • Re: where do so many tenses come from?
    ... So if there are less signs to choose from, then for each the probability to ... However, I would suppose that if the phoneme set is smaller, the amount of phonemes needed per morpheme is bigger if there are to be enough morphemes, so each phoneme could be seen as carrying a smaller part of the 'information'. ... (Notice that if some many-phonemed language happens to have large morphemes one's back to square 1.) ...
    (sci.lang)