Re: where do so many tenses come from?



In article <b9ta32dlgi0kl0o1ou21kktcqb49ipmj19@xxxxxxx>, Ruud Harmsen wrote:

I'm currently writing a program to make such calculations
automatically, so that eventually I'll be able to tell if a Portuguese
and a Hungarian spoken news item contain more or less bits of
information.

Not only the frequency of phonemes plays a rôle but much more so the
frequency of clusters. I have once computed Huffman codes not only for
single letters (not the same as phonemes, but the effect will be the same
for both), but also for pairs, triples, and so on. In the text there were
102 distinct letters, but given one letter there were in the average only
27 pairs beginning with that letter, and for one quadruple of letters
there were in the average less than 2 quintuples beginning with this
quadruple. So the most recent history determines a lot what follows, and
this must be taken into account when calculating information content.

The numbers are in
http://www.lrz-muenchen.de/services/schulung/unterlagen/compress/, but be
warned that this is in no way a reliable source: it is the collection of
arbitrary examples a set up for a single lecture, and published only to
save giving out handouts.

Helmut Richter
.



Relevant Pages

  • Re: Spell check does not catch single letters.
    ... Those letters are not in the lexicon. ... Word made this choice because there are many uses for single letters that ... >>>>>to get word to show a spelling error if only one ...
    (microsoft.public.word.spelling.grammar)
  • Re: YOU ALL SUCK!
    ... >> of generating text from the frequency of single letters, ... >> new character you look at the frequences of the letters given that the ... the same triples in the suspect text. ... one could generate a Markov Chain in the style of that author. ...
    (comp.os.linux.misc)
  • Re: Software architecture using C for mid-range PIC.
    ... redundant. ... I would strongly suggest to abbreviate further, ... That would basically amount to letting a machine condense your command set to single letters, ...
    (comp.arch.embedded)