Re: Question about the Shannon "entropy" of genomes




"Doug Wedel" <dougwedel@xxxxxxxxxxxxx> wrote in message
news:g5eqau$1oak$1@xxxxxxxxxxxxxxxxxxxxxx
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures"
of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.


Three search terms you may find useful:

Codon usage bias
GC-content
puffer-fish junk-dna


Graham


.



Relevant Pages

  • Re: Strategic Functional Migration and Multiple Inheritance
    ... Would C# be a "simpler" language by making all the keywords single ... tokens mean different things in different contexts, ... I'm not trying to argue this in metrics of information theory, ... I don't believe such metrics give a good idea of the readability of ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Case sensitivity in programming languages.
    ... have worked for many years with case insensitive software then encounter ... Thats why is most every language today case sensitive and no one ... case-insensitive languages are the only ones who see that case sensitivity ... if the grammer requires certain tokens to be of ...
    (comp.lang.php)
  • Re: Comparing languages
    ... language between frequency and irregularity that shouldn't necessarily ... nothing to justify the following statement about "ease of learning" ... > of tokens that you don't know. ... than learning the shorter string 94706215, and if you didn't know what ...
    (sci.lang)
  • Re: Anyone use ELSE minor-mode in Emacs?
    ... Emacs minor mode authors by Emacs expansion conventions - I use F3 ... If you stick with just the "tokens" at first, ... have to browse the individual language file to see what strings are defined ...
    (comp.lang.python)
  • Re: Comparing languages
    ... >> of the international language is the most simple case to begin with. ... >> vary from language to language, and are indicators of the ease of learning ... contains N word tokens. ... the fraction of tokens in T that are not in L. ...
    (sci.lang)