Re: Question about the Shannon "entropy" of genomes



Doug Wedel <dougwedel@xxxxxxxxxxxxx> wrote:
Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.

look up 'codon bias' for one level of redundancy

Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.

http://www-lmmb.ncifcrf.gov/~toms/




--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748)

.



Relevant Pages

  • Question about the Shannon "entropy" of genomes
    ... Using Claude Shannon's formulas for measuring the redundancy of symbol ... tokens in message strings, and given a large enough text to work with, it ... is possible to identify the language of a text simply from the statistical ...
    (sci.bio.evolution)
  • Re: Strategic Functional Migration and Multiple Inheritance
    ... Would C# be a "simpler" language by making all the keywords single ... tokens mean different things in different contexts, ... I'm not trying to argue this in metrics of information theory, ... I don't believe such metrics give a good idea of the readability of ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Case sensitivity in programming languages.
    ... have worked for many years with case insensitive software then encounter ... Thats why is most every language today case sensitive and no one ... case-insensitive languages are the only ones who see that case sensitivity ... if the grammer requires certain tokens to be of ...
    (comp.lang.php)
  • Re: Comparing languages
    ... language between frequency and irregularity that shouldn't necessarily ... nothing to justify the following statement about "ease of learning" ... > of tokens that you don't know. ... than learning the shorter string 94706215, and if you didn't know what ...
    (sci.lang)
  • Re: Anyone use ELSE minor-mode in Emacs?
    ... Emacs minor mode authors by Emacs expansion conventions - I use F3 ... If you stick with just the "tokens" at first, ... have to browse the individual language file to see what strings are defined ...
    (comp.lang.python)