Inferring phylogenies + duplication and divergence



A paper on the evolution of the bacterial flagellum (here and here)
http://www.pnas.org/cgi/content/abstract/0700266104v1
http://sciencenow.sciencemag.org/cgi/content/full/2007/417/3

has triggered some critical comments from bloggers (here and here).
http://www.pandasthumb.org/archives/2007/04/flagellum_evolu_1.html#more
http://genomicron.blogspot.com/2007/04/genome-sequences-reduce-complexity-of.html

And here is a blogger who has bad feelings about all this instant blogging.
http://scienceblogs.com/loom/2007/04/17/when_scientists_go_all_bloggy.php

Interesting stuff. I suppose discussion of the relationship of all this
to Behe/Miller/Matzke probably belongs over on talk.origins. But I am
curious about some methodological questions related to inference about
gene duplication and divergence and its relationship to phylogenetic inference.

To oversimplify a bit, the basic claim of the flagellum paper is that flagella
in some 38 bacterial species all contain "the same" 24 core proteins. And
that those 24 proteins all arose from a single ancestral protein which
duplicated and diverged in a pre-LUCA organism. And then that those 24
protein genes underwent further evolution as the LUCA branched over time
into the 38 species.

This kind of thing has been done before, of course, with tRNAs by Eigen's group
many decades ago, and many times since. But my first question is whether
there is a good review paper saying how one ought to go about it, and what
are the pitfalls?

One way of thinking about the problem is to build a matrix with (say) 38 rows
and 24 columns. E. coli, for example, gets row #3. And one of the 24 genes,
FlgA say, gets column #5. We have 24x38 gene sequences in our database.

Now, one way to proceed is to concatenate all of the sequences in each row,
and then build a tree of rows for the phylogeny. Then, concatenate all
of the sequences in each column and build a separate tree of columns for the
gene duplication/divergence hypotheses. Is this valid? I realize that
you have to somehow make sure that the alignments in the rows and columns
match up, but is there anything else that needs to be done?

Of course, the problem becomes more complex if you don't know (or don't assume)
that all of the gene duplication/divergences took place before all of the
organism lineage branches. Or if some of the diverged genes have been lost
in some lineages. Felsenstein's book touches briefly on some of the issues
here, but I am wondering whether there is a more complete treatment anywhere.

Also, I am curious whether phylogeny experts here agree with the bloggers and
with my own intuitions that this flagellum paper seriously overstates its case.



.


Quantcast