Re: binomial 'association' measure?
From: Dan Bolser (dmb_at_mrc-dunn.cam.ac.uk)
Date: 10/14/04
- Next message: François Charton: "Re: significance in contingency table"
- Previous message: Dave miller: "Forward Forecast"
- In reply to: Graham Jones: "Re: binomial 'association' measure?"
- Next in thread: Graham Jones: "Re: binomial 'association' measure?"
- Reply: Graham Jones: "Re: binomial 'association' measure?"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 14 Oct 2004 21:13:05 +0100 To: Graham Jones <grahamj@visiv.co.uk>
On Thu, 14 Oct 2004, Graham Jones wrote:
>In article <Pine.LNX.4.21.0410111922380.384-100000@mail.mrc-
>dunn.cam.ac.uk>, Dan Bolser <dmb@mrc-dunn.cam.ac.uk> writes
>
>>Here is my problem (after much rethinking the data / what I want to
>>ask)...
>>
>[...]
>
>>Over the superfamilies we can impose the 'taxonomic tree of life', i.e. a
>>distinct and given (immutable) hierarchical set of groupings for the
>>genomes. The 'root' of the 'tree of life' encompasses all genomes, and low
>>level groupings go all the way down to the individual species.
>>
>
>I'm not clear about this. You say the hierarchical set of groupings
>(which I take to mean a hierarchical clustering) is 'over' the
>superfamilies but 'for' the genomes.
>
>I assume you mean the groupings (clusters) are of genomes, because you
>seem to think it makes sense to look at one superfamily at a time. In
Yes, sorry, I made this mistake a couple of times and managed to correct
it in a few places. The above slipped through.
The hierarchical clustering of *genomes* is what I mean. I didn't use the
term hierarchical clustering because I wasn't sure if this was precicely
correct given the data, so I called the clusters 'groupings'.
The data is a tree. Each genome is a leaf, and there is only one
root. There is only one path from a leaf to the root (i.e. this isn't a
dag).
>other words, your data for one superfamily (let's say #42) looks like
>genome A has a assignments
>genome B has b assignments
>....
>
>plus
>
>some clusters like {A,D,H}, {B,C}, etc, at various levels in the
>hierarchy. Am I with you so far?
Yes, exactly correct.
>I am also unclear as to whether you want to look at a complete cut
>through the hierarchy, or just one cluster at a time.
The tree isn't very uniform. I was trying to think about cutting the tree
based on 'information content' of each node, as defined by the 'assignment
space covered' by that node. By this I mean neg log of the proportion of
the assignments which fall under that node.
NB by 'assignment' I mean specifically the superfamily assignments to the
genomes.
Cutting has the problem that I don't 'see' universally distributed
superfamilies, that is superfamilies which are best 'described' by the
root node.
>Finally, I am unclear as to whether you want to 'prove' something (which
>looks tricky to me) or just sift through your data in the hope of
>finding something interesting.
I hope to asign each superfamily to a particular part of the taxonomic
tree, like 'universal', 'mammal', 'bacteria', etc... This data will be
very usefull to me inorder to frame many questions about both superfamiy
and taxonomic evolution.
>[...]
>>Which higher level grouping (or groupings) *best* explains the observed
>>distribution of a superfamily over the genomes?
>>
>
>A catch here is that the lower the level of grouping, the better the
>explanation will be. The grouping {A}, {B}, {C},... will explain the
>data perfectly.
Yes, so somehow I need to minimize the model, and ofset that minimization
against error.
That is what I mean by *best*.
In some cases a list of genomes is acceptable, for example if a single
superfamily occurs in 5 very diverse genomes (diverse in terms of the
classification assigned to the genomes in the tree of life). One could
call this a 'universally distributed superfamily'. However, intuitivly I
think that category belongs to superfamilies which have a high number of
assignments to (almost) every genome.
It suddenly strikes me that I have been trying to solve this question for
the past 3 years.
Does the above help clarify what I want to ask?
Cheers,
Dan.
- Next message: François Charton: "Re: significance in contingency table"
- Previous message: Dave miller: "Forward Forecast"
- In reply to: Graham Jones: "Re: binomial 'association' measure?"
- Next in thread: Graham Jones: "Re: binomial 'association' measure?"
- Reply: Graham Jones: "Re: binomial 'association' measure?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|