Re: Estimate if two bivariate sets are statistically different
- From: Ryan <Ryan.Andrew.Black@xxxxxxxxx>
- Date: Tue, 15 Jul 2008 17:20:08 -0700 (PDT)
On Jul 15, 6:02 pm, Ray Koopman <koop...@xxxxxx> wrote:
On Jul 15, 12:14 pm, Ryan <Ryan.Andrew.Bl...@xxxxxxxxx> wrote:
On Jul 15, 2:14 pm, Ray Koopman <koop...@xxxxxx> wrote:
On Jul 15, 10:14 am, "m.s." <deviceran...@xxxxxxxxx> wrote:
Hi,
I have the following problem. I have two datasets that are
evaluated along two different variables. By a simple scatterplot,
it seems that dataset A follows a peculiar distribution
("horseshoe"-like, non-normal). Dataset B seems to be more
widespread and shifted towards higher Y values, but it's only a
few points.
What I would like to have is, for example, the probability
dataset B comes from a distribution like A , or any other
meaningful measure of the difference between datasets A and B.
I've found some concepts like the Mahalanobis distance or the
Hotelling T-square distribution, that could be useful, but these
seem to require the data are normally distributed in one sense
or the other. However I feel that there should be some kind of
general, obvious method.
Is there this method, and, if there is:
- where can I find info on that
- possibly, where can I look for a tutorial on how to
practically implement it?
My statistics knowledge is poor, so bear with me.
Thanks a lot,
m.
Hey Ray,
If it's okay, I have a couple of questions/comments
interspersed below your comments.
Do a scatterplot of the merged datasets, with
nothing to show which point came from which set.
1. Merge both datasets...
First dataset:
subj var1 var 2
1 45 35
2 23 45
3 17 57
Second dataset:
subj var1 var 2
4 27 31
5 29 36
6 41 48
Meged dataset:
subj var1 var 2
1 45 35
2 23 45
3 17 57
4 27 31
5 29 36
6 41 48
2. Do a scatterplot
Partition the plane -- perhaps, but not necessarily, by
grid lines; if there appear to be natural breaks they may
be used -- into regions that are as compact as possible,
In other words, look at the scatterplot and see
if points tend to cluster together in regions?
When you create the regions, you want to "carve nature at its
joints" if there are any. You wouldn't want to put a boundary
through the middle of an obvious cluster if a minor shift could
avoid it. If I were doing it I'd probably start with some sort
of grid (not necessarily rectangular) and then adjust as needed.
Also, you want to attend to both variables approximately equally.
(E.g., you wouldn't want only thin vertical strips, because that
would ignore the y-variable.
subject to the constraint that each region
have at least 5(n1+n2)/min(n1,n2) points,
where n1 and n2 are the two sample sizes.
I'm not calculating this formula correctly. Are you saying
that there should always be at least 10 points per region?
Each region will give rise to two cells in the 2 x R contingency
table. The expected frequency in each cell should be at least 5.
So yes, that means that each region should have at least 10 points
if n1 = n2, and more otherwise. (Remember, "expecteds should be at
least 5" is only a rule of thumb and should not be interpreted too
rigidly.)
This should be done by someone who knows nothing about either
dataset or how they might differ. Then do a 2 x R chi-square test
comparing the distributions of the two datasets over the R regions.
I'm having a hard time visualizing this. Let's say from the example
above that there are two distinct regions in the merged dataset.
Region 1 Points:
45, 35 = Dataset1
23, 45 = Dataset1
27, 31 = Dataset2
Region 2 Points:
17, 57 = Dataset1
29, 36 = Dataset2
41, 48 = Dataset2
So the 2X2 table would look like?
Dataset1 Dataset2
Region 1 2 1
Region 2 1 2
If this results in a significant chi square, then this provides
evidence that each dataset has a different distribution.
Am I way off here???
No, you're got the idea.- Hide quoted text -
- Show quoted text -- Hide quoted text -
- Show quoted text -
Thank you.
.
- References:
- Estimate if two bivariate sets are statistically different
- From: m.s.
- Re: Estimate if two bivariate sets are statistically different
- From: Ray Koopman
- Re: Estimate if two bivariate sets are statistically different
- From: Ryan
- Re: Estimate if two bivariate sets are statistically different
- From: Ray Koopman
- Estimate if two bivariate sets are statistically different
- Prev by Date: Re: Estimate if two bivariate sets are statistically different
- Next by Date: inversion of covariance sub-matrices
- Previous by thread: Re: Estimate if two bivariate sets are statistically different
- Next by thread: Re: Estimate if two bivariate sets are statistically different
- Index(es):
Relevant Pages
|