Re: Estimate if two bivariate sets are statistically different



On Jul 15, 10:14 am, "m.s." <deviceran...@xxxxxxxxx> wrote:
Hi,

I have the following problem. I have two datasets that are evaluated
along two different variables. By a simple scatterplot, it seems that
dataset A follows a peculiar distribution ("horseshoe"-like, non-
normal). Dataset B seems to be more widespread and shifted towards
higher Y values, but it's only a few points.

What I would like to have is, for example, the probability dataset B
comes from a distribution like A , or any other meaningful measure of
the difference between datasets A and B.

I've found some concepts like the Mahalanobis distance or the
Hotelling T-square distribution, that could be useful, but these seem
to require the data are normally distributed in one sense or the
other. However I feel that there should be some kind of general,
obvious method.

Is there this method, and, if there is:
- where can I find info on that
- possibly, where can I look for a tutorial on how to practically
implement it?

My statistics knowledge is poor, so bear with me.

Thanks a lot,
m.

Do a scatterplot of the merged datasets, with nothing to show which
point came from which set. Partition the plane -- perhaps, but not
necessarily, by grid lines; if there appear to be natural breaks they
may be used -- into regions that are as compact as possible, subject
to the constraint that each region have at least 5(n1+n2)/min(n1,n2)
points, where n1 and n2 are the two sample sizes. This should be done
by someone who knows nothing about either dataset or how they might
differ. Then do a 2 x R chi-square test comparing the distributions
of the two datasets over the R regions.
.



Relevant Pages