Testing correlation with permutation samples
- From: "\"Luis A. Afonso\"" <licas_@xxxxxxxxxxx>
- Date: Fri, 02 Dec 2005 20:12:23 EST
This is only a tentative preliminary effort in the use of permutation samples to evaluate confidence intervals of the Pearson CLC (Coefficient of linear correlation).
Given a bivariate sample x,y of size n, noted by S0=(xi, yi) ,i=1,2,...n, let us evaluate the CLC:
__r0 = a / b____________________________(1)
__a= [xy] - (1/n)[x][y]________b= sqrt (uv)
__u=[x^2] - (1/n) [x] ^2_v= [y^2] - (1/n) [y] ^2
Where the straight parentheses mean: sum of the indexed quantities from 1 to n.
Our goal is to test if r0 is a (statistically) significant value, i.e., if r0 (whatever it is) could not be obtained by pure chance (indisputably different from zero).
Our *rationale* is the following: if each X value is paired with a randomly chosen Y one to form a bivariate pair we destroyed the eventual relationship between the two Populations and consequently the sample will belong to the Universe of no correlation (set L), or the same to say such that H0 is true. Therefore we can find the position of r0 among L= r|H0. In terms of r0 cumulative frequency it should be greater than 0.025 but lesser than 0.975 if H0 is true (acceptance interval at the 5% level).
In order to be possible to construct the Permutation samples the expression (1) should be changed in
__[xy] = (1.x1´.y1´ + 2.x2´.y2´ +...+ n.xn´.yn´)/c
_where c = n(n+1)/2. and similarly:
_____[x] = (1.x1´ + 2.x2´ +...+ n.xn´)/c
_____[x^2] =(1.x1´^2+ 2.x2´^2 +...+ n.xn´^2)/c
_____[y] = (1.y1´ + 2.y2´ +...+ n.yn´)/c
_____[y^2] =(1.y1´^2 + 2.y2´^2 +...+ n.yn´^2)/c
Writing x1´, x2´,..., xn´ I represent ANY permutation of the X sample ítems (all of them, drawn without replacement). The same for the sample Y.
It is easy to find that the factor c turns out eliminated from the fraction r. (I had showed elsewhere in this news that the Mathematical Expectations do not change using this *weights*).
Advantages:
_1_The Populations from where the source samples were drawn seem to be irrelevant (no-parametric test); in particular the concern about normality does not matter.
_2_The obnoxious *with replacement* is avoided,
_3_ Contrary to the *complete shuffling* of all the items (both samples) each sample population is preserved (I am thinking, for example, in normal populations of different variances).
All feed-back criticizing objectively this idea would be welcomed.
_______________licas (Luis A. Afonso)
.
- Prev by Date: Re: regression and p-values
- Next by Date: smaller errors after including correlations in the fit?
- Previous by thread: regression and p-values
- Next by thread: smaller errors after including correlations in the fit?
- Index(es):