Re: Use of G Statistics equation for comparing sample distribution.



On Jun 20, 9:41 am, crd...@xxxxxxxxx wrote:
If I knew any of those things, I wouldnt be as lost as I am.  The
paper that Im using for my project is here:http://www.mediateam.oulu.fi/publications/pdf/16.pdf.
The equation and setup is on page 2.

The notation of it makes no sense to either me or my advisor and the
original source is not readily available.  And without this equation,
I cant proceed in my code much further other than debugging and
example checks.

Chris

On Jun 19, 4:04 pm, Aniko <aniko123...@xxxxxxxxx> wrote:



On Jun 18, 1:29 pm, crd...@xxxxxxxxx wrote:

I am working on a project which involves comparing two regions of an
image by means of a "G statistic"  The equation I am to use is as
follows:

E_(variable) denotes a Sigma (Summation) with variable as its
condition

ie:
-------
\
/
-------
variable

s,m are two histograms (256x8 2D histograms)
i is a bin number
f_i is the frequency at bin i

The equation, as defined in the paper I'm using, is:

G = 2*( [E_s,m E_i f_i log f_i] -[E_s,m (E_i f_i) log (E_i f_i)] -
[E_i (E_s,m f_i) log (E_s,m f_i) ] +
[(E_s,m E_i f_i) log (E_s,m E_i f_i) ]

I am trying to perform this comparison within a computer program but I
am unsure of the logical flow of it, mostly since there are 3
variables (s,m,i) but only one is explicitly shown to be used.  I am
not familiar with this complex of a problem so if anyone could help my
try and decipher this equation, I would be grateful.

Chris

Chris,
You are not getting answers because we can't understand what's going
on. I think you'll need to clarify the setup and the notations (and
perhaps answer your question along the way). It is not clear what the
indices s and m represent. What is a 2D histogram? Does it mean that
you have two binned variables, and you have a count for each pair of
bins? Does s run along one dimension and m along the other, or perhaps
i runs along the entire 2D space? And you have two of those?

Aniko- Hide quoted text -

- Show quoted text -

The link was helpful. I agree with you that their notation is
atrocious. Here is a clearer version: let i run over all the bins (and
it is not important whether it is a 1D or 5D histogram). Let f_ij
denote the count in the i-th bin of the j-th histogram (j=1,2). Now
in the formula replace all f_i by f_ij, and all summations over "s,m"
by summations over j. Now it hopefully should make sense.

Aniko
.



Relevant Pages

  • Re: Use of G Statistics equation for comparing sample distribution.
    ... If I knew any of those things, I wouldnt be as lost as I am. ... The equation and setup is on page 2. ... The notation of it makes no sense to either me or my advisor and the ... f_i is the frequency at bin i ...
    (sci.stat.math)
  • Re: One template, three printers; pg1 letterhead, all others bond
    ... You can setup the bin to be used for first and subsequent pages in page ... no matter which tray it's found in. ...
    (microsoft.public.word.pagelayout)
  • Re: Strange one
    ... value 7 in agreement with the correct values in column D. But so does ... To add to the confusion using values in F generated by =5%*ROWalso ... If formulas yielding equivalent bin values produce different results, ... on how I setup the bins it gives me different results.If I setup a bin ...
    (microsoft.public.excel)
  • Error using copy project from xp iis 5 to server 2003 iis6
    ... Error trying to setup the application root ... Unable to set permissions on the 'bin' directory ...
    (microsoft.public.vsnet.general)