Re: Calculating joint entropy

From: Yaroslav Bulatov (yaroslavvb_at_gmail.com)
Date: 02/07/05


Date: 6 Feb 2005 17:46:37 -0800

Entropy is a property of a probability distribution.

So to calculate entropy you first fit a distribution to your data, and
then calculate entropy of that distribution.

What Matlab probably does for entropy of 2 discrete variables is to
take the "empirical distribution", and calculate the entropy of that.
So here, the process of "fitting probability distribution to data"
basically consists of filling in the entries of the table.

So the extend the analogy, you can treat each possible setting of your
18 variables, as a unique value, and then your estimation would be
equivalent to finding entropy of 1 variable with n^18 different values.

However, with 18 variables, even if they are binary valued, the
table-based approach will contain at least 2^18 entries, hence 2^18-1
independent parameters to estimate. That's a problem because:
1) it takes a lot of memory
2) unless you have lots of data, your estimates of 2^18 parameters will
be mostly noise, so your calculated entropy will be mostly noise

So in order to get something meaningful you need to fit a distribution
that's
1) Close enough to the "true" distribution
2) Has much fewer parameters than datapoints

Picking the right distribution is mostly domain dependent, but here are
some examples of what you can do.

1) Assume all your variables are independent.

Then joint entropy will be sums of individual entropies:
H(A,B,C,D,E,F,G) = H(A)+H(B)+....+H(G)

2) Assume some limited number of dependencies. For entropy calculations
to be easy, you can do it by specifying your dependencies as a DAG. For
instance you can do the following

A->B->C->D<-E<-F<-G

Then your entropy will be

H(A,B,C,D,E,F,G) = H(A)+H(B|A)+H(C|B)+H(D|C,E)+H(E|F)+H(F|G)+H(G)
where
H(F|G) = H(F,G)-H(G) and
H(D|C,E) = H(D,C,E)-H(C,E)

These joint entropies can be calculated using the table based approach.
(ie, treat D,C,E as a single variable with 2^3 values, and calculate
entropy of that)

You can see here, that the largest table we have to deal with has 8
cells, instead having to fill entries of full joint table with 32
cells.

The total number of parameters we have to estimate with this
parametrization is
1+3+3+7+3+3+1 = 21 as opposed to 31 for the joint table, but also here,
one datapoint can be used to fill in several different cell, whereas in
the full joint approach each datapoint only affects one cells, so that
results in low accuracy in estimation.

So to answer your question, you can:
1) Treat your 18 variables as a single variable with n^18 values, and
use formula for entropy of 1 random variable
2) Assume they are all independent, and add up their individual
entropies
3) Use your domain knowledge to come up with a good parametrized model
for the variables, and calculate entropy of that (ie, a DAG)

3) will give most accurate results, 1) will give accurate results if
you have much more than n^18 datapoints, otherwise use 2)

Hope that helps.



Relevant Pages

  • Re: behavior as mapping
    ... estimating a probability distribution, the distribution ... sequence with equal probability - since you have microsecond temporal ... reduction of the entropy Pto the entropy P ... If there were 4 genes we would need 2 bits of binding site info. ...
    (comp.ai.philosophy)
  • Computational secure entropy extraction
    ... distilling entropy from an unknown distribution. ... there existed some universal entropy distiller that could be used on all input ... D is -secure if given that k is drawn from any distribution ... Let's define that a "hit" is the case ...
    (sci.crypt)
  • Re: behavior as mapping
    ... estimating a probability distribution, the distribution ... sequence with equal probability - since you have microsecond temporal ... reduction of the entropy Pto the entropy P ... If there were 4 genes we would need 2 bits of binding site info. ...
    (comp.ai.philosophy)
  • Re: new /dev/random
    ... >A c1,c2 entropy generator takes any input k and produces a string of ... >fixing any probability distribution it likes over those strings, ... >long as the entropy exceeds k. ... our mixer to do a good job with, we can then ask whether applying SHA1 ...
    (sci.crypt)
  • Re: new /dev/random
    ... > optimistic dreams for what one would like from an entropy distiller. ... SHA1 is a secure hash, though, which would also be very exciting. ... > to any such distribution yields pseudorandom outputs. ...
    (sci.crypt)