Re: binomial 'association' measure?

From: Aleks Jakulin (a_jakulin_at_@hotmail.com)
Date: 10/03/04


Date: Sun, 3 Oct 2004 20:27:59 +0200

Dan Bolser wrote:
> >
>>We explored the latter two approaches in our paper at
>>http://www.ailab.si/aleks/Int/jakulin-bratko-ICML2004.pdf.
>
> "In problems with many attributes, the joint PDF may become sparse.
> The objective of learning is to construct a model of the joint PDF
> that will avoid this sparseness."
>
> Do you mean learning in this particular problem domain or learning
> in general?
> Is such a constructed model 'the best' in some way? Why is it the
> objective?

Actually, either way. In general, probability becomes meaningless with
sparseness. For meaningful probability, you need some kind of overlap
of multiple instances in the same locale. Even procedures that claim
to be sparse, such as support vector machines, are truly just
projecting the high-dimensional instances into one-dimensional
distances from a certain hyperplane.

> You say that 'high or low' interaction information among attributes
> is an indication that the attributes interact and should not be
> factorized. This makes sense because you say 'factorization takes
> advantage of independencies among attributes', however, you also
> say 'of course, the factors themselves need not be independent'.
> I am confused!

Imagine some joint PMF P(A,B,C,D,E). The interactions are only present
in p(A,B) and in p(B,C,D). These, along with P(E) can be seen as
"factors". But you cannot factorize P(A,B,C,D,E) into
P(A,B)P(B,C,D)P(E) because B appears twice, and the "factors" overlap.
This can be corrected using the chain rule and conditioning P(A,B) and
P(B,C,D) on P(B), as is a common practice with Bayesian networks. The
factorization of P(A,B,C,D,E) is then P(A|B)P(B,C,D,B)P(E).

Aleks

-- 
mag. Aleks Jakulin
http://www.ailab.si/aleks/
Artificial Intelligence Laboratory,
Faculty of Computer and Information Science,
University of Ljubljana, Slovenia.