Re: Functional approximation in higher dimensions

From: Greg Heath (heath_at_alumni.brown.edu)
Date: 02/24/05


Date: Thu, 24 Feb 2005 04:46:33 GMT

With that many inputs, the first order of business after scaling and
outlier removal is probably input dimensionality reduction.

In general, optimal results for classification and regression are
obtained when the the I/O transformation is taken into consideration.
However, since there is never a free lunch, compromises must be made
with the prediction/causality dilema.

For prediction, one looks for a method of dimensionality reduction that
will represent the n-dimensional input by as few as possible m (m < n)
independent features without significantly degrading the resulting
output. However, often this results in features whose correlation with
the output is very difficult for humans to understand.

For causality, one looks for a method of input variable subset
selection that will represent the n-dimensional input by as few as
possible p ( p < n) original inputs without significantly degrading
the resulting output. However, often the the number of inputs with an
understandable correlation with the output is far less than the number
needed to obtain a satisfactory output.

Nonlinear Partial Least Squares (NPLS) is a technique that has been
developed to deal with this dilema. I am not familiar with the method,
but I think it is equivalent to trying to minimize the weighted sum of
mean-square-errors in output and input space. There are free algorithms
available via an internet search. Most of them, however, are for Linear
PLS.

I've read that NLPS doesn't appear to have an advantage over NNs.
However, since many real-world I/O correlations have a strong linear
component, a quick and dirty fling with a free PLS (Linear) algorithm
might be fruitful. In the same vein, Linear Principal Component
Analysis in the combined I/O space is worth investigation.

Nonlinear NN techniques (Sensitivity coefficients, Optimal Brain
Damage, ...) that have been discussed in former postings and the FAQ
(See ftp://ftp.sas.com/pub/neural/importance.html#linmod_wgt) can be
searched using groups.google.com. These can lead to further searches
for free software.

Hope this helps.

Greg

[ comp.ai is moderated. To submit, just post and be patient, or if ]
[ that fails mail your article to <comp-ai@moderators.isc.org>, and ]
[ ask your news administrator to fix the problems with your system. ]



Relevant Pages

  • Re: Functional approximation in higher dimensions
    ... outlier removal is probably input dimensionality reduction. ... obtained when the the I/O transformation is taken into consideration. ... understandable correlation with the output is far less than the number ... Most of them, however, are for Linear ...
    (sci.math)
  • Re: A basic question on Canonical Correlation Analysis
    ... >> I have a basic question about canonical correlation. ... using ordinary linear regression will lead to massive ... One attempt to produce a predictor which has low computation cost ... before replacing all the independent vectors with the ...
    (sci.stat.edu)
  • Correlation Probability Confidence Intervals in PI
    ... Correlation Probability Confidence Intervals in PI ... Notice that in simple linear regression, ... within the Fairly Frequent Event range of .05 to .95. ...
    (sci.stat.math)
  • Re: Orthogonal Distance Regressions in R (or anywhere else)
    ... irrelevant and proves to be so in a simple regression! ... Let r be the correlation between x and y. ... It is a NUMERICAL value of NUMERICAL (linear) association. ...
    (sci.stat.math)
  • Re: Regression significance conundrum
    ... >>> linearly correlated with x, then there is a linear functional ... with a "linear function model between Y and X". ... about not knowing what a linear regression model is! ... >>NOW about how you SERIOUSLY erred about what correlation measures. ...
    (sci.stat.math)