Re: Nonlinear Least-Squares curve-fitting




Old Mac User wrote:
The root of many difficulties with nonlinear regression (nonlinear
coefficients in the model) was aptly described by Reef Fish. In many
instances... most instances in my experience with data from chemical
and physical systems... some of the model coefficients are strongly
correlated with each other. This makes estimating those coefficients
difficult, ambiguous, and sometimes impossible. Reef Fish mentioned the
"slippery banana" which is used to describe this situation. I'd like to
reduce his three-variable (three model coefficients) example down to
two model coefficients.
Sketch some contour lines that have the appearance of an elongated
valley. By elongated I mean REALLY elongated. These contour lines will
be elipses. The axes of this graphic are the values of the estimated
model coefficients. Rotate this valley so that it's at about a 45
degree angle on the graph paper. Then, just for fun, put a bend in it
so that it resembles a banana. The lines of constant elevation
represent lines of constant residual sums of squares. That is, the
residual sum of squares is a constant at all points on a given contour
line. This is a good representation of the sum of squares valley for a
troublesome nonlinear model. The estimated model coefficients you are
seeking are at the bottom of this valley. The model coefficients are
correlated with each other. Pick a value of one of the coefficients...
hold it constant... run a traverse in the other coefficient... find the
minimum sum of squares and report the value of that second coefficient.
Do this again with the first coefficient set to a different value and
the value of the second coefficient will change accordingly. But in
both instances the residual sum of squares will be almost the same.
The fact that they are "almost the same" is a warning that there is no
unique least squares solution.

In a real-life attempt at fitting data to such a model, you may end up
with a pair of model coefficients that happen to be just one of many
combinations, each of which has almost the same residual sum of
squares. If you run the function minimizer just once, you may think
"I've found the least squares values of those two parameters" when in
fact there are many combinations each of which is just as good as all
the others.
So there will be no unique least squares solution. This is why I asked
that you start the optimizer (minimizer) from different points and
record the model coefficients and the corresponding residual sum of
squares. You can learn a lot about "the valley" that way.

Attempting to relate model coefficients to "known" or perceived values
attributed to some physical (or chemical) system can be extremely
difficult. After all, there will be many combinations of the
coefficients and each combination is just as good as many others.
Heaped on top of this, you may be dealing with three of more correlated
model coefficients. And there's the matter of calculating confidences
intervals on those coefficients to establish a range of uncertainty.
Actually, to validate that the model has any merit whatsoever.

Having said this, now let's imagine what happens if you include all of
the available data... except for one data... in the fitting process.
Suppose your estimated model coefficients turn out to be near one end
of the valley. Then add that one piece of data and fit again. That may
culminate in "finding" a pair of model coefficients at the other end of
the valley. The elongated valley makes the solution unstable, or
ill-conditioned.

Please believe me with I say that fitting data to nonlinear models is a
specialized undertaking. If you focus entirely on the wonders of
linking data + model + minimizer and watching it "run" and find a
combination of coefficients... ignoring the sum of squares valley,
confidence intervals of the estimated coefficients, etc. then your
"answer" will be at the whim of small changes (random variation) in the
raw data and roundoff error (and the "fitting policy, such as a
stopping rule) in the minimizer.

The reason why I was working on this problem (which was about 15 years
ago) was to help develop spectroscopy software for a company in which
given data, the software can estimate parameters of a certain
non-linear least squares problem. The parameters were real experimental
physical quantities. I had to test my algorithm on real data in which
the experimental values of the parameters were known. If I could get my
algorithm spit out parameters which match these real experimental
values, then I knew that my algorithm would probably work well for data
of the user of the software. This was the purpose of my task.

Based on my experience with this, I agree with you that fitting data to
nonlinear models is a specialized undertaking. Automating such a
process is risky. But I found that I increased the odds of the
algorithm succeeding in matching the experimental parameters when I did
what the head of the company suggested, which was to gradually
introduce more and more data points to the NLS estimation, as I talked
about in my original post.


Nonlinear models abound in physics and chemistry. As a chemical
engineer-statistician I worked with reaction rate models most of which
are nonlinear in the coefficients. Some are best expressed as
first-order differential equations, which makes it even more
interesting because then we have to "pick a combinatiokn of
coefficients"... do a numerical integration... calculate the residuals
and the residual sum of squares... pick another combination... etc. so
as to resolve to a least squares solution set.

My parting advice is this. To learn about the sum of squares valley,
write some software to calculate the sum of squares (not the minimum
SS, just the sum of squares as Sum(Observed - Predicted)^2) for all
combinations of the pairs of your model coefficients. Do this in a
grid in terms of values of those pairs of coefficients. Look for the
patterns and trends in the sum of squares valley. If the valley is
"round" in terms of a particular pair of model coefficients, then
estimating those two independently of each other is easy. If the
valley is highly elongated in a particular pair of coefficients, that
pair will be a problem. If you see such a highly elongated sum of
squares valley, remove a few data and calculate another grid. If you
make the grid fine enough, you'll likely see the penultimate minimum
sum of squares shift around.

When fitting data to a nonlinear model, the first thing to do is to
create the suggested grids and learn about the shape of the sum of
squares valley. After that... learn "which experimental data is
needed" in order to reduce the correlations among pairs of model
coefficients so as to reduce the elongation. Finally... maybe... it
will be time to put it all together with a function minimizer.

When all is said and done, success/failure hinges on having data that
is dispersed in "experimental space" such that we can actually estimate
the model coefficients.

There are similar problems with Neural Nets, which are usually just a
form of least squares minimization. The major difference is that with
NNs you cannot tell how many "fitting constants" are being used. This
is truly a case of "torturing the data until it confesses". Which, as
Reef Fish rightly suggested, often leads to false confessions.

Take care. OMU



cafeinst@xxxxxxx wrote:
Old Mac User wrote:
Cafei...

If you are finding approximately the same residual sum of squares for
various combinations of the fitting parameters, then you have very poor
estimates of those parameters. In other words, the confidence intervals
on the parameters are very wide.

It always goes back to the "design of the experiments" that spawned the
data. Attempting to fit a "complex" nonlinear model to just any old set
of accumulated data is a gross waste of time. Estimating parameters in
a linear model can be a loser game if the "independent variables" are
poorly arranged in their space. Attempting to do this with a "complex"
nonlinear model is many times worse. Worse in the sense of wasting time
and effort.

Multiple minima... or what seem to be multiple minima... many of which
have approx. the same residual sum of squares... suggests your data is
poorly conditioned for the model you are attempting to fit.

You may be right. Let me describe what I did in more detail, so that
perhaps you can explain why this happened:

The data that I got came from physics experiments. The functions to fit
the data came from standard physics equations. The parameters in these
equations were actually real physics quantities which could be
experimentally measured. When I applied the algorithm which I described
of gradually adding data points, the parameters that the algorithm
output were much closer to the experimentally observed values for the
parameters than when I did not do it gradually.

However, when I did the same thing except that I chose the parameters a
priori and I artificially generated data to fit the parameters for
which the error in the data was normally distributed, the algorithm
which I described of gradually adding data points did not give any
improvement over when I did not do it gradually.

Any possible explanations?


So go back and look at those residual sums of squares. At the end of
the "fitting" process, are many of them approx. the same? OMU


cafeinst@xxxxxxx wrote:
Scott Seidman wrote:
cafeinst@xxxxxxx wrote in news:1158947866.574595.138520
@h48g2000cwc.googlegroups.com:

This technique was suggested to me in 1992 when I was having trouble
getting my computer program to give a nonlinear least-squares fit to a
certain complicated function. And it worked amazingly. I'd like to know
if anyone has heard of this?

Craig



Never heard of this, but I would think that whether it would work nicely or
not would depend upon how well-behaved the function and the data are--
sometimes it might work nicely, and sometimes not-so-nicely.

I have also tried it on other types of functions, but there didn't seem
to be any gain in doing it this way. The intuition for this type of
technique is that if you start with a small number of data points, the
least squares function is not as complicated as if you had started with
a large number of data points; therefore, if the least squares function
involves a small number of data points, there won't be as many local
minima to avoid as if the least squares function involved a large
number of data points.

It's using *gradualness* to solve the problem. It's actually quite
natural and probably explains why it is not good to teach 1st graders
how to read by giving them "Huck Finn". It's better to first teach them
"Hellicopters and Gingerbread" or "A Duck is a Duck" and let them read
"Huck Finn" in high school. This concept can be used for Machine
Learning too, I would think.

Craig


Regardless of the method you use to generate your initial guess, best
practice is to start from a variety of locations in n-space, and see which
initial guess gives you the smallest least squares error. You usually
don't want to depend on just one initial guess to give you the "right"
answer, even if you have a nifty algorithm like the one you described.
Choosing that handful of starting points can be something of an art, too.

There are some algorithms that actually provide the estimator with enough
"energy" to jump out of local minima. You see this type of thing often
with neural nets, where there can be a hundreds of (essentially
meaningless) weights to estimate, and there are local minima all over the
place.

--
Scott
Reverse name to reply

.



Relevant Pages

  • Re: representing polynomial range values as sums of 2 squares
    ... Let f be a polynomial in n variables, n>=1, with integer coefficients. ... elements of rangewhich can be represented as the sum of 2 squares ... of integers is either empty or infinite. ...
    (sci.math)
  • Re: representing polynomial range values as sums of 2 squares
    ... Let f be a polynomial in n variables, n>=1, with integer coefficients. ... elements of rangewhich can be represented as the sum of 2 squares ... of integers is either empty or infinite. ...
    (sci.math)
  • Re: Nonlinear Least-Squares curve-fitting
    ... Nonlinear Models bothered me a lot. ... is the number of fitting constants, or model coefficients It's not the ... who do nothing but confidence intervals and formal hypothesis tests), ... of squares with respect to each model coefficient and the mixed second ...
    (sci.stat.math)
  • Re: Nonlinear Least-Squares curve-fitting
    ... The root of many difficulties with nonlinear regression (nonlinear ... coefficients in the model) was aptly described by Reef Fish. ... represent lines of constant residual sums of squares. ... residual sum of squares is a constant at all points on a given contour ...
    (sci.stat.math)
  • Re: Nonlinear Least-Squares curve-fitting
    ... Nonlinear Models bothered me a lot. ... is the number of fitting constants, or model coefficients It's not the ... If those confidence intervals are realistically narrow, ... of squares with respect to each model coefficient and the mixed second ...
    (sci.stat.math)

Loading