Re: Fitting reference curves to experimental data. Help developing an algroithm



On Jan 27, 2:19 pm, artzy <gareth.michael.vaug...@xxxxxxxxx> wrote:
Hello,

Can someone please help with the following.
I'm looking for an algorithm to implement in a small spectral analysis
program I'm writing.

I have some experimental data and several reference data. I want to
find out what combination of the reference data(spectra) fit best to
the experimental data.

So far i am able to fit a polynomial to the data sets and I'm trying
to find the combination of the reference polynomials will best fit the
experimental polynomial. So what I have is:

F(x) = alpha*A(x) + beta*B(x) + gamma*C(x)

and I need to find alpha, beta and gamma which give the best fit over
a range of x.

One of the problems is that I'm working with spectral data and thus
the area under the curves will depend on the amount of time for which
each spectra were collected.

I don't know what you mean here, and am not sure how to include it in
the analysis.



If anyone can help I would be most grateful.

If I'm not clear please ask me to clarify.

Thanks

Do I understand that you have some data set (x1,y1), (x2,y2), ...,
(xn,yn) and would like to find the coefficients a, b and c that give
the best fit between the values yj and the formulas a*A(xj) + b*B(xj)
+ c*C(xj) for j = 1, 2, ..., n? (Here if have used symbols a, b and c
instead of alpha, beta and gamma.) Assuming so, the next issue is:
what do you mean by "best fit"?

The classical answer is 'least squares', where you want to minimize
sum{[yj-a*A(xj)-b*B(xj)-c*C(xj)]^2 : j=1, 2, ..., n}. This is a
standard problem that can be solved manually using regression theory
formulas or using readily-available computer tools such as a
spread***. At the expense of a bit more manual work, you can even
introduce importance weights wj > 0 and minimize the weighted least
squares criterion sum{wj*[yj-a*A(xj)-b*B(xj)-c*C(xj)]^2 : j=1..n},
provided that you know the weights wj ahead of time. This allows you
to try harder to reduce the size of the errors in more important
regions.

On the other hand, there are some good reasons (robustness, less
sensitivity to outliers, etc.) for using a least total absolute error
criterion: minimize sum {|yj-a*A(xj)-b*B(xj)-c*C(xj)| ; j=1..n}. This
cannot be solved using classical formulas, but is easily handled using
Linear Programming (LP). For example, it can be formulated as min
sum{zj :j=1..n}, subject to zj >= a*A(xj) + b*B(xj) + c*C(xj) - yj and
zj >= yj - a*A(xj) - b*B(xj) - c*C(xj) for j = 1, 2, ..., n. In this
problem the variables are a, b, c and z1, z2, ..., zn; the input
parameters are the yj and the computed numbers A(xj), B(xj) and C(xj).
You can solve this LP problem on a spread*** using the built-in
solver, or you can download a number of free LP solvers from the web.
Of course, you could equally well have a weighted least absolute
deviation problem in which you minimize sum{wj*zj ; j=1..n} with known
importance weights wj.

R.G. Vickson
Adjunct Professor, University of Waterloo
.


Quantcast