Re: Software suggestions
- From: "Anon." <bob.ohara@xxxxxxxxxxxxxxxxx>
- Date: Sat, 27 Aug 2005 18:14:35 +0300
BernardZ wrote:
In article <1125140487.375978.136460@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, clemenr@xxxxxxxxxx says...
From what you write, I think that the following is the case:
You have a table of data.
You want a formula that relates some variable "Results" to some other variables "var1", "var2", "var3" ... "varn"
If you were interested in linear regression, then you'd want to find weights b0, b1, b2, b3, ..., bn such that:
Results = b0 + b1 * var1 + b2 * var2 + b3 * var3 ... + bn * varn
minimises some goodness of fit measure. E.g. least squares.
However, you don't know the form the expression should take. Hence you want a program that finds the formula for you. You're disappointed by what you find in R because the various procedures in R can typically only be used if you give the structure of the formula, and R then finds a "good" set of weights.
Is this the case?
If this is the case, then you are in trouble. Because finding such formulae in data, at least in the relatively unrestricted (in terms of possible structures present in the formula, over and above weights) form that this problem is usually attempted in the field of machine learning is far from a solved problem. Unless the relationship you're searching for in your data is quite simple, even if you get hold of such a program, it's unlikely to be much use to you. And if the relationship is simple, you can probably find it by hand.
I think you need to be clearer about what it is that you want.
You are correct about what I want. I want the computer to find it in an unrestricted structure.
I am surprised that no one has written such a program too.
Say I put in 20 variables in a table with say 1000 entries. A computer program tried to solve it for say 2,000,000 different equations and then returns the equation that gives the best fit.
Would it be useful? To me yes.
Yes, we'd all like something like that: it means we wouldn't have to think.
The problem is that you will be able to find a good model for the data, but you have no idea about how well the model describes the mechanisms that create the data (rather than the data). If you do the analysis on two replicate data sets, will you get the same model? If not, then what do you do? The best model for one data set might be awful for the other.
And what criterion do you use to determine the best model? Different criteria will give you different answers.
This is the sort of idea that sounds good, but sends us professinals into spasms: there are so many problems with it.
Bob
-- Bob O'Hara Department of Mathematics and Statistics P.O. Box 68 (Gustaf Hällströmin katu 2b) FIN-00014 University of Helsinki Finland
Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: www.jnr-eeb.org .
- Follow-Ups:
- Re: Software suggestions
- From: clemenr
- Re: Software suggestions
- References:
- Software suggestions
- From: BernardZ
- Re: Software suggestions
- From: Phil Sherrod
- Re: Software suggestions
- From: bernardz
- Re: Software suggestions
- From: Robert Dodier
- Re: Software suggestions
- From: BernardZ
- Re: Software suggestions
- From: Bart
- Re: Software suggestions
- From: BernardZ
- Re: Software suggestions
- From: clemenr
- Re: Software suggestions
- From: BernardZ
- Software suggestions
- Prev by Date: Re: Software suggestions
- Next by Date: Re: Software suggestions
- Previous by thread: Re: Software suggestions
- Next by thread: Re: Software suggestions
- Index(es):