Re: Multilinear regression - techniques and performance
- From: spellucci@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (Peter Spellucci)
- Date: Fri, 18 Nov 2005 13:36:33 +0000 (UTC)
In article <pan.2005.11.18.12.46.23.160069@xxxxxxxxx>,
renderer <no@xxxxxxxxx> writes:
>On Fri, 18 Nov 2005 11:38:53 +0000, Peter Spellucci wrote:
>
>>
>> In article <pan.2005.11.17.21.36.31.924370@xxxxxxxxx>,
>> renderer <no@xxxxxxxxx> writes:
>> >I have about 5,000,000 scalar observations with 2000 independent
>> >variables (Xij, Yi) for i=1..n, j=1..m n=5000000, m=2000.
>> >The matrix Xij is rather sparse.
>> >
>> >At the moment, it takes several hours to do the regression
>> >on a mid-spec PC, using rather primitive ad-hoc methods.
>> >I suspect it could be done much faster, perhaps with
>> >Monte-Carlo methods and/or resampling.
>>
>> a sparse qr or sparse svd should work. you might try svdpack from
>> netlib, lsqr, or similar methods for approximating the best linear least squares
>> solution. coming down to a few seconds is hard to achieve here. already one
>> scalar product wil be in the milliseconds range if the vectors are full.
>> and sparse matrix techniques will involve a lot of index computing, hence
>> savings in arithmetic gained from sparsity may be lost there.
>> simply try this out.
>> look also here:
>> \begin{citation}
>> http://sun.stanford.edu/~rmunk/PROPACK/
>
>Thank you for these links. You have given me some more ideas for
>solving this. Unfortunately, I haven't taken any serious courses on
>Linear Algebra, and my limited knowledge is just what I have picked up
>over the years as an engineer and programmer.
>
>I could probably work through the SVD theory, and use it as
>a "black box" computation, but I'd like to understand a bit more
>about what it could achieve.
>
>If I define:
>
>A = U x S x Transpose(V) for my example problem size above.
>
>Would I be correct to say (for an "economy sized" decomposition with
>sparsity):
>
>U is 5000000 x 2000 matrix
>S is 2000 x 2000 diagonal matrix
>V is 2000 x 2000 general matrix?
>
>I am also assuming that U is very sparse. If U is not sparse, it
>would not be possible to store, and would be very slow to compute.
>Is this correct?
no unfortunately U and V will be full as a rule these are unitary matrices,
hence if htere is coupling through the rows and columns of your A (which certainly
is there) then these matrices will be full. Indeed svdpack does not compute
the full svd, it works iteratively and tries to compute the essential part of
the singular values only in order to make the residual small. I don't know
whta you mean by "Monte Carlo solution" of a least squares problem:
minimzation the sum of squares by a minimizer using random steps?
you could of course try to consider this as a minimzation problem
with only 2000 unknowns with an a bit expensive objective function
and since this is a convex problem you might have success using a
preconditioned conjugate gradient method , taking a partial svd as
a preconditioner but usually this is discouraged since the condition
number of the Hessian of this function (A'*A) is the square of that of
A, andit describes the propagation of erros in the computation.
Hnece my advice: try svdpack or PROPACK first. if this does not satisfy your
needs, then you might try the minimization approach (which is also behind LSQR)
hth
peter
>
>I'm surprised that there aren't any high capacity linear regression
>programs available for free. I haven't found any so far :(
>
the reason is that even if you want to use QR instead, then the "fill in" is
considerable larger than for gaussian lu (which can not be applied here)
and in the range of dimensions you are considering standard methods
come to the limits of current computers capacity
>I'm still curious about the possible Monte-Carlo methods for this,
>since I think they will be more scalable to larger problems, and
>easier for my (small) brain to understand!
>
>Thanks again for the ideas!
>--
>renderer
>
.
- Follow-Ups:
- Re: Multilinear regression - techniques and performance
- From: renderer
- Re: Multilinear regression - techniques and performance
- References:
- Multilinear regression - techniques and performance
- From: renderer
- Re: Multilinear regression - techniques and performance
- From: Peter Spellucci
- Re: Multilinear regression - techniques and performance
- From: renderer
- Multilinear regression - techniques and performance
- Prev by Date: Re: Double precision random number generator in Fortran 77
- Next by Date: numerical differentiation
- Previous by thread: Re: Multilinear regression - techniques and performance
- Next by thread: Re: Multilinear regression - techniques and performance
- Index(es):
Relevant Pages
|
Loading