Re: Multilinear regression - techniques and performance
- From: renderer <no@xxxxxxxxx>
- Date: Fri, 18 Nov 2005 12:43:37 GMT
On Fri, 18 Nov 2005 11:38:53 +0000, Peter Spellucci wrote:
>
> In article <pan.2005.11.17.21.36.31.924370@xxxxxxxxx>,
> renderer <no@xxxxxxxxx> writes:
> >I have about 5,000,000 scalar observations with 2000 independent
> >variables (Xij, Yi) for i=1..n, j=1..m n=5000000, m=2000.
> >The matrix Xij is rather sparse.
> >
> >At the moment, it takes several hours to do the regression
> >on a mid-spec PC, using rather primitive ad-hoc methods.
> >I suspect it could be done much faster, perhaps with
> >Monte-Carlo methods and/or resampling.
>
> a sparse qr or sparse svd should work. you might try svdpack from
> netlib, lsqr, or similar methods for approximating the best linear least squares
> solution. coming down to a few seconds is hard to achieve here. already one
> scalar product wil be in the milliseconds range if the vectors are full.
> and sparse matrix techniques will involve a lot of index computing, hence
> savings in arithmetic gained from sparsity may be lost there.
> simply try this out.
> look also here:
> \begin{citation}
> http://sun.stanford.edu/~rmunk/PROPACK/
Thank you for these links. You have given me some more ideas for
solving this. Unfortunately, I haven't taken any serious courses on
Linear Algebra, and my limited knowledge is just what I have picked up
over the years as an engineer and programmer.
I could probably work through the SVD theory, and use it as
a "black box" computation, but I'd like to understand a bit more
about what it could achieve.
If I define:
A = U x S x Transpose(V) for my example problem size above.
Would I be correct to say (for an "economy sized" decomposition with
sparsity):
U is 5000000 x 2000 matrix
S is 2000 x 2000 diagonal matrix
V is 2000 x 2000 general matrix?
I am also assuming that U is very sparse. If U is not sparse, it
would not be possible to store, and would be very slow to compute.
Is this correct?
I'm surprised that there aren't any high capacity linear regression
programs available for free. I haven't found any so far :(
I'm still curious about the possible Monte-Carlo methods for this,
since I think they will be more scalable to larger problems, and
easier for my (small) brain to understand!
Thanks again for the ideas!
--
renderer
.
- Follow-Ups:
- Re: Multilinear regression - techniques and performance
- From: Peter Spellucci
- Re: Multilinear regression - techniques and performance
- References:
- Multilinear regression - techniques and performance
- From: renderer
- Re: Multilinear regression - techniques and performance
- From: Peter Spellucci
- Multilinear regression - techniques and performance
- Prev by Date: Re: 2nd derivative FD discretisations
- Next by Date: Re: Double precision random number generator in Fortran 77
- Previous by thread: Re: Multilinear regression - techniques and performance
- Next by thread: Re: Multilinear regression - techniques and performance
- Index(es):
Relevant Pages
|