Re: Multilinear regression - techniques and performance



On Fri, 18 Nov 2005 11:38:53 +0000, Peter Spellucci wrote:

>
> In article <pan.2005.11.17.21.36.31.924370@xxxxxxxxx>,
> renderer <no@xxxxxxxxx> writes:
> >I have about 5,000,000 scalar observations with 2000 independent
> >variables (Xij, Yi) for i=1..n, j=1..m n=5000000, m=2000.
> >The matrix Xij is rather sparse.
> >
> >At the moment, it takes several hours to do the regression
> >on a mid-spec PC, using rather primitive ad-hoc methods.
> >I suspect it could be done much faster, perhaps with
> >Monte-Carlo methods and/or resampling.
>
> a sparse qr or sparse svd should work. you might try svdpack from
> netlib, lsqr, or similar methods for approximating the best linear least squares
> solution. coming down to a few seconds is hard to achieve here. already one
> scalar product wil be in the milliseconds range if the vectors are full.
> and sparse matrix techniques will involve a lot of index computing, hence
> savings in arithmetic gained from sparsity may be lost there.
> simply try this out.
> look also here:
> \begin{citation}
> http://sun.stanford.edu/~rmunk/PROPACK/

Thank you for these links. You have given me some more ideas for
solving this. Unfortunately, I haven't taken any serious courses on
Linear Algebra, and my limited knowledge is just what I have picked up
over the years as an engineer and programmer.

I could probably work through the SVD theory, and use it as
a "black box" computation, but I'd like to understand a bit more
about what it could achieve.

If I define:

A = U x S x Transpose(V) for my example problem size above.

Would I be correct to say (for an "economy sized" decomposition with
sparsity):

U is 5000000 x 2000 matrix
S is 2000 x 2000 diagonal matrix
V is 2000 x 2000 general matrix?

I am also assuming that U is very sparse. If U is not sparse, it
would not be possible to store, and would be very slow to compute.
Is this correct?

I'm surprised that there aren't any high capacity linear regression
programs available for free. I haven't found any so far :(

I'm still curious about the possible Monte-Carlo methods for this,
since I think they will be more scalable to larger problems, and
easier for my (small) brain to understand!

Thanks again for the ideas!
--
renderer

.



Relevant Pages

  • Re: interaction terms in regression model
    ... approach to regression than that in some other disciplines. ... centering constants you use (I'm assuming a two-variable model here... ... two linear and one interaction between those) the "significance" of the ... The "failure to center" ...
    (sci.stat.edu)
  • Re: Regression analysis -- how-to?
    ... > were it not for the fact that, the last time I asked an algorithm ... Do you want to learn about regression? ... > (linear, quadratic, logarithmic, exponential, power and inverse) and at ... > least two types of linear correlation coefficient (rank and product ...
    (comp.lang.pascal.delphi.misc)
  • Re: independent variables - ln
    ... of the meaning of "independent variables" in a regression (the lr ... part-timers stop here is because they're *uncomfortable* with statistics ... "independent variables" in a lr (linear regression), ... transformations, and lnis just any one of infinitely many ...
    (sci.stat.math)
  • Re: Ordinal logistic regression vs. multiple regression with ordinal outcome?
    ... preference to ordinary multiple regression for a dichotomous DV? ... One might try to use OLS regression with categorical DVs. ... predications, as much as that the DV is upper- and lower-censored, ... Could happen if you were modeling counts on a linear scale. ...
    (sci.stat.consult)
  • Re: When and how to do transformation in multiple regression
    ... >> I am doing a project of multiple regression. ... >> regression, how to examine whether the data are linear or not, when I ... >> should do the transformation and which technique I should choose. ... You do this by raising x to a power ...
    (sci.stat.math)