Re: sparse polynomial arithmetic



On Apr 3, 8:10 am, pari...@xxxxxxxxxxxxxx wrote:
The problems above use arbitrary precision arithmetic. The input
coefficients are either Maple immediate integers: ((x << 1) | 1) or
pointers to GMP mpz_structs.

Do you mean that the same memory location is used for both, if it's odd
it's an integer, if even a GMP pointer?

Yes, this is Maple's immediate integer format.

Good, so how about a big problem ? Try f := (x+y+z+t+1)^30 times g :=
f+1, and reduce the coefficients mod 32003 first.

You can check it on your PC by running
icas 'a:=symb2poly((x+y+z+t+1)^30,[x,y,z,t]):; a:=(a%32003)%0:;
time(b:=a*(a+1)); size(a);size(b);'
(replacing a:=(a%32003)%0:; by a:=evalf(a):; takes 73s).

On my machine I got:
giac 32-bit int : 346.25 sec
giac 52-bit float : 322.47 sec
sdmp multiprecision: 733.01 sec (previous time)

On the other hand, thanks to your post, I definitively believe I could
improve giac multiplication of polynomials with integer coefficients
using chinese remaindering. It is most probably the best way to multiply
(almost dense) polynomials with large coeffs since you insure that all
intermediate computations are done with small ints.

What about the Kronecker trick ? Or interpolation for that matter.

I don't think sparness is the main problem, but memory certainly is
(512M RAM is not enough).

I disagree. Many approaches work fine on dense problems but suffer or
die on sparse problems. The problem is sorting the terms. For
example, if we multiply two dense polynomials with n terms and add the
partial products one at a time into a sum, the algorithm is O(n^2).
But if the polynomials are sparse this is O(n^3). See Stephen C.
Johnson, "Sparse Polynomial Arithmetic" ACM SIGSAM Bulletin (1974).
Division is exactly the same way, and one can speculate on the effect
that naively coded division had on implementations of Buchberger's
algorithm for all those years. Also related, pseudo division does an
order of magnitude too many coefficient multiplications if implemented
naively. These issues do not appear with dense univariate
polynomials, but if you take those algorithms and run them on sparse
multivariate polynomials, you get a degradation of performance that is
an order of magnitude.

I estimate that your example 3 would require around 500M in giac just to hold
the answer. I could run it up to exponent 10 (2096600 terms, 22.5s) but
after that it swaps too much. I'll try with more memory, I estimate it
will be around 100s.

Example 3 has a nasty surprise:
f := (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^12:
g := (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^12:
multiply p := f*g;
divide q := p/f;

This is a 6188 x 6188 = 5821335 multiplication. The computation is
inherently O(6188^2) or O(6188^2 log 6188) if sorting is included. I
believe you use a hash function so for you the overhead of sorting is
O(1). So, it should be (46376/6188)^2 = 56 times faster than the
Fateman problem. I ran it with floats and giac took 31.01 sec (versus
322.47 above). You lost a factor of 5.4. How ?

Memory access. Your hash function randomly accesses at least
5821335*8/1024^2 = 44.4MB of memory. Unless you have a large fraction
of that in L2 cache, most of the accesses will go to DRAM. That costs
150-200 cycles versus 20 cycles for L2. To get that number, multiply
the latency of the ram in nanoseconds by the clock speed of the
processor in GHz, and add the costs of an L1 and L2 (and L3) cache
miss. As processor speeds go up and RAM speeds fall behind (DDR2 is
half the frequency of DDR, for example) this overhead will get
bigger. These are the issues people need to think about to do high
performance computation.

For this discussion, I should cite some earlier work by Fateman that
I'm sure many readers of sci.math.symbolic will remember:
www.cs.berkeley.edu/~fateman/papers/fastmult.pdf
.