Re: sparse polynomial arithmetic



Hi all,

I found this thread from the Sage mailing list. I'm developing a FLOSS
computer algebra system, called Piranha, geared towards Celestial
Mechanics (like TRIP, but Piranha is much more immature). Homepage is
here:

http://piranha.tuxfamily.org

Currently it deals with polynomials, Fourier series and Poisson
series, and it is undergoing heavy redesigning (which is almost
finished, though). Here are some preliminary benchmark results (64-bit
Core2 laptop, T7100 processor, 1.8GHz, 2MB cache, 2GB RAM, GCC 4.3.0
under GNU/Linux):

Example 1: Fateman's benchmark, double-precision coefficients (16-bit
integer exponents)
f := (1+x+y+z+t)^30;
g := f+1;
multiply p := f*g;

real 0m27.953s
user 0m27.419s
sys 0m0.518s


Example 2: Fateman's benchmark, GMP mpz coefficients (16-bit integer
exponents)
f := (1+x+y+z+t)^30;
g := f+1;
multiply p := f*g;

real 6m44.520s
user 6m39.926s
sys 0m4.401s


Example 3: Roman Pearce benchmark, double coefficients (16-bit integer
exponents)
f := (1+x+y+z+t)^20;
g := f+1;
multiply p := f*g

real 0m0.862s
user 0m0.784s
sys 0m0.058s


Example 4: Roman Pearce benchmark, GMP mpz coefficients (16-bit
integer exponents)
f := (1+x+y+z+t)^20;
g := f+1;
multiply p := f*g

real 0m20.067s
user 0m19.802s
sys 0m0.255s


Example 5: Roman Pearce sparse benchmark, double coefficients (16-bit
integer exponents)
f := (1 + x + y + 2*z^2 + 3*t^3 + 5*u^5)^12:
g := (1 + u + t + 2*z^2 + 3*y^3 + 5*x^5)^12:
multiply p := f*g;

real 0m17.582s
user 0m15.884s
sys 0m1.684s


Example 6: Roman Pearce sparse benchmark (1 billion terms), double
coefficients (16-bit integer exponents)

NOT completed, not enough memory :/ It stops at around 1/4 with ~50%
memory usage, then probably the hash table gets resized but the memory
allocator is unable to satisfy the request for more space. Estimated
running time in the range of 600 seconds.
--------

Regarding the poor performance of the last two tests, I should
probably mention that memory constraints don't permit to use the
fastest algorithm (I've calculated around 5GB of RAM should be needed
for that in the first case), and that the actual multiplication in the
completed test takes ~8 seconds. The remaining time is needed to
decode the result of multiplication back into the general purpose
polynomial structure. Memory usage, unscientifically measured with
top, peaks at around 50% of the available RAM.

Please note that Piranha is still in heavy development and no specific
optimizations on the code have been performed. In particular:

1) no parallelization
2) no SIMD instructions
3) no optimization of cache memory usage (I was thinking about cache
blocking, mainly)
4) standard out-of-the-box C++ data structures from the C++ STL and
the Boost libraries are used (hash_set, set, etc. - optimized data
structures can substantially increase performance for the slower
algorithms, hence especially in the case of sparse multiplication)

The optimizations above are planned for a later stage though.

Thanks for the benchmarks Roman and for the information available from
your web page, it gave me lots of food for thought :)

Best regards,

Francesco Biscani.
.



Relevant Pages

  • Re: Fastcode MM B&V 0.41
    ... > role than the execution order. ... If you leave the system alone, the memory usage scores stays ... Opening up other programs during a benchmark is not ...
    (borland.public.delphi.language.basm)
  • Re: jruby vs mri memory management
    ... Linux 32bit when we benchmarked the Ruby Quiz #157 submissions, ... was not impressed (I didn't check the memory usage though): ... you just had to launch the benchmark to get it. ... JVM). ...
    (comp.lang.ruby)
  • Re: Fastcode MM B&V 0.41
    ... > As the results grid grows, the memory usage scores of the MMs ... If you run a benchmark batch with MMs in the order BucketMM first ... ...
    (borland.public.delphi.language.basm)
  • Excessive Memory Usage with libxml
    ... I've got some basic functionality going, it's not finished, and when ... When I run the benchmark I get weird things happening. ... 10000 iterations were fine ... The memory usage though was stupid at 235Mb Real and 233Mb Virtual ...
    (comp.lang.ruby)
  • Re: sparse polynomial arithmetic
    ... straight out of the box. ... Example 1: Fateman's benchmark (dense) ... # coefficients of f and g are 61 bits ...
    (sci.math.symbolic)