Re: sparse polynomial arithmetic
- From: Bill Hart <goodwillhart@xxxxxxxxxxxxxx>
- Date: Fri, 4 Apr 2008 10:51:00 -0700 (PDT)
On Apr 4, 4:53 pm, rjf <fate...@xxxxxxxxx> wrote:
On Apr 3, 9:48 pm, Bill Hart <goodwillh...@xxxxxxxxxxxxxx> wrote:
On Apr 4, 3:36 am, rjf <fate...@xxxxxxxxx> wrote:
FLINT doesn't use either of the strategies you have in mind, for small
polynomials.
If FLINT neither uses machine integer/float arithmetic, nor calls to
libraries, what does it do?
It doesn't multiply individual coefficients at all for polynomials
that small. It currently uses Kronecker segmentation.
The "large"
integers that result are multiplied by GMP since that is faster than
implementing the multiplication in C. I call directly to GMP's
routines which are basically implemented in assembly for this. In the
range we are talking about, it uses the classical algorithm to
multiply the "large" integers at the cost of about 4.67 cycles per
limb x limb. There is something like 32 cycles of overhead for the
large integer multiplication.
If you are devoted to low-level hacking, it seems evident that GMP big
arithmetic which must deal with carrys
would be doing (potentially) more work than the same program that
knows that carrys never happen.
I have no idea how the program is supposed to know that carries are
never going to happen. If you were writing a function which multiplies
polynomials and the program operates under the *assumption* that
carries will never occur, then of course you can implement it more
efficiently.
FLINT certainly has such a module for polynomials over Z/pZ for a word
sized modulus p, though naturally that has many other optimisations
too. By the way, in that module, I *exclusively* use KS except for
polynomials of length <=8 (or <= 6 if the modulus is bigger than 32
bits).
So I'm not talking out my arse, I've done the work, done the
benchmarking and I know what I'm talking about. There are no calls to
any multiprecision library except to multiply the "large integers"
*if* they fall below our FFT bound (which is 1000 limbs).
But we aren't talking about a fixed precision polynomial module here,
but polynomials over multiprecision coefficients. You can't just
"know" in advance that coefficients aren't going to overflow, you have
to check. That check takes time. Yes it's linear time, if you do the
check properly, but as I pointed out, we don't use coefficient
arithmetic since we use KS. You claim without proof that doing the
check and then using fixed precision is going to be faster than that.
I can tell you it makes a difference only for polynomials of length <=
8 on my architecture. And that is without all the other overhead I
mentioned.
(That is, the limbs etc are polynomial coefficients). This assumes
similar code style, which you may be unwilling to do.
It also assumes that you will implement FFT when needed to be
comparable to GMP.
Our FFT is not just comparable to GMP's, it is up to 2.3 times faster
and typically 1.7 times faster. It can be used for integers quite a
bit smaller than GMP's FFT.
seehttp://www.cs.berkeley.edu/~fateman/papers/fftvsothers.pdf
But of course the polynomials in FLINT are stored in a format which
supports multiprecision coefficients, so any comparison with a format
which only supports up to 64 bit signed coefficients is meaningless.
Not exactly: if you can prove that the computation and the
coefficients in the answer fit in that form, you can use it.
No, it's a meaningless comparison. You are comparing apples and
oranges. Of course you can use a module with a single word allocated
for each coefficient if you want to. But it is meaningless to compare
that to a module which supports multiprecision coefficients. They do
different things.
Nonetheless, the raw time FLINT currently uses is 3 microseconds to
multiply two length ( = degree + 1) 16 polynomials with 29 bit signed
coefficients on a 1.8GHz Opteron machine. That is around 20 cycles per
coefficient by coefficient multiplication on this machine.
I'd actually be surprised if you could do that much more efficiently
in a relatively portable way using machine integer/floating point
arithmetic. You've got memory accesses, loop overhead, coefficient
multiplications, term additions.
In my testing, I think I found that packing lots of coefficients in
long GMP numbers
was better than classical multiplication, but not better than other
methods.
See the paper cited above.
That's complete and utter bollucks. I know because FLINT has the
fastest univariate polynomial multiplication of any package in the
world for dense polynomials with multiprecision coefficients. It is
*uniformly* faster than all other packages at all polynomial degrees
and all sizes of coefficient.
If you think you can multiply polynomials faster than FLINT, I'd be
interested in seeing what you have. But I bet you couldn't even get
within a factor of 5 for most polynomial sizes. You might even be
thousands of times slower for some sizes, just as some very expensive
packages are.
If your algorithm is going to be fairly general, it will need to then
multiply polynomials of different lengths and bitsizes, ensure
overflows are not going to occur, in which case it will have to either
implement logic to detect these as it goes and reallocate the output
accordingly or it will have to check the bit size of the coefficients
before starting the computation to ensure that they are not going to
overflow.
Oh yes. but a reasonably programming environment (like lisp) makes
the general version the simplest to write, and takes maybe 1/3 of a
page of code.
Hacked-up versions that might even make overflow possible, are harder
to write, and are longer.
Yeah, like 30,000 lines of code.
The completely general algorithm in FLINT for polynomials with
multiple precision coefficients, which does memory allocation, selects
the correct algorithm to use, etc, takes 5 microseconds for the
problem mentioned above.
On top of this, in a computer algebra system, you are going to have an
enormous amount of logic to convert from whatever the general format
is for storing polynomials, to this format. You will also have
interpreter overhead etc.
Conversions between formats like this will be linear-time, so might be
negligible. Depends on how much speed is provided compared to
alternatives.
Well, I note that you make this claim without proof. I stick by my
claim.
I really doubt the optimisation you are talking about makes all that
much difference for general polynomial multiplications in a CAS. It
might make a few percent difference, in which case, go and buy a
faster processor for your machine.
I think one resolves such questions by profiling, benchmarking etc.
Guesses don't usually work very well.
I've spent a year and a half doing that. I'm not making very many
guesses when it comes to polynomial multiplication.
How fast can you do the above computation in LISP, or any of the
packages you mention, assuming uniformly distributed random signed
coefficients, and an algorithm general enough to handle overflows,
should they occur, or select the correct algorithm after computing
that overflows won't occur?
Your assumption that a primary determinant of the speed is whether a
program is written
"in Lisp".
??? Is that a question or a statement.
There are piles of ideas in this directory..http://www.cs.berkeley.edu/~fateman/generic/
which expand upon some of these issues.
If you find that your polynomial multiplication routine has to deal
with coefficient overflow, that makes me suspicious of the design.
Who said that!? I already said I use KS. And you can be as suspicious
as you like.
The program that should deal with overflow is the program that does
coefficient arithmetic, usually.
Even if you use KS, if the coefficient isn't going to fit in a word,
you need a multiprecision format for your coefficients.
Bill.
.
- References:
- sparse polynomial arithmetic
- From: Roman Pearce
- Re: sparse polynomial arithmetic
- From: Roman Pearce
- Re: sparse polynomial arithmetic
- From: mabshoff
- Re: sparse polynomial arithmetic
- From: Bill Hart
- Re: sparse polynomial arithmetic
- From: Roman Pearce
- Re: sparse polynomial arithmetic
- From: Bill Hart
- Re: sparse polynomial arithmetic
- From: rjf
- Re: sparse polynomial arithmetic
- From: Bill Hart
- Re: sparse polynomial arithmetic
- From: rjf
- Re: sparse polynomial arithmetic
- From: Bill Hart
- Re: sparse polynomial arithmetic
- From: rjf
- sparse polynomial arithmetic
- Prev by Date: Re: sparse polynomial arithmetic
- Next by Date: Re: sparse polynomial arithmetic
- Previous by thread: Re: sparse polynomial arithmetic
- Next by thread: Re: sparse polynomial arithmetic
- Index(es):
Relevant Pages
|