Re: Square root algorithms and complexity
- From: Don Taylor <dont@xxxxxxxxxxxxxxx>
- Date: Sat, 26 Nov 2005 16:12:07 -0600
Phil Carmody <thefatphil_demunged@xxxxxxxxxxx> writes:
>Don Taylor <dont@xxxxxxxxxxxxxxx> writes:
>> >Don Taylor wrote:
>> >[ ... ]
>> >> Perhaps related question:
>> >>
>> >> Is there a fast method to determine whether a huge odd integer is a
>> >> perfect square or not, when the overwhelming majority of the numbers
>> >> tested will not in fact be squares. Finding the square root is not
>> >> required. I've searched and haven't found any mention of this.
>> ...
>> I found in Cohen, "A Course in Computational Algebraic Number Theory",
>> a mostly table driven method that needs only n mod 64, n mod 45045,
>> and 4 small integer sized table lookups to eliminate all but 6/715
>> non-squares. The rest require a square root calculation.
>>
>> So this will speed the process up more than 100 fold.
>> Finding a way to get another ten fold increase would be helpful.
>Don't always bow down to Cohen. His maths is impeccable, but his
>implementations are quite dated.
I try to conservatively estimate the ratio of an author's skill
and experience with my own.
>45045 smacks of 16-bit-ism, a disease that ought to have died in
>the 80s. Far more factors can be crammed into a 32/52/62/64-bit
>value, depending on your architectural preference.
45045=63*65*11 which were the three sequential moduli, after 64, that
he chose so he could use three small tables. He does say that cite
that other moduli can be used, and references a collection of papers
J.-L. Nicolas, Etre ou ne pas e^tre un carre', Dopo le Parole,
(a collection of not always serious papers for A. K. Lenstra's
doctorate), Amsterdam 1984
Cohen does say that if table size is of no concern that you can
have a single table of size 45045.
If I understand Cohen's description, each additional carefully chosen
moduli can perhaps contribute another factor of two or three in the
discrimination process. So if the moduli don't have to grow too
quickly then it might be possible to squeeze ten of them into your
64 bit word, have ten small tables, and gain an additional perhaps
tenfold discrimination over the description Cohen gave. That would
worthwhile. I will pursue that.
>Assuming you're using a common architecture, you'd probably be
>able to do between 2 and 6 such divisions in parallel at no loss
>of speed. (On the assumption that RAM is slower than multipliers,
>and latencies are >1.). That's up to several dozen small primes,
>and therefore a very respectable rejection ratio.
Is there an example of code or a description of how to write code
that would allow multiple biginteger mods to run in parallel with
no loss in speed?
>Some of the small primes' residues, however, can be extracted without
>doing any multiplication (or division, obviously), by dint of being
>factors of 2^B-1 where B is a small multiple of the word width.
>Residues modulo 3^2, 5, 7, 13, 17, 97, and others, are trivially found
>by reducing N modulo 2^96-1, which can be done using only 3 32-bit
>scratch registers and nothing more than add with carry operations.
>It therefore ought to be limitted only by the speed of reading RAM.
If I understand, this seems similar to what Cohen describes, first
finding r=N mod 45045 and then finding r mod 63, r mod 65, r mod 11,
avoiding the need of doing 3 big integer mods.
>N mod 64 is also unnecessarily limitting. N mod 2^64 (assuming a
>non-toy arch) can be used instead, without a table lookup, in only
>8 instructions with a typical latency of 3 to 5 (times the typical
>latency of simple integer operations). Google for "glorious
>annotated pseudo assembler" for details. Sure, the gains are small,
>but the information is there in that bottom limb, why not use it?
The mod 64 is used because we can create a table of 64 bits, each
indicating whether that can or cannot be a the residue of a square.
If I used 2^64 don't I then need a 2^64 bit table to check this?
And if I haven't made a mistake in my quick check here then it
appears that the ratio of squares to non-squares residues for odd
numbers in the table doesn't change as powers of 2 grow.
>> He does mention a method using Legendre symbol calculations with
>> multiple primes but apparently the table driven method is usually
>> faster.
>I'd do the Legendre calculations using a table-driven method,
>personally, so I'm not sure I see the distinction.
Thank you for your comments. I will study this carefully.
>Phil
>--
>If a religion is defined to be a system of ideas that contains unprovable
>statements, then Godel taught us that mathematics is not only a religion, it
>is the only religion that can prove itself to be one. -- John Barrow
.
- Follow-Ups:
- Re: Square root algorithms and complexity
- From: Phil Carmody
- Re: Square root algorithms and complexity
- References:
- Square root algorithms and complexity
- From: gpapak
- Re: Square root algorithms and complexity
- From: gpapak
- Re: Square root algorithms and complexity
- From: LC Killingbeck
- Re: Square root algorithms and complexity
- From: Don Taylor
- Re: Square root algorithms and complexity
- From: Peter Schorn
- Re: Square root algorithms and complexity
- From: Don Taylor
- Re: Square root algorithms and complexity
- From: Phil Carmody
- Square root algorithms and complexity
- Prev by Date: Re: Defining "<" for the rationals
- Next by Date: Re: Attiude with quaternions-
- Previous by thread: Re: Square root algorithms and complexity
- Next by thread: Re: Square root algorithms and complexity
- Index(es):