Re: SF: Back to theory

From: Nora Baron (norabaron_at_hotmail.com)
Date: 02/27/05


Date: 27 Feb 2005 06:21:53 -0800

Tim Peters wrote:
> [Nora Baron, to JSH]
> >>> You have made this so much more complicated than it needs to
> >>> be. Here is a simpler approach that may accomplish much the
> >>> same and it is also much more general.
> >>>
> >>> Assume M is the number to be factored. Pick a (small)
> >>> integer j. Let T = M - j. Thus T is a function of both
> >>> M and j.
> >>>
> >>> Factor T. Assume you have split it into two factors,
> >>> f and g. Thus T = f*g.
>
> [Tim Peters]
> >> As for James's methods, the test I'll talk about below tries all
> >> possible ways of splitting T=f*g s.t. f and g are integers >= 1
> >> and f >= g. I'm skipping f < g cases since gcd(f-g, whatever) =
> >> gcd(g-f, whatever) (someone may wish to sue me over this, but in
my
> >> universe gcd always returns a non-negative result).
>
> [Nora]
> > Rest assured that at least I will not be filing a lawsuit
> > over this!
>
> Wise choice: I have never been successfully sued for choosing a
convention
> by anyone with a palindromic pseudonym, and I intend to die with that

> flawless record intact.
>

  I shall have to discuss this with my friend, Walt Law.

> >>> Now let X be some rational function of f and g. One
> >>> possible choice might be,
> >>>
> >>> X = (f - g)/(f + g).
> >>>
> >>> Finally, let Y = M / X. Thus M = X * Y.
> >>>
> >>> Note that X and Y are both functions of the factors of
> >>> T. Also both are rational numbers. There is some chance
> >>> that the numerator of X has a factor in common with M.
>

  Of course as you note below, one may as well consider only
X = f - g in the first place. A probably-better variant is
discussed below ...

[snip]

> >> Of course I agree, but I'm almost as insignificant as you are
these
> >> days <wink>.
>
> > Hey, you're competing with a serial liar here!
>
> Sigh. Of all JSH's peculiar online behaviors, claiming that you lie
in your
> posts is the most incomprehensible to me. The "serial" part I can
> understand if I squint, but the "lying" part doesn't work for me even
after
> exhaustingly intense suspension of disbelief coupled with a heavy
dose of
> LSD.
>

  As in many other instances he has his own nonstandard definition
of 'liar'. A 'liar' is someone who disagrees with James Harris.

[snip]

>> > Yes, good points, and I agree that there are advantages to
> > using M^2 - j^2. I was trying to make things as simple as
> > possible.
>
> I hope you to stick to that, too, at least until it's better
undertood. As
> was showed later in this msg, it was tweaking to try a much larger
number of
> splittings that gave a dramatic boost to your idea in testing, not
really
> complicating the base idea.
>
> ...
>
> >> Here's "the standard test" output after picking 1000 pairs of
15-bit
> >> primes at random (I gave a precise account of what that means in
an
> >> earlier post), and using this algorithm to factor their product:
>
> I want to elaborate on that, because some results _may_ be artifacts
of the
> way primes are chosen. I pick an N-bit prime like so: pick a random

> integer i uniformly from the range 2**(N-1) <= i < 2**N, and then
find the
> smallest prime >= i. But not all primes in range are equally likely
to get
> picked -- due to the way I'm choosing them, the probability of
picking a
> particular prime p is proportional to 1 + the number of composites
> immediately preceding p. I can't imagine why that would bias the
results,
> but can't swear that it doesn't. Another thing to test, anyway.
>
> ...
>
> >> "expected random" is the expected (mean) number of gcds that would
> >> have been needed to factor the same products by a method that
> > repeatedly picks k purely at random from 1..M-1 until
> >> 1 < gcd(k, M) < M.
>
> I want to elaborate on that too, because some results may be due to
errors
> in my analysis.
>
> I view picking gcd candidates at random as independent Bernoulli
trials.
> When M=pq, there are M/p-1=q-1 numbers in 1..M-1 divisible by p,
M/q-1=p-1
> divisible by q, and none divisible by p*q. So the probability of
picking a
> k at random in 1..M-1 that reveals a factor is x=(p+q-2)/(M-1).
>
> The number of trials until the first success then follows the
geometric
> distribution, and has mean 1/x = (M-1)/(p+q-2). I round that
expression to
> the nearest integer in the code. One possible insecurity is that the

> variance is (1-x)/x^2, and that grows without bound as M increases
(so that
> x approaches 0). This makes it hard to conclude anything from a
single
> trial, which is why I try to use 1000 (when there's enough time).
>
> ...
>
> >> I can slash the number of gcds needed, and speed it by a factor >
3,
> >> just by iterating over all the ways to split T^6 (instead of plain
T)
> >> as f*g. But that leads to a much bigger surprise!
> >> Like so:
> >>
> >> nbits 15: 1000 of 1000 factored, 100%
> >> gcds: actual 7,765,039 expected random 12,139,989 no factor 0
>
> > Hmm. That is a little surprising.
>
> It still is to me, and I'll give more evidence of better-than-random
> behavior later. Something still needs an explanation here!
>
> > The 100% factored part is NOT surprising. Here's why. Say
> > M = p * q, where p and q are prime. Let j = p - 1. Thus
> > T = M - p + 1. Hence let f = M - p + 1 and g = 1. Then
> > f - g = M - p + 1 - 1 = M - p, which factors as p*(q - 1), hence
> > gcd(f - g, M) = p.
>
> Right, and I realized your *un*tweaked method would always succeed.
It
> wasn't (still isn't ...) obvious to me whether it would still always
succeed
> after I tweaked it to look at splittings of T^6 instead of T, and the

> argument you just gave is for the untweaked case. It doesn't apply
directly
> to the T^6 case, and in fact picking j=p-1 doesn't always give a
factor in
> the T^6 case. For example, at M = 15 = 5*3, pick j = 5-1 = 4. Then
T=11.
> As you proved, when splitting over T, f=11 and g=1 does the trick:
gcd(10,
> 15) = 5. But there's no way to split 11^6 "that works": gcd(11^i -
> 11^(6-i), 15) = 15 for all i in 0..6. That's easy to see from a
small
> table:
>
> i 11^i mod 15
> -- -----------
> 0 1
> 1 11
> 2 1
> 3 11
> 4 1
> 5 11
> 6 1
>
> Since 11^i = 11^(6-i) mod 15 for i in 0..6, 15 divides f-g for all
> splittings of 11^6.
>
> > So can one do better with T = M^2 - j^2? Certainly one can do
> > no worse since M - 1 divides M^2 - 1.
>

  I did some hand calculations which indicate that, with T = M - j
and T factored as f * g, letting X = f^2 - g^2 may be substantially
faster than X = f - g. Plus again it is easy to show that it
will always factor.

  Here is an example:

     M = 391

     j = 1 --> T = 390 = 30 * 13 --> f = 30, g = 13

     Thus f^2 - g^2 = 900 - 169 = = 731, and

     GCD(731, M) = 17.

  For small M, it seems to work (unreasonably) often with j = 1.
Not invariably, however = e.g., for M = 143, you have to go as far
as j = 3: T = 140 = 20 * 7 --> f^2 - g^2 = 351 = 13 * 27.

> Knock yourself out <wink>. I'm keener right now to account for the
> better-than-random behavior of what's already on the table. That's
not
> jaw-droppingly amazing, of course -- it's easy, e.g., to give a
factoring
> method for two-prime products that always succeeds with a single gcd
> (multiply all the primes through the square root of M, and do one gcd
with
> that). The thing that intrigues about _this_ method is that f-g
still just
> "looks like" poke-and-hope to me. If it's better than poke-and-hope,
there
> must be a reason for that.
>

  Certainly your experiments look like a lot more than chance
deviations from poke-and-hope expected. Try X = f^2 - g^2 also,
if you don't mind.

> >> Congratulations! This is the first "surrogate factoring"
algorithm
> >> I've tested that performs better than random-gcd (well, it's still
> >> much slower than random-gcd, but time isn't the measure here). I
> >> suggest we christen it the Baron-Harris method <LOL>.
>
> > I was hoping to patent it. Oh well.
>
> Change the exponent to 7, and it's all yours <0.1 wink>.
>
> > Anyway, your using it with T^6 makes it Baron-Harris-Peters.
>
> No, sadly for me, James was the first to use exponent 6, so that's
his idea.
> I don't have enough imagination to try exponents other than 6 either
-- I
> have to leave breakthroughs like that to the real mathematicians.
>
> >> I don't have time to think about "why" now, but it sure was a
surprise
> >> to me -- all I had in _mind_ by moving to T^6 was to increase the
> >> quality of the pseudorandom number generator "f-g".
> >>
> >> Looks like that holds for products of 20-bit primes too (these
take a
> >> _lot_ longer to run, so I stopped after 75 -- out of time for
tonight):
> >>
> >> nbits 20: 75 of 75 factored, 100%
> >> gcds: actual 13,320,947 expected random 28,033,396 no factor
0
>
> I let "my usual" 1000-trial test run overnight, and the end result
was a
> little worse, but not much worse, than that:
>
> nbits 20: 1000 of 1000 factored, 100%
> gcds: actual 209,981,730 expected random 383,897,119 no factor
0
>
> Because of the very high variance of the "number of trials needed for
first
> success" statistic for a Bernoulli process with such low probability
of
> success per trial, I also tracked how many times the algorithm beat
the
> expected-random # of trials: 844 of 1000. That seems plain
impossible if
> it beats expected-random only by chance. In the worst case the
algorithm
> used about 800000 more gcds than expected-random, and in the best
case about
> 500000 fewer. The two median values of
>
> expected_random - actual
>
> were 226468 and 226618, and the mean was 173915. Since the mean
value of
> expected_random was 383897, that's a big difference. It all adds up
to
> strong evidence of better-than-random behavior to my eyes.
>

  Yes - curious. No, wait, I mean "fascinating, wacky, cool".

> Just FYI, here's the distribution of j values (the first j at which a
factor
> was found); one "*" = 2 products with that j:
>
> 1 ******************************************************
> 2 ****************
> 3 ***********************
> 4 *************
> 5 *****************************
> 6 ****
> 7 ***********************************
> 8 *********
> 9 **********************
> 10 ******
> 11 *******************************
> 12 ****
> 13 *************************************
> 14 *****
> 15 ********
> 16 ***
> 17 ******************************
> 18 **
> 19 *******************
> 20 *****
> 21 ******
> 22 *****
> 23 ******************
> 24 *
> 25 **************
> 26 ***
> 27 ***
> 28 ****
> 29 ****************
> 31 *********
> 32 *
> 33 ***
> 34 **
> 35 **********
> 36 **
> 37 *****
> 38 ***
> 39 ****
> 40 *
> 41 ********
> 43 *****
> 44 **
> 45 *
> 46 *
> 47 ******
> 49 ***
> 50 *
> 51 *
> 52 *
> 53 ****
> 57 *
> 58 *
> 59 ***
> 61 **
> 62 *
> 63 *
> 64 *
> 65 **
> 67 **
> 71 *
> 73 *
> 83 *
> 85 *
> 87 *
> 95 *
> 127 *
>

  Nice way to display the data.

> As with all the specific JSH algorithms tested too, odd j is luckier
than
> even. I think that's "just because" odd j forces T=M-j to have a
small
> factor (namely 2), and that in turn tends to increase the number of
> splittings to try.
>

[snip]

>
> > Amazing! And reassuring, regarding the sanity.
>
> Ah, you like those! OK, here are a thousand more. Heh -- kidding.
I'll
> just give the worst one from that run (in the sense of the first win
> appearing at the largest j):
>
> M = 774661 * 755087 = 584936450507
> j = 127
> T = M-j = = 2^2 5 599 997 48973
> T^6 = obvious
> f = 6629023236001399900589330376301953749216234168750
> g = 6042303984767482296320
> check for yourself that T^6 = f*g (it does)
> gcd(f-g, M) = 774661 bingo
> number of gcds tried = 1,178,023
>
> ...
>
> >> Not to mention that Baron-Harris probably always finds a factor,
>
> > See above.

  Almost the identical argument works for X = f^2 - g^2.

>
> Ditto (there's no proof yet on the table that it always works when
splitting
> T^6, although it almost certainly does -- heck, a proof may even be
easy).
>

  May be. I haven't looked.

> ...
>
> >> Yup. The other half is that if M _could_ be factored from a
> >> factorization of T efficiently, it's more than a little puzzling
that
> >> nobody yet has found a way to do that. Well, maybe tomorrow's
Baron-
> >> Harris-Decker-Filho method will finally crack that nut.
>
> > Baron-Harris-Peters-Decker-Filho. I'm not holding my breath.
>
> I will, but not before Decker and Filho commit to devoting their
lives to it
> too <wink>.

  Conceivably we all have better things to do.

  Nora B.



Relevant Pages

  • Re: SF: Back to theory
    ... > splittings that gave a dramatic boost to your idea in testing, ... > I view picking gcd candidates at random as independent Bernoulli ... > success" statistic for a Bernoulli process with such low probability ... I also tracked how many times the algorithm beat ...
    (sci.crypt)
  • Re: Simple answer, surrogate factoring
    ... and take the gcd of Ax with M. ... outermost parentheses here is surprising, so is this really what you meant ... generally has few splittings, and especially when sticking to j=1. ... this is by far the fastest non-factoring algorithm of this kind I've ...
    (sci.math)
  • Re: Simple answer, surrogate factoring
    ... and take the gcd of Ax with M. ... outermost parentheses here is surprising, so is this really what you meant ... generally has few splittings, and especially when sticking to j=1. ... this is by far the fastest non-factoring algorithm of this kind I've ...
    (sci.crypt)