Re: JSH: The will to lie



Jesse F. Hughes wrote:

David Bernier <david250@xxxxxxxxxxxx> writes:


David Bernier wrote:

William Hughes wrote:
[snip]

I think it would be interesting to see by how much the sequence:
5-3*floor(5/3), ... 1000000007-3*floor(1000000007/3)
can be compressed by Winzip, Gzip, bzip2, etc.

In the primes mod 3 sequence, for primes>3 and < 10^7, we get a finite
subsequence of 664,577 numbers in {1,2}.

I used a simplistic encoding 1 -> a byte '1', 2 -> a byte '2'.
A 664,577-byte file didn't even compress by a factor of 8 using the
Winzip default. I think I got 84% reduction with the PPmD option;
but that leaves 16%, or 0.16 > 1/8.

The obvious thing is to code in binary, and then compress. But I've
put that on hold. Another possibility is to develop a custom
compressor based on all of Tim Peters pretty considerable
computations and comments. It's not clear to me how much
pseudo-entropy is lost for subsequences of length ~= pi(10^9) using
the correlations he found. I think a realistic probabilistic model
for these correlations could be based on Markov chains, but I don't
know how to compute entropies of Markov chains or processes.


Isn't the obvious compression algorithm for this sequence the one you
give above?

The sequence is given by a simple algorithm. All you need to know is
how many times you want to repeat that algorithm to generate the
initial segment. So it seems to me that you can compress a sequence
of length n to one of log_2(n) + C or so (where C accounts for the
code that calculates the next prime and then mods by 3).

Right. What I would like to know is how well
general purpose compressors (that don't try to find
correlations with sequences of primes) can do.

In the best of worlds, we would like to pack blocks of 8 consecutive
values of James's sequence into one byte, which is 8 bits. After
that, we can compress this packed file using any number of
compression programs. What I did amounted to getting a byte
which prints to a '1' or a '2'. In ASCII, a character '1' is
49 in decimal, 31 in hexadecimal and 00110001 as a block of 8 bits.
Similarly, a character of '2' is 50 decimal, 32 hex. or 00110010 as a
block of 8 bits. So any compressor looking at my 664,577-byte file
will see bunches of '00110001's and '00110010's , where each
bunch represents one term of James's sequence. I'd expect a file
with 100,000,000 terms of James's sequence to be compressed
to a greater extent. But I am making some progress towards
packing 8 terms in one byte. Maybe PPM or whatever can do
better if it's all binary:
http://en.wikipedia.org/wiki/Prediction_by_partial_matching

Or am I just mistaken on what you're doing. I assume you're looking
at the Chaitin/Kolmogorov/Solovay complexity of the sequence to see
how random it is in their sense. But since the sequence is defined by
a simple algorithm, the answer is "not at all".

(see above)

David Bernier
.



Relevant Pages

  • Re: Probability, compressible sequences: Backgrounds
    ... noise is caused simply because people do not use adequate definitions ... A "specific sequence" cannot be random. ... How can you compress a specific sequence wthin the decompressor to ... external trigger (me, dropping the coin). ...
    (comp.compression)
  • Re: Compression leads to encryption NEW COMPRESSION METHOD!
    ... Can you please present a simple pseudo code example of your alorithm? ... Looking at your PPT, it seems you have a simple substitution system inplace. ... in a 16 bit sequence, and replace it with the most compressible (via my ... still get 5 to 8 cycles before we can no longer compress, ...
    (sci.crypt)
  • Re: Probability, compressible sequences: Backgrounds
    ... In the compressed sequence you need to insert also the decompressor ... How can you compress a specific sequence wthin the decompressor to ... I'm not using the coin experiment to define what a random process is. ... the only reason why we use compressors is that we model data ...
    (comp.compression)
  • Re: JSH: The will to lie
    ... is more likely to be followed by a residue of 2> ... The obvious thing is to code in binary, and then compress. ... Isn't the obvious compression algorithm for this sequence the one you ... The sequence is given by a simple algorithm. ...
    (sci.math)
  • Re: Effective Lossy Compression of Bitstream
    ... I want to compress lossy this bitstring, ... the generated sequence is bounded by a threshold? ... then I could transmit the function to recreate the sequence. ...
    (comp.compression)