Re: More of an Algorithems question



On Sun, 20 Nov 2005 02:04:07 GMT, LC Killingbeck
<NOTlynn.killingbeck@xxxxxxxxxxxxxxxx> wrote:

>quasi <quasi@xxxxxxxx> wrote in news:d19qn1pbiaukij1tmslcr6p4e1e2hda5b5@
>4ax.com:
>
>> On Fri, 18 Nov 2005 00:11:37 GMT, LC Killingbeck
>> <NOTlynn.killingbeck@xxxxxxxxxxxxxxxx> wrote:
>>
>>>quasi <quasi@xxxxxxxx> wrote in
>>>news:t7ipn118hl17f4p3sbtetcnsqv7jj250f3@xxxxxxx:
>>>
>>>> On Thu, 17 Nov 2005 13:13:06 -0500, quasi <quasi@xxxxxxxx> wrote:
>>>>
>>>>>On Thu, 17 Nov 2005 17:59:28 +0000 (UTC), Dave Seaman
>>>>><dseaman@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>>On Thu, 17 Nov 2005 12:51:17 -0500, quasi wrote:
>>>>>>> On Thu, 17 Nov 2005 17:21:45 +0000 (UTC), Dave Seaman
>>>>>>><dseaman@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>>>On Thu, 17 Nov 2005 12:15:12 -0500, quasi wrote:
>>>>>>>>> On Thu, 17 Nov 2005 15:07:58 GMT, cri@xxxxxxxx (Richard Harter)
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>>On 15 Nov 2005 14:55:51 -0800, "Filter"
>>>>>>>>>><filtermedialtd@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>
>>>>>>>>>>>How can I proove that no Algorithm can compress every file of
>>>>>>>>>>>length 10^6?
>>>>>>>>>>
>>>>>>>>>>However there is an algorithm that will losslessly compress
>every
>>>>>>>>>>file of length 10^6 except one.
>>>>>>>>>>
>>>>>>>>
>>>>>>>>> I don't believe that.
>>>>>>>>
>>>>>>>>> The pigeonhole principle applies easily to invalidate such a
>>>>>>>>> claim ...
>>>>>>>>
>>>>>>>>> Let n=10^6.
>>>>>>>>
>>>>>>>>> There are 2^n possible files with n bits, but only 2^(n-1) files
>>>>>>>>> with n-1 bits. Hence, at most half the files can be compressed.
>>>>>>>>
>>>>>>>>There are 2^n-1 files with fewer than n bits. Hence, the claim
>>>>>>>>stands.
>>>>>>
>>>>>>> Not by my count.
>>>>>>
>>>>>>> For example, 16 possible files with 4 bits but only 8 possible
>>>>>>> files with 3 bits, no?
>>>>>>
>>>>>>Are there any other files with fewer than 4 bits that you have
>>>>>>overlooked in your count?
>>>>>
>>>>>Ok, I see my confusion.
>>>>>
>>>>>I failed to take into account files with less than (n-1) bits.
>>>>>
>>>>>I get it now -- thanks.
>>>>>
>>>>>quasi
>>>>
>>>> Ok, but then along the lines of what I was thinking, the following
>>>> claim is valid:
>>>>
>>>> In any lossless compression scheme, less than half of the files with
>n
>>>> or fewer bits can be compressed.
>>>>
>>>> quasi
>>>
>>>Isn't "less than half" even stronger as "none"? A 1-bit file cannot be
>>>compressed, since the only smaller file (0 bits) is needed to
>>>represent itself. Then a 2-bit file cannot be compressed, since all of
>>>the 0- and 1-bit files are needed to represent themselves. Induction.
>>
>> Yes, I saw that the two 1-bit files were a problem as far compression,
>> but I was assuming that a compression scheme is allowed to compress
>> some files while potentially expanding others. Thus, a compression
>> scheme could map some files of <=n bits to more than n bits, and, as
>> you point out, something like that is necessary. If anything gets
>> smaller, something has to get bigger -- I guess it's a kind of
>> conservation law for bits.
>>
>> quasi
>>
>
>Can one conserve the total size, over all files? For example, suppose we
>have all 5-bit and smaller files, and some 5-bit file compresses to a
>2-bit file. The original 2-bit file has to become larger - and there is a
>natural empty slot where the uncompressed 5-bit file came from. The net
>effect is that the 2-bit and the 5-bit file are simply interchanged, with
>one compressing and the other growing, while the total size over all
>files remains constant.
>
>If the above holds, and if there is a theorem that any random
>permutation of the integers 1,2,3,...,N can be re-ordered (or, even,
>placed into any other order) by pair-wise interchanges, then the
>result should be a theorem that the total size can be retained (or,
>of course made bigger). The truth of the pair-wise swap idea seems
>obvious: works trivially for N=2; and for N>2 do a swap that moves
>any number into its final position, reducing the remaining numbers
>to the same problem with only (N-1) to re-order.
>
>Lynn Killingbeck

Yes, as you suggest, any permutation on the set of n-or-less bit files
can be used to define a compression scheme.

Note that if such a compression scheme is used on the full set of
n-or-less bit files, less than half of the files will get smaller.

Of course, as I pointed out earlier in this thread, the pigeonhole
principle implies that in any compression scheme, less than half of
the full set of n-or-less bit files will get smaller.

Also, for the permutation scheme, at least one set must also end up
with exactly the same number of bits, so might as well let the
permutation fix that one.

quasi
.



Relevant Pages

  • Re: how do I do DCT on a 100x100 image?
    ... huhua wrote: ... lower frequencies less and usually leaves the DC element alone). ... compression techniques which are usually lossless. ... typical compression scheme you will get much greater data ...
    (comp.dsp)
  • Re: Optimize file
    ... not supported (has to do with compression scheme used LZW ... apps I have on my PPC. ... support for them was simply dropped from PPC image viewers ...
    (microsoft.public.pocketpc)
  • Re: Data encryption 360 degrees the nsa cannot break -- 01
    ... :_very_ good compression ratio would be 10000:1. ... An *average* lossless compression ratio of 10000:1 is not possible. ... Examined over all possible input files, no compression scheme can ... But that's over all possible input files. ...
    (comp.security.misc)
  • Re: compression API available in Java & C++?
    ... > I'm looking for a compression algorithm that works better than gzip ... No general-purpose compression scheme is going to be very much help for such ... The gzip compression scheme can be pre-trained ...
    (comp.lang.java.programmer)
  • Re: AES 256 key and anti-key
    ... the same permutation, and there is nothing wrong about that. ... Guaranteeing a surjective mapping from the set ... One doesn't even need to worry about anyting. ... Same with Burrows Wheeler compression systmes it bugged me ...
    (sci.crypt)