Re: How much lossless compression is possible in images?

From: Mxsmanic (mxsmanic_at_hotmail.com)
Date: 12/07/04


Date: Tue, 07 Dec 2004 05:18:39 +0100

Richard Tobin writes:

> Why?

Because compression only works if some messages of a given length are
more probable than others of that same length. If messages are
completely random, they are all equally probable. And if random
messages must be preserved bit-by-bit, it is not possible to represent
them with fewer bits than they originally contained. The entropy of the
messages must be preserved.

> Consider the algorithm that precedes all input strings with a 1-bit.
> It makes all strings longer.

This isn't really a compression algorithm in the sense being discussed
here.

A compression algorithm is simply a substitution algorithm. It performs
a one-for-one substitution of variable-length messages from one set A
for fixed-length messages from another set B. Set A is designed such
that the messages it contains corresponding to the most probable
messages in B are shorter than the fixed-length of the messages in B;
all other messages in A are longer. Since the most probable messages in
B are translated to shorter messages in A, the algorithm effectively
compresses messages. But this only works if the messages in B are not
all equally probable. If they are equally probable (as they must be if
they are random), there is no way to compress them, because a longer
message from A is just as likely to be substituted as a shorter message.

Note that the total length of all messages in set A must be at least
equal to the total length of all messages in set B. So if random
substitution of messages from A is forced by equal probability for all
messages from B (random messages in B), then the net result is zero
compression.

> Perhaps you don't consider that a compression algorithm.

I don't. A more rigorous definition can be found above.

> So maybe you mean an algorithm that, on average, doesn't change the
> length (which is the "best" kind of algorithm).

All algorithms will produce output of equal or greater length than input
if the input is random.

> Maybe it makes one
> very long string small, and lots of others just lightly bigger. This
> will obviously not make 50% of images smaller.

It will, however, guarantee that the average length change for all
messages is zero, which is nearly the same thing from a practical
standpoint.

-- 
Transpose hotmail and mxsmanic in my e-mail address to reach me directly.


Relevant Pages

  • Re: Paying for assistance
    ... forget about the input being 8 bit sequences. ... This is just uninteresting, and forget about the dictionary, this is only a technicality and we can assume without loss of generality that the input is sorted by probability. ... The expected lengths of files 1,2,3 are then given by multiplying the expected bit count by the entropy of the given file, i.e. ... them 'bait' for a compression algorithm. ...
    (comp.compression)
  • Compression is Equivalent to General Intelligence
    ... intelligence and compression that needs to be cleared up. ... (encoded as strings). ... The only possible probability distribution over ... The problem of finding the shortest program consistent with an agent's ...
    (comp.ai)
  • Re: optimizing huffman coding (especially for natural language)
    ... discover a set of symbols such that the probability of each symbol is ... Sounds like Shanon-Fano algorithm, ... improves compression a lot. ...
    (comp.compression)
  • Re: kosher jpgs
    ... compress a jpg with its proprietary jpg compression method. ... It is already compressed using a lossy compression algorithm. ... Lossy compression allows you to throw out entropy, while lossless only minimizes non-entropy data. ... definition of probability, and probability is only definable with ...
    (comp.lang.java.programmer)
  • Re: Compression by Use of Info String
    ... messages, you need to establish that probability of occurrence of each message, ... The PPMD encoding assumes that there ... So "Optimal Compression" is at best a theoretical idea, ... practical approach for achieving in practice. ...
    (comp.programming)