Re: Word count of minimum vocabulary



Am Sat, 08 Jul 2006 12:10:18 +0800 schrieb Lee Sau Dan:

"Richard" == Richard Wordingham <jrw0602@xxxxxxxxxxx> writes:

Richard> It depends on how you handle the dictionary. If each
Richard> file has its own dictionary, then that is not an issue.

Don't the LZ-based algorithms essentially build a brand new dictionary
for every compression session? And this dictionary is not restricted
to English words. It can also encode highly frequent digraphs (like
"an", "on") and trigraphs (such as "the", "ion") as well as frequently
occurring phrases. It considers a superset of combinations than your
algorithm as candidates for the selected tokens, and chooses the best
ones from this superset. You have to explain why your algorithm can
do better by not considering some potentially good candidates that LZ*
would examine.

The LZ77-based algorithms build the dictionary dynamically, so it is
growing as they go through the text. (LZ78 uses the "sliding window",
which is similar in effect). So if you have a static dictionary that
fits well to your text, compression rates will be better, in
particular in the beginning. Of course only if you don't have to
transmit the dictionary along with the text.

JOachim
.



Relevant Pages

  • Re: UTC Time
    ... >> the wall clock was showing back then and there. ... Richard, ... In addition to the algorithms that Dr John and others have ...
    (borland.public.delphi.non-technical)
  • Vacancy: Assistant/Associate Professor Algorithmics TUD
    ... Full-time Assistant Professor/Associate Professor in Algorithms ... He or she will also supervise PhD candidates in the group. ... the appointee will teach advanced algorithms in the Master's phase. ...
    (comp.theory)
  • Re: Books on applied algorithms with less theory/proofs?
    ... algorithms, but that focus less on theory and proofs. ... for some candidates. ... My budget did not allow be to buy all those ... Roedy Green Canadian Mind Products ...
    (comp.lang.java.programmer)