One Way To Output LZ77 codes

Giganews Newsgroups
Subject: One Way To Output LZ77 codes
Posted by:  Gerald Tamayo (comp…@gmail.com)
Date: Wed, 12 Feb 2020

Reduced Length LZ (RLLZ)

Let me share here my one way to output LZ77 codes, which may improve LZ77/L=
ZSS or LZW.

LZ77:

LZ77 coding transmits <offset, length> codes. Many times in the compression=
process, the same matching string is encoded with an <offset> and the same=
<length> code. We can avoid transmitting the <length> code for many string=
s as well as not outputting a bit to identify a literal (as in LZSS) by the=
following:=20

Instead, read the whole input or block  and gather duplicated or similar st=
rings and encode <the whole string> (e.g., <length of the string (n>=3D2)> =
plus the actual string of characters) only this one time, <the number of th=
e same matching strings (i.e., number of following offset codes)>, and its =
succeeding occurrences in the input block by transmitting only the said <of=
fset> codes. (Or you can use an escape code to end the offset codes, perhap=
s BLOCK_SIZE.) Do this for all duplicated strings. This is actually a "Redu=
ced Length LZ (RLLZ)".=20

The literals are outputted last *without bit flags* since they fall into th=
e block buffers not covered or "not activated" by encoded strings. So just =
one array of literals can be outputted maybe at the end of the file, or, si=
nce block-based coding is more practical for shorter offset codes, at the e=
nd of the appropriately-sized block of <offsets>.

During decoding, the strings are written first in the output or write buffe=
r using the offsets, and the literals "fill in" the unwritten positions or =
"holes" in the write buffer. This makes very compact LZ.

                ***

No need to output bit flags for literals or the number of following consecu=
tive literals in some algorithms.=20

Then the completely filled write buffer is written to file.=20

That is, e.g. after gathering all distinct repeated strings on a block,=20

1) no transmission of <length> code for the next occurrences of the same st=
ring (but, in the simplest way, you have to output the number of succeeding=
strings);=20

2) transmitting the literals last (in the output buffer) means no need to t=
ransmit *bit flags* for literals (and matches). This is the novel or surpri=
sing idea here: deferred literals output;=20

3) if you know (LZT) LZ-Tamayo (2008) algorithm where it was demonstrated c=
omplete exclusion of the <length> code, this might as well be "LZ-Tamayo2".=
Should improve LZ77/LZSS/LZW based compressors. Decoding is also straight-=
forward.=20

Sorry that only now after a decade of LZT (2008 ) i am releasing this. I st=
opped coding compression in 2010 or 2011. (Note: it seems there is already =
"LZT" before i named my algorithm. LZ-Tischer.)=20
***

I have similar ideas as "rep-codes" in 2000s. My compression ideas are from=
the late 90s when i became interested again in data compression.=20

The ones i call here "holes" they call "gaps", even in LZW. But the idea of=
deferring transmission of literals for an output buffer avoids bit flags f=
or both literals and matches which is still used in some explanations of RO=
LZ. Other algorithms still need to output the number of following literals,=
or literal_length. Deferred, meaning this is not an "online" algorithm.=20

-- Gerald R. Tamayo

(reposted, edited from Sept. 2018)

Replies