- You are here
- Comp Newsgroups Archived.At
- comp.compression
- 2019 May
- A Random Data Compressor to solve

Subject: | A Random Data Compressor to solve |

Posted by: | Gerald Tamayo (comp…@gmail.com) |

Date: | Sat, 18 May 2019 |

Do you have a library to do arithmetic using Very Large Integers, BigNums o=

r BigInts, or Infinite-Precision Integers? Then maybe you can use it for th=

e compression algorithm i describe here. But really, we need to have small =

integers (frequencies) as much as possible to avoid very long computations.=

=20

The random data compression algorithm is best understood if you "visualize"=

a bar chart or a histogram, where new symbol frequencies are always trying=

to become greater than the current highest frequency, which we increment b=

y its delta with the new symbol's frequency. The new highest frequency beco=

mes the new symbol's frequency; or put simply, the new symbol must have the=

highest frequency. So at most, the new highest frequency can only "double"=

or add by 1 bit in length. (In decoding, the symbol with the highest frequ=

ency is the symbol to decode; this means it is stack based. We add the delt=

a to the highest frequency during encoding so we can preserve or get back t=

o the previous frequency of the symbol when decoding.) Output is actually t=

he frequency table, which is easy to compress or generate?=20

Algorithm pseudo-code:=20

/* initialize frequency table. */=20

for (i=3D0; i < 256; i++) freq[i] =3D i + 1;=20

max =3D freq[255];=20

do {=20

c =3D get_byte(infile);=20

if (c =3D=3D EOF) break;=20

freq[c] =3D max + (max - freq[c]);=20

max =3D freq[c];=20

} while (1);=20

No "runs" of a single character allowed in the input, no "run-lengths" as m=

uch as possible. "Random data" indeed.=20

New or recalled observations:=20

1. This algorithm ironically "expands" the frequencies at first. ? LOL=20

We're back to the early days of information theory or data compression hist=

ory!=20

2. The bombshell: It takes more than 1 bit added to encode for very small =

frequencies which suddenly must be maximum. The solution might be to "swap"=

them but this requires new information or codes. This is back to delta cod=

ing. haist=20

3. But a total cycling of the frequency table might work...=20

4. Instead of 8-bit bytes, use 4-bit symbols;=20

***=20

This is similar, i think, to WEB Technologies' algorithm as featured in BYT=

E magazine in 1992 and noted by comp.compression FAQ:

"WEB, in fact, says that virtually any amount of data can be=20

squeezed to under 1024 bytes by using DataFiles/16 to compress=20

its own output multiple times."=20

I think they were using or playing with a frequency table too, 256 32-bit f=

requencies =3D 1K.=20

They might had to output the MSbit of the highest frequency, the result of =

which may equal another byte frequency/ies?=20

That's why they had the problem that 4 numbers in a matrix are equal, a rar=

e case in their algorithm.=20

Just maybe.=20

(Ideally, at most 1 bit increase in frequency of output or new symbol, but =

the bombshell precludes that. If they are of the same bitsize, then only 1 =

bit increase in the new max frequency.=20

The current symbol has always the highest frequency.=20

You decode backwards, from last symbol to first; the symbol with the highes=

t frequency is the current symbol.

One parameter in decoding is the famed file_size().=20

)=20

The problem with the algorithm is that the emitted frequency table could be=

very large due to very large frequencies if you implement it by really usi=

ng BigNums or BigInts;=20

You then have to compress the very large frequency table.

Maybe to achieve compression, you can just consider the MSBit after the ari=

thmetic (addition) operation.=20

Or the solution is nearly just MTF (you have to output the character that *=

doubled* (MSBit activated)).=20

WEB Technologies' Datafiles/16 algorithm is clearly designed for compressio=

n of *random* data, and recursive, which are futile indeed.=20

- A Random Data Compressor to solve posted by Gerald Tamayo on Sun, 19 May 2019