A Random Data Compressor to solve

Giganews Newsgroups
Subject: A Random Data Compressor to solve
Posted by:  Gerald Tamayo (comp…@gmail.com)
Date: Sat, 18 May 2019

Do you have a library to do arithmetic using Very Large Integers, BigNums o=
r BigInts, or Infinite-Precision Integers? Then maybe you can use it for th=
e compression algorithm i describe here. But really, we need to have small =
integers (frequencies) as much as possible to avoid very long computations.=

The random data compression algorithm is best understood if you "visualize"=
a bar chart or a histogram, where new symbol frequencies are always trying=
to become greater than the current highest frequency, which we increment b=
y its delta with the new symbol's frequency. The new highest frequency beco=
mes the new symbol's frequency; or put simply, the new symbol must have the=
highest frequency. So at most, the new highest frequency can only "double"=
or add by 1 bit in length. (In decoding, the symbol with the highest frequ=
ency is the symbol to decode; this means it is stack based. We add the delt=
a to the highest frequency during encoding so we can preserve or get back t=
o the previous frequency of the symbol when decoding.) Output is actually t=
he frequency table, which is easy to compress or generate?=20

Algorithm pseudo-code:=20

/* initialize frequency table. */=20
for (i=3D0; i < 256; i++) freq[i] =3D i + 1;=20
max =3D freq[255];=20

do {=20
  c =3D get_byte(infile);=20
  if (c =3D=3D EOF) break;=20
  freq[c] =3D max + (max - freq[c]);=20
  max =3D freq[c];=20
} while (1);=20

No "runs" of a single character allowed in the input, no "run-lengths" as m=
uch as possible. "Random data" indeed.=20

New or recalled observations:=20

1.  This algorithm ironically "expands" the frequencies at first. ? LOL=20

We're back to the early days of information theory or data compression hist=

2.  The bombshell: It takes more than 1 bit added to encode for very small =
frequencies which suddenly must be maximum. The solution might be to "swap"=
them but this requires new information or codes. This is back to delta cod=
ing. haist=20

3.  But a total cycling of the frequency table might work...=20

4. Instead of 8-bit bytes, use 4-bit symbols;=20


This is similar, i think, to WEB Technologies' algorithm as featured in BYT=
E magazine in 1992 and noted by comp.compression FAQ:
"WEB, in fact, says that virtually any amount of data can be=20
squeezed to under 1024 bytes by using DataFiles/16 to compress=20
its own output multiple times."=20

I think they were using or playing with a frequency table too, 256 32-bit f=
requencies =3D 1K.=20

They might had to output the MSbit of the highest frequency, the result of =
which may equal another byte frequency/ies?=20

That's why they had the problem that 4 numbers in a matrix are equal, a rar=
e case in their algorithm.=20

Just maybe.=20

(Ideally, at most 1 bit increase in frequency of output or new symbol, but =
the bombshell precludes that. If they are of the same bitsize, then only 1 =
bit increase in the new max frequency.=20

The current symbol has always the highest frequency.=20

You decode backwards, from last symbol to first; the symbol with the highes=
t frequency is the current symbol.

One parameter in decoding is the famed file_size().=20

The problem with the algorithm is that the emitted frequency table could be=
very large due to very large frequencies if you implement it by really usi=
ng BigNums or BigInts;=20
You then have to compress the very large frequency table.

Maybe to achieve compression, you can just consider the MSBit after the ari=
thmetic (addition) operation.=20
Or the solution is nearly just MTF (you have to output the character that *=
doubled* (MSBit activated)).=20

WEB Technologies' Datafiles/16 algorithm is clearly designed for compressio=
n of *random* data, and recursive, which are futile indeed.=20