fast compression of MN's 415k test file, down to 5k bytes

Giganews Newsgroups
Subject: fast compression of MN's 415k test file, down to 5k bytes
Posted by:  jules Gilbert (jules.stoc…@gmail.com)
Date: Sat, 6 Jun 2009

(I have other methods that are not affected by the "counting problem",
and will happily drop a supplied input file down to only several
bytes.  Of course those methods are more than a hundred times slower
than this method, which, nicely, requires only tiny amounts of
memory;  Of course one reason it's so nice is that bzip is a pretty
well organized program. -- As I write this I am still shell'ing out to
make use of bzip2, but expect to take a week soon and deal with that.
Then demo's!!!

If you live in New England and want to come to one of our
demonstration's, send me a resume especially noting your current
employer.

However my production method, shown here, can reduce the file size of
all files to about 5k bytes, without regard for the content (I simply
XOR the input before each process, so all input appears random to the
core process.)  And a real limit exists -- it's just not the kind of
limit that several people who post here have maintained (sometimes I
think they just enjoy being 'nay-sayer's, and don't care one bit for
the advancement of science.)

With BZIP, which I consider an excellent compressor -- really, among
the best, though I know that the PPM statistical methods can do much
better, but I had to standardize on something, and I think I made an
excellent choice, bzip2.  Anyway, with this compressor several files,
two movie files and various other data sets each stop compressing at
about 5k bytes.

(See listings.)

In this example I used Mark Nelson's test file as the original input.

This method will work with any input, most especially output files
produced as the result of conventional file compression.

Notice, after about fifty iterations this method doesn't perform in a
stable fashion.  It simply stops compressing.  Still, it made Mark's
file about 82 times smaller.  More important, it will take any random-
appearing file and drop it to about the same size, say, about 5k
bytes.

And it's very very fast!

And requires little memory.

415241 Jun  6 10:28 MNtestfile
=================================
339825 Jun  6 10:55 temp0001.tmp
277071 Jun  6 10:55 temp0002.tmp
228994 Jun  6 10:56 temp0003.tmp
189022 Jun  6 10:56 temp0004.tmp
156938 Jun  6 10:56 temp0005.tmp
131860 Jun  6 10:56 temp0006.tmp
110192 Jun  6 10:56 temp0007.tmp
  93373 Jun  6 10:56 temp0008.tmp
  79046 Jun  6 10:56 temp0009.tmp
  67666 Jun  6 10:56 temp0010.tmp
  58502 Jun  6 10:56 temp0011.tmp
  48800 Jun  6 10:56 temp0012.tmp
  42463 Jun  6 10:56 temp0013.tmp
  35973 Jun  6 10:56 temp0014.tmp
  30272 Jun  6 10:56 temp0015.tmp
  26652 Jun  6 10:57 temp0016.tmp
  23740 Jun  6 10:57 temp0017.tmp
  20921 Jun  6 10:57 temp0018.tmp
  18263 Jun  6 10:57 temp0019.tmp
  15971 Jun  6 10:57 temp0020.tmp
  14234 Jun  6 10:57 temp0021.tmp
  13305 Jun  6 10:57 temp0022.tmp
  12018 Jun  6 10:57 temp0023.tmp
  10940 Jun  6 10:57 temp0024.tmp
  10054 Jun  6 10:57 temp0025.tmp
  9331 Jun  6 10:57 temp0026.tmp
  8460 Jun  6 10:57 temp0027.tmp
  7878 Jun  6 10:57 temp0028.tmp
  7644 Jun  6 10:57 temp0029.tmp
  7171 Jun  6 10:57 temp0030.tmp
  6893 Jun  6 10:57 temp0031.tmp
  6601 Jun  6 10:57 temp0032.tmp
  6295 Jun  6 10:57 temp0033.tmp
  6121 Jun  6 10:57 temp0034.tmp
  5685 Jun  6 10:57 temp0035.tmp
  5591 Jun  6 10:57 temp0036.tmp
  5601 Jun  6 10:57 temp0037.tmp
  5717 Jun  6 10:57 temp0038.tmp
  5651 Jun  6 10:57 temp0039.tmp
  5514 Jun  6 10:57 temp0040.tmp
  5326 Jun  6 10:57 temp0041.tmp
  5253 Jun  6 10:57 temp0042.tmp
  4941 Jun  6 10:57 temp0043.tmp
  4906 Jun  6 10:57 temp0044.tmp
  4734 Jun  6 10:57 temp0045.tmp
  4754 Jun  6 10:57 temp0046.tmp
  4761 Jun  6 10:57 temp0047.tmp
  4796 Jun  6 10:57 temp0048.tmp
  4934 Jun  6 10:57 temp0049.tmp
  5108 Jun  6 10:57 temp0050.tmp
  5207 Jun  6 10:57 temp0051.tmp
  5255 Jun  6 10:57 temp0052.tmp
  5424 Jun  6 10:57 temp0053.tmp
  5293 Jun  6 10:57 temp0054.tmp
  5053 Jun  6 10:57 temp0055.tmp
  4932 Jun  6 10:57 temp0056.tmp
  4772 Jun  6 10:57 temp0057.tmp
  4894 Jun  6 10:57 temp0058.tmp
  4843 Jun  6 10:57 temp0059.tmp
  4663 Jun  6 10:57 temp0060.tmp
  4674 Jun  6 10:57 temp0061.tmp
  4825 Jun  6 10:57 temp0062.tmp
  4817 Jun  6 10:57 temp0063.tmp
  4880 Jun  6 10:57 temp0064.tmp
  4799 Jun  6 10:57 temp0065.tmp
  4699 Jun  6 10:57 temp0066.tmp
  4754 Jun  6 10:57 temp0067.tmp
  4708 Jun  6 10:57 temp0068.tmp
  4705 Jun  6 10:57 temp0069.tmp
  4697 Jun  6 10:57 temp0070.tmp
  4867 Jun  6 10:57 temp0071.tmp
  4829 Jun  6 10:57 temp0072.tmp
  4658 Jun  6 10:57 temp0073.tmp
  4626 Jun  6 10:57 temp0074.tmp
  4660 Jun  6 10:57 temp0075.tmp
  4809 Jun  6 10:57 temp0076.tmp
  4920 Jun  6 10:57 temp0077.tmp
  4894 Jun  6 10:57 temp0078.tmp
  4997 Jun  6 10:57 temp0079.tmp
  4871 Jun  6 10:57 temp0080.tmp
  4989 Jun  6 10:57 temp0081.tmp
  4958 Jun  6 10:57 temp0082.tmp
  4919 Jun  6 10:57 temp0083.tmp
  4762 Jun  6 10:57 temp0084.tmp
  4900 Jun  6 10:57 temp0085.tmp
  4934 Jun  6 10:57 temp0086.tmp
  4880 Jun  6 10:57 temp0087.tmp
  4867 Jun  6 10:57 temp0088.tmp
  4791 Jun  6 10:57 temp0089.tmp
  4815 Jun  6 10:57 temp0090.tmp
  4718 Jun  6 10:57 temp0091.tmp
  4786 Jun  6 10:57 temp0092.tmp
  4715 Jun  6 10:57 temp0093.tmp
  4704 Jun  6 10:57 temp0094.tmp
  4783 Jun  6 10:57 temp0095.tmp
  4896 Jun  6 10:57 temp0096.tmp
  5135 Jun  6 10:57 temp0097.tmp
  5130 Jun  6 10:57 temp0098.tmp
  5044 Jun  6 10:57 temp0099.tmp

Replies