Oddities of popular archivers

Subject: Oddities of popular archivers
Posted by:  Elhana (tanarriscour…@yahoo.com)
Date: Wed, 10 Jul 2019

I used some popular archivers to compress a text file, and results surprise=
d me quite much.

The worst contender turned out to be gzip, with average 60% reduction. Inte=
restingly, the UTF-8 version is compressed worse than ISO one, with about 1=
5% overhead. I was under impression that both files contain the same amount=
of information, so they should compress to a comparable amount.

The next result belongs to PKZIP. It managed to compress each file about 40=
bytes better than gzip. (the gzip header was 25 bytes long).

The next result belongs to xzip. It managed with the UTF-8 text much better=
, giving only 8% overhead (which is still too much in my opinion). Average =
compression was 70%.

Next comes 7-zip, with default settings, which failed spectacularly on UTF-=
8 file, which turned out 8k more than xzip one. The other files compressed =
about 400 bytes better.

The silver prize went to bzip2, with its impressive 72% compression. Surpri=
singly, it processed the UTF-8 file even better, with only 5% overhead.

And the undisputed champion was WinRAR, with 81% compression.

The following questions arose:

* Why does xzip suck?
* Why UTF-8 is not supported by mainstream compression software?
* And why proprietary compression software so easily outperforms the 'free'=