|Subject:||Oddities of popular archivers|
|Posted by:||Elhana (tanarriscour…@yahoo.com)|
|Date:||Wed, 10 Jul 2019|
I used some popular archivers to compress a text file, and results surprise=
d me quite much.
The worst contender turned out to be gzip, with average 60% reduction. Inte=
restingly, the UTF-8 version is compressed worse than ISO one, with about 1=
5% overhead. I was under impression that both files contain the same amount=
of information, so they should compress to a comparable amount.
The next result belongs to PKZIP. It managed to compress each file about 40=
bytes better than gzip. (the gzip header was 25 bytes long).
The next result belongs to xzip. It managed with the UTF-8 text much better=
, giving only 8% overhead (which is still too much in my opinion). Average =
compression was 70%.
Next comes 7-zip, with default settings, which failed spectacularly on UTF-=
8 file, which turned out 8k more than xzip one. The other files compressed =
about 400 bytes better.
The silver prize went to bzip2, with its impressive 72% compression. Surpri=
singly, it processed the UTF-8 file even better, with only 5% overhead.
And the undisputed champion was WinRAR, with 81% compression.
The following questions arose:
* Why does xzip suck?
* Why UTF-8 is not supported by mainstream compression software?
* And why proprietary compression software so easily outperforms the 'free'=