Re: Efficient Floating Point Compression?!

Giganews Newsgroups
Subject: Re: Efficient Floating Point Compression?!
Posted by:  cr88192 (cr881…@NOSPAM.hotmail.com)
Date: Wed, 3 May 2006

"Frashman" <frank.ro…@stud.tu-ilmenau.de> wrote in message
news:e37sin$htm$…@newsserver.rz.tu-ilmenau.de...
> Hello,
>
> I am search for some informations about Floating-Point Compression.
>
> Want pack and unpack huge masses of Floating Point Numbers.
>
lossy or losslessly?...
within a limited or broad range?...
any likely statistical relations between the numbers?...
....

>
> Have anybody an Idea how to do this effectiv?
> - tools?
> - algorithms?
> - benchmarks?

sorry, this is not my area of expertise, so my thoughts may not be that
useful.

a lot depends on what you want.

for many things I do, I need neither that much range nor that much accuracy,
so I often just convert the numbers to fixed-point (or integers). this saves
space vs storing full floats, and is simple.

another known representation is the 'hfloat' or half-float, which is
basically a 16 bit floating point format (not all that accurate, or common,
but it is 1/2 the size of a float).

also possible and common in some things is merging duplicate vectors (then
again, I don't know if you have duplicates, or if the floats are part of
vectors).

more agressive teqniques (lossless):
it is also possible to develop a custom representation for said floats, for
example, via huffman coding the exponent, and maybe higher-order bits of the
mantissa (depending on the data, the lower order bits are likely to either
be some simple pattern, or something near-random). another huffman table for
all the remaining mantissa bits could help, but I am unsure how much.

eg:
    sign, encoded as a bit;
    exponent, coded using an exponent table;
    high 8, coded with first mantissa table;
    med 8, low 7 (shl 1), coded with second mantissa table.

additionally, one could get by only using a single huffman table, eg, by
maintaining 3 or 4 MTF contexts (will save managing multiple tables, but
will make it slower).

a slight possible hack could be remembering whatever byte followed each byte
"last time", and then pull that byte to the front in the next context, which
in some cases could help with the ratios (depending on how strongly values
are associated), but this may not be worth the extra overhead.

additionally possible would be to modify a paq-style arithmetic coder for
use on floating point data, which may well give the best ratios, but is
likely to encode/decode more slowly than using huffman tables (but quite
possibly faster than using MTF), and (for any reasonable sized context)
require more memory (a 12 to 16 bit context would probably be sufficient, at
the cost of up to good 128 or 256kB of memory for a 16 bit context...).

lossy:
the previous ideas can be combined, eg, if one is willing to put up with
some loss, they can code the values in less space.

an 16-bit fixed-point number or hfloat is likely to compress quite
effectively with  arithmetic coding and a 16bit context (given the whole
number fits in context, vs just part of it).

likewise, using 2 huffman tables is also likely to turn out fairly well
(albeit, probably not as good as if some context were retained from the high
byte when coding the low one, but oh well...).

also possible is that if one has fixed point numbers with a much larger
number of high-order bits than low-order ones (or the values are highly
clustered near 0), then the "prefix" and "extra" bits scheme may be fairly
effective.

or something...

>
> Thank you very much!
>
> Best Regards, Frank

Replies

In response to

Efficient Floating Point Compression?! posted by Frashman on Tue, 02 May 2006