- You are here
- Comp Newsgroups Archived.At
- comp.compression
- 2006 May
- Re: Efficient Floating Point Compression?!

Subject: | Re: Efficient Floating Point Compression?! |

Posted by: | cr88192 (cr881…@NOSPAM.hotmail.com) |

Date: | Wed, 3 May 2006 |

"Frashman" <frank.ro…@stud.tu-ilmenau.de> wrote in message

news:e37sin$htm$…@newsserver.rz.tu-ilmenau.de...

> Hello,

>

> I am search for some informations about Floating-Point Compression.

>

> Want pack and unpack huge masses of Floating Point Numbers.

>

lossy or losslessly?...

within a limited or broad range?...

any likely statistical relations between the numbers?...

....

>

> Have anybody an Idea how to do this effectiv?

> - tools?

> - algorithms?

> - benchmarks?

sorry, this is not my area of expertise, so my thoughts may not be that

useful.

a lot depends on what you want.

for many things I do, I need neither that much range nor that much accuracy,

so I often just convert the numbers to fixed-point (or integers). this saves

space vs storing full floats, and is simple.

another known representation is the 'hfloat' or half-float, which is

basically a 16 bit floating point format (not all that accurate, or common,

but it is 1/2 the size of a float).

also possible and common in some things is merging duplicate vectors (then

again, I don't know if you have duplicates, or if the floats are part of

vectors).

more agressive teqniques (lossless):

it is also possible to develop a custom representation for said floats, for

example, via huffman coding the exponent, and maybe higher-order bits of the

mantissa (depending on the data, the lower order bits are likely to either

be some simple pattern, or something near-random). another huffman table for

all the remaining mantissa bits could help, but I am unsure how much.

eg:

sign, encoded as a bit;

exponent, coded using an exponent table;

high 8, coded with first mantissa table;

med 8, low 7 (shl 1), coded with second mantissa table.

additionally, one could get by only using a single huffman table, eg, by

maintaining 3 or 4 MTF contexts (will save managing multiple tables, but

will make it slower).

a slight possible hack could be remembering whatever byte followed each byte

"last time", and then pull that byte to the front in the next context, which

in some cases could help with the ratios (depending on how strongly values

are associated), but this may not be worth the extra overhead.

additionally possible would be to modify a paq-style arithmetic coder for

use on floating point data, which may well give the best ratios, but is

likely to encode/decode more slowly than using huffman tables (but quite

possibly faster than using MTF), and (for any reasonable sized context)

require more memory (a 12 to 16 bit context would probably be sufficient, at

the cost of up to good 128 or 256kB of memory for a 16 bit context...).

lossy:

the previous ideas can be combined, eg, if one is willing to put up with

some loss, they can code the values in less space.

an 16-bit fixed-point number or hfloat is likely to compress quite

effectively with arithmetic coding and a 16bit context (given the whole

number fits in context, vs just part of it).

likewise, using 2 huffman tables is also likely to turn out fairly well

(albeit, probably not as good as if some context were retained from the high

byte when coding the low one, but oh well...).

also possible is that if one has fixed point numbers with a much larger

number of high-order bits than low-order ones (or the values are highly

clustered near 0), then the "prefix" and "extra" bits scheme may be fairly

effective.

or something...

>

> Thank you very much!

>

> Best Regards, Frank

- Re: Efficient Floating Point Compression?! posted by Frashman on Wed, 03 May 2006

Efficient Floating Point Compression?! posted by Frashman on Tue, 02 May 2006