- You are here
- Comp Newsgroups Archived.At
- comp.compression
- 2008 March
- Probability, compressible sequences: Backgrounds

Subject: | Probability, compressible sequences: Backgrounds |

Posted by: | Thomas Richter (th…@math.tu-berlin.de) |

Date: | Fri, 07 Mar 2008 |

Hi folks,

since there was enough noise in this group once again, I afraid much

noise is caused simply because people do not use adequate definitions

for the subjects to be discussed here, and it's time to clean up the

mess. Folks, before you discuss, please get some background, really,

even if it only helps to reduce the noise level here.

A "specific sequence" cannot be random. And as such, as long as you know

your sequence, you can compress it. Actually, to zero bit, as you know

what you're talking about, so there's no point in transmitting it in

first place. "Entropy" and "compressibility" are statements that are not

defined on *specific* sequences, or on "subsequences of specific

sequences", but only on *random processes*. A *process* is something

different than a *sequence*.

By that I mean that, in order to define the terms, I need to have a

device (probably a coin) that generates symbols (head or tail) by an

external trigger (me, dropping the coin). I can measure the relative

frequencies of the symbols, and for a well defined random process, I

must ensure that these frequencies converge to probabilities as the

number of experiments grows to infinity. *This is nothing you can

prove*. It is an assumption you have to make about your device to have

the theory make some sense.

Second, compressibility does not speak about "compressing sequence A or

B". It speaks about "compressing sequences on average generated by a

random device as above". That means, operationally, you run a long

sequence of experiments, using the same device, understanding that it

doesn't change its statistics, and feed the output of each experiment

(each of which generating a lot of symobls) into your compression

algorithm. The *average* (expected) output length of your compression

algorithm, divided by the number of input symbols, is the

"compressibility" - *OF THE RANDOM PROCESS*. Note that I'm not talking

about a specific sequence, this doesn't make sense.

The reason why "compressors work", or rather, the philosophy behind

them, is to model (or rather "misunderstand") the input sequence "as if"

it had been generated by a random process, and build a compressor that

works well *on this* random process. Not on any *specific* sequence.

Second misunderstanding: "RAD" or "random process" does not imply that

the probabilities of all symbols of the random process are equal. That

is, in fact, the boring case as it is easy to show that you cannot

compress this (i.e. for any compression device, the *average* output

length is equal to the input size, taking the number of experiments and

the size of the sequence to infinity). Furthermore, in no way the

probability of the next symbol need to be independent of the symbols

before (i.e. there is no need for the process to be memory-less). This

is again the "boring case". Compressors work because they model the

input source by a random process that *does* have memory. And this is a

good model for text, language, programs, images... "Good" as in "it does

work". We "do not have" a random device that generates symbols *exactly*

as in text/speech/images... etc. But "it's close enough for the purpose

of compression to assume it" - that is, "a model".

Going back to the original statement: It would help a lot if, before you

would start discussing, you would start from a proper definition of

words. I'm not saying that the above one is the "only" or "correct" one,

but a lot of noise here could have been avoided if you would have stated

clearly what you're talking about.

Thanks for reading,

Thomas

- Re: Probability, compressible sequences: Backgrounds posted by Industrial One on Fri, 7 Mar 2008
- Re: Probability, compressible sequences: Backgrounds posted by a.deniā¦@yahoo.com on Sun, 9 Mar 2008
- Re: Probability, compressible sequences: Backgrounds posted by spamtrap@adsignum.com on Sun, 9 Mar 2008