# Probability, compressible sequences: Backgrounds

 Subject: Probability, compressible sequences: Backgrounds Posted by: Thomas Richter (th…@math.tu-berlin.de) Date: Fri, 07 Mar 2008

Hi folks,

since there was enough noise in this group once again, I afraid much
noise is caused simply because people do not use adequate definitions
for the subjects to be discussed here, and it's time to clean up the
mess. Folks, before you discuss, please get some background, really,
even if it only helps to reduce the noise level here.

A "specific sequence" cannot be random. And as such, as long as you know
your sequence, you can compress it. Actually, to zero bit, as you know
what you're talking about, so there's no point in transmitting it in
first place. "Entropy" and "compressibility" are statements that are not
defined on *specific* sequences, or on "subsequences of specific
sequences", but only on *random processes*. A *process* is something
different than a *sequence*.

By that I mean that, in order to define the terms, I need to have a
device (probably a coin) that generates symbols (head or tail) by an
external trigger (me, dropping the coin). I can measure the relative
frequencies of the symbols, and for a well defined random process, I
must ensure that these frequencies converge to probabilities as the
number of experiments grows to infinity. *This is nothing you can
prove*. It is an assumption you have to make about your device to have
the theory make some sense.

Second, compressibility does not speak about "compressing sequence A or
B". It speaks about "compressing sequences on average generated by a
random device as above". That means, operationally, you run a long
sequence of experiments, using the same device, understanding that it
doesn't change its statistics, and feed the output of each experiment
(each of which generating a lot of symobls) into your compression
algorithm. The *average* (expected) output length of your compression
algorithm, divided by the number of input symbols, is the
"compressibility" - *OF THE RANDOM PROCESS*. Note that I'm not talking
about a specific sequence, this doesn't make sense.

The reason why "compressors work", or rather, the philosophy behind
them, is to model (or rather "misunderstand") the input sequence "as if"
it had been generated by a random process, and build a compressor that
works well *on this* random process. Not on any *specific* sequence.

Second misunderstanding: "RAD" or "random process" does not imply that
the probabilities of all symbols of the random process are equal. That
is, in fact, the boring case as it is easy to show that you cannot
compress this (i.e. for any compression device, the *average* output
length is equal to the input size, taking the number of experiments and
the size of the sequence to infinity). Furthermore, in no way the
probability of the next symbol need to be independent of the symbols
before (i.e. there is no need for the process to be memory-less). This
is again the "boring case". Compressors work because they model the
input source by a random process that *does* have memory. And this is a
good model for text, language, programs, images... "Good" as in "it does
work". We "do not have" a random device that generates symbols *exactly*
as in text/speech/images... etc. But "it's close enough for the purpose
of compression to assume it" - that is, "a model".

Going back to the original statement: It would help a lot if, before you
would start discussing, you would start from a proper definition of
words. I'm not saying that the above one is the "only" or "correct" one,
but a lot of noise here could have been avoided if you would have stated