Source-Property Corpora?

Giganews Newsgroups
Subject: Source-Property Corpora?
Posted by:  Metatron (
Date: Fri, 27 Feb 2009


Before I start generating, I'd like to ask if somebody is aware of
available streams with specific distribution properties, like gamma,
laplacian and gaussian sources, with various parameter-variations and
sizes if possible. I remember there was one corpus of i.i.d. sources
in different sizes (b7/b9).

The purpose is that if you want to track down inefficiency and/or you
want to calibrate your entropy-backend for these kind of sources, you
actually don't want to drive your entire front-end (fe. an image
compressor). You'd be able to massively make more test-runs with just
a "simple" entropy-coder.

If there is no corpus, maybe there is a generic number-generator (a
DRNG? distributionshape-random-number-generator) creating that kind of

What I'm actually concerned with is a generic entropy-coder for very
large symbol alphabets (possibly with as much distinct symbols as
available sysbols) over laplacian/gaussian mixture distributions, in
that context the density of symbols over an interval in the source-
symbol alphabet is likewise laplacian/gaussian distributed.

Well, if nothing of that is around, I hope it is of help if I create
the code and publish it together with some example-streams?