# Alea

Define and compose random variables.

## Random numbers

First, we need a way to generate random numbers. Here, a random number generator is defined dynamically as an object having a method that returns a uniform number in [0,1]:

```
type Random = object
random: proc(): float
```

One can obtain instances of `Random`

by wrapping the RNG defined in
nim-random, such as in

```
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
```

The reason why we need to wrap them is that random number generators in nim-random are defined as a concept, while it will be simpler in the sequel to represent them as a single type.

## Random variables

A random variable of type `A`

is just something that can take a random number
generator and provide an instance of type `A`

:

```
type RandomVar[A] = concept x
var rng: Random
rng.sample(x) is A
```

In other word, the only operation that we need to define on a type `T`

to
make it an instance of `RandomVar[A]`

is

`proc sample(rng: var Random, t: T): A = ...`

Here we require the first parameter to be of type `var Random`

because drawing
a random number mutates the internal state of the RNG. It may be more clear to
return a new state together with the value of type `A`

, much like in the
state monad
but we avoid doing so for performance reason.

If we think of the internal state space of the random number generator as the
probability space `Ω`

, the similarity between our definition and the mathematical
definition of random variable is apparent.

A few core random variables are defined:

`ConstantVar[A]`

is a just a trivial random variable that always samples the same value`Uniform`

is a uniform variable over a real interval`Choice[A]`

is a discrete random variable that can take a finite number of values with equal probability`ClosureVar[A]`

is a wrapper over a`proc(rng: var Random): A`

Most random variables that arise by manipulating other variables are of
`ClosureVar`

type.

Here is an example showing how to costruct instances of these variables. Types are inferred, and are there just for explanatory purposes:

```
import alea
proc f(rng: var Random): float = 2 * rng.random()
let
c: ConstantVar[string] = constant("hello")
u: Uniform = uniform(2, 14)
d: Choice[int] = choice(@[1, 2, 3, 4, 5])
x: ClosureVar[float] = closure(f)
```

## Operations on random variables

A few common operations on random variables are supported - in particular mapping and filtering:

```
import alea, future
let
a = uniform(3, 12)
b = a.map((x: float) => 3 - x)
c = a.filter((x: float) => x > 5)
```

Mapping, in particular, is a common operation, and there is a macro `lift`

that takes a function `A => B`

and declares a function of the same name of
type `RandomVar[A] => ClosureVar[B]`

that is obtained by mapping. It can be
used with a type hint in case the function is overloaded, as in

```
import math
proc sq(x: float): float = x * x
lift(abs, float)
lift(sq) # No ambiguity here
let
a = uniform(3, 12)
b = sq(abs(a))
```

There is also a version of two arguments `map2`

, that takes two random variables
and a binary function. Generalization for more than two arguments can be done
using the fact that random variables form
a monad
but the relevant functions are still to be implemented.

Many mathematical functions, as well as the arithmetic operations, are already lifted, so the following is valid:

```
let
a = uniform(3, 12)
b = choice(@[1.0, 2.5, 3.7])
c = abs(a - b) * sqrt(a)
```

## Conditioning random variables

Random variables can also be conditioned with respect to each other. For
instance, if `a`

and `b`

are real random variables and we want to condition
`a`

to the occurrence that `b`

is positive, we can do:

`let c = a.where(b, (x: float) => x > 0)`

where the last parameter to `where`

is a predicate that should be satisfied
by samples from `b`

.

How to make this work? Drawing from `b`

will change the status of the random
number generator, which in theory prevents us from sampling `a`

at the same
point.

To avoid this issue, we use a fake random number generator that wraps another
instance of `Random`

, but repeats its result twice. That is, internally we
use an auxiliary (fake) RNG defined like

`var repeated = rng.repeat(2)`

You can use the same trick whenever there is the need to draw more than a single sample from the same point of the probability space.

## Statistics on random variables

A few common statistics are implemented on `RandomVar[float]`

, such as the mean,
variance and so on. There is a generic implementation that will work for any
random variable, but particular types of random variables can use more specialized
methods.

An example of their usage is:

```
let x = uniform(2, 5) + choice(@[1.2, 3.3, 4.5])
var rng = ...
echo rng.sample(x)
echo rng.mean(x)
echo rng.variance(x)
echo rng.stddev(x)
```

The covariance is also implemented, again by using the trick of a repeating random number generator to draw from the two distributions at the same time.

All there operations admit an optional parameter which is the number of samples to compute the statistics with more or less accuracy:

`echo rng.mean(x, samples = 1000000)`

Finally, complex random variables, that are represented by chains of closures,
can be approximated by sampling enough times. There is a function `discretize`

that will take any `RandomVar[A]`

and produce an instance of `Choice[A]`

that
will wrap a certain number of samples:

```
let f = ... # Some complex random variable
let d = f.discretize(samples = 20000)
```

## More distributions

A few common real random variables are implemented:

```
let
g = gaussian(mu = 0, sigma = 1)
p = poisson(3.5)
b = bernoulli(0.7)
```

Usually, the statistics for these notable random variables are known, so we have overloads, in such a way that, for instance, the mean of a Gaussian variable will always be exact.

## Defining custom distributions

To define your own random variables, you can take inspiration, say, from
`bernoulli.nim`

. The only mandatory operation for a type `T`

to be an instance
of `RandomVar[A]`

is

`proc sample(rng: var Random, t: T): A`

If other statistics are known (such as mean, variance and so on), one can also define overloads such as

`proc mean(rng: var Random, t: T, samples = 100000)`

## A complete example

Here is a small example that combines all of the above:

```
import future
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
let
a = uniform(0, 9)
b = choice([1, 2, 3, 4, 5]).map((x: int) => x.float)
c = poisson(13)
d = gaussian(mu = 3, sigma = 5).filter((x: float) => x > 3)
s = ln(abs((sqrt(a) * b) - (a.floor / log10(c)))) + d
t = c.where(s, (x: float) => x > 5)
u = rng.discretize(t)
echo rng.mean(s)
echo rng.stddev(u)
```

## TODO

- improve the DSL for conditioning
- higher moments and other statistics
- monad composition
- histograms
- add more standard distributions (beta, gamma, geometric...)
- entropy etc.