Random variable monads

haskell monad probability

In this note we give a mathematical translation of the monadic treatment to random variables in this amazing stackexchange answer, which is my (figuratively) sixth attempt to understanding monads. We show the meanings of return, bind / (>>=), join, fmap / (<$>) and ap / (<*>) and the monad laws in this concrete example. See Hackage for a clear and concise definition of monads.

For notational consistency, we use capital letters for random variables in Haskell code, as is the convention in probability theory. We also use == to stand for derivation in Haskell code.

For any state space $S$, let $R(S)$ be the space of random variables on $S$. In the following $S, T, P$ are notations of state spaces, and $X, Y, Z$ random variables.

In the stackexchange answer (we ignore the psuedorandomness and pretend everything is truly random),

return x is defined as the atomic random variable $\delta_x$
For $f: S \to R(T)$, X >>= f is the random variable $\expe_X f(X)$. It can be thought of in terms of conditional random variables. Say $X \in R(S)$ and $Y \in R(T)$ are two random variables, and that $Y$ given $X = x$ is $f(x)$. Then X >>= f is the (unconditional) random variable $Y$.
join can be derived from (>>=): Let $X \in R(R(S))$ be a random random variable, then join X = X >>= id $= X >>= id_{R(S)} = \expe X$ is a random variable on $S$, obtained by averaging over the random variables $X$ can take values in.
X >> Y is simply $Y$.

Deriving fmap / (<$>):

And for $g: S \to T$, let $\mu_X$ be the distribution of $X$, then fmap g X = X >>= (return . g) $= \expe_X (\delta_{g(\cdot)})(X) = \int \delta_{g(x)} \mu_X(d x) = g(X)$ which agrees with the definition of the randomMap function in that answer.

As for ap / (<*>):

Let $F \in R(T^S)$ and $X \in R(S)$, then F <*> X is defined as F <*> X = do {f <- F; x <- X; return (f x)} == do{f <- F; fmap f X} == F >>= (\f -> fmap f X) which is equal to $\expe_F F(X)$, a random variable on $T$ by averaging over $F$.

Now we translate the monad laws:

For $x \in S$ and $f: S \to R(T)$, return x >>= f == f x: $LHS = \delta_x >>= f = \expe_{X \sim \delta_x} f(X) = f(x) = RHS$.
For $X \in R(S)$, X >>= return == X: $LHS = \expe_X \delta_X = X = RHS$.
For $X \in R(S), f: S \to R(T), g: T \to R(P)$, X >>= (\x -> f x >>= g) == (X >>= f) >>= g: Suppose $Y \in R(T)$ and $Z \in R(P)$ such that $(Y|X = x) = f(x)$ and $(Z|Y = y) = g(y)$, then \begin{align} LHS &= \expe_X (\expe_{f(\cdot)} g \circ f (\cdot)) (X) = \expe_X \expe_{Y|X = \cdot} g (Y|X = \cdot) = \expe_X (Z|X = \cdot) (X) = Z\\ RHS &= \expe_{\expe_X f(X)} g(\expe_X f(X)) = \expe_Y g(Y) = Z. \end{align} In the LHS we average first over $Y$ then over $X$ whereas in the RHS we average over $X$ then $Y$.