Random variable monads
haskell monad probability
In this note we give a mathematical translation of the monadic treatment to random variables in this amazing stackexchange answer, which is my (figuratively) sixth attempt to understanding monads.
We show the meanings of return, bind / (>>=), join, fmap / (<$>) and ap / (<*>) and the monad laws in this concrete example.
See Hackage for a clear and concise definition of monads.
For notational consistency, we use capital letters for random variables in Haskell code, as is the convention in probability theory.
We also use == to stand for derivation in Haskell code.
For any state space \(S\), let \(R(S)\) be the space of random variables on \(S\). In the following \(S, T, P\) are notations of state spaces, and \(X, Y, Z\) random variables.
In the stackexchange answer (we ignore the psuedorandomness and pretend everything is truly random),
- 
return xis defined as the atomic random variable \(\delta_x\) - 
For \(f: S \to R(T)\), 
X >>= fis the random variable \(\expe_X f(X)\). It can be thought of in terms of conditional random variables. Say \(X \in R(S)\) and \(Y \in R(T)\) are two random variables, and that \(Y\) given \(X = x\) is \(f(x)\). ThenX >>= fis the (unconditional) random variable \(Y\). - 
joincan be derived from(>>=): Let \(X \in R(R(S))\) be a random random variable, thenjoin X = X >>= id\(= X >>= id_{R(S)} = \expe X\) is a random variable on \(S\), obtained by averaging over the random variables \(X\) can take values in. - 
X >> Yis simply \(Y\). 
Deriving fmap / (<$>):
- 
And for \(g: S \to T\), let \(\mu_X\) be the distribution of \(X\), then 
fmap g X = X >>= (return . g)\(= \expe_X (\delta_{g(\cdot)})(X) = \int \delta_{g(x)} \mu_X(d x) = g(X)\) which agrees with the definition of therandomMapfunction in that answer. 
As for ap / (<*>):
- 
Let \(F \in R(T^S)\) and \(X \in R(S)\), then 
F <*> Xis defined asF <*> X = do {f <- F; x <- X; return (f x)} == do{f <- F; fmap f X} == F >>= (\f -> fmap f X)which is equal to \(\expe_F F(X)\), a random variable on \(T\) by averaging over \(F\). 
Now we translate the monad laws:
- 
For \(x \in S\) and \(f: S \to R(T)\), 
return x >>= f == f x: \(LHS = \delta_x >>= f = \expe_{X \sim \delta_x} f(X) = f(x) = RHS\). - 
For \(X \in R(S)\), 
X >>= return == X: \(LHS = \expe_X \delta_X = X = RHS\). - 
For \(X \in R(S), f: S \to R(T), g: T \to R(P)\), 
X >>= (\x -> f x >>= g) == (X >>= f) >>= g: Suppose \(Y \in R(T)\) and \(Z \in R(P)\) such that \((Y|X = x) = f(x)\) and \((Z|Y = y) = g(y)\), then \begin{align} LHS &= \expe_X (\expe_{f(\cdot)} g \circ f (\cdot)) (X) = \expe_X \expe_{Y|X = \cdot} g (Y|X = \cdot) = \expe_X (Z|X = \cdot) (X) = Z\\ RHS &= \expe_{\expe_X f(X)} g(\expe_X f(X)) = \expe_Y g(Y) = Z. \end{align} In the LHS we average first over \(Y\) then over \(X\) whereas in the RHS we average over \(X\) then \(Y\).