Giry monad

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra.^[1]^[2]^[3]^[4]^[5] It is one of the main examples of a probability monad.

It is implicitly used in probability theory whenever one considers probability measures which depend measurably on a parameter (giving rise to Markov kernels), or when one has probability measures over probability measures (such as in de Finetti's theorem).

Like many iterable constructions, it has the category-theoretic structure of a monad, on the category of measurable spaces.

Construction

The Giry monad, like every monad, consists of three structures:^[6]^[7]^[8]

A functorial assignment, which in this case assigns to a measurable space $X$ a space of probability measures $PX$ over it;
A natural map $\delta :X\to PX$ called the unit, which in this case assigns to each element of a space the Dirac measure over it;
A natural map ${\mathcal {E}}:PPX\to PX$ called the multiplication, which in this case assigns to each probability measure over probability measures its expected value.

The space of probability measures

Let $(X,{\mathcal {F}})$ be a measurable space. Denote by $PX$ the set of probability measures over $(X,{\mathcal {F}})$ . We equip the set $PX$ with a sigma-algebra as follows. First of all, for every measurable set $A\in {\mathcal {F}}$ , define the map $\varepsilon _{A}:PX\to \mathbb {R}$ by $p\longmapsto p(A)$ . We then define the sigma algebra ${\mathcal {PF}}$ on $PX$ to be the smallest sigma-algebra which makes the maps $\varepsilon _{A}$ measurable, for all $A\in {\mathcal {F}}$ (where $\mathbb {R}$ is assumed equipped with the Borel sigma-algebra). ^[6]

Equivalently, ${\mathcal {PF}}$ can be defined as the smallest sigma-algebra on $PX$ which makes the maps

p\longmapsto \int _{X}f\,dp

measurable for all bounded measurable $f:X\to \mathbb {R}$ .^[9]

The assignment $(X,{\mathcal {F}})\mapsto (PX,{\mathcal {PF}})$ is part of an endofunctor on the category of measurable spaces, usually denoted again by $P$ . Its action on morphisms, i.e. on measurable maps, is via the pushforward of measures. Namely, given a measurable map $f:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ , one assigns to $f$ the map $f_{*}:(PX,{\mathcal {PF}})\to (PY,{\mathcal {PG}})$ defined by

f_{*}p\,(B)=p(f^{-1}(B))

for all $p\in PX$ and all measurable sets $B\in {\mathcal {G}}$ . ^[6]

The Dirac delta map

Given a measurable space $(X,{\mathcal {F}})$ , the map $\delta :(X,{\mathcal {F}})\to (PX,{\mathcal {PF}})$ maps an element $x\in X$ to the Dirac measure $\delta _{x}\in PX$ , defined on measurable subsets $A\in {\mathcal {F}}$ by^[6]

\delta _{x}(A)=1_{A}(x)={\begin{cases}1&{\text{if }}x\in A,\\0&{\text{if }}x\notin A.\end{cases}}

The expectation map

Let $\mu \in PPX$ , i.e. a probability measure over the probability measures over $(X,{\mathcal {F}})$ . We define the probability measure ${\mathcal {E}}\mu \in PX$ by

{\mathcal {E}}\mu (A)=\int _{PX}p(A)\,\mu (dp)

for all measurable $A\in {\mathcal {F}}$ . This gives a measurable, natural map ${\mathcal {E}}:(PPX,{\mathcal {PPF}})\to (PX,{\mathcal {PF}})$ .^[6]

Example: mixture distributions

A mixture distribution, or more generally a compound distribution, can be seen as an application of the map ${\mathcal {E}}$ . Let's see this for the case of a finite mixture. Let $p_{1},\dots ,p_{n}$ be probability measures on $(X,{\mathcal {F}})$ , and consider the probability measure $q$ given by the mixture

q(A)=\sum _{i=1}^{n}w_{i}\,p_{i}(A)

for all measurable $A\in {\mathcal {F}}$ , for some weights $w_{i}\geq 0$ satisfying $w_{1}+\dots +w_{n}=1$ . We can view the mixture $q$ as the average $q={\mathcal {E}}\mu$ , where the measure on measures $\mu \in PPX$ , which in this case is discrete, is given by

\mu =\sum _{i=1}^{n}w_{i}\,\delta _{p_{i}}.

More generally, the map ${\mathcal {E}}:PPX\to PX$ can be seen as the most general, non-parametric way to form arbitrary mixture or compound distributions.

The triple $(P,\delta ,{\mathcal {E}})$ is called the Giry monad.^[1]^[2]^[3]^[4]^[5]

Relationship with Markov kernels

One of the properties of the sigma-algebra ${\mathcal {PF}}$ is that given measurable spaces $(X,{\mathcal {F}})$ and $(Y,{\mathcal {G}})$ , we have a bijective correspondence between measurable functions $(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ and Markov kernels $(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ . This allows to view a Markov kernel, equivalently, as a measurably parametrized probability measure.^[10]

In more detail, given a measurable function $f:(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ , one can obtain the Markov kernel $f^{\flat }:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ as follows,

f^{\flat }(B|x)=f(x)(B)

for every $x\in X$ and every measurable $B\in {\mathcal {G}}$ (note that $f(x)\in PY$ is a probability measure). Conversely, given a Markov kernel $k:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ , one can form the measurable function $k^{\sharp }:(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ mapping $x\in X$ to the probability measure $k^{\sharp }(x)\in PY$ defined by

k^{\sharp }(x)(B)=k(B|x)

for every measurable $B\in {\mathcal {G}}$ . The two assignments are mutually inverse.

From the point of view of category theory, we can interpret this correspondence as an adjunction

\mathrm {Hom} _{\mathrm {Meas} }(X,PY)\cong \mathrm {Hom} _{\mathrm {Stoch} }(X,Y)

between the category of measurable spaces and the category of Markov kernels. In particular, the category of Markov kernels can be seen as the Kleisli category of the Giry monad.^[3]^[4]^[5]

Product distributions

Given measurable spaces $(X,{\mathcal {F}})$ and $(Y,{\mathcal {G}})$ , one can form the measurable space $(PX,{\mathcal {PX}})\times (PY,{\mathcal {PY}})=(X\times Y,{\mathcal {F}}\times {\mathcal {G}})$ with the product sigma-algebra, which is the product in the category of measurable spaces. Given probability measures $p\in PX$ and $q\in PY$ , one can form the product measure $p\otimes q$ on $(X\times Y,{\mathcal {F}}\times {\mathcal {G}})$ . This gives a natural, measurable map

(PX,{\mathcal {PF}})\times (PY,{\mathcal {PG}})\to {\big (}P(X\times Y),{\mathcal {P(F\times G)}}{\big )}

usually denoted by $\nabla$ or by $\otimes$ .^[4]

The map $\nabla :PX\times PY\to P(X\times Y)$ is in general not an isomorphism, since there are probability measures on $X\times Y$ which are not product distributions, for example in case of correlation. However, the maps $\nabla :PX\times PY\to P(X\times Y)$ and the isomorphism $1\cong P1$ make the Giry monad a monoidal monad, and so in particular a commutative strong monad.^[4]

Further properties

If a measurable space $(X,{\mathcal {F}})$ is standard Borel, so is $(PX,{\mathcal {PF}})$ . Therefore the Giry monad restricts to the full subcategory of standard Borel spaces.^[1]^[4]

The algebras for the Giry monad include compact convex subsets of Euclidean spaces, as well as the extended positive real line $[0,\infty ]$ , with the algebra structure map given by taking expected values.^[11] For example, for $[0,\infty ]$ , the structure map $e:P[0,\infty ]\to [0,\infty ]$ is given by

p\longmapsto \int _{[0,\infty )}x\,p(dx)

whenever

p

is supported on

[0,\infty )

and has finite expected value, and

e(p)=\infty

otherwise.

Citations

^ ^a ^b ^c Giry (1982)
^ ^a ^b Avery (2016), pp. 1231–1234
^ ^a ^b ^c Jacobs (2018), pp. 205–106
^ ^a ^b ^c ^d ^e ^f Fritz (2020), pp. 19–23
^ ^a ^b ^c Moss & Perrone (2022), pp. 3–4
^ ^a ^b ^c ^d ^e Giry (1982), p. 69
^ Riehl (2016)
^ Perrone (2024)
^ Perrone (2024), p. 238
^ Giry (1982), p. 71
^ Doberkat (2006), pp. 1772–1776

References

Giry, Michèle (1982). "A categorical approach to probability theory". Categorical Aspects of Topology and Analysis. Lecture Notes in Mathematics. Vol. 915. Springer. pp. 68–85. doi:10.1007/BFb0092872. ISBN 978-3-540-11211-2.

Doberkat, Ernst-Erich (2006). "Eilenberg-Moore algebras for stochastic relations". Information and Computation. 204 (12): 1756–1781. doi:10.1016/j.ic.2006.09.001.

Avery, Tom (2016). "Codensity and the Giry monad". Journal of Pure and Applied Algebra. 220 (3): 1229–1251. arXiv:1410.4432. doi:10.1016/j.jpaa.2015.08.017. S2CID 119695729.

Jacobs, Bart (2018). "From probability monads to commutative effectuses". Journal of Logical and Algebraic Methods in Programming. 94: 200–237. doi:10.1016/j.jlamp.2016.11.006. hdl:2066/182000.

Fritz, Tobias (2020). "A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics". Advances in Mathematics. 370. arXiv:1908.07021. doi:10.1016/j.aim.2020.107239. S2CID 201103837.

Moss, Sean; Perrone, Paolo (2022). "Probability monads with submonads of deterministic states". LICS '22: Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science. arXiv:2204.07003. doi:10.1145/3531130.3533355.

Riehl, Emily (2016). "Chapter 5. Monads and their Algebras". Category Theory in Context. Dover. ISBN 978-0486809038.

Perrone, Paolo (2024). "Chapter 5. Monads and Comonads". Starting Category Theory. World Scientific. doi:10.1142/9789811286018_0005. ISBN 978-981-12-8600-1.

External links

What is a probability monad?, video tutorial.

[giry-1] Giry (1982)

[avery-2] Avery (2016), pp. 1231–1234

[jacobs-3] Jacobs (2018), pp. 205–106

[fritz-4] ^ ^a ^b ^c ^d ^e ^f Fritz (2020), pp. 19–23

[moss-perrone-5] Moss & Perrone (2022), pp. 3–4

[giry-construction-6] Giry (1982), p. 69

[riehl-7] Riehl (2016)

[perrone-8] Perrone (2024)

[9] Perrone (2024), p. 238

[10] Giry (1982), p. 71

[11] Doberkat (2006), pp. 1772–1776

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]