Let
Y
{\displaystyle Y}
be any random variable with
E
[
|
Y
|
]
<
∞
{\displaystyle \mathbb {E} [|Y|]<\infty }
. Suppose
{
F
0
,
F
1
,
…
}
{\displaystyle \left\{{\mathcal {F}}_{0},{\mathcal {F}}_{1},\dots \right\}}
is a filtration , i.e.
F
s
⊂
F
t
{\displaystyle {\mathcal {F}}_{s}\subset {\mathcal {F}}_{t}}
when
s
<
t
{\displaystyle s<t}
. Define
Z
t
=
E
[
Y
∣
F
t
]
,
{\displaystyle Z_{t}=\mathbb {E} [Y\mid {\mathcal {F}}_{t}],}
then
{
Z
0
,
Z
1
,
…
}
{\displaystyle \left\{Z_{0},Z_{1},\dots \right\}}
is a martingale ,[ 2] namely Doob martingale , with respect to filtration
{
F
0
,
F
1
,
…
}
{\displaystyle \left\{{\mathcal {F}}_{0},{\mathcal {F}}_{1},\dots \right\}}
.
To see this, note that
E
[
|
Z
t
|
]
=
E
[
|
E
[
Y
∣
F
t
]
|
]
≤
E
[
E
[
|
Y
|
∣
F
t
]
]
=
E
[
|
Y
|
]
<
∞
{\displaystyle \mathbb {E} [|Z_{t}|]=\mathbb {E} [|\mathbb {E} [Y\mid {\mathcal {F}}_{t}]|]\leq \mathbb {E} [\mathbb {E} [|Y|\mid {\mathcal {F}}_{t}]]=\mathbb {E} [|Y|]<\infty }
;
E
[
Z
t
∣
F
t
−
1
]
=
E
[
E
[
Y
∣
F
t
]
∣
F
t
−
1
]
=
E
[
Y
∣
F
t
−
1
]
=
Z
t
−
1
{\displaystyle \mathbb {E} [Z_{t}\mid {\mathcal {F}}_{t-1}]=\mathbb {E} [\mathbb {E} [Y\mid {\mathcal {F}}_{t}]\mid {\mathcal {F}}_{t-1}]=\mathbb {E} [Y\mid {\mathcal {F}}_{t-1}]=Z_{t-1}}
as
F
t
−
1
⊂
F
t
{\displaystyle {\mathcal {F}}_{t-1}\subset {\mathcal {F}}_{t}}
.
In particular, for any sequence of random variables
{
X
1
,
X
2
,
…
,
X
n
}
{\displaystyle \left\{X_{1},X_{2},\dots ,X_{n}\right\}}
on probability space
(
Ω
,
F
,
P
)
{\displaystyle (\Omega ,{\mathcal {F}},{\text{P}})}
and function
f
{\displaystyle f}
such that
E
[
|
f
(
X
1
,
X
2
,
…
,
X
n
)
|
]
<
∞
{\displaystyle \mathbb {E} [|f(X_{1},X_{2},\dots ,X_{n})|]<\infty }
, one could choose
Y
:=
f
(
X
1
,
X
2
,
…
,
X
n
)
{\displaystyle Y:=f(X_{1},X_{2},\dots ,X_{n})}
and filtration
{
F
0
,
F
1
,
…
}
{\displaystyle \left\{{\mathcal {F}}_{0},{\mathcal {F}}_{1},\dots \right\}}
such that
F
0
:=
{
ϕ
,
Ω
}
,
F
t
:=
σ
(
X
1
,
X
2
,
…
,
X
t
)
,
∀
t
≥
1
,
{\displaystyle {\begin{aligned}{\mathcal {F}}_{0}&:=\left\{\phi ,\Omega \right\},\\{\mathcal {F}}_{t}&:=\sigma (X_{1},X_{2},\dots ,X_{t}),\forall t\geq 1,\end{aligned}}}
i.e.
σ
{\displaystyle \sigma }
-algebra generated by
X
1
,
X
2
,
…
,
X
t
{\displaystyle X_{1},X_{2},\dots ,X_{t}}
. Then, by definition of Doob martingale, process
{
Z
0
,
Z
1
,
…
}
{\displaystyle \left\{Z_{0},Z_{1},\dots \right\}}
where
Z
0
:=
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
∣
F
0
]
=
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
,
Z
t
:=
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
∣
F
t
]
=
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
∣
X
1
,
X
2
,
…
,
X
t
]
,
∀
t
≥
1
{\displaystyle {\begin{aligned}Z_{0}&:=\mathbb {E} [f(X_{1},X_{2},\dots ,X_{n})\mid {\mathcal {F}}_{0}]=\mathbb {E} [f(X_{1},X_{2},\dots ,X_{n})],\\Z_{t}&:=\mathbb {E} [f(X_{1},X_{2},\dots ,X_{n})\mid {\mathcal {F}}_{t}]=\mathbb {E} [f(X_{1},X_{2},\dots ,X_{n})\mid X_{1},X_{2},\dots ,X_{t}],\forall t\geq 1\end{aligned}}}
forms a Doob martingale. Note that
Z
n
=
f
(
X
1
,
X
2
,
…
,
X
n
)
{\displaystyle Z_{n}=f(X_{1},X_{2},\dots ,X_{n})}
. This martingale can be used to prove McDiarmid's inequality .
McDiarmid's inequality
edit
The Doob martingale was introduced by Joseph L. Doob in 1940 to establish concentration inequalities such as McDiarmid's inequality, which applies to functions that satisfy a bounded differences property (defined below) when they are evaluated on random independent function arguments.
A function
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
satisfies the bounded differences property if substituting the value of the
i
{\displaystyle i}
th coordinate
x
i
{\displaystyle x_{i}}
changes the value of
f
{\displaystyle f}
by at most
c
i
{\displaystyle c_{i}}
. More formally, if there are constants
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
such that for all
i
∈
[
n
]
{\displaystyle i\in [n]}
, and all
x
1
∈
X
1
,
x
2
∈
X
2
,
…
,
x
n
∈
X
n
{\displaystyle x_{1}\in {\mathcal {X}}_{1},\,x_{2}\in {\mathcal {X}}_{2},\,\ldots ,\,x_{n}\in {\mathcal {X}}_{n}}
,
sup
x
i
′
∈
X
i
|
f
(
x
1
,
…
,
x
i
−
1
,
x
i
,
x
i
+
1
,
…
,
x
n
)
−
f
(
x
1
,
…
,
x
i
−
1
,
x
i
′
,
x
i
+
1
,
…
,
x
n
)
|
≤
c
i
.
{\displaystyle \sup _{x_{i}'\in {\mathcal {X}}_{i}}\left|f(x_{1},\dots ,x_{i-1},x_{i},x_{i+1},\ldots ,x_{n})-f(x_{1},\dots ,x_{i-1},x_{i}',x_{i+1},\ldots ,x_{n})\right|\leq c_{i}.}
McDiarmid's Inequality[ 1] — Let
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
satisfy the bounded differences property with bounds
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
.
Consider independent random variables
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\dots ,X_{n}}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
≥
ε
)
≤
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
,
{\displaystyle {\text{P}}\left(f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]\geq \varepsilon \right)\leq \exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right),}
P
(
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
≤
−
ε
)
≤
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
,
{\displaystyle {\text{P}}(f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]\leq -\varepsilon )\leq \exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right),}
and as an immediate consequence,
P
(
|
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
|
≥
ε
)
≤
2
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
.
{\displaystyle {\text{P}}(|f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]|\geq \varepsilon )\leq 2\exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right).}