In probability theory and statistics , the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution ) is a multivariate four-parameter family of continuous probability distributions . It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix ).[ 1]
normal-inverse-Wishart Notation
(
μ
,
Σ
)
∼
N
I
W
(
μ
0
,
λ
,
Ψ
,
ν
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}
Parameters
μ
0
∈
R
D
{\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,}
location (vector of real )
λ
>
0
{\displaystyle \lambda >0\,}
(real)
Ψ
∈
R
D
×
D
{\displaystyle {\boldsymbol {\Psi }}\in \mathbb {R} ^{D\times D}}
inverse scale matrix (pos. def. )
ν
>
D
−
1
{\displaystyle \nu >D-1\,}
(real) Support
μ
∈
R
D
;
Σ
∈
R
D
×
D
{\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Sigma }}\in \mathbb {R} ^{D\times D}}
covariance matrix (pos. def. ) PDF
f
(
μ
,
Σ
|
μ
0
,
λ
,
Ψ
,
ν
)
=
N
(
μ
|
μ
0
,
1
λ
Σ
)
W
−
1
(
Σ
|
Ψ
,
ν
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},{\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }})\ {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
Suppose
μ
|
μ
0
,
λ
,
Σ
∼
N
(
μ
|
μ
0
,
1
λ
Σ
)
{\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Sigma }}\sim {\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right)}
has a multivariate normal distribution with mean
μ
0
{\displaystyle {\boldsymbol {\mu }}_{0}}
and covariance matrix
1
λ
Σ
{\displaystyle {\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }}}
, where
Σ
|
Ψ
,
ν
∼
W
−
1
(
Σ
|
Ψ
,
ν
)
{\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu \sim {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
has an inverse Wishart distribution . Then
(
μ
,
Σ
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
has a normal-inverse-Wishart distribution, denoted as
(
μ
,
Σ
)
∼
N
I
W
(
μ
0
,
λ
,
Ψ
,
ν
)
.
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}
Probability density function
edit
f
(
μ
,
Σ
|
μ
0
,
λ
,
Ψ
,
ν
)
=
N
(
μ
|
μ
0
,
1
λ
Σ
)
W
−
1
(
Σ
|
Ψ
,
ν
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right){\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
The full version of the PDF is as follows:[ 2]
f
(
μ
,
Σ
|
μ
0
,
λ
,
Ψ
,
ν
)
=
λ
D
/
2
|
Ψ
|
ν
/
2
|
Σ
|
−
ν
+
D
+
2
2
(
2
π
)
D
/
2
2
ν
D
2
Γ
D
(
ν
2
)
exp
{
−
1
2
T
r
(
Ψ
Σ
−
1
)
−
λ
2
(
μ
−
μ
0
)
T
Σ
−
1
(
μ
−
μ
0
)
}
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\frac {\lambda ^{D/2}|{\boldsymbol {\Psi }}|^{\nu /2}|{\boldsymbol {\Sigma }}|^{-{\frac {\nu +D+2}{2}}}}{(2\pi )^{D/2}2^{\frac {\nu D}{2}}\Gamma _{D}({\frac {\nu }{2}})}}{\text{exp}}\left\{-{\frac {1}{2}}Tr({\boldsymbol {\Psi \Sigma }}^{-1})-{\frac {\lambda }{2}}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})\right\}}
Here
Γ
D
[
⋅
]
{\displaystyle \Gamma _{D}[\cdot ]}
is the multivariate gamma function and
T
r
(
Ψ
)
{\displaystyle Tr({\boldsymbol {\Psi }})}
is the Trace of the given matrix.
Posterior distribution of the parameters
edit
Suppose the sampling density is a multivariate normal distribution
y
i
|
μ
,
Σ
∼
N
p
(
μ
,
Σ
)
{\displaystyle {\boldsymbol {y_{i}}}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
where
y
{\displaystyle {\boldsymbol {y}}}
is an
n
×
p
{\displaystyle n\times p}
matrix and
y
i
{\displaystyle {\boldsymbol {y_{i}}}}
(of length
p
{\displaystyle p}
) is row
i
{\displaystyle i}
of the matrix .
With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly
(
μ
,
Σ
)
∼
N
I
W
(
μ
0
,
λ
,
Ψ
,
ν
)
.
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}
The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart
(
μ
,
Σ
|
y
)
∼
N
I
W
(
μ
n
,
λ
n
,
Ψ
n
,
ν
n
)
,
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|y)\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{n},\lambda _{n},{\boldsymbol {\Psi }}_{n},\nu _{n}),}
where
μ
n
=
λ
μ
0
+
n
y
¯
λ
+
n
{\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\bar {\boldsymbol {y}}}}{\lambda +n}}}
λ
n
=
λ
+
n
{\displaystyle \lambda _{n}=\lambda +n}
ν
n
=
ν
+
n
{\displaystyle \nu _{n}=\nu +n}
Ψ
n
=
Ψ
+
S
+
λ
n
λ
+
n
(
y
¯
−
μ
0
)
(
y
¯
−
μ
0
)
T
w
i
t
h
S
=
∑
i
=
1
n
(
y
i
−
y
¯
)
(
y
i
−
y
¯
)
T
{\displaystyle {\boldsymbol {\Psi }}_{n}={\boldsymbol {\Psi +S}}+{\frac {\lambda n}{\lambda +n}}({\boldsymbol {{\bar {y}}-\mu _{0}}})({\boldsymbol {{\bar {y}}-\mu _{0}}})^{T}~~~\mathrm {with} ~~{\boldsymbol {S}}=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y}}}})({\boldsymbol {y_{i}-{\bar {y}}}})^{T}}
.
To sample from the joint posterior of
(
μ
,
Σ
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
, one simply draws samples from
Σ
|
y
∼
W
−
1
(
Ψ
n
,
ν
n
)
{\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {y}}\sim {\mathcal {W}}^{-1}({\boldsymbol {\Psi }}_{n},\nu _{n})}
, then draw
μ
|
Σ
,
y
∼
N
p
(
μ
n
,
Σ
/
λ
n
)
{\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }}_{n},{\boldsymbol {\Sigma }}/\lambda _{n})}
. To draw from the posterior predictive of a new observation, draw
y
~
|
μ
,
Σ
,
y
∼
N
p
(
μ
,
Σ
)
{\displaystyle {\boldsymbol {\tilde {y}}}|{\boldsymbol {\mu ,\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
, given the already drawn values of
μ
{\displaystyle {\boldsymbol {\mu }}}
and
Σ
{\displaystyle {\boldsymbol {\Sigma }}}
.[ 3]
Generating normal-inverse-Wishart random variates
edit
The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If
(
μ
,
Σ
)
∼
N
I
W
(
μ
0
,
λ
,
Ψ
,
ν
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}
then
(
μ
,
Σ
−
1
)
∼
N
W
(
μ
0
,
λ
,
Ψ
−
1
,
ν
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }}^{-1},\nu )}
.
The normal-inverse-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.
^ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
^ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference . Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
^ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.
Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]