In probability theory and statistics , the Dirichlet negative multinomial distribution is a multivariate distribution on the non-negative integers. It is a multivariate extension of the beta negative binomial distribution . It is also a generalization of the negative multinomial distribution (NM(k , p )) allowing for heterogeneity or overdispersion to the probability vector. It is used in quantitative marketing research to flexibly model the number of household transactions across multiple brands.
Notation
DNM
(
x
0
,
α
0
,
α
)
{\displaystyle {\textrm {DNM}}(x_{0},\,\alpha _{0},\,{\boldsymbol {\alpha }})}
Parameters
x
0
∈
R
>
0
,
α
0
∈
R
>
0
,
α
∈
R
>
0
m
{\displaystyle x_{0}\in \mathbb {R} _{>0},\alpha _{0}\in \mathbb {R} _{>0},{\boldsymbol {\alpha }}\in \mathbb {R} _{>0}^{m}}
Support
x
i
∈
{
0
,
1
,
2
,
…
}
,
1
≤
i
≤
m
{\displaystyle x_{i}\in \{0,1,2,\ldots \},1\leq i\leq m}
PMF
B
(
x
∙
,
α
∙
)
B
(
x
0
,
α
0
)
∏
i
=
1
m
Γ
(
x
i
+
α
i
)
x
i
!
Γ
(
α
i
)
{\displaystyle {\frac {\mathrm {B} (x_{\bullet },\alpha _{\bullet })}{\mathrm {B} (x_{0},\alpha _{0})}}\prod _{i=1}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{x_{i}!\Gamma (\alpha _{i})}}}
where
x
∙
=
Σ
i
=
0
m
x
i
{\displaystyle x_{\bullet }=\Sigma _{i=0}^{m}x_{i}}
,
α
∙
=
Σ
i
=
0
m
α
i
{\displaystyle \alpha _{\bullet }=\Sigma _{i=0}^{m}\alpha _{i}}
and Γ(x ) is the Gamma function and B is the beta function . Mean
x
0
α
0
−
1
α
{\displaystyle {\tfrac {x_{0}}{\alpha _{0}-1}}{\boldsymbol {\alpha }}}
for
α
0
>
1
{\displaystyle \alpha _{0}>1}
Variance
x
0
(
x
0
+
α
0
−
1
)
(
α
0
−
1
)
2
(
α
0
−
2
)
[
α
α
T
+
(
α
0
−
1
)
diag
(
α
)
]
{\displaystyle \,{\frac {x_{0}(x_{0}+\alpha _{0}-1)}{(\alpha _{0}-1)^{2}(\alpha _{0}-2)}}\left[{\boldsymbol {\alpha }}{\boldsymbol {\alpha }}^{\operatorname {T} }+(\alpha _{0}-1)\operatorname {diag} ({\boldsymbol {\alpha }})\right]}
for
α
0
>
2
{\displaystyle \alpha _{0}>2}
MGF
does not exist CF
B
(
x
0
,
α
∙
)
B
(
x
0
,
α
0
)
F
D
(
m
)
(
x
0
,
α
;
x
0
+
α
∙
;
e
i
t
1
,
⋯
,
e
i
t
m
)
{\displaystyle {\frac {\mathrm {B} (x_{0},\alpha _{\bullet })}{\mathrm {B} (x_{0},\alpha _{0})}}F_{D}^{(m)}(x_{0},{\boldsymbol {\alpha }};x_{0}+\alpha _{\bullet };e^{it_{1}},\cdots ,e^{it_{m}})}
where
F
D
(
m
)
{\displaystyle F_{D}^{(m)}}
is the Lauricella function
If parameters of the Dirichlet distribution are
α
{\displaystyle {\boldsymbol {\alpha }}}
, and if
X
∣
p
∼
NM
(
x
0
,
p
)
,
{\displaystyle X\mid p\sim \operatorname {NM} (x_{0},\mathbf {p} ),}
where
p
∼
Dir
(
α
0
,
α
)
,
{\displaystyle \mathbf {p} \sim \operatorname {Dir} (\alpha _{0},{\boldsymbol {\alpha }}),}
then the marginal distribution of X is a Dirichlet negative multinomial distribution:
X
∼
DNM
(
x
0
,
α
0
,
α
)
.
{\displaystyle X\sim \operatorname {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha }}).}
In the above,
NM
(
x
0
,
p
)
{\displaystyle \operatorname {NM} (x_{0},\mathbf {p} )}
is the negative multinomial distribution and
Dir
(
α
0
,
α
)
{\displaystyle \operatorname {Dir} (\alpha _{0},{\boldsymbol {\alpha }})}
is the Dirichlet distribution .
Dirichlet negative multinomial as a compound distribution
edit
The Dirichlet distribution is a conjugate distribution to the negative multinomial distribution. This fact leads to an analytically tractable compound distribution .
For a random vector of category counts
x
=
(
x
1
,
…
,
x
m
)
{\displaystyle \mathbf {x} =(x_{1},\dots ,x_{m})}
, distributed according to a negative multinomial distribution , the compound distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution:
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
∫
p
N
e
g
M
u
l
t
(
x
∣
x
0
,
p
)
D
i
r
(
p
∣
α
0
,
α
)
d
p
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})=\int _{\mathbf {p} }\mathrm {NegMult} (\mathbf {x} \mid x_{0},\mathbf {p} )\mathrm {Dir} (\mathbf {p} \mid \alpha _{0},{\boldsymbol {\alpha }}){\textrm {d}}\mathbf {p} }
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
Γ
(
∑
i
=
0
m
x
i
)
Γ
(
x
0
)
∏
i
=
1
m
x
i
!
1
B
(
α
+
)
∫
p
∏
i
=
0
m
p
i
x
i
+
α
i
−
1
d
p
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma \left(\sum _{i=0}^{m}{x_{i}}\right)}{\Gamma (x_{0})\prod _{i=1}^{m}x_{i}!}}{\frac {1}{\mathrm {B} ({\boldsymbol {\alpha }}_{+})}}\int _{\mathbf {p} }\prod _{i=0}^{m}p_{i}^{x_{i}+\alpha _{i}-1}{\textrm {d}}\mathbf {p} }
which results in the following formula:
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
Γ
(
∑
i
=
0
m
x
i
)
Γ
(
x
0
)
∏
i
=
1
m
x
i
!
B
(
x
+
+
α
+
)
B
(
α
+
)
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma \left(\sum _{i=0}^{m}{x_{i}}\right)}{\Gamma (x_{0})\prod _{i=1}^{m}x_{i}!}}{\frac {{\mathrm {B} }(\mathbf {x_{+}} +{\boldsymbol {\alpha }}_{+})}{\mathrm {B} ({\boldsymbol {\alpha }}_{+})}}}
where
x
+
{\displaystyle \mathbf {x_{+}} }
and
α
+
{\displaystyle {\boldsymbol {\alpha }}_{+}}
are the
m
+
1
{\displaystyle m+1}
dimensional vectors created by appending the scalars
x
0
{\displaystyle x_{0}}
and
α
0
{\displaystyle \alpha _{0}}
to the
m
{\displaystyle m}
dimensional vectors
x
{\displaystyle \mathbf {x} }
and
α
{\displaystyle {\boldsymbol {\alpha }}}
respectively and
B
{\displaystyle \mathrm {B} }
is the multivariate version of the beta function . We can write this equation explicitly as
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
x
0
Γ
(
∑
i
=
0
m
x
i
)
Γ
(
∑
i
=
0
m
α
i
)
Γ
(
∑
i
=
0
m
(
x
i
+
α
i
)
)
∏
i
=
0
m
Γ
(
x
i
+
α
i
)
Γ
(
x
i
+
1
)
Γ
(
α
i
)
.
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})=x_{0}{\frac {\Gamma (\sum _{i=0}^{m}x_{i})\Gamma (\sum _{i=0}^{m}\alpha _{i})}{\Gamma (\sum _{i=0}^{m}(x_{i}+\alpha _{i}))}}\prod _{i=0}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{\Gamma (x_{i}+1)\Gamma (\alpha _{i})}}.}
Alternative formulations exist. One convenient representation[ 1] is
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
Γ
(
x
∙
)
Γ
(
x
0
)
∏
i
=
1
m
Γ
(
x
i
+
1
)
×
Γ
(
α
∙
)
∏
i
=
0
m
Γ
(
α
i
)
×
∏
i
=
0
m
Γ
(
x
i
+
α
i
)
Γ
(
x
∙
+
α
∙
)
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\Gamma (x_{\bullet })}{\Gamma (x_{0})\prod _{i=1}^{m}\Gamma (x_{i}+1)}}\times {\frac {\Gamma (\alpha _{\bullet })}{\prod _{i=0}^{m}\Gamma (\alpha _{i})}}\times {\frac {\prod _{i=0}^{m}\Gamma (x_{i}+\alpha _{i})}{\Gamma (x_{\bullet }+\alpha _{\bullet })}}}
where
x
∙
=
x
0
+
x
1
+
⋯
+
x
m
{\displaystyle x_{\bullet }=x_{0}+x_{1}+\cdots +x_{m}}
and
α
∙
=
α
0
+
α
1
+
⋯
+
α
m
{\displaystyle \alpha _{\bullet }=\alpha _{0}+\alpha _{1}+\cdots +\alpha _{m}}
.
This can also be written
Pr
(
x
∣
x
0
,
α
0
,
α
)
=
B
(
x
∙
,
α
∙
)
B
(
x
0
,
α
0
)
∏
i
=
1
m
Γ
(
x
i
+
α
i
)
x
i
!
Γ
(
α
i
)
.
{\displaystyle \Pr(\mathbf {x} \mid x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\mathrm {B} (x_{\bullet },\alpha _{\bullet })}{\mathrm {B} (x_{0},\alpha _{0})}}\prod _{i=1}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{x_{i}!\Gamma (\alpha _{i})}}.}
Marginal distributions
edit
To obtain the marginal distribution over a subset of Dirichlet negative multinomial random variables, one only needs to drop the irrelevant
α
i
{\displaystyle \alpha _{i}}
's (the variables that one wants to marginalize out) from the
α
{\displaystyle {\boldsymbol {\alpha }}}
vector. The joint distribution of the remaining random variates is
D
N
M
(
x
0
,
α
0
,
α
(
−
)
)
{\displaystyle \mathrm {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha _{(-)}}})}
where
α
(
−
)
{\displaystyle {\boldsymbol {\alpha _{(-)}}}}
is the vector with the removed
α
i
{\displaystyle \alpha _{i}}
's. The univariate marginals are said to be beta negative binomially distributed.
Conditional distributions
edit
If m -dimensional x is partitioned as follows
x
=
[
x
(
1
)
x
(
2
)
]
with sizes
[
q
×
1
(
m
−
q
)
×
1
]
{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} ^{(1)}\\\mathbf {x} ^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(m-q)\times 1\end{bmatrix}}}
and accordingly
α
{\displaystyle {\boldsymbol {\alpha }}}
α
=
[
α
(
1
)
α
(
2
)
]
with sizes
[
q
×
1
(
m
−
q
)
×
1
]
{\displaystyle {\boldsymbol {\alpha }}={\begin{bmatrix}{\boldsymbol {\alpha }}^{(1)}\\{\boldsymbol {\alpha }}^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}q\times 1\\(m-q)\times 1\end{bmatrix}}}
then the conditional distribution of
X
(
1
)
{\displaystyle \mathbf {X} ^{(1)}}
on
X
(
2
)
=
x
(
2
)
{\displaystyle \mathbf {X} ^{(2)}=\mathbf {x} ^{(2)}}
is
D
N
M
(
x
0
′
,
α
0
′
,
α
(
1
)
)
{\displaystyle \mathrm {DNM} (x_{0}^{\prime },\alpha _{0}^{\prime },{\boldsymbol {\alpha }}^{(1)})}
where
x
0
′
=
x
0
+
∑
i
=
1
m
−
q
x
i
(
2
)
{\displaystyle x_{0}^{\prime }=x_{0}+\sum _{i=1}^{m-q}x_{i}^{(2)}}
and
α
0
′
=
α
0
+
∑
i
=
1
m
−
q
α
i
(
2
)
{\displaystyle \alpha _{0}^{\prime }=\alpha _{0}+\sum _{i=1}^{m-q}\alpha _{i}^{(2)}}
.
That is,
Pr
(
x
(
1
)
∣
x
(
2
)
,
x
0
,
α
0
,
α
)
=
B
(
x
∙
,
α
∙
)
B
(
x
0
′
,
α
0
′
)
∏
i
=
1
q
Γ
(
x
i
(
1
)
+
α
i
(
1
)
)
(
x
i
(
1
)
!
)
Γ
(
α
i
(
1
)
)
{\displaystyle \Pr(\mathbf {x} ^{(1)}\mid \mathbf {x} ^{(2)},x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {\mathrm {B} (x_{\bullet },\alpha _{\bullet })}{\mathrm {B} (x_{0}^{\prime },\alpha _{0}^{\prime })}}\prod _{i=1}^{q}{\frac {\Gamma (x_{i}^{(1)}+\alpha _{i}^{(1)})}{(x_{i}^{(1)}!)\Gamma (\alpha _{i}^{(1)})}}}
Conditional on the sum
edit
The conditional distribution of a Dirichlet negative multinomial distribution on
∑
i
=
1
m
x
i
=
n
{\displaystyle \sum _{i=1}^{m}x_{i}=n}
is Dirichlet-multinomial distribution with parameters
n
{\displaystyle n}
and
α
{\displaystyle {\boldsymbol {\alpha }}}
. That is
Pr
(
x
∣
∑
i
=
1
m
x
i
=
n
,
x
0
,
α
0
,
α
)
=
n
!
Γ
(
∑
i
=
1
m
α
i
)
Γ
(
n
+
∑
i
=
1
m
α
i
)
∏
i
=
1
m
Γ
(
x
i
+
α
i
)
x
i
!
Γ
(
α
i
)
{\displaystyle \Pr(\mathbf {x} \mid \sum _{i=1}^{m}x_{i}=n,x_{0},\alpha _{0},{\boldsymbol {\alpha }})={\frac {n!\Gamma \left(\sum _{i=1}^{m}\alpha _{i}\right)}{\Gamma \left(n+\sum _{i=1}^{m}\alpha _{i}\right)}}\prod _{i=1}^{m}{\frac {\Gamma (x_{i}+\alpha _{i})}{x_{i}!\Gamma (\alpha _{i})}}}
.
Notice that the expression does not depend on
x
0
{\displaystyle x_{0}}
or
α
0
{\displaystyle \alpha _{0}}
.
If
X
=
(
X
1
,
…
,
X
m
)
∼
DNM
(
x
0
,
α
0
,
α
1
,
…
,
α
m
)
{\displaystyle X=(X_{1},\ldots ,X_{m})\sim \operatorname {DNM} (x_{0},\alpha _{0},\alpha _{1},\ldots ,\alpha _{m})}
then, if the random variables with positive subscripts i and j are dropped from the vector and replaced by their sum,
X
′
=
(
X
1
,
…
,
X
i
+
X
j
,
…
,
X
m
)
∼
DNM
(
x
0
,
α
0
,
α
1
,
…
,
α
i
+
α
j
,
…
,
α
m
)
.
{\displaystyle X'=(X_{1},\ldots ,X_{i}+X_{j},\ldots ,X_{m})\sim \operatorname {DNM} \left(x_{0},\alpha _{0},\alpha _{1},\ldots ,\alpha _{i}+\alpha _{j},\ldots ,\alpha _{m}\right).}
For
α
0
>
2
{\displaystyle \alpha _{0}>2}
the entries of the correlation matrix are
ρ
(
X
i
,
X
i
)
=
1.
{\displaystyle \rho (X_{i},X_{i})=1.}
ρ
(
X
i
,
X
j
)
=
cov
(
X
i
,
X
j
)
var
(
X
i
)
var
(
X
j
)
=
α
i
α
j
(
α
0
+
α
i
−
1
)
(
α
0
+
α
j
−
1
)
.
{\displaystyle \rho (X_{i},X_{j})={\frac {\operatorname {cov} (X_{i},X_{j})}{\sqrt {\operatorname {var} (X_{i})\operatorname {var} (X_{j})}}}={\sqrt {\frac {\alpha _{i}\alpha _{j}}{(\alpha _{0}+\alpha _{i}-1)(\alpha _{0}+\alpha _{j}-1)}}}.}
The Dirichlet negative multinomial is a heavy tailed distribution . It does not have a finite mean for
α
0
≤
1
{\displaystyle \alpha _{0}\leq 1}
and it has infinite covariance matrix for
α
0
≤
2
{\displaystyle \alpha _{0}\leq 2}
. Therefore the moment generating function does not exist.
Dirichlet negative multinomial as a Pólya urn model
edit
In the case when the
m
+
2
{\displaystyle m+2}
parameters
x
0
,
α
0
{\displaystyle x_{0},\alpha _{0}}
and
α
{\displaystyle {\boldsymbol {\alpha }}}
are positive integers the Dirichlet negative multinomial can also be motivated by an urn model - or more specifically a basic Pólya urn model . Consider an urn initially containing
∑
i
=
0
m
α
i
{\displaystyle \sum _{i=0}^{m}{\alpha _{i}}}
balls of
m
+
1
{\displaystyle m+1}
various colors including
α
0
{\displaystyle \alpha _{0}}
red balls (the stopping color). The vector
α
{\displaystyle {\boldsymbol {\alpha }}}
gives the respective counts of the other balls of various
m
{\displaystyle m}
non-red colors. At each step of the model, a ball is drawn at random from the urn and replaced, along with one additional ball of the same color. The process is repeated over and over, until
x
0
{\displaystyle x_{0}}
red colored balls are drawn. The random vector
X
{\displaystyle \mathbf {X} }
of observed draws of the other
m
{\displaystyle m}
non-red colors are distributed according to a
D
N
M
(
x
0
,
α
0
,
α
)
{\displaystyle \mathrm {DNM} (x_{0},\alpha _{0},{\boldsymbol {\alpha }})}
. Note, at the end of the experiment, the urn always contains the fixed number
x
0
+
α
0
{\displaystyle x_{0}+\alpha _{0}}
of red balls while containing the random number
X
+
α
{\displaystyle \mathbf {X} +{\boldsymbol {\alpha }}}
of the other
m
{\displaystyle m}
colors.
^ Farewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050.