Dirichlet negative multinomial distribution

In probability theory and statistics, the Dirichlet negative multinomial distribution is a multivariate distribution on the non-negative integers. It is a multivariate extension of the beta negative binomial distribution. It is also a generalization of the negative multinomial distribution (NM(k, p)) allowing for heterogeneity or overdispersion to the probability vector. It is used in quantitative marketing research to flexibly model the number of household transactions across multiple brands.

Notation
Parameters
Support
PMF
where , and Γ(x) is the Gamma function and B is the beta function.
Mean for
Variance for
MGF does not exist
CF
where is the Lauricella function

If parameters of the Dirichlet distribution are , and if

where

then the marginal distribution of X is a Dirichlet negative multinomial distribution:

In the above, is the negative multinomial distribution and is the Dirichlet distribution.


Motivation

edit

Dirichlet negative multinomial as a compound distribution

edit

The Dirichlet distribution is a conjugate distribution to the negative multinomial distribution. This fact leads to an analytically tractable compound distribution. For a random vector of category counts  , distributed according to a negative multinomial distribution, the compound distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution:

 
 

which results in the following formula:

 

where   and   are the   dimensional vectors created by appending the scalars   and   to the   dimensional vectors   and   respectively and   is the multivariate version of the beta function. We can write this equation explicitly as

 

Alternative formulations exist. One convenient representation[1] is

 

where   and  .

This can also be written

 

Properties

edit

Marginal distributions

edit

To obtain the marginal distribution over a subset of Dirichlet negative multinomial random variables, one only needs to drop the irrelevant  's (the variables that one wants to marginalize out) from the   vector. The joint distribution of the remaining random variates is   where   is the vector with the removed  's. The univariate marginals are said to be beta negative binomially distributed.

Conditional distributions

edit

If m-dimensional x is partitioned as follows

 

and accordingly  

 

then the conditional distribution of   on   is   where

 

and

 .

That is,

 

Conditional on the sum

edit

The conditional distribution of a Dirichlet negative multinomial distribution on   is Dirichlet-multinomial distribution with parameters   and  . That is

 .

Notice that the expression does not depend on   or  .

Aggregation

edit

If

 

then, if the random variables with positive subscripts i and j are dropped from the vector and replaced by their sum,

 


Correlation matrix

edit

For   the entries of the correlation matrix are

 
 

Heavy tailed

edit

The Dirichlet negative multinomial is a heavy tailed distribution. It does not have a finite mean for   and it has infinite covariance matrix for  . Therefore the moment generating function does not exist.

Applications

edit

Dirichlet negative multinomial as a Pólya urn model

edit

In the case when the   parameters   and   are positive integers the Dirichlet negative multinomial can also be motivated by an urn model - or more specifically a basic Pólya urn model. Consider an urn initially containing   balls of   various colors including   red balls (the stopping color). The vector   gives the respective counts of the other balls of various   non-red colors. At each step of the model, a ball is drawn at random from the urn and replaced, along with one additional ball of the same color. The process is repeated over and over, until   red colored balls are drawn. The random vector   of observed draws of the other   non-red colors are distributed according to a  . Note, at the end of the experiment, the urn always contains the fixed number   of red balls while containing the random number   of the other   colors.

See also

edit

References

edit
  1. ^ Farewell, Daniel & Farewell, Vernon. (2012). Dirichlet negative multinomial regression for overdispersed correlated count data. Biostatistics (Oxford, England). 14. 10.1093/biostatistics/kxs050.