Negative log predictive density

In statistics, the negative log predictive density (NLPD) is a measure of error between a model's predictions and associated true values. A smaller value is better. Importantly the NLPD assesses the quality of the model's uncertainty quantification. It is used for both regression and classification.

To compute: (1) find the probabilities given by the model to the true labels. (2) find the negative log of this product. (we actually find the negative of the sum of the logs, for numerical reasons).

Definition

${\text{NLPD}}=-\sum _{i=1}^{N}\log p(y_{i}=t_{i}|\mathbf {x_{i}} )$

where $p(y|\mathbf {x} )$ is the model, $\mathbf {x_{i}}$ are the inputs (independent variables) and $t_{i}$ are the observations outputs (dependent variable).

Often the mean rather than the sum is used (by dividing by N),

${\text{NLPD}}=-{\frac {1}{N}}\sum _{i=1}^{N}\log p(y_{i}=t_{i}|\mathbf {x_{i}} )$

Example

Calculating the NLPD for a simple classification example

We have a method that classifies images as dogs or cats. Importantly it assigns probabilities to the two classes.

We show it a picture of three dogs and three cats. It predicts that the probability of the first three being dogs as 0.9 and 0.4, 0.7 and of the last three being cats as 0.8 and 0.4, 0.3.

The NLPD is: $-(\log 0.9+\log 0.4+\log 0.7+\log 0.8+\log 0.4+\log 0.3)=3.72$ .

Comparing to a classifier with better accuracy but overconfident

We compare this to another classifier which predicts the first three as being dogs as 0.95, 0.98, 0.02, and the last three being cats as 0.99, 0.96,0.96. The NLPD for this classifier is 4.08. The first classifier only guessed half correctly, so did worse on a traditional measure of accuracy (compared to 5/6 for the second classifier). However it performs better on the metric of NLPD: The second classifier is effectively 'confidently wrong' which is penalised heavily by this metric.

Compared to a very under-confident classifier

A third classifier then just predicts 0.5 for all classes will have an NLPD in this case of 4.15: worse than either of the others.

Usage

It is used extensively in probabilistic modelling research. Examples include:

- Candela, Joaquin Quinonero, et al. "Propagation of uncertainty in bayesian kernel models-application to multiple-step ahead forecasting." 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03).. Vol. 2. IEEE, 2003.

- Kersting, Kristian, et al. "Most likely heteroscedastic Gaussian process regression." Proceedings of the 24th international conference on Machine learning. 2007.

- See also https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/coin.12411 for a background of other approaches (confusingly the definition in that reference says that the NLPD is what most others refer to as the *average* NLPD). I.e. ${\frac {1}{N}}{\text{NLPD}}$ .

- Heinonen, Markus, et al. "Non-stationary gaussian process regression with hamiltonian monte carlo." Artificial Intelligence and Statistics. PMLR, 2016.