Out-of-bag error

Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. OOB error is the mean prediction error on each training sample $x i$ , using only the trees that did not have $x i$ in their bootstrap sample.^[1]

Bootstrap aggregating allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations that were not used in the building of the next base learner.

Out-of-bag dataset

When bootstrap aggregating is performed, two independent sets are created. One set, the bootstrap sample, is the data chosen to be "in-the-bag" by sampling with replacement. The out-of-bag set is all data not chosen in the sampling process.

When this process is repeated, such as when building a random forest, many bootstrap samples and OOB sets are created. The OOB sets can be aggregated into one dataset, but each sample is only considered out-of-bag for the trees that do not include it in their bootstrap sample. The picture below shows that for each bag sampled, the data is separated into two groups.

Visualizing the bagging process. Sampling 4 patients from the original set with replacement and showing the out-of-bag sets. Only patients in the bootstrap sample would be used to train the model for that bag.

This example shows how bagging could be used in the context of diagnosing disease. A set of patients are the original dataset, but each model is trained only by the patients in its bag. The patients in each out-of-bag set can be used to test their respective models. The test would consider whether the model can accurately determine if the patient has the disease.

Calculating out-of-bag error

Since each out-of-bag set is not used to train the model, it is a good test for the performance of the model. The specific calculation of OOB error depends on the implementation of the model, but a general calculation is as follows.

Find all models (or trees, in the case of a random forest) that are not trained by the OOB instance.
Take the majority vote of these models' result for the OOB instance, compared to the true value of the OOB instance.
Compile the OOB error for all instances in the OOB dataset.

An illustration of OOB error

The bagging process can be customized to fit the needs of a model. To ensure an accurate model, the bootstrap training sample size should be close to that of the original set.^[2] Also, the number of iterations (trees) of the model (forest) should be considered to find the true OOB error. The OOB error will stabilize over many iterations so starting with a high number of iterations is a good idea.^[3]

Shown in the example to the right, the OOB error can be found using the method above once the forest is set up.

Comparison to cross-validation

Out-of-bag error and cross-validation (CV) are different methods of measuring the error estimate of a machine learning model. Over many iterations, the two methods should produce a very similar error estimate. That is, once the OOB error stabilizes, it will converge to the cross-validation (specifically leave-one-out cross-validation) error.^[3] The advantage of the OOB method is that it requires less computation and allows one to test the model as it is being trained.

Accuracy and Consistency

Out-of-bag error is used frequently for error estimation within random forests but with the conclusion of a study done by Silke Janitza and Roman Hornung, out-of-bag error has shown to overestimate in settings that include an equal number of observations from all response classes (balanced samples), small sample sizes, a large number of predictor variables, small correlation between predictors, and weak effects.^[4]

References

^ James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2013). An Introduction to Statistical Learning. Springer. pp. 316–321.
^ Ong, Desmond (2014). A primer to bootstrapping; and an overview of doBootstrap (PDF). pp. 2–4.
^ ^a ^b Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (PDF). Springer. pp. 592–593.
^ Janitza, Silke; Hornung, Roman (2018-08-06). "On the overestimation of random forest's out-of-bag error". PLOS ONE. 13 (8): e0201904. Bibcode:2018PLoSO..1301904J. doi:10.1371/journal.pone.0201904. ISSN 1932-6203. PMC 6078316. PMID 30080866.

[islr-1] James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2013). An Introduction to Statistical Learning. Springer. pp. 316–321.

[2] Ong, Desmond (2014). A primer to bootstrapping; and an overview of doBootstrap (PDF). pp. 2–4.

[:0-3] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (PDF). Springer. pp. 592–593.

[4] Janitza, Silke; Hornung, Roman (2018-08-06). "On the overestimation of random forest's out-of-bag error". PLOS ONE. 13 (8): e0201904. Bibcode:2018PLoSO..1301904J. doi:10.1371/journal.pone.0201904. ISSN 1932-6203. PMC 6078316. PMID 30080866.

[1]

[2]

[3]

[4]