Bayesian optimization

Bayesian optimization is a sequential design strategy for global optimization of black-box functions,^[1]^[2]^[3] that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions. With the rise of artificial intelligence innovation in the 21st century, Bayesian optimizations have found prominent use in machine learning problems, for optimizing hyperparameter values.^[4]^[5]

History

The term is generally attributed to Jonas Mockus [lt] and is coined in his work from a series of publications on global optimization in the 1970s and 1980s.^[6]^[7]^[1]

Strategy

Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom.^[8]

Bayesian optimization is typically used on problems of the form ${\textstyle \max _{x\in A}f(x)}$ , where ${\textstyle A}$ is a set of points, ${\textstyle x}$ , which rely upon less (or equal to) than 20 dimensions ( ${\textstyle \mathbb {R} ^{d},d\leq 20}$ ), and whose membership can easily be evaluated. Bayesian optimization is particularly advantageous for problems where ${\textstyle f(x)}$ is difficult to evaluate due to its computational cost. The objective function, ${\textstyle f}$ , is continuous and takes the form of some unknown structure, referred to as a "black box". Upon its evaluation, only ${\textstyle f(x)}$ is observed and its derivatives are not evaluated.^[9]

Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a prior over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the posterior distribution over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.

There are several methods used to define the prior/posterior distribution over the objective function. The most common two methods use Gaussian processes in a method called kriging. Another less expensive method uses the Parzen-Tree Estimator to construct two distributions for 'high' and 'low' points, and then finds the location that maximizes the expected improvement.^[10]

Standard Bayesian optimization relies upon each $x\in A$ being easy to evaluate, and problems that deviate from this assumption are known as exotic Bayesian optimization problems. Optimization problems can become exotic if it is known that there is noise, the evaluations are being done in parallel, the quality of evaluations relies upon a tradeoff between difficulty and accuracy, the presence of random environmental conditions, or if the evaluation involves derivatives.^[9]

Acquisition functions

Examples of acquisition functions include

probability of improvement
expected improvement
Bayesian expected losses
upper confidence bounds (UCB) or lower confidence bounds
Thompson sampling

and hybrids of these.^[11] They all trade-off exploration and exploitation so as to minimize the number of function queries. As such, Bayesian optimization is well suited for functions that are expensive to evaluate.

Solution methods

The maximum of the acquisition function is typically found by resorting to discretization or by means of an auxiliary optimizer. Acquisition functions are maximized using a numerical optimization technique, such as Newton's Method or quasi-Newton methods like the Broyden–Fletcher–Goldfarb–Shanno algorithm.

Applications

The approach has been applied to solve a wide range of problems,^[12] including learning to rank,^[13] computer graphics and visual design,^[14]^[15]^[16] robotics,^[17]^[18]^[19]^[20] sensor networks,^[21]^[22] automatic algorithm configuration,^[23]^[24] automatic machine learning toolboxes,^[25]^[26]^[27] reinforcement learning,^[28] planning, visual attention, architecture configuration in deep learning, static program analysis, experimental particle physics,^[29]^[30] quality-diversity optimization,^[31]^[32]^[33] chemistry, material design, and drug development.^[9]^[34]^[35]

Bayesian Optimization has been applied in the field of facial recognition.^[36] The performance of the Histogram of Oriented Gradients (HOG) algorithm, a popular feature extraction method, heavily relies on its parameter settings. Optimizing these parameters can be challenging but crucial for achieving high accuracy.^[36] A novel approach to optimize the HOG algorithm parameters and image size for facial recognition using a Tree-structured Parzen Estimator (TPE) based Bayesian optimization technique has been proposed.^[36] This optimized approach has the potential to be adapted for other computer vision applications and contributes to the ongoing development of hand-crafted parameter-based feature extraction algorithms in computer vision.^[36]

References

^ ^a ^b Močkus, J. (1989). Bayesian Approach to Global Optimization. Dordrecht: Kluwer Academic. ISBN 0-7923-0115-3.
^ Garnett, Roman (2023). Bayesian Optimization. Cambridge University Press. ISBN 978-1-108-42578-0.
^ Hennig, P.; Osborne, M. A.; Kersting, H. P. (2022). Probabilistic Numerics (PDF). Cambridge University Press. pp. 243–278. ISBN 978-1107163447.
^ Snoek, Jasper (2012). "Practical Bayesian Optimization of Machine Learning Algorithms". Advances in Neural Information Processing Systems 25 (NIPS 2012).
^ Klein, Aaron (2017). "Fast bayesian optimization of machine learning hyperparameters on large datasets". Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR: 528–536.
^ Močkus, Jonas (1975). "On bayesian methods for seeking the extremum". Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974. Lecture Notes in Computer Science. Vol. 27. pp. 400–404. doi:10.1007/3-540-07165-2_55. ISBN 978-3-540-07165-5.
^ Močkus, Jonas (1977). "On Bayesian Methods for Seeking the Extremum and their Application". IFIP Congress: 195–200.
^ Wilson, Samuel (2019-11-22), ParBayesianOptimization R package, retrieved 2019-12-12
^ ^a ^b ^c Frazier, Peter I. (2018-07-08). "A Tutorial on Bayesian Optimization". arXiv:1807.02811 [stat.ML].
^ J. S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl: Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems: 2546–2554 (2011)
^ Matthew W. Hoffman, Eric Brochu, Nando de Freitas: Portfolio Allocation for Bayesian Optimization. Uncertainty in Artificial Intelligence: 327–336 (2011)
^ Eric Brochu, Vlad M. Cora, Nando de Freitas: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. CoRR abs/1012.2599 (2010)
^ Eric Brochu, Nando de Freitas, Abhijeet Ghosh: Active Preference Learning with Discrete Choice Data. Advances in Neural Information Processing Systems: 409-416 (2007)
^ Eric Brochu, Tyson Brochu, Nando de Freitas: A Bayesian Interactive Optimization Approach to Procedural Animation Design. Symposium on Computer Animation 2010: 103–112
^ Yuki Koyama, Issei Sato, Daisuke Sakamoto, Takeo Igarashi: Sequential Line Search for Efficient Visual Design Optimization by Crowds. ACM Transactions on Graphics, Volume 36, Issue 4, pp.48:1–48:11 (2017). DOI: https://doi.org/10.1145/3072959.3073598
^ Yuki Koyama, Issei Sato, Masataka Goto: Sequential Gallery for Interactive Visual Design Optimization. ACM Transactions on Graphics, Volume 39, Issue 4, pp.88:1–88:12 (2020). DOI: https://doi.org/10.1145/3386569.3392444
^ Daniel J. Lizotte, Tao Wang, Michael H. Bowling, Dale Schuurmans: Automatic Gait Optimization with Gaussian Process Regression Archived 2017-08-12 at the Wayback Machine. International Joint Conference on Artificial Intelligence: 944–949 (2007)
^ Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos and Arnaud Doucet. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots. Volume 27, Issue 2, pp 93–103 (2009)
^ Scott Kuindersma, Roderic Grupen, and Andrew Barto. Variable Risk Control via Stochastic Optimization. International Journal of Robotics Research, volume 32, number 7, pp 806–825 (2013)
^ Roberto Calandra, André Seyfarth, Jan Peters, and Marc P. Deisenroth Bayesian optimization for learning gaits under uncertainty. Ann. Math. Artif. Intell. Volume 76, Issue 1, pp 5-23 (2016) DOI:10.1007/s10472-015-9463-9
^ Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias W. Seeger: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory 58(5):3250–3265 (2012)
^ Garnett, Roman; Osborne, Michael A.; Roberts, Stephen J. (2010). "Bayesian optimization for sensor set selection". In Abdelzaher, Tarek F.; Voigt, Thiemo; Wolisz, Adam (eds.). Proceedings of the 9th International Conference on Information Processing in Sensor Networks, IPSN 2010, April 12–16, 2010, Stockholm, Sweden. ACM. pp. 209–219. doi:10.1145/1791212.1791238.
^ Frank Hutter, Holger Hoos, and Kevin Leyton-Brown (2011). Sequential model-based optimization for general algorithm configuration, Learning and Intelligent Optimization
^ J. Snoek, H. Larochelle, R. P. Adams Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems: 2951-2959 (2012)
^ J. Bergstra, D. Yamins, D. D. Cox (2013). Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms. Proc. SciPy 2013.
^ Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. KDD 2013: 847–855
^ Jasper Snoek, Hugo Larochelle and Ryan Prescott Adams. Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 2012
^ Berkenkamp, Felix (2019). Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics (Doctoral Thesis thesis). ETH Zurich. doi:10.3929/ethz-b-000370833. hdl:20.500.11850/370833.
^ Philip Ilten, Mike Williams, Yunjie Yang. Event generator tuning using Bayesian optimization. 2017 JINST 12 P04028. DOI: 10.1088/1748-0221/12/04/P04028
^ Evaristo Cisbani et al. AI-optimized detector design for the future Electron-Ion Collider: the dual-radiator RICH case 2020 JINST 15 P05009. DOI: 10.1088/1748-0221/15/05/P05009
^ Kent, Paul; Gaier, Adam; Mouret, Jean-Baptiste; Branke, Juergen (2023-07-19). "BOP-Elites, a Bayesian Optimisation Approach to Quality Diversity Search with Black-Box descriptor functions". arXiv:2307.09326 [math.OC]. Preprint: Arxiv.
^ Kent, Paul; Branke, Juergen (2023-07-12). "Bayesian Quality Diversity Search with Interactive Illumination". Proceedings of the Genetic and Evolutionary Computation Conference (PDF). GECCO '23. New York, NY, USA: Association for Computing Machinery. pp. 1019–1026. doi:10.1145/3583131.3590486. ISBN 979-8-4007-0119-1. S2CID 259833672.
^ Gaier, Adam; Asteroth, Alexander; Mouret, Jean-Baptiste (2018-09-01). "Data-Efficient Design Exploration through Surrogate-Assisted Illumination". Evolutionary Computation. 26 (3): 381–410. arXiv:1806.05865. doi:10.1162/evco_a_00231. ISSN 1063-6560. PMID 29883202. S2CID 47003986.
^ Gomez-Bombarelli et al. Automatic Chemical Design using a Data-Driven Continuous Representation of Molecules. ACS Central Science, Volume 4, Issue 2, 268-276 (2018)
^ Griffiths et al. Constrained Bayesian Optimization for Automatic Chemical Design using Variational Autoencoders Chemical Science: 11, 577-586 (2020)
^ ^a ^b ^c ^d Mohammed Mehdi Bouchene: Bayesian Optimization of Histogram of Oriented Gradients (Hog) Parameters for Facial Recognition. SSRN (2023)

[Mockus1989-1] Močkus, J. (1989). Bayesian Approach to Global Optimization. Dordrecht: Kluwer Academic. ISBN 0-7923-0115-3.

[garnett_bayesopt_2022-2] Garnett, Roman (2023). Bayesian Optimization. Cambridge University Press. ISBN 978-1-108-42578-0.

[HennigOsborneKersting2022-3] Hennig, P.; Osborne, M. A.; Kersting, H. P. (2022). Probabilistic Numerics (PDF). Cambridge University Press. pp. 243–278. ISBN 978-1107163447.

[4] Snoek, Jasper (2012). "Practical Bayesian Optimization of Machine Learning Algorithms". Advances in Neural Information Processing Systems 25 (NIPS 2012).

[5] Klein, Aaron (2017). "Fast bayesian optimization of machine learning hyperparameters on large datasets". Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR: 528–536.

[6] Močkus, Jonas (1975). "On bayesian methods for seeking the extremum". Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974. Lecture Notes in Computer Science. Vol. 27. pp. 400–404. doi:10.1007/3-540-07165-2_55. ISBN 978-3-540-07165-5.

[7] Močkus, Jonas (1977). "On Bayesian Methods for Seeking the Extremum and their Application". IFIP Congress: 195–200.

[8] Wilson, Samuel (2019-11-22), ParBayesianOptimization R package, retrieved 2019-12-12

[:0-9] Frazier, Peter I. (2018-07-08). "A Tutorial on Bayesian Optimization". arXiv:1807.02811 [stat.ML].

[10] J. S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl: Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems: 2546–2554 (2011)

[11] Matthew W. Hoffman, Eric Brochu, Nando de Freitas: Portfolio Allocation for Bayesian Optimization. Uncertainty in Artificial Intelligence: 327–336 (2011)

[12] Eric Brochu, Vlad M. Cora, Nando de Freitas: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. CoRR abs/1012.2599 (2010)

[13] Eric Brochu, Nando de Freitas, Abhijeet Ghosh: Active Preference Learning with Discrete Choice Data. Advances in Neural Information Processing Systems: 409-416 (2007)

[14] Eric Brochu, Tyson Brochu, Nando de Freitas: A Bayesian Interactive Optimization Approach to Procedural Animation Design. Symposium on Computer Animation 2010: 103–112

[15] Yuki Koyama, Issei Sato, Daisuke Sakamoto, Takeo Igarashi: Sequential Line Search for Efficient Visual Design Optimization by Crowds. ACM Transactions on Graphics, Volume 36, Issue 4, pp.48:1–48:11 (2017). DOI: https://doi.org/10.1145/3072959.3073598

[16] Yuki Koyama, Issei Sato, Masataka Goto: Sequential Gallery for Interactive Visual Design Optimization. ACM Transactions on Graphics, Volume 39, Issue 4, pp.88:1–88:12 (2020). DOI: https://doi.org/10.1145/3386569.3392444

[17] Daniel J. Lizotte, Tao Wang, Michael H. Bowling, Dale Schuurmans: Automatic Gait Optimization with Gaussian Process Regression Archived 2017-08-12 at the Wayback Machine. International Joint Conference on Artificial Intelligence: 944–949 (2007)

[18] Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos and Arnaud Doucet. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots. Volume 27, Issue 2, pp 93–103 (2009)

[19] Scott Kuindersma, Roderic Grupen, and Andrew Barto. Variable Risk Control via Stochastic Optimization. International Journal of Robotics Research, volume 32, number 7, pp 806–825 (2013)

[20] Roberto Calandra, André Seyfarth, Jan Peters, and Marc P. Deisenroth Bayesian optimization for learning gaits under uncertainty. Ann. Math. Artif. Intell. Volume 76, Issue 1, pp 5-23 (2016) DOI:10.1007/s10472-015-9463-9

[21] Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias W. Seeger: Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory 58(5):3250–3265 (2012)

[22] Garnett, Roman; Osborne, Michael A.; Roberts, Stephen J. (2010). "Bayesian optimization for sensor set selection". In Abdelzaher, Tarek F.; Voigt, Thiemo; Wolisz, Adam (eds.). Proceedings of the 9th International Conference on Information Processing in Sensor Networks, IPSN 2010, April 12–16, 2010, Stockholm, Sweden. ACM. pp. 209–219. doi:10.1145/1791212.1791238.

[23] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown (2011). Sequential model-based optimization for general algorithm configuration, Learning and Intelligent Optimization

[24] J. Snoek, H. Larochelle, R. P. Adams Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems: 2951-2959 (2012)

[25] J. Bergstra, D. Yamins, D. D. Cox (2013). Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms. Proc. SciPy 2013.

[26] Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. KDD 2013: 847–855

[27] Jasper Snoek, Hugo Larochelle and Ryan Prescott Adams. Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 2012

[28] Berkenkamp, Felix (2019). Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics (Doctoral Thesis thesis). ETH Zurich. doi:10.3929/ethz-b-000370833. hdl:20.500.11850/370833.

[29] Philip Ilten, Mike Williams, Yunjie Yang. Event generator tuning using Bayesian optimization. 2017 JINST 12 P04028. DOI: 10.1088/1748-0221/12/04/P04028

[30] Evaristo Cisbani et al. AI-optimized detector design for the future Electron-Ion Collider: the dual-radiator RICH case 2020 JINST 15 P05009. DOI: 10.1088/1748-0221/15/05/P05009

[31] Kent, Paul; Gaier, Adam; Mouret, Jean-Baptiste; Branke, Juergen (2023-07-19). "BOP-Elites, a Bayesian Optimisation Approach to Quality Diversity Search with Black-Box descriptor functions". arXiv:2307.09326 [math.OC]. Preprint: Arxiv.

[32] Kent, Paul; Branke, Juergen (2023-07-12). "Bayesian Quality Diversity Search with Interactive Illumination". Proceedings of the Genetic and Evolutionary Computation Conference (PDF). GECCO '23. New York, NY, USA: Association for Computing Machinery. pp. 1019–1026. doi:10.1145/3583131.3590486. ISBN 979-8-4007-0119-1. S2CID 259833672.

[33] Gaier, Adam; Asteroth, Alexander; Mouret, Jean-Baptiste (2018-09-01). "Data-Efficient Design Exploration through Surrogate-Assisted Illumination". Evolutionary Computation. 26 (3): 381–410. arXiv:1806.05865. doi:10.1162/evco_a_00231. ISSN 1063-6560. PMID 29883202. S2CID 47003986.

[34] Gomez-Bombarelli et al. Automatic Chemical Design using a Data-Driven Continuous Representation of Molecules. ACS Central Science, Volume 4, Issue 2, 268-276 (2018)

[35] Griffiths et al. Constrained Bayesian Optimization for Automatic Chemical Design using Variational Autoencoders Chemical Science: 11, 577-586 (2020)

[Bouchene2023-36] Mohammed Mehdi Bouchene: Bayesian Optimization of Histogram of Oriented Gradients (Hog) Parameters for Facial Recognition. SSRN (2023)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]