CatBoost
Developer(s)Yandex Machine Learning Systems
Initial releaseJuly 18, 2017; 7 years ago (2017-07-18)
Stable release
0.23.2[1] / May 26, 2020; 4 years ago (2020-05-26)
Repositorygithub.com/catboost/catboost
Written inPython, C++, CUDA
PlatformLinux, macOS, Windows
TypeMachine learning
LicenseApache License 2.0
Websitewww.catboost.ai

CatBoost[2] is a free and open-source machine learning library. It's core feature is training of gradient boosting models. The library is published and supported by Yandex, a Russian corporation specializing in internet-related products and services, including transportation and search, and is also actively used in different companies and research institutes all over the world.

The library works on Linux, Windows and macOS. It supports Python[3], R and command-line interfaces for training and analysing resulting models. It supports applying the models in Python, R, C, C++, Java, Rust and C#. It also supports a list of formats for future predictions, including CoreML, ONNX and PMML.

The library supports processing on CPU and GPU for both, training and prediction.

The library supports training machine learning models on structured or tabled data. It supports numeric data, categorical data and text data as input for training.

Features

edit

CatBoost github page claims that the library has the following advantages: - Superior quality without parameter tuning compared to other popular open-source gradient boosting libraries XGBoost and LightGBM. - The fastest implementation of GPU training compared to other popular open-source gradient boosting libraries - Support of different types of input data, including categorical and text data - Tools for model analysis and visualisation

Applications

edit

CatBoost is used in Yandex for ranking search results, building recommendation feed for music, films and blogs, for weather prediction, for spam detection, for self-driving cars, for personal assistant Alice. It is used in CERN for particle classification, in Aviasales for ranking hotels, in CloudFlare for bot detection, in Careem taxi for destination prediction.

History

edit

In 2009 Andrey Gulin, now the head of Yandex Ads Quality, has developed Matrixnet, a proprietary gradient boosting library that was used in Yandex to rank search results. Since 2009 Matrixnet has been used in different projects in Yandex, including recommendation systems and weather prediction.

In 2014-2015 Andrey Gulin with a team of researchers has started a new project called Tensornet that was aimed to solve the problem of "how to work with categorical data". It resulted in several proprietary Gradient Boosting libraries with different approaches to handling categorical data.

In 2016 Anna Veronika Dorogush has started working on Gradient Boosting in Yandex, including Matrixnet and Tensornet. With her team she has implemented and open-sourced the next version of the library, called CatBoost with full support of categorical data, GPU training, model analysis, visualisation tools and later text data support.

CatBoost was open-sourced in July 2017 and is under active development in Yandex and the open-source community.

edit

See also

edit

References

edit
  1. ^ "CatBoost Release". Retrieved 08 June 2020. {{cite news}}: Check date values in: |accessdate= (help)
  2. ^ "GitHub project webpage".
  3. ^ "Python Package Index PYPI: catboost". Retrieved June 1, 2019.

Category:Applied machine learning Category:Data mining and machine learning software Category:Free software programmed in C++ Category:Free software programmed in Python Category:Free statistical software Category:Open-source artificial intelligence Category:Python (programming language) scientific libraries Category:Software using the Apache license