Home  >>  Archives  >>  Volume 5 Number 3  >>  st0087

The Stata Journal
Volume 5 Number 3: pp. 330-354



Subscribe to the Stata Journal
cover

Boosted regression (boosting): An introductory tutorial and a Stata plugin

Matthias Schonlau
RAND
Abstract.   Boosting, or boosted regression, is a recent data-mining technique that has shown considerable success in predictive accuracy. This article gives an overview of boosting and introduces a new Stata command, boost, that implements the boosting algorithm described in Hastie, Tibshirani, and Friedman (2001, 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regression example, the R2 value computed on a test dataset is R2 = 21.3% for linear regression and R2 = 93.8% for boosting. In the logistic regression example, stepwise logistic regression correctly classifies 54.1% of the observations in a test dataset versus 76.0% for boosted logistic regression. Currently, boost accommodates Gaussian (normal), logistic, and Poisson boosted regression. boost is implemented as a Windows C++ plugin.
Terms of use     View this article (PDF)

View all articles by this author: Matthias Schonlau

View all articles with these keywords: boost, boosted regression, boosting, data mining

Download citation: BibTeX  RIS

Download citation and abstract: BibTeX  RIS