Boosted Trees - Program Overview

The Statistica Boosted Trees module is a complete implementation of the method usually referred to as stochastic gradient boosting trees [Friedman, 1999a, b; Hastie, Tibshirani, & Friedman, 2001; also known as TreeNet (Salford Systems, Inc.) and MART (Jerill, Inc.)]. In Statistica, these techniques can be used for regression-type problems (to predict a continuous dependent variable) as well as classification problems (to predict a categorical dependent variable).

Estimation. You have full control over all key aspects of the estimation procedure, including the complexity of the trees fitted to the data, the maximum number of boosting steps, the subsampling rate for the training sample at each boosting step, the learning or shrinkage rate, etc. You can also specify an independent testing sample to evaluate the predictive validity in that sample for each solution in the sequence of boosting steps. If no specific testing sample is selected, Statistica randomly selects such a sample at each boosting step, and then determines the best solution (best number of additive expansions or simple trees) based on the performance of the respective models for predicting the cases in those testing samples.

Results. As with all modules of Statistica, Statistica Data Miner, Statistica Enterprise Server, and Statistica Enterprise Server Data Miner, a large number of graphs are provided in the results as aids for the evaluation of the final model.

Deployment for Data Mining. As is the case for all modules for predictive data mining, the final solution can be deployed by generating computer code in C/C++, Statistica Visual Basic (SVB), or PMML (for later deployment via the Statistica Rapid Deployment engine).