Incremental (vs. Non-Incremental Learning Algorithms)

Methods (algorithms) for predictive data mining are also referred to as "learning" algorithms, because they derive information from the data to predict new observations. These algorithms can be divided into those that require one or perhaps two complete passes through the input data, and those that require iterative multiple access to the data to complete the estimation. The former type of algorithms are also sometimes referred to as incremental learning algorithms, because they will complete the computations necessary to fit the respective models by processing one case at a time, each time "refining" the solution; then, when all cases have been processed, only few additional computations are necessary to produce the final results. Non-incremental learning algorithms are those that need to process all observations in each iteration of an iterative procedure for refining a final solution. Obviously, incremental learning algorithms are usually much faster than non-incremental algorithms, and for extremely large data sets, non-incremental algorithms may not be applicable at all (without first sub-sampling).

Statistica Data Miner includes a large selection of non-incremental and incremental learning algorithms, that can be scaled to predictive data mining projects for a few thousand observations and a few variables (data columns), as well as to situations where the data may consist of many millions of observations, and hundreds of thousands of variables (i.e., to many gigabytes of data). Efficient methods for random sampling are also included, that will efficiently select samples from huge databases for subsequent analyses.

See also, Predictive Data Mining and Streaming DB Connector.