Rapid Deployment of Predictive Models - Quick Tab

Select the Quick tab of the Rapid Deployment of Predictive Models dialog box to access the options described here.

Summary: Predicted & residual values (classifications). Click this button to compute predicted values or predicted classifications and other statistics for the current model from the current active data set. For regression-type problems, when observed values for the outcome variable exist, Statistica will also compute the average squared error (residual) for each prediction model; for classification-type problems, when observed values for the outcome variable exist, the program will also compute overall error rates for each prediction model.

Save predicted & residual values (classifications). Click this button to compute predicted values or predicted classifications and other statistics for the current model from the current active data set, and to place this information along with other variables into a spreadsheet marked for input for subsequent analyses. After clicking this button, a variable selection dialog box will be displayed, where you can select variables to save along with the predicted and residual values (classifications). This option is particularly useful in order to create input data files for further analyses after computing predicted values (e.g., for further residual analyses, etc.).

Save predicted & residual values to input data. Click this button to compute predicted values or predicted classifications and other statistics for the current models from the current active data set, and to place this information into the input data file or database (see also Streaming Database Connector Technology). This option can be used to write predicted values, classification probabilities, or classifications back into the same data file or database that contains the data from which the respective models were estimated; this is a common operation in data mining projects when the predictions from one or more models are to be integrated into the data warehouse (see Data Warehousing).

After clicking this button, the Assign statistics to variables for saving in input data dialog box is displayed. Use these options to assign the statistics of interest (predicted values or classifications, prediction probabilities, residuals) to columns in the current input data file (or connection to an external database). Make sure that the variables to which you want to write this information already exist in the current data file, i.e., create those variables or fields (in a database) prior to using this option. Refer also to Steaming Database Connector Technology for details on how to define the cursor to permit writing of results (e.g., predicted classifications) to an external database.

Model file list. Click this button to display a spreadsheet with names of currently selected (PMML) files.

Include pred. probs in output. This option is available for classification models. When this check box is selected, the prediction probabilities of the dependent categories for the selected models will be created.

Tree model options.

Include node id. Select this check box to produce an extra column - Terminal Node - to the results spreadsheet. This is the node ID where the tree algorithm stops to generate a prediction for each case.

Predict case(s) with missing data in inputs. Select this check box to generate predictions for cases with missing inputs. Tree algorithms can be used to generate predictions by just returning at the node where a decision cannot be made to proceed further (because of missing data). By default this check box will be cleared and can be selected if a particular use case accepts such non-terminal node prediction from tree algorithms.

When unmatched categories are encountered. In some cases and for some models, it is possible to make predictions from categorical (class) variables when the actual categories or class values are not referenced by the respective prediction models (PMML).

For example, suppose you built a prediction model that included a categorical predictor EmploymentStatus with the categories Yes and No. Now suppose a new case is to be predicted (scored) where the individual responded Part Time, instead of Yes or No. Some models, such as logistic regression can in fact make valid predictions for such cases, assuming that the value for this variable is Not-Yes and Not-No. The coding that is performed before the prediction model is applied may code the respective variable as two binary variables (0/1) to indicate the Yes and No responses. If a category does not match either of those variables, it will be 0 for both response categories.

Sometimes this may not be desirable behavior. Instead the predictions should either not be made for those cases, or the entire deployment (computations of predicted values) should be interrupted to inform the operator or analyst that a heretofore unknown class value was encountered.

Let model handle unmatched categories. Select this option button to let the respective prediction model handle these cases. Some will make valid predictions.

Set predictions to Missing Data. Select this option button to let the scoring proceed, but return a missing data value for the predicted value or class from all prediction models.

Interrupt scoring (processing), and show error. Select this option button to interrupt the analyses. The first instance of an unmatched class value will be shown in the error message.