Example: SAS Deployment

The following STATISTICA modules support SAS code generation.

Accessed from the Statistics tab or menu:

ANOVA

General Linear Models (GLM)

Generalized Linear/Nonlinear Models (GLZ)

General Regression Models (GRM)

Data mining modules

General Classification and Regression Trees (GC&RT)

General CHAID and Exhaustive CHAID (GCHAID)

Interactive Trees (C&RT, CHAID)

Boosted Tree Classifiers and Regression

Random Forests for Regression and Classification

Multivariate Adaptive Regression Splines (MARSplines)

Automated Neural Networks

• Regression

• Classification

• Cluster analysis

The SAS code option is accessed from the Code generator button located in the results dialogs of these modules. The following image shows this option in the General Classification and Regression Trees (GC&RT) module.

This example illustrates SAS code generation for classification trees.

The data for the analysis were generated in a manner similar to the way that a faulty calculator would display numerals on a digital display (for a description of how these data were generated, see Breiman et. al., 1984). The numerals from one through nine and zero that were entered on the keypad of a calculator formed the observed classes on the dependent variable Digit. There were 7 categorical predictors, Var1 through Var7. The levels on these categorical predictors (0 = absent; 1 = present) correspond to whether or not each of the 7 lines (3 horizontal and 4 vertical) on the digital display was illuminated when the numeral was entered on the calculator. The predictor variable to line correspondence is Var1 - top horizontal, Var2 - upper left vertical, Var3 - upper right vertical, Var4 - middle horizontal, Var5 - lower left vertical, Var6 - lower right vertical, and Var7 - bottom horizontal.

The first 10 cases of the data set are shown below. The complete data set containing a total of 500 cases is available in the example data file Digit.sta.

Open this data file by selecting Open Examples from the File menu (classic menus) or by selecting Open Examples from the Open menu on the Home tab (ribbon bar); it is in the Datasets folder.

Specifying the Analysis. Classic menus: from the Data Mining menu, select General Classification/Regression Tree Models; Ribbon bar: on the Statistics tab in the Trees/Partitioning group, click C&RT.

The General Classification and Regression Trees Startup Panel will be displayed.

We will perform a Standard C&RT analysis, which is selected by default, so click the OK button to display the Standard C&RT dialog box.

The dependent variable in this case is categorical in nature, so on the Quick tab, select the Categorical response (categorical dependent variable) check box.

Then click the Variables button to display the standard variable selection dialog. Here, select DIGIT as the Dependent variable and VAR1 through VAR7 as the Categorical predictor variables, and then click the OK button. There is no need to specify the Response codes or Factor codes explicitly in this case since we will be using all of them, so STATISTICA will automatically determine those codes from the data.

Next, click on the Classification tab. We will accept the Equal Misclassification costs and the Gini measure of Goodness of fit defaults. In the Prior probabilities group box, select the Equal option button.

Then, click on the Validation tab. Select the V-fold cross-validation check box, and accept all other defaults.

Click OK to begin the computations. You will see a dialog to indicate the progress of the analyses; in some cases, the v-fold cross-validation analysis can be time consuming (as v repeated analyses are performed). Next, the GC&RT Results dialog box will be displayed.

Generating SAS code. Select the Report tab, and click the Code generator button to display the drop-down list containing different code generation options. Select SAS code.

The Code Generator dialog box will be displayed.

By default, this dialog box will be displayed every time you generate SAS code. If you do not want to see this dialog box again, select the Don’t show this dialog again check box. You can also prevent this dialog box from being shown, or specify to reinstate it, via the Options dialog box. Select the Display tab (located under Analyses/Graphs), and select (to display the dialog box) or clear the When generating SAS code warn about SAS’s variable naming conventions check box.

Click the close button to generate the SAS code.

The users of SAS deployment code are expected to know how to import data and view the work libraries in SAS.  After importing the data in SAS, you can open the generated SAS code in the SAS program editor and run it to obtain predictions.

See also Rules for SAS Variable Names.