SANN - Subsampling Dialog Box and Quick Tab

Select the Subsampling option button on the Quick tab of the SANN - Data selection dialog box and click OK to display the SANN - Subsampling dialog box.

In many ways the Subsampling dialog box is similar to the Custom Neural Network (CNN) dialog box. As with CNN, users can specify options in the Subsampling dialog box to create individual neural networks with full specifications, such as size and architecture, as determined by the user. The difference here is, while all networks created via CNN in one analysis use the same train, test, and validation samples, the Subsampling dialog box enables us to create multiple neural networks on different samples.

Depending on the analysis type selected in the Startup Panel (and in some cases, the selected network type), this dialog box can contain up to five tabs simultaneously. The available tabs are: Quick (options described below), MLP, RBF, Weight Decay, Initialization, and Real Time Training Graph. Use the options on these tabs to configure the neural networks. Note that you can also access the Subsampling dialog box from the SANN Results dialog boxes.

Active neural networks. The grid in the Active neural networks group box provides a quick view of the networks you have created for modeling your data. If you have not trained any networks or if you have not selected any active networks, this grid will be empty.

Train. Click the Train button to build (train) networks according to the specifications made on the tabs of this dialog. While the networks are being trained, the Neural networks training in progress dialog box will be displayed. This dialog box provides summary details of networks as they are created. When the requested number of networks has been trained, the SANN - Results dialog box is displayed.

Go to results. Click this button to display the Results dialog box without performing additional training. Note that if the Active neural networks grid is empty, this button will not be available.

Save networks. Click this button to display a drop-down list containing the following commands:

PMML. Select PMML to display the Save PMML file dialog box, which contains options to store the active networks for future use. Note that this dialog will be displayed only when the Active neural networks grid contains networks. Stored PMML networks can be opened by clicking the Load network files button in the SANN - New Analysis/Deployment Startup Panel.

C/C++ language. Select C/C++ language to display the Save C file dialog box, which contains options to store the active networks for future use.

C#. Select this command to generate code as C#.

Java. Click this command to generate code in Java script.

SAS. Select SAS to display the Save SAS file dialog box, which contains options to save deployment code for the created model as SAS code (a .sas file). See also, Rules for SAS Variable Names.

SQL stored procedure in C#. Click this command to generate code as a C# class intended for use in a SQL Server user defined function.

SQL User Defined Function in C#. Select this command to generate code as a C# class intended for use as a SQL Server user defined function.

Teradata. Select this command to generate code as C Computer language function intended for use as a user defined function in a TeraData querying environment.

Deployment to STATISTICA Enterprise. Select this command to deploy the results as an Analysis Configuration in STATISTICA Enterprise. Note that appropriately formatted data must be available in a STATISTICA Enterprise Data Configuration before the results can be deployed to an Analysis Configuration.

Data statistics. Click the Data statistics button to generate a spreadsheet containing the mean, standard deviation, minimum value and maximum value for each continuous variable in the analysis. These data statistics will be broken down by each sample (training, testing, and validation) and also reported for the overall data set. Since Subsampling assigns different train, test, and validation samples to each network, this option generates as many spreadsheets as the number of active network.

Summary. Click the Summary button to generate a spreadsheet containing the summary details listed in the Active neural networks grid. Note that if the Active neural networks grid is empty, this button will not be available.

Cancel. Click the Cancel button to exit the SANN - Subsampling dialog box and return to the SANN - Data selection dialog box. Any selections made will be disregarded, and you will be prompted to discard any networks in the Active neural networks grid.

Options. Click the Options button to display the Options menu.

Quick Tab

Select the Quick tab of the SANN - Subsampling dialog box to access the options described here.

Network type. Use these options in this group box to specify the type of network (multilayer perceptron or radial basis function).

Multilayer perceptron (MLP). Select the Multilayer perceptron (MLP) option button to generate multilayer perceptron networks. The multilayer perceptron is the most common form of network. It requires iterative training and the networks are quite compact, execute quickly once trained, and in most problems yield better results than the other types of networks.

Radial basis function (RBF). Select the Radial basis function (RBF) option button to generate radial basis function networks. Radial basis function networks tend to be slower and larger than multilayer perceptron and often have inferior performance, but they can be trained faster than MLP for large data sets and linear output activation functions.

Error function. Specify the error function to be used in training a network.

Sum of squares. Select the Sum of squares option button to generate networks using the sum of squares error function. Note that this is the only error function available for regression type analyses.

Cross entropy. Select the Cross entropy check box to generate networks using cross entropy error functions. This error function assumes that the data is drawn from the exponential family of distributions (see Bishop 1995 for more details) and supports a direct probabilistic interpretation of the network outputs. Note this error function is only available for classification problems; it will be disabled for regression type analyses. When the Cross entropy error function is selected, the Output neurons (in the Activation functions group box) will always be set to Softmax.

Activation functions. Use the options in this group box to select activation functions for the hidden and output neurons. The choice of the activation function, i.e., the precise mathematical function, is crucial in building a neural network model since it is directly related to the performance of the model. Generally, it is recommended that you choose the Tanh and Identity functions for the hidden and output neurons for multilayer perceptron networks (default settings) when the Sum of squares error function is used. For Radial basis function networks, the Hidden units are automatically set to Gaussian; and the Output units are set to either Identity (when Sum of squares error function is used) or Softmax (when Cross entropy error function is used).

Hidden units. Use the Hidden units drop-down list to select the activation function for the hidden layer neurons. For Multilayer perceptron networks, these include the Identity function, hyperbolic Tanh (recommended), Logistic sigmoid, Exponential, and Sine activation functions. For Radial basis function networks, a Gaussian activation function is always used for hidden neurons.

Identity. Uses the identity function. With this function, the activation level is passed on directly as the output.

Tanh. Uses the hyperbolic tangent function (recommended). The hyperbolic tangent function (tanh) is a symmetric S-shaped (sigmoid) function, whose output lies in the range (-1, +1). Often performs better than the logistic sigmoid function because of its symmetry.

Logistic. Uses the logistic sigmoid function. This is an S-shaped (sigmoid) curve, with output in the range (0, 1).

Exponential. Uses the exponential activation function.

Sine. Uses the standard sine activation function.

Gaussian. Uses a Gaussian (or Normal) distribution. This is the only choice available for RBF neural networks.

Output units. Use the Output units drop-down list to select the activation functions for the hidden-output neurons. For Multilayer perceptron networks, these include the Identity function (recommended), hyperbolic Tanh, Logistic sigmoid, Exponential, Sine, and Softmax activation functions. For Radial basis function networks, the choice of Output units is dependent on the selected Error function. For RBF networks with Sum of squares error function, an Identity activation function is used. For RBF networks with Cross entropy error function, the Softmax activation function is always used.

Identity. Uses the identity function (recommended). With this function, the activation level is passed on directly as the output.

Tanh. Uses the hyperbolic tangent function. The hyperbolic tangent function (tanh) is a symmetric S-shaped (sigmoid) function, whose output lies in the range (-1, +1). Often performs better than the logistic sigmoid function because of its symmetry.

Logistic. Uses the logistic sigmoid function. This is an S-shaped (sigmoid) curve, with output in the range (0, 1).

Exponential. Uses the negative exponential activation function.

Sine. Uses the standard sine activation function.

Softmax. Uses a specialized activation function for one-of-N encoded classification networks. It performs a normalized exponential (i.e., the outputs add up to 1). In combination with the Cross entropy error function, it allows Multilayer perceptron networks to be modified for class probability estimation (Bishop, 1995; Bridle, 1990).

No. of neurons. Specify the number of neurons in the hidden layer of the network. The more neurons the hidden layer contains, the more complex (flexible) it becomes.