SANN - Data Selection

Click the OK button in the SANN - Analysis/Deployment Startup Panel to display the SANN - Data selection dialog box, which can contain up to four tabs: Quick, Sampling, Subsampling, and Time series. The latter tab is available for time series analysis. The options described here are available regardless of which tab is selected.

OK. Click the OK button to display the dialog box for the strategy selected on the Quick tab (either the SANN - Automated Network Search (ANS) dialog box or the SANN - Custom Neural Network dialog box). Note that if you have not already specified Variables, a standard variable selection dialog box will be displayed first.

Cancel. Click the Cancel button to close the dialog box and return to the SANN - New Analysis/Deployment Startup Panel.

Options. See Options Menu for descriptions of the commands on this menu.

MD handling (inputs). Specify how to treat cases with missing values (in the input variables of the selected models). This group box is always disabled for Time series analysis. There are two options:

Casewise. Any cases with missing values are omitted when generating results. Cases with missing target values are labeled as “Missing” and used to form the “Missing” sample. The missing sample consists of data cases with one or more missing target values. This option is not available for Time series tasks. Casewise is the only method available for missing data handling of categorical variables.

Mean substitution. The mean substitution procedure is used to "patch" missing values before training or executing the network. When this option is selected, missing values are replaced with the training sample mean. Note that this option is applicable only to continuous variables. This implies that the mean substitution option will be disabled when there are no continuous inputs in the analysis, and that for classification tasks the option cannot be applied to the target variable, in which case all cases with missing targets will be labeled as "missing," which means a case with missing target value. Such cases are grouped in SANN as the missing sample and can be used for fixing the basis functions of the RBF neural networks and for making predictions. Also note that the mean substitution is not applicable to time series analysis (whether regression or classification).

Note: The mean substitution option will always compute the simple arithmetic mean, to replace missing data, even when weights are in effect. Weights in SANN are used ("interpreted") as measures of case "importance", i.e., they will affect the estimation of neural network parameters themselves. If the intention of weights is to compute a weighted mean (e.g., a population average computed using weights) to replace missing data in the input file, use option Data - Data Filtering/Recoding - Replace Missing Data replace missing data values with weighted means.

Case selection. Click the Case selection button to display the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the case selection conditions overview, syntax summary, and dialog box description.

Case weights. Click the Case weights button to display the Analysis/Graph Case Weights dialog box, which is used to adjust the contribution of individual cases to the outcome of the current analysis by "weighting" those cases in proportion to the values of a selected variable. In STATISTICA SANN, case weights are used to encourage a network to emphasis on or ignore learning specific cases or even regions from the data set. All data cases by default have case weights equal to 1. If a data case is assigned a case weight less than 1, for example 0.5, then the error due to mis-fitting that data case is half. This means the network will emphasis less on learning this particular data case since there is less penalty for error in predictions. Similarly, a neural network will fine tune better to predicting a data case with weight, say, equal to 2, since in this case the error due to predictions is twice as much.

Note: Weights in SANN are used and interpreted as measures of case importance, i.e., they will affect the estimation of neural network parameters themselves, but not more. For example, case weights are not used in mean substitution of missing data or calculations of data statistics such as mean and standard deviation of the variables. If you assign weights to cases in the data set, the neural network algorithm will try to predict cases with higher weights with more accuracy. This is useful in a number of situations such as imbalanced data or data sets with cases that are more important to accurately predict. Data cases with zero weights will be excluded from the train, test, and validation samples (i.e., they will be ignored from the analysis). Cases weights can be integers or fractional numbers.