SANN Example 4: Time Series (Classification)

Organizing (or segmenting) time series data into distinct classes is an important detail in many fields (e.g., robotics, weather forecasting, quality control, etc.). In this example, we will explore some of the options that are available in STATISTICA Automated Neural Networks (SANN) for classification analysis of time series data.

Note: The results shown in this example may be slightly different from your analysis because the neural network algorithms use random number generators for fixing the initial value of the weights (starting points) of the neural networks, which often result in obtaining slightly different (local minima) solutions each time you run the analysis. Also note that changing the seed for the random number generator used to create the train, test, and validation samples can change your results.

Data. IrisSNN.sta is a common data set used for classification purposes. The object is to assign 150 irises to one of three categories based on four measurements: sepal length and width, and petal length and width. The particular arrangement of this data set lends itself to time series classification data in that the frequency of switching between categories is low, and because of this, sequential values are likely to be from the same category. Part of the data set is shown below.

Specifying the analysis. To begin the example, open IrisSNN.sta and start SANN:

Ribbon bar. Select the Home tab. In the File group, click the Open arrow and select Open Examples to display the Open a STATISTICA Data File dialog box. Open the data file, which is located in the Datasets folder. Then, select the Statistics tab. In the Advanced/Multivariate group, click Neural Nets to display the SANN - New Analysis/Deployment Startup Panel. Or, select the Data Mining tab. In the Learning group, click Neural Networks to display the SANN - New Analysis/Deployment Startup Panel.

Classic menus. From the File menu, select Open Examples. In the Open a STATISTICA Data File dialog, double-click the Datasets folder, and then double-click on IrisSNN.sta. Then, from the Statistics menu or the Data Mining menu, select Automated Neural Networks to display the SANN - New Analysis/Deployment Startup Panel.

In the Startup Panel, you can select from five analysis types, or you can deploy previously saved networks. For this example, select Time series (classification) in the New analysis list.

 Click the OK button to display the SANN - Data selection dialog box.

As with SANN time series regression analysis (used in SANN Example 3: Growth in Number of Airline Passengers over Time), use the options in the SANN - Data Selection dialog box to make variable selections and specify a method to use in splitting the data into training, testing, and validation subsets.

Also, there are three types of strategies to choose from: the Automated network search (ANS), Custom neural networks (CNN) and Subsampling. For more details on how to use ANS for time series analyses, see Example 3: Growth in Number of Airline Passengers over Time.

On the Quick tab of the SANN - Data selection dialog box, click the Variables button. In SANN, the types of variables you can select (e.g., categorical or continuous targets, categorical or continuous inputs) are pre-determined by your analysis selection. Because classification analysis presupposes a single categorical target with either continuous or categorical inputs, the variable selection dialog box will not provide an option for continuous targets.

For this example, first select the Show appropriate variables only check box. Then, select Flower as the Categorical target, and select Slength, Swidth, Plength, and Pwidth as Continuous inputs.

Click the OK button to return to the SANN - Data Selection dialog box.

As you can see in the Analysis variables (present in the dataset) group box, SANN reports that there are not any continuous targets.

Below this, in the Strategy for creating predictive models group box, select the Custom neural networks (CNN) option button.

Next, select the Sampling (CNN and ANS) tab. As mentioned in Example 3, you are able to divide the data into two or three subsets (train, test, and validation) without disrupting the sequential nature of the data (see the Note on sampling in Example 3). For this example, we will use a sampling variable to divide our data into two subsets: train and test.

Select the Sampling variable option button. The three buttons Training sample, Testing sample, and Validation sample will be activated, enabling you to specify one or more of these samples.

Click the Training sample button to display the Sampling variable dialog box.

Click the Sample Identifier Variable button to display the Cross-Validation Specifications dialog box. For this example, select NNSET, and click the OK button.

In the Sampling variable dialog box, the Code for training sample field now defaults to Select. Change this field to Train (by double-clicking in the field to display a variable selection dialog box, selecting Train, and clicking OK).

In the Sampling variable dialog box, select the On option button in the Status group box. The dialog box should look as shown below.

Click the OK button, and repeat this process to specify the variable (NNSET) and code (Select) for the Testing sample. Once you have specified both subsets, the Data Selection dialog box - Sampling (CNN and ANS) tab should look as shown below.

Next, select the Time Series tab, which contains options to set the values of the most important time series parameters, i.e., Number of time steps used as inputs and Number of steps ahead to predict. For this example, we will leave these parameters at default, but depending on your future analyses, you may need to change these quantities so they reflect the time periodicity in your time series data. This is an additional decision that you need to make for time series problems – determining the number of time series steps to use as input to the network. For some problems, determining the correct number of input steps can require a certain amount of trial-and-error. However, if the problem contains a natural cycle period (such as the Series_G data set, which contains monthly sales figures, so there is a definite length - 12 - period in the data, see Example 3), you should try specifying that cycle, or an integral multiple of it.

In this case, we will use the default values. By leaving the Number of steps used to predict set to 1, we will be using a lag of 1 for making time series predictions

Next, click the OK button in the SANN - Data Selection dialog box to display the SANN - Custom Neural Network dialog box.

Training the networks. The Custom Neural Network (CNN) tool enables you to choose individual network architectures and training algorithms to exact specifications. You can use CNN to train multiple neural network models with exactly the same design specifications but with different random initialization of weights. As a result, each network will find one of the possible solutions posed by neural networks of the same architecture and configurations. In other words, each resulting network will provide you with a suboptimal solution (i.e., a local minimum).

For this example, leave the number of Networks to train (on the Quick tab) at default. You can also experiment with the No. of neurons in the hidden layer. Usually there are no general guidelines to set this quantity, but if you want to make this process (i.e., searching for the optimal value of neurons in the hidden layer) automated, then we suggest you use the Automated Network Search (ANS).

Before we train our network, let’s review some of the options available on the tabs of this dialog box. Look at the Quick tab.

Network type. SANN enables you to train either a multilayer perceptron (MLP) or a radial basis function (RBF) network. The multilayer perceptron is the most common form of network. It requires iterative training, the networks are quite compact and execute quickly once trained, and in most problems yield better results than the other types of networks including RBFs. In contrast, the radial basis function networks tend to be slower and larger than multilayer perceptron, and often have worse performance. They are also usually less effective than multilayer perceptrons if you have a large number of input variables (they are more sensitive to the inclusion of unnecessary inputs).

Error function. For classification analysis types, SANN enables you to select either Sum of squares error function or Cross entropy. By default, the cross entropy error function is used with Softmax as the output activation function. Such networks support a probabilistic interpretation of the output confidence levels generated by the network. They are true probability of input class memberships given the categorical levels of the target data.

Activation functions. SANN also provides options for specifying the activation functions for the input-hidden (hidden layer) and hidden-output (output layer) neurons of multilayer perceptron networks. You can always specify activation functions for the hidden units; however, you can only specify activation functions for the output units when the sum of squares error function is used. When the cross entropy function is used, Softmax is always used as the activation functions for output units.

For our example, we will use the defaults as shown above. Next, select the MLP tab.

On this tab, you can specify the training algorithm [Gradient descent, BFGS (Broyden-Fletcher-Goldfarb-Shanno), or Conjugate gradient] to use in training the multilayer perceptron (MLP) network, the Network randomization technique (you can initialize the network with values drawn from a Normal distribution or values from a Uniform distribution), and any Stopping conditions that you want to use.

Again, we will use the defaults as shown above. Next, select the Weight Decay tab.

You can use the options on this tab to specify the use of Weight decay regularization for the hidden layer, the output layer, or both. This option encourages the development of smaller weights, which tends to reduce the problem of over-fitting, thereby potentially improving generalization performance of the network.

Weight decay works by modifying the network's error function to penalize large weights; the result is an error function that compromises between performance and weight size. Consequently, too large a weight decay term may damage network performance unacceptably, and experimentation is generally needed to determine an appropriate weight decay factor for a particular problem domain. For radial basis function (RBF) networks, you can only specify the use of weight decay for the hidden-output layer (output weight decay). For this example, we will not use weight decay.

Reviewing the results. Now that we have reviewed the various options in the SANN Custom Neural Network tool, click the Train button to train the neural networks. After training, the SANN - Results dialog box is displayed.

This dialog box contains six tabs and provides options for generating predictions and other results (e.g., confidences and accuracy), for performing graphical analysis of the data, for reviewing network statistics, for creating lift charts, for generating time series predictions, and for making custom predictions.

Reviewing the results. Perhaps the first result you should review is the list of retained networks, as well as their performance and specifications, which is displayed in the Active network networks data grid displayed at the top of the Results dialog box. This information (also available in the Summary spreadsheet) enables you to quickly access the overall quality of your networks as well as their specific architectures and types, such as number of inputs, number of hidden units, activation types, etc.

For example, the network illustrated above in the Active neural networks grid and below in the Summary spreadsheet is type MLP with 4 inputs, 4 neurons in the hidden units, and 3 outputs (corresponding to the 3 categories of the target variable FLOWER), with a train and test classification rate at 96.5% and 97.10145%, respectively. The algorithm BFGS was used to train the network and the best solution was found at training cycle number 12. The network has a Tanh activation for the hidden units and Softmax activation function for the outputs (and hence the use of the Entropy error function).

Just because the classification rate of a network is high, it does not mean the solution found by the network is a good one. For example, a network trained on a data set with an unbalanced target variable consisting of 90% of category, say, A, and 10% of category B is likely to predict most of the 90% of the patterns belonging to A while unable to correctly predict cases belonging to B. Such a network can yield a high classification rate but also an output that is almost flat.

Given the above, you need to examine the performance of your networks with more depth rather than just considering the performance rates. One way to do this is to examine the confusion matrix and the percentage of correctly classified cases per category. You can print this information in a spreadsheet by clicking the Confusion matrix button on the Details tab of the Results dialog. The confusion matrix and classification summary are useful tools in evaluating the effectiveness of a classification network. To review these two spreadsheets, click the Confusion matrix button. Both spreadsheets are generated.

The confusion matrix shows both the number of correctly classified cases per category as well as the number of incorrectly cases per category. For example, from the above spreadsheet we note that the network has correctly classified all 27 cases of category Setosa in the train set. Also 25 cases out of a total of 26 belonging to Versicol are correctly classified, leaving us with the remaining 1 misclassified case, in favor of Setosa. Similar analysis can be carried out for the Virginic class, which has 26 cases out of a total of 27 correctly classified, with 1 misclassified case.

The above information is further summarized (see illustration below) in the classification summary spreadsheet, where we can see the percentage of classified cases for Setosa, Versicol and Virginic. Note that the network is consistent in its classification rate, i.e., it classifies all categories with similar classification rates.

Note that the above analysis was carried out on the classification rate for the train data. You should also carry out the same analysis for the test (and validation) samples if selected. You can do this by selecting the appropriate check box in the Sample group box. Select all sample options to include train, test, and validation sample in the analysis.

Equally important for a classification analysis is the study of lift charts and the ROC (Receiver Operator Characteristic) curves. For more details, see the step-by-step example The Iris Problem.

Finally, after completing your analysis and approving the network you have trained, the last step is to save the network so you can use it later. This process, i.e., using a network for predicting future data, is known as deployment. With STATISTICA, you can deploy SANN PMML models using either the Rapid Deployment module or SANN itself, but first you must save your networks.