Independent Component Analysis Example

To demonstrate the use of STATISTICA Independent Component Analysis, we will consider a step-by-step example involving a simple synthetic data set containing two sets of variables (variables 3 and 4) formed by mixing two sets of independent and non-Gaussian signals (variables 1 and 2). We first create and save an ICA model using the Fast ICA algorithm, and then save the model in PMML format, and re-load the file for deployment.

NOTE: In real-life applications, the original signals (variables 1 and 2) are unknown since they are not directly measurable quantities. These variables are included in the data set, however, only for demonstration purposes.

Creating a New ICA Model (New Analysis)

To begin, open the file SinMix.sta via the File - Open Examples menu; it is in the Datasets folder.

Then, select Independent Components Analysis from the Data Mining menu to display the Fast Independent Component Analysis Startup Panel.

Click the Variables button to display a standard one variable selection dialog box. Select one or more dependent variables. The more variables you select, the more independent components you can extract, assuming that they exist and provided that they are imbedded in the mixed signals (selected variables). However, you cannot extract more principal components than the number of selected variables. Thus, in this case, you must set the Number of components option to no larger than 2. Other settings you may want to change, if and when necessary, are the value of the Alpha parameter of the log-cosh function, used for measuring nongaussianity of the signals, and Tolerance (convergence). The value of the Alpha parameter is restricted to .

Further settings of the ICA model can be accessed on the Options tab. Here you can select the method for implementing the Fast ICA algorithm, either Parallel or Deflation. Other settings include the choice of the function used in the measure of nongaussianity (Log-cosh negative entropy or Exponential negative entropy). One option that is recommended to select is the Normalize variables check box. This will specify a form of preprocessing that ensures that all variables are treated on an equal scale and, thus, none of them can bias the analysis merely because of scale or magnitude.

After you have completed specifying the options for the ICA model, click the OK button to proceed with building the model. This will initiate the Fast ICA algorithm and then display the Fast Independent Component Analysis Results dialog, where you can conclude the analysis.

The Summary box at the top of the Results dialog displays the settings and specifications of the ICA model you just created. You can print the displayed information to a spreadsheet by clicking the Summary button.

One of the most important output spreadsheets provided by the ICA model is the estimate of the principal components. You can access this information by clicking the Components button to create the spreadsheet, which displays the estimated value of the principal components for each valid case in the data set.

Alternatively, you may want to view these components in various graph formats. To this end, select (highlight) Component 1 and Component 2 displayed in the Summaries for components list. Click the Line plots button to display the graph of each component against case number in one graph window.

To further analyze the principal components, you may want to print their descriptive statistics (mean, variance, minimum, and maximum) by clicking the Descriptives button.

Advanced users may also produce further results using the options on the Advanced tab. These options and their definitions are further discussed in the STATISTICA Electronic Help and the Technical Notes topic.

The ICA model displayed in the Results dialog is not permanent. This means that clicking the Cancel button will result in the loss of the model you have created. To create a permanent record of your results, use the Code generator option to save the current model in a programming language of your choice. These include C/C++, Visual Basic and PMML. The C/C++ format is particularly suitable for deploying models outside the STATISTICA environment. If you want to deploy your ICA models via the STATISTICA Independent Component Analysis, you must use the PMML language. You can access this option from the Code generator drop down menu; select PMML script to display a standard STATISTICA Report containing the deployment code in PMML language. Copy and paste the entire content of this file into a notepad, and then save it under a name and folder of your choice for later use.

ICA for Deployment

You can use your saved ICA models for deployment whenever needed. To do this, you first need to have a data set loaded in the STATISTICA environment. This data set need not be the same as the one you used to create the model, i.e., training data. In fact, this data set is often different from the training data.

Select the Deployment of existing models check box on the Deployment tab of the Startup Panel. Now, you have the choice of selecting your analysis variables manually by clicking the Variables button, or via the Variable selection via PMML option.

Click the Load models button to display a file selection dialog. Locate and select a PMML file you have saved. Then, run the deployed model. As before, clicking the Components button will create a spreadsheet displaying the principal components as estimated by the deployed ICA model given the new (deployment) data set. The Line plots, Scatterplots, and Histogram options will display the same information in the respective graph formats.