Cluster Analysis: Joining (Tree Clustering) - Advanced Tab

Joining (Tree Clustering)

Select the Advanced tab of the Cluster Analysis: Joining (Tree Clustering) dialog box to access the options described here.

Variables. Click the Variables button to display the standard variable selection dialog box. Note that STATISTICA interprets the selected variables as dimensions if Cases (rows) is selected in the Cluster box (see below); if Variables (Columns) is selected in the Cluster box, the selected variables will be interpreted as objects.

Input file. The Input file box contains two options: Raw data and Distance matrix.

Raw data. If you select Raw data, STATISTICA expects a standard raw data file as input.

Distance matrix. If you select Distance matrix, the input matrix may either be a correlation matrix or a distance (dissimilarity) matrix with numbers indicating the distances or dissimilarities between objects. STATISTICA will automatically determine the contents of the matrix (i.e., whether it contains correlations or dissimilarities, see Matrix file format). If the input matrix is a correlation matrix (which indicates the similarity and closeness between objects), it is converted to distances before the analysis begins; specifically, all correlations are transformed as 1-Pearson r.

Note that if your Input file consists of correlation coefficients only (e.g., from a published source), and no means, standard deviations, or N is available, you may simply assume standardized data (mean = 0, standard deviation = 1) and an N of, for example, 100 (N must be greater than the number of variables in the analysis). You will first need to add these four cases (means, standard deviation, cases and matrix) to your spreadsheet before you can run the analysis. Of course, in the results, the descriptive statistics for each variable are not meaningful in that case, however, the cluster analysis can be performed based on the correlation coefficients alone.

Cluster. The Cluster box contains two options: Variables (columns) and Cases (rows). The option you select determines how STATISTICA interprets the selected Variables. Note that the Cluster box is only available if Raw data is selected as the Input file.

Variables (columns). If Variables (Columns) is selected, STATISTICA interprets the selected Variables (see above) as objects.

Cases (rows). If Cases (rows) is selected, STATISTICA interprets the selected Variables as dimensions.

Amalgamation (linkage) rule. There are seven different amalgamation rules available in the Amalgamation (linkage) rule box: Single Linkage, Complete Linkage, Unweighted pair-group average, Weighted pair-group average, Unweighted pair-group centroid, Weighted pair-group centroid (median), and Ward's method. The default rule is Single Linkage (also called the "method of the nearest neighbors").

One of the main parameters that guides the joining (tree-clustering) process is the linkage rule, that is, the rule that determines when two clusters are to be joined (linked or amalgamated). For a detailed description of amalgamation rules, see Joining (Tree Clustering) Introductory Overview - Amalgamation or Linkage Rules.

Distance measure. There are seven different distance measures that can be computed from Raw data: Squared Euclidean distances, Euclidean distances, City-block (Manhattan) distances, Chebychev distance metric, Power: SUM(ABS(x-y)p)1/r, Percent disagreement, and 1-Pearson r.

The joining algorithm starts by first computing a matrix of distances between the objects that are to be clustered.  For a detailed description of these distances, refer to Joining (Tree Clustering) Introductory Overview - Distance Measures.

If Distance matrix is selected as the Input file, then Dissimilarities from matrix is automatically selected in the Distance measure box. If the input matrix is a correlation matrix, then the correlations (which denote the degree of similarity) will be transformed to dissimilarities (1 - r).

Power distance parameters. If the Power distances option is selected in the Distance measure box, specify the two parameters p and r for the power distance in these boxes.

Batch processing and reporting. If you select the Batch processing and reporting check box, STATISTICA automatically performs the analysis (after you click the OK button) and sends the entire output from the analysis to a workbook, individual windows, and/or to a report (depending to the options selected in the Analysis/Graph Output Manager).