k-Means Clustering Results - Quick Tab

k-Means Clustering

Select the Quick tab of the k-Means Clustering Results dialog box to access the options described here.

Summary: Cluster means & Euclidean distances. Click the Summary: Cluster means & Euclidean distances button to create two spreadsheets:

A spreadsheet with the means for each cluster for each dimension;

A spreadsheet with the Euclidean distances (below the diagonal) and squared Euclidean distances (above the diagonal) between "cluster centers."

Specifically, this matrix shows the Euclidean distances between clusters, computed from the respective cluster means on the dimensions used for the classification. The distance between two objects or cluster centers i and j are computed as:

Di,j = Ö{S[(xi - xj )2 /ND]}

where the summation is over the ND dimensions in the current analysis.

Analysis of variance. The goal of the k-means clustering procedure is to classify objects (cases or variables, depending on the selection made in the Cluster box in the Cluster Analysis: K-Means Clustering dialog box) into a user-specified number of clusters. To evaluate the appropriateness of the classification, you can compare the within-cluster variability (small if the classification is good) to the between-cluster variability (large if the classification is good). In other words, you can perform a standard between-groups analysis of variance for each dimension (case or variable).

Click the Analysis of variance button to create a standard spreadsheet with those ANOVAs. Note that although the F ratios and p values are given in the table, statistical significances should be interpreted with caution since their meanings are not the same as in an actual ANOVA of experimental data (see Cluster Analysis Overviews). In short, these are not a priori tests, and we capitalize on chance by arranging the most statistically significant ANOVAs possible (see Hartigan, 1975, for a more detailed discussion of this point).

Graph of means. Click the Graph of means button to create a line graph of the means across clusters. This plot is very useful for visually summarizing the differences in means between clusters.