Data - Unstacking/Stacking

Ribbon bar. Select the Data tab. In the Transformations group, click Stack to display the Unstacking/Stacking dialog box...

Classic menus. From the Data menu, select Unstacking/Stacking to display the Unstacking/Stacking dialog box...

...which contains options to perform "stacking" or "unstacking" of data files to make them suitable for statistical analysis. The general operation or restructuring of the data is similar to a cross-tabulation of values (given certain categorical variables), and this type of operation is also referred to as a cross-tabulation (restructuring) of the data (even though this specific operation is not limited to frequency data, nor is that application the one most commonly encountered).

These operations are typically applied when accessing data from databases, where the individual data values are identified by case and variable IDs, and each unique combination of these IDs are contained in a new row of the data file.

Shown above is a simple example of unstacked and stacked data sets, containing identical information. Note how the information in the unstacked file (to the left) is organized so that each unique row ID identifies a case or observation, each with two values recorded in variables V1 and V2. In the stacked representation of this file (to the right), each row in the spreadsheet represents a measurement for each unique combination of values or codes in the RowID and ColumnID variables.

Now consider the following, more general, example data file:

The Unstacking operation will create a new data spreadsheet so that each case, row, or observation in the new (unstacked) data file will contain a unique combination of values or codes found in the Unit ID variables, and the variables or columns in the new data file will contain variables created from the unique combination of values for codes found in the Measure ID variables; the Measurements (1 through k) can either be aggregated (e.g., represented by the mean value over the measurements), or used to create separate variables (one for each set of unique combinations of values or codes found in the Measure ID variables).

So, for example, the file shown above, after being unstacked could look like this:

If the multiple Measurements in this example were aggregated, for example by computing their means, the resulting unstacked file would look like this (of course, without the explicit equations; i.e., only the means over the 2 measurements in each cell would be shown in this case):

In general, several methods are available to aggregate measures, or to deal with multiple references to unique combinations of Measure IDs or Unit IDs.

An alternative method for Unstacking can be used if your data set has no RowID variable(s). If no RowID variables are specified, measure values are written to the output spreadsheet in the order that ColumnID grouping values are found. For example, if we perform the Unstack operation on the spreadsheet on the left (below), selecting Group as the ColumnID and Value as the measurement variable, and selecting the option to include unselected variables (Age), then the result is the spreadsheet on the right:

Note that the values of the Value variable are written to the output in the order that the ColumnID values are encountered; i.e., the measure value from the first occurrence of Males goes to case 1, the second to case 2, etc. Then the measure value from the first occurrence of Females gets written to case 1, and so on. The value of the Age variable that gets copied over is the one encountered for the first group seen; subsequent values are ignored. That is, the value "20-30" in case 2 of the output came from case 2 in the input, not case 6. Note also that in this example the file is sorted by Group but that is not necessary; any order is possible here and the data file need not be sorted, as it must when using a RowID variable.

Stacking the data file. The example shown above illustrates the Unstacking operation. The Stacking operation accomplishes essentially the reverse restructuring of the data file. This operation can be useful when, for example, stacking the results of a predictive data mining operation into a format suitable for being written back to a database.

Operating on large data files. Note that the stacking and unstacking operations can be performed much faster if the respective input file is already sorted by the data columns that will be used for the case (row, unit) IDs and variable (column, measurement) IDs.

See also the Unstacking/Stacking dialog box topic for additional details.