Filter Duplicate Cases

Ribbon bar. Select the Data tab. In the Transformations group, click Filter/Recode and on the menu, select Filter Duplicate Cases to display the Filter Duplicate Cases dialog box.

Classic menus. On the Data - Data Filtering/Recoding submenu, select Filter Duplicate Cases to display the Filter Duplicate Cases dialog box.

Using these options, you can select the variables that specify the basis of distinction (i.e., variables in the spreadsheet that are selected for differentiating two cases) and determine the format for the resulting (de-duped) spreadsheet. Note that any number of combinations of variables can be used to specify a duplicate.

Input. Use the options in the Input group box to specify information for the input spreadsheet.

Variables. Click the Variables button to display a standard variable selection dialog, which you can use to select the variables to use as the basis of distinction. Cases in the spreadsheet will be evaluated, and if identical responses are given for the variables selected here, the cases will be treated as duplicates.

Cases. Click the Cases button to display the Spreadsheet Case Selection Conditions dialog box, which contains options to select only specified observations or cases for the de-duping operations.

Use casenames. Select the Use casenames check box if you want to use case names as one of the comparison criteria. When this check box is selected, Statistica will treat as duplicates any cases that have the same case name (provided the cases match on any other specified variables as well). When the check box is cleared, duplicate case names will be ignored. Note, to compare solely on the casenames, click the Variables button, clear the Select variables to be included in the subset edit field and click OK, and then select the Use casenames check box

Data is sorted. Select the Data is sorted check box if the input spreadsheet is sorted on the variable(s) you have selected for the comparison criteria. When this check box is selected, Statistica expects the input spreadsheet to be sorted. If the spreadsheet is not sorted, then an error message will be displayed prompting you to clear the check box or sort the data. Note that when the input data are not sorted, Statistica performs an internal sort of the data before beginning the de-duping process. Depending on the size of your data, this can add additional time to the de-duping process. Note also that the actual input data set will not be affected by this internal sort; however, any output spreadsheets that are created will be in sorted order unless the Preserve order check box is selected in the Output group box (see below).

Output. Use the options in this group box to specify what types of new spreadsheets (if any) will be created during the de-duping (duplicate case removal) process. Note that when new spreadsheets are created, they will contain all variable properties of the parent spreadsheet, e.g., variable header formats, display formats, measurement types, etc. For more information on variable properties, see the variable specification dialog.

Variables. Click the Variables button to display a standard variable selection dialog which you can use to select variables in the input spreadsheet that will be included in the output (filtered) spreadsheet.

Create new spreadsheet. When this check box is selected, Statistica creates a new spreadsheet that contains only the unique cases. If this check box is cleared, then duplicate cases will be removed from the input spreadsheet.

Create duplicates spreadsheet. When this check box is selected, Statistica generates a new spreadsheet that contains the duplicate cases.

Preserve order. Select the Preserve order check box to maintain the case order from the input spreadsheet in the newly created spreadsheet. If this check box is cleared, then the new spreadsheets will be sorted by the variables that were selected as the basis of distinction (e.g., if Age and Gender were selected in the Input group box, the resulting spreadsheets will be sorted by Age and Gender).

Copy formatting. Select the Copy formatting check box to use the spreadsheet formats (e.g., cell formatting, variable header formatting, or even Spreadsheet Layouts) of the input spreadsheet in the output spreadsheet. When this check box is cleared, formatting in the input spreadsheet will not be copied to the new spreadsheet.

OK. Click OK to accept the options specified here and remove duplicate cases from the current spreadsheet.

Cancel. Click Cancel to close this dialog box without removing duplicate cases from the current spreadsheet.