Filter Sparse Data

Ribbon bar. Select the Data tab. In the Transformations group, click Filter/Recode and on the menu, select Filter Sparse Data to display the Filter Sparse Data dialog box.

Classic menus. On the Data - Data Filtering/Recoding submenu, select Filter Sparse Data to display the Filter Sparse Data dialog box.

Use these options to specify the criteria for determining sparse cases and/or variables.

Input. Use the options in the Input group box to specify the criteria for determining sparse data. Statistica uses the information you give here to determine which variables and/or cases should be removed from the data.

Variables. Click the Variables button to display a variable selection dialog box used to select the variables to filter for sparse data. The selected variables will be evaluated, and if the maximum percentage for missing data is exceeded, the appropriate values will be removed.

Cases. Click the Cases button to display the Spreadsheet Case Selection Conditions dialog box, which contains options to select only specified observations or cases for the sparse data filtering operation.

Max percent MD in Variables. In this box, specify the maximum percentage of missing data that is acceptable for a given variable. When a variable's percent of missing data exceeds the number here, the variable will be removed from the filtered spreadsheet. For example, if you enter 10, then any variable (column) that is missing more than 10% of its observations (i.e., has empty cells for more than 10% of its cases) will be removed from the spreadsheet. Consider a data set with 10 variables (V1-V10) and 100 cases. If for variable V4, cases 73 - 89 are missing, then the missing data percentage for V4 is 17%.

Max percent MD in Cases. In this box, specify the maximum percentage of missing data that is acceptable for any given case. When a case's percent of missing data exceeds the number here, the case will be removed from the filtered spreadsheet. If you enter 10, then any case (row) that is missing more than 10% of its observations (i.e., has empty cells for more than 10% of the variables on the row) will be removed from the spreadsheet. Consider a data set with 10 variables (V1-V10). For the 23rd case, variables V1, V3, and V5 are missing data. Then the missing data percentage for case 23 is 30%.

Output. Use the options in this group box to specify whether a new spreadsheet will be created and which variables (from the original data set) should be included in it. Note that when new spreadsheets are created, they will contain all variable properties of the parent spreadsheet, e.g., variable header formats, display formats, measurement types, etc. For more information on variable properties, see the variable specification dialog box.

Variables. Click the Variables button to display a standard variable selection dialog which you can use to select variables in the input spreadsheet that will be included in the output (filtered) spreadsheet.

Create new spreadsheet. When this check box is selected, Statistica creates a new spreadsheet that contains only the filtered data. If this check box is cleared, sparse variables and cases are removed from the input spreadsheet.

Copy formatting. Select the Copy formatting check box to use the spreadsheet formats (e.g., cell formatting, variable header formatting, or even Spreadsheet Layouts) of the input spreadsheet in the output spreadsheet. When this check box is cleared, formatting in the input spreadsheet will not be copied to the new spreadsheet.

OK. Click OK to accept the options specified here and remove sparse data from the current spreadsheet.

Cancel. Click Cancel to close this dialog box without removing sparse data from the current spreadsheet.