Process Missing Data

Ribbon bar. Select the Data tab. In the Transformations group, click Filter/Recode and on the menu, select Process Missing Data to display the Process Missing Data dialog box.

Classic menus. On the Data - Data Filtering/Recoding submenu, select Process Missing Data to display the Process Missing Data dialog box.

Use these options to define missing data for selected variables and assign new (or additional) missing data values.

Input. Use the options in the Input group box to specify the variables and cases to process.

Variables. Click the Variables button to display a standard variable selection dialog box, which is used to select the variables to filter for missing data. The selected variables will be evaluated, and missing data in those variables will be recoded as specified in the Missing Data Parameters group box.

Cases. Click the Cases button to display the Spreadsheet Case Selection Conditions dialog box, which contains options to select only specified observations or cases to evaluate for missing data.

Use Caseweights. Select this check box to use the currently assigned spreadsheet case weights in processing missing data. When this check box is selected, values of the case weight variable specified on the Spreadsheet Case Weights dialog will be used as case multipliers before the missing data processing is applied. If the check box is cleared, the assigned case weight will be disregarded for this analysis. Note that when case weights have not been assigned, this check box will be dimmed.

Missing Data Options (MD). Use the options in this group box to define specific types of missing data.

Non-numeric data as MD. Select this check box to treat non-numeric data in variables that are not of type text as missing data. When selected, any non-numeric data in the selected variables that are not of type text will be treated as missing data and recoded using the selected recode action. This option does not apply to categorical variables, i.e., variables whose type is set to categorical or can be detected as categorical when the type is auto.

White space as MD. When cases in a text variable contain line spaces, tabs, character spaces, and carriage returns, you can choose to treat them as missing data. To do this, select the White space as MD check box. When this is selected, any cases containing white space will be recoded using the selected recode action.

Counts. Click the Counts button to initiate a scan for missing data in the selected variables. Statistica will scan the selected variables for missing data and return a count of the missing data in the Recode Count column of the Missing Data Parameters grid. No recoding will be performed. This feature enables you to quickly determine the number of missing data cells in the input spreadsheet.

Missing Data Parameters. Use the options in this grid to specify how missing data in the individual variables will be recoded.

Variable. This column shows the variables that have been selected for missing data processing. Note that you can select the variables one at a time to specify other options, or you can use the SHIFT or CTRL keys to select blocks of variables to specify the same options for all selected variables. To modify the variable selection, click the Variables button in the Input group box (see above).

Recode Action. In this column, select the action you want to take when missing data (MD) are discovered in the specified variable. There are several recoding actions available; however, not all actions are suitable for all variable types. For example, you cannot recode a categorical variable to its mean.

Ignore MD. Select this option to skip over or ignore the missing data in the selected variable. When this method is selected, the missing data are not processed.

Recode MD to Value. Select this recode action to recode all missing data to a user-specified value. When this action is selected, you must specify a value (either numeric or text) in the Recode Value column.

Recode MD to Mean. Select this recode action to replace all missing data in the variable with the mean of the variable. This action can only be used with numeric variables.

Recode MD to Median Select this recode action to replace all missing data in the variable with the median of the variable. This action can only be used with numeric variables.

Recode MD to Mode. Select this recode action to replace all missing data in the variable with the mode of the variable. This action can only be used with numeric variables.

Flag MD. When you select this recode action, missing data are not replaced. Instead, Statistica will flag the variable (i.e., place an excluded symbol in its variable header) if the percentage of missing data in the variable exceeds the percentage specified in the Flag if MD% column. For example, when you select this action and enter a 50 in the Flag if MD% column, the selected variables will be flagged if they contain more than 50% missing data.

Recode MD to Value and Flag. When this recode action is selected, all missing data in the variable are set the Value specified in the Recode Value column and the variable is flagged if its percentage of missing data exceeds the percentage specified in the Flag if MD% column. When this action is selected, you must specify a value (either numeric or text) in the Recode Value column and in the Flag if MD% column.

Additional MD Values. In this column, enter any additional missing data value you want to use for the selected variable. Although missing data cells are displayed as blank cells, they are in fact assigned a reserved number. By default, StatisticaA uses -999999998 as the missing data value. If you want to treat other values as missing data (e.g., treat n/a as missing data), enter those values here. Enter space delimited values to specify more than one additional MD value. If you want to specify a string that contains spaces as MD, you must place it in quotes (e.g., to treat the string info not available as missing data, you must enter ’r;info not available’ in the Additional MD Value column.

Recode Value. If you have selected Recode MD to Value or Recode MD to Value and Flag in the Recode Action column, specify the value to use for recoding in this column. Note that the Recode MD to Value options and the value specified here will replace the missing data.

Flag if MD%. You can elect to have variables flagged if the percentage of missing data exceeds a specific level (e.g.,10%). Enter the percentage to use in flagging variables in this column. For example, if you enter a 10 and the selected variable has greater than 10% missing data, the variable will be flagged. An option for removing flagged variables from the final spreadsheet is available in the Output group box. In order to flag variables, you must select either Flag MD or Recode MD to Value and Flag in the Recode Action column.

Recode Count. You can scan the input spreadsheet for missing data by clicking the Counts button in the Missing Data Options group box. When this button is clicked, Statistica scans the specified variables in the input spreadsheet and return the number of missing data cells in this column. Note that recode counts are only given when the Counts button has been clicked.

Output. Use the options in this group box to specify whether a new spreadsheet will be created and which variables (from the original data set) should be included in it. Note that when new spreadsheets are created, they will contain all variable properties of the parent spreadsheet, e.g., variable header formats, display formats, measurement types, etc. For more information on variable properties, see the variable specification dialog.

Variables. Click the Variables button to display a variable selection dialog box to select variables in the input spreadsheet that will be included in the output (filtered) spreadsheet.

Create new spreadsheet. When this check box is selected, Statistica creates a new spreadsheet that contains only the filtered data. If this check box is cleared, then missing data actions will be applied to the input spreadsheet.

Remove flagged variables. Select this check box to remove flagged variables from the output spreadsheet. When this check box is selected, Statistica will remove any variables that have been flag based on the Flag if MD% value specified in the Missing Data Parameters grid. By default, this check box is not selected.

Copy formatting. Select the Copy formatting check box to use the spreadsheet formats (e.g., cell formatting, variable header formatting, or even Spreadsheet Layouts) of the input spreadsheet in the output spreadsheet. When this check box is cleared, formatting in the input spreadsheet will not be copied to the new spreadsheet.

OK. Click the OK button to accept the options specified here and recode the missing data based on the input criteria.

Cancel. Click the Cancel button to close this dialog box without recoding missing data in the current spreadsheet.