MD Imputation

Ribbon bar. Select the Data tab. In the Transformations group, click Filter/Recode and select MD Imputation from the menu to display the MD Imputations dialog box.

Classic menus. From the Data - Data Filtering/Recoding submenu, select MD Imputation to display the MD Imputations dialog box.

Use the options in this dialog box to specify the parameters for the k-nearest neighbor algorithm and then perform missing data replacement using that algorithm. For more information on this algorithm, see K-Nearest Neighbors Introductory Overview.

Input/Output. Use the options in the Input/Output group box to specify the criteria for the k-nearest neighbor algorithm. Statistica uses the information you give here to determine what values to input to the missing data. Note that all nearest neighbor calculations will be based on standardized Euclidean distances.

Variables. Click the Variables button to display a variable selection dialog box. Select one or more continuous or categorical target variables, and a list of input variables (which can be continuous, categorical, or a mixture of both). Note that the variables you select here will be used in the algorithm and included in the resulting (output) spreadsheet.

Cases. Click the Cases button to display the Spreadsheet Case Selection Conditions dialog box, which contains options to select only specified observations or cases for the data filtering operation.

Use Caseweights. Select this check box to use the currently assigned spreadsheet case weights before applying the k-nearest neighbor algorithm. When this check box is selected, values of the case weight variable specified in the Spreadsheet Case Weights dialog box will be used as case multipliers before the algorithm is applied. If the check box is cleared, the assigned case weight will be disregarded for this analysis. Note that when case weights have not been assigned, this check box will be dimmed.

K-value. In this field, specify the number of nearest neighbors K. This option may significantly influence the quality of inference. By default, this value is set to 3; however, you can specify any k between 1 and 500.

No. of exemplars. Enter the number of cases to use in the prototype (exemplar) data set. Statistica K-Nearest Neighbors divides the data into prototype and test samples. The testing sample is used to create a set of new query points for which the outcomes are estimated from the known values of the K nearest cases in the prototype sample.

OK. Click OK to accept the options specified here and use the k-nearest neighbor algorithm on the current spreadsheet.

Cancel. Click Cancel to close this dialog box without using the k-nearest neighbor algorithm on the current spreadsheet.