Full-featured implementation of K-Nearest Neighbors (KNN) for multiple regression problems. The final solution is automatically stored for deployment.

**General**

**Detail of
computed results reported**. Detail of computed results; if Minimal
detail is requested, spreadsheets of analysis summary, model specifications
as well descriptive statistics (regression statistics) will be displayed;
at the Comprehensive level of detail, a spreadsheet of predictions and
residuals as well as their histogram plots will be displayed; in addition
to the above, the All results level will display a spreadsheet (if the
'Creates residual statistics' option is selected) containing all data
set variables and their statistics including predictions and residuals
(whichever applicable).

**Missing
data deletion**.

**Generate
datasource, if N for input less than**. Generate a data source
for further analyses with other Data Miner nodes if the input data source
has fewer than k observations, as specified in this edit field; note that
parameter k (number of observations) will be evaluated against the number
of observations in the input data source, not the number of valid or selected
observations.

**Sampling**

**Sampling method**. Sampling method to be used for dividing the
data set into example and test subsets. Random sampling will divide the
dataset into example and testing samples in a random fashion. This is
in contrast to the First N method which selects the first N cases as the
training set and the rest as the testing sample. NOTE: you may also use
a learning/testing indicator variable method for sampling from the data.
You can access this functionality via the Advanced tab of the data spreadsheet
in the Data Acquisition of Statistica Data Miner environment.
Selecting this method (i.e. learning/testing indicator) will override
any choice of sampling you make on this tab.

**Size of
example set (%)**. Specifies the percentage of data cases that
will be used as examples. The remaining valid cases in the dataset will
be used to form the test sample.

**Seed**.
Specifies the random generator seed for dividing data into the example
and test sets

**Use first
N cases**. Selects the first N valid cases in the data set as
training subset. The rest are used for testing

**Options**

**Number of
nearest neighbors**.

**Distance
measure**. Specifies the metric to be used for measuring the
distance between two points in the input space

**Standardize
distances**. Select this option to standardize distances

**Use weighted
average\voting for predictions**.

**Cross-validation**

**Apply v-fold
cross-validation**. Applies v-fold cross-validation to obtain
estimates of the capacity, epsilon and nu parameters

**V value**.
Number of cross-validation folds

**Seed**.
Seed value for random data shuffling for cross-validation

**Minimum
K**. Start value for number of nearest neighbors (used by cross-validation
grid search)

**Maximum
K**. End value for number of nearest neighbors (used by cross-validation
grid search)

**Increment
in K**.

**Memory usage**

**Restrict
memory usage**. Restrict the amount of memory that can be used
by the analysis.

**Amount of
memory that can be used by the analysis.**. Amount of memory
that can be used by the analysis.

**Results**

**Subset used
to generate results**.

**Include
inputs**. Includes the independent variables in spreadsheets
and histograms.

**Include
outputs**. Includes the dependent variables in spreadsheets and
histograms.

**Include
predictions**. Includes predictions in spreadsheets and histograms.

**Include
residuals**. Includes residuals in spreadsheets and histograms.

**Include
standard deviations**. Includes standard deviations for regression
predictions in spreadsheets and histograms (meaningful only when number
of nearest neighbors is larger than one).

**Creates
residual statistics**. Creates predicted and residual statistics
for each case depending on the selected level of details.

**Deployment. **Deployment is
available if the Statistica installation is licensed for this feature.

**Generates
C/C++ code**. Generates C/C++ code for deployment of predictive
model.

**Generates SVB code**. Generates Statistica Visual
Basic code for deployment of predictive model.

**Generates
PMML code**. Generates PMML (Predictive Models Markup Language)
code for deployment of predictive model. This code can be used via the
Rapid Deployment options to efficiently compute predictions for (score)
large data sets.

**Saves C/C++
code**. Save C/C++ code for deployment of predictive model.

**File name
for C/C code**. Specify the name and location of the file where
to save the (C/C++) deployment code information.

**Saves SVB code**. Save Statistica Visual Basic code
for deployment of predictive model.

**File name
for SVB code**. Specify the name and location of the file where
to save the (SVB/VB) deployment code information.

**Saves PMML
code**. Saves PMML (Predictive Models Markup Language) code for
deployment of predictive model. This code can be used via the Rapid Deployment
options to efficiently compute predictions for (score) large data sets.

**File name
for PMML (XML) code**. Specify the name and location of the file
where to save the (PMML/XML) deployment code information.