Variable Specification (for ETL ID-based Analyses)

On the STATISTICA ETL Advanced  tab, click the Variable specs button to display the Variable specification dialog box, which provides data cleaning and aggregation options for each selected output variable.

Minimum permissible value. Converts any value less than the specified floating point number to missing data. This data cleaning method applies to continuous variables only.

Maximum permissible value. Converts any value greater than the specified floating point number to missing data. This data cleaning method applies to continuous variables only.

Aggregation statistics type. Summarizes data by central tendency (mean, median, mode), variation (std. dev.), range (minimum, maximum), total (sum), or none.

    • Mean. The number calculated by adding a group of numbers and then dividing by the count of those numbers. This function is the default for continuous variables and applies to continuous variables only.

    • Median. The middle number of a group of numbers; that is, half the numbers have values that are greater than the median, and half the numbers have values that are less than the median. This function applies to continuous variables only.

    • Mode. The most frequently occurring value in a group of values. In case of a tie, the first value is selected. This function is the default for categorical variables and applies to categorical variables only.

    • Std. dev. (Standard Deviation). A measure of how widely values are dispersed from the average value (the mean). This function applies to continuous variables only.

    • Minimum. The smallest number in a set of numbers. This function applies to continuous variables only.

    • Maximum. The largest number in a set of numbers. This function applies to continuous variables only.

    • Sum. The total of a series of numbers. This function applies to continuous variables only.

    • None. No aggregation of output variables. Data sources are merged in a Cartesian product (i.e., cross product of variables). This function applies to both continuous and categorical variables.

Note: If None is specified for any of the rows, then the output preserves all multiples. Conversely, if every row has a defined aggregation statistic type (i.e., not None), then the output has only one case for every unique ID (i.e., no multiples).