Statistica ETL: ID-Based - Advanced Tab

Select the Advanced tab of the Statistica ETL: ID-based dialog box to access the options described here. This tab contains items from the Quick tab plus less commonly used ID-Based STATISTICA ETL options.

The grid at the top lists the data sources and their associated properties such as data source ID (for recording/editing SVB macros), data type, and name. When your cursor hovers over a data source in the grid, a ToolTip displays the full path of the data source location.

Add data source. Click this button to display the Select Data Sources dialog box, where you can specify a data source to add to the list.

Remove data source. Select a data source in the grid at the top of the tab and click this button to remove that data source from the grid.

Note: To change the sequence in which the data sources are aligned/merged, select a data source and click the arrow buttons at the side of the grid. Click the down arrow to demote the selected source; click the up arrow to promote the selected source.

Options/properties applicable to selected data source above. These options apply to the data source selected in the grid.

Variables. Click this button to display the Select Variables dialog box, where you select ID and output variables. If you also select one or more optional Time variable(s), then the Time Variable Specs button is enabled.

Variable specs. Click this button to display the Variable Specification dialog box, which provides data cleaning and aggregation options for each selected output variable. Adjacent to this button, the word "default" is displayed if no output specifications have been changed.

Time Variable Specs. Available only if a Time variable was selected. Click this button to display the Time Variable Specification dialog box, which provides data cleaning and aggregation options for each selected Time variable. Adjacent to this button, the word "default" is displayed if no Time specifications have been changed.

Assume data is sorted ascending by ID variable (merge step will be faster). Select this check box to significantly decreased runtime for very large data sets that are pre-sorted ascending by the Identifier variable. If the data are not sorted ascending by the Identifier variable, a warning message displays.

Use input data case selection conditions. Select this check box to specify that Statistica uses a subset of cases as defined in the input spreadsheet conditions. Click the Edit button to display the Analysis/Graph Case Selection Conditions dialog box.

Use variable prefix. When this check box is selected, variables in the output contain a default prefix, which is the data source name. The prefix can be changed in the adjacent edit box.

Only use when sources have duplicate variable names. When this check box is selected, only duplicate variables contain a prefix.

Merge properties for all data sources. Data sources are merged by matching on the selected Identifier variables and optional time variables.

Preserve order in data. Select this check box to retain the original Identifier (i.e., Class ID) order. Merge results are sorted by the order of identifiers in first data source, then by the order of identifiers in each subsequent data source as numbered in the ID column of the Data source grid.

Unmatched cases. These options specify how unequal numbers of cases are handled.

Fill with MD. Select this option to pad unmatched cases with missing data. This is the default option.

Delete cases. If this option button is selected, cases from input data sources that cannot be matched are removed from the results spreadsheet.

Generate Cartesian. Select this option to create a cross product between every unmatched case against every other case (i.e., if a unique case is found in only data source 1 or data source 2), then every combination of that case against every other case are created.

Abort merge. When this option button is selected, the presence of unmatched cases in any data source causes an error message to be displayed and the merge procedure to be abandoned. Note that the Abort merge option only works when the variables specifications for all data sources declare an Aggregation statistic type of None.

Multiple Cases. These options specify how duplicate matching cases are handled.

Fill with MD. Select this option to pad duplicate matched cases with missing data. This is the default option.

Copy down. Select this option to generate a Cartesian product for duplicate matches of the same value.

See Statistica ETL: ID-Based - Startup Panel and Quick Tab for button descriptions.

See also Statistica Extract, Transform, and Load (ETL) Overview.