Variable Selection Filtering

Some data sources used in Statistica have hundreds of variables. If the variables are named in a predictable manner, you can use wildcard expressions or open-ended ranges to select variables instead of scrolling through the entire list and selecting individual variables.

The text expressions are persistent, so if you close a variable selection dialog box after selecting  variables, and then you need to edit the selections, you can open the variable selection dialog box again; the variable list(s) and the edit box(es) are repopulated with the previous selections.

Variable selection filtering can be used in analyses, graphs, and workspace nodes.

Wildcard Expressions

Type wildcard expressions into the edit box located below the variable list(s) in standard variable selection dialog boxes.

1.  Statistica supports the following special characters to define wildcard expressions:

a.  The * character will match 0 or more characters

b.  The ? character will match exactly 1 character

2.  Select all variables by entering a single * in the edit box.

3.  Pattern matching by wildcard expression is not case sensitive:

GEN* and gen* are treated the same.

5.  Wildcard characters can be placed anywhere in an expression.

a.  At the beginning: *DER matches GENDER, REMAINDER, etc.

b.  In the middle: C*E matches, COKE, CAKE, COPE, etc.

c.  At the end: PRED* matches PREDICTOR1, PREDICATE, etc.

6.  Multiple wildcard characters can be used in a single expression:

MEAS*0? Matches MEASURE01, MEASURE05, but not MEASURE0 or MEASURE022.

7.  Statistica will first attempt to exactly match a single explicit variable name. If a variable with the given name is found, no additional pattern matching will occur.

a.  If a data set has variables named PRED* and PREDICTOR1, the string PRED* will match only the variable named PRED*

b.  The pattern PRED** will match all variables that match the PRED* pattern, including a variable named PRED*

8.  Place wildcard expressions in quotes to select variables that contain special characters such as dashes and spaces:

"GENERAL P*” will match GENERAL PREDICTOR, GENERAL POPULATION and GENERAL PATTON.

9.  When a wildcard expression is typed into the edit box, all variables that match that expression will become selected in the corresponding list box.

10. The order of the variables in the selection will be the order the matching variable names appear in the data file.

11. You can enter multiple variable names, numbers, and/or patterns in the edit box, and all the matches will be selected in the corresponding list box:

"GENDER MEAS*2? 9 15-20" will select the variable named GENDER as well as any variable that matches the pattern MEAS*2, and variables 9 and 15 through 20.

12. If a pattern or text is entered into the edit box that does not match any existing variables, nothing will be selected in the corresponding list box, and when you click the OK button, an error will be generated if the dialog box requires a selection (some dialog boxes do not require a selection).

13. The Show appropriate variables only check box cannot be used with a pattern that would also select any variables that are filtered out by use of the option.

Open-Ended Ranges

1.  In the edit box of a variable selection dialog box, enter a range of the form n-*, where n is an integer, to mean all variables from variable n to the last variable. For example: 4-* selects all variables from variable 4 to the last variable.

When specified as the variable selection in a macro, the variable selection will adjust dynamically to the data file. That is, if run against an input with 10 variables, 4-* will select 4-10. When run against an input with 15 variables, 4-* will select 4-15.

2.  Statistica interprets any unquoted dash (-) as a range operator, and treats the left side as a variable name or variable number; the right side must be a name, number, or *.

a.  Invalid text (text that is not a valid variable name, number, or * on the right) on the left or right sides of the dash (-) operator will be an error.

b.  Patterns are not allowed on either side, except for the * on the right side.

3.  To allow for variables whose names begin with a numeral such as "4-", dashes that occur in quoted strings will be interpreted as names or patterns.

Examples:

3-5 means variable 3 through 5

"3-5" means a variable named 3-5

2-* means variables 2 through the last variable

"2-*" will first match any variable explicitly named 2-*, and if none are found, will match all variable names beginning with 2-

"Duration of Credit” - * means variable named Duration of credit through the last variable

Macros and Variable Filtering

1.  When recording a macro from a statistical or graph analysis, Statistica records the variable selections as they are specified in the edit box of the variable selection dialog box. If you enter selection(s) as text, including but not limited to wildcard patterns, the text will be recorded.

2.  When running an analysis/graph macro that selects variables by pattern or text, and then making the analysis visible and displaying the variable selection dialog box, the pattern is preserved/displayed in the corresponding variable selection edit box. Recording a new macro from this analysis created by a macro records the variable selections by text/pattern.

3.  When selecting variables by wildcard pattern in a macro (more generally, from automation), the actual variables selected are determined dynamically at runtime based on the current input and which variables match the pattern. This could result in errors if the analysis expects a certain number of variables and the pattern does not match an appropriate number of variables.

4.  For any variable selection from script/automation outside of analysis/graph modules, such as data management operations, Statistica allows variable selection by pattern. Statistica does not record patterns in such operations in master macros, but you can edit the script and use a pattern for variable selection. For example, spreadsheet.Subset("FINAL*") will select either a variable named "FINAL*” or all variables beginning with the word FINAL.