Case Selection Conditions - Syntax Summary
Recoding variable values,
performing case selection for analyses, or verifying quality of data involves
a comparison of conditions or values encountered in the spreadsheet with
conditions specified by an expression to determine a further action for
each case (i.e., respectively assigning a new variable value, including
the case in an analysis, or verifying that the data associated with a
case are correct). Assigning variable values via a spreadsheet formula
involves case-by-case calculation of values for a variable based directly
on the values of other spreadsheet variables and/or parameters not included
in the spreadsheet. For data transformations more extensive or more complex
than can be accomplished using the variable recode and spreadsheet formula
facilities provided, use Statistica Visual Basic.
Follow these guidelines
to create case selection, verification, or recode conditions.
Refer to
variables by either their numbers (e.g., v1 = 1) or their names
(e.g., Gender = 1). Note that you can type variable names in either
upper case or lower case letters (i.e., "GENDER" is equivalent
to "gender"). Note also that v0 refers to the
case number when used in expressions.
In expressions,
enclose text labels of a variable in single quotation marks (e.g., v1
= 'MALE'). Note that you can type text labels in either upper case
or lower case letters (i.e., 'YES' is equivalent to 'yes').
In expressions
or spreadsheet formulas, enclose variable names containing special characters
(e.g., spaces, plus or minus signs) in single quotation marks. If the
single quotation mark itself occurs in the variable name, use double quotation
marks instead (e.g., "A's Score"). Note that if double
quotation marks are used in the name, then the variable name must be placed
in single quotation marks.
Note that when there is
a conflict between text labels and variable names, variable names will
take precedence. Consider the following example:
Using a case selection
condition such as "v2='MALE'" with the above data set will select
only the last case. This is because the variable "MALE" will
take precedence over the variable "GENDER"'s text label. If
you intend to case select on a variable's text labels, but a conflict
with another variable's name arises, place a '$' at the end of the expression.
For example, "v2='MALE'$" will yield the following result:
Syntax conversions for spreadsheet
formulas. Spreadsheet formulas (specified in the Variable
specifications dialog box accessed by selecting Variable Specs
on the Data menu)
must start with an equal sign. When you enter a label that starts with
an equal sign, STATISTICA will assume that it is a formula and
will verify it for formal correctness. Note that a semicolon after a formula
starts a comment: e.g., = v1 + v2; this is a comment.
Missing values. The IsMD(x)
function will return a value of true (1) if the passed expression
is a missing data value. For details on how missing data values are handled
in arithmetic and logical expressions (e.g., in case selection conditions,
spreadsheet formulas), see Logical
and Arithmetic Expressions Involving Missing Data Values Always Evaluate
to FALSE or Missing Data.
Operators. A number of
arithmetic, relational, and logical operators are available for creating
expressions for recoding, case selection, and data verification or for
creating spreadsheet formulas.
Arithmetic: +,
-, *, /, ** or ^ (exponentiation),
( )
Relational:
= (equal to)
<>, >< (not equal to)
< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)
Logical:
AND (equivalent to &)
OR (equivalent to !)
NOT (equivalent to ~)
Note that a common error
is caused by omitting parentheses needed to adjust for the precedence
of operators; for example, the expression x > 0 and x <
1 is incorrect and needs parentheses: (x > 0) and (x < 1)
because relational operators (>, <) have a lower precedence
than the conjunction (and).
Math functions. Math
functions can be used in expressions for recoding, case selection or data
verification as well as in spreadsheet formulas for calculating variable
values. If the value of any variable used in the expression or formula
is missing (in the current case), then the expression evaluates to missing
data (for the current case).
Abs(x) |
absolute value
of x |
Arccos(x) |
arc cosine of x |
Arcsin(x) |
arc sine of x |
Arctan(x) |
arc tangent of
x |
Cos(x) |
cosine of x |
CosH(x) |
hyperbolic cosine
of x |
Exp(x) |
e to the power
of x |
Hypot(x,y) |
returns hypotenuse
of x and y (square root(x2 + y2)) |
Log(x) |
natural logarithm
of x |
Log2(x) |
binary logarithm
of x |
Log10(x) |
common logarithm
of x |
Max(x,y) |
returns the greater
of x and y |
Min(x,y) |
returns the lesser
of x and y |
Rnd(x) |
random number in
the range of 0 to x |
RndNormal(x) |
random number from
a normal distribution with mu=0 and sigma=x |
RndPoisson(x) |
random number from
a Poisson distribution with parameter x |
Sign(x) |
sign of x: if
x>0 then +1, if x<0 then -1, if x = 0 remains 0 |
Sin(x) |
sine of x |
SinH(x) |
hyperbolic sine
of x |
Sqrt(x) |
square root of
x |
Tan(x) |
tangent of x |
TanH(x) |
hyperbolic tangent
of x |
Trunc(x) |
truncate x to an
integer "towards zero" |
Uniform(x) |
random value in
range 0 to x (same as Rnd(x)) |
Function names are not
case sensitive, i.e., Log(x) is the same as log(x) or LOG(x).
As indicated in the function list above, math functions accept one to
two arguments depending on the function. The position of the function
in the expression or spreadsheet formula will be replaced by the return
value of the function. Numeric values [e.g., Sqrt(155)],
variable names [e.g., Max(SCORE1,SCORE9)], or variable numbers [e.g., Log(v8)] are acceptable arguments. Additional
arguments acceptable to math functions are expressions that evaluate to
a number [e.g., Max(v7,(v5+v8-BASELINEVALUE)/3)]
or functions that return a numeric result [e.g., Sin(Sqrt(v5))]. Some commonly used constants
can also be specified in expressions and formulas by reference: e.g.,
Pi = 3.14... Euler (e) = 2.71...
Statistics functions.
Nine statistics functions that accept lists of values and/or ranges and
arguments, and adjust to missing data, are also available for use in recoding,
case selection, and verification expressions as well as in spreadsheet
formulas:
Mean(x1,
x2,..xn) |
mean of n arguments |
Median(x1,
x2,..xn) |
50th percentile
of n arguments |
Perc25(x1,
x2,..xn) |
25th percentile
of n arguments |
Perc75(x1,
x2,..xn) |
75th percentile
of n arguments |
Statmax(x1,
x2,..xn) |
maximum of n arguments |
Statmin(x1,
x2,..xn) |
minimum of n arguments |
Stdev(x1,
x2,..xn) |
standard deviation
of n arguments |
Sum(x1,
x2,..xn) |
sum of n arguments |
Validn(x1,
x2,..xn) |
number non-missing
of n arguments |
Note that the Statmin
and Statmax functions have the stat prefix to distinguish them
from the arithmetical Max and Min (math) functions discussed
above. Like math functions, the names of statistics functions are not
case sensitive when used in expressions and formulas.
All statistics functions
ignore arguments that contain missing values, basing their results on
non-missing data only. Unlike math functions, statistics functions will
not evaluate to a missing result unless all arguments are missing values
or an argument is encountered that evaluates to an undefined value (e.g.,
square root of a negative number or division by zero). Also unlike math
functions, each statistics function accepts any number of arguments placed
in parentheses and separated by commas.
The following are acceptable
statistics function arguments:
Numbers |
e.g., Mean(18,80,120,68,40) |
Variable names |
e.g., Mean(SALARY,1050,BONUS,500) |
Variable numbers |
e.g., Stdev(120,v3,v2,v8,255) |
Ranges of variables
designated by name or number (use colons to define ranges) |
e.g., Sum(v54,COST1:COST5,2550,v23:v35,1575,OVERHEAD) |
Expressions that
evaluate to a number |
e.g., Mean(500,(v6+456)/v3),TRAVELEXPENSE,v8) |
Functions that
return a numeric result |
e.g., Mean(220,Sqrt(v8+v7),v12) |
Distribution functions and their integrals. Statistica
provides a predefined broad selection of distribution functions, their
integrals and inverse distribution functions that can be used in spreadsheet
formulas and in recoding, case selection, and verification expressions
like all other functions.
For a listing of available
distributions see Spreadsheet
Formulas - Distributions and Their Functions.
Selection
Using Dates. Statistica can select cases based on dates by enclosing
the date with # symbols. For example, if you want to select the week of
January 18 through 24, 2009, for variable 6, the case selection condition
is: v6>=#1/18/2009# AND v6<=#1/24/2009#. Time functions can also
be used with this selection method, for example: v6 > #3/5/07 12:03AM#.
For more details, see Spreadsheet Case
Selection Conditions. See also: Case
Selection Conditions Overview and Case
Selection Condition Examples.
For transformations of
text variables (variables of type text), see also, Transformation
of Text Variables (Variables of Type Text). Note that Statistica
Spreadsheets also support text labels for numeric values (these are
labels "attached" to numeric values and are used for display
purposes only); when transforming the values with attached text labels,
the respective transformations are performed on the numeric representations
and not on the text labels.