2D Categorized Box Plots - Advanced Tab

Graphical Analytic Techniques

Select the Advanced tab of the 2D Categorized Box Plots Startup Panel to access the options described here.

Graph Type. Select the type of box plot to be plotted from the Graph Type list. Click the desired plot link below for a brief description of that type of graph.

Layout. Select the type of layout for the graph(s).

Separate. Select this option button to produce a Separate plot layout (where each subset of cases is displayed in a separate graph) for the categorized plots.

Overlaid. Select this option button to produce an Overlaid plot layout (where all subsets are overlaid in one graph and identified by patterns and colors) for the categorized plots.

Variables. Click this button to display a variable selection dialog box, in which you can select the X and (optional) Y grouping variables, the dependent variable(s) to be displayed in the box plots, and the grouping variable that will be used to categorize values of the dependent variable(s) in each small graph. A list of selected variables is displayed below the Variables button.

If more than one dependent variable is selected, a sequence of graphs (one for each dependent variable) will be produced using the same set of grouping variables. Note that the selected grouping variables do not have to be categorical variables (e.g., contain codes); you can use one of the methods of categorization (below) to categorize continuous variables. The selection of grouping variables is not necessary if the categories are defined via the Multiple subsets method.

X and Y categories. Select Integer mode, Unique values, or Categories to specify that method of categorization for each of the variables selected via the Change Variable button, or use the Boundaries, Codes, or Multiple subsets options. For more information about each of these methods of categorization, click on the links below:

Integer mode

Intervals. Use the options in this group box to choose the method of categorization for the box plot (i.e., for the grouping variable that will be used to categorize values of the dependent variable(s) in each small graph). Each of the methods is discussed in Methods of Categorization.

Graph Icon. The graph icon in the lower section, left side of the dialog box represents the currently selected Graph Type and the selection of values for the Middle Point, Box, and Whiskers. The graph icon previews these three selections.

Middle point. The options in this group box control the type of value and appearance of the middle point.

Value. The middle point can be either the Mean, Median, Mean/Median (uses the Mean as the middle point, plus it has an added marker for the Median), or Median/Mean (uses the Median as the middle point, plus it has an added marker for the Mean) of the selected variable. The options available for the Box and Whiskers depend on this selection.

Style. Use the Style box to control how the middle point is represented (by a Line or Point).

Pooled variance. This check box is available when you select Mean as the middle point Value (see above). The setting of this check box determines how the standard deviations and standard errors (for the means) are computed from grouped data. When the Pooled Variance check box is selected, Statistica computes the pooled within-group (category) variance for all groups (categories), and uses this value as an estimate of s (Sigma) when computing the standard errors for the means (see, for example, Milliken and Johnson, 1984). Specifically, the program computes the pooled within-group (category) variance as:

spooled2 = 1/(n-k) * [s12 *(n1 -1) + ... + sk2 *(nk -1)]

In this equation, k refers to the k groups in the plot, s12, refers to the variance in the i'th category or group, n1 refers to number of valid observations in the i'th category or group, and n is the overall number of valid observations in the plot.

The standard error of the mean for the i'th group is then computed as:

s.e.(mean) = spooled /square root(ni)

Box. If you select Median as the middle point, the range (box) can be represented by Percentiles or the Min-Max values of the selected variable, or a specified Constant value (when you want a fixed size box around the medians).

If you select Mean as the mid-point, the range (box) can be defined in terms of standard deviations (Std. Dev), standard errors (Std. Error), Min-Max values of the selected variable, or a specified Constant value (when you want a fixed size box around the means).

You can also specify a Coefficient by which the selected range value is multiplied (by default, the Coefficient is 1). Note that, except for unusual applications, the default value of the coefficient (1) should not be changed if the box Value is Min-Max.

Whisker. If you select Median as the middle point, the range (whiskers) can be represented by Percentiles or the Min-Max values of the selected variable, a specified Constant value, or Non-Outlier Range (see Outliers and Extremes).

If you select Mean as the mid-point, the range (whiskers) can be defined in terms of standard deviations (Std. Dev), standard errors (Std. Error), or Min-Max values of the selected variable, or Non-Outlier Range.

If you select Non-Outlier Range, Statistica determines which points in the data set are outliers (see Outliers and Extremes), and then uses the highest and lowest data points which are closest to the outliers (but are not outliers) as the whiskers in the plot.

You can also specify a Coefficient by which the selected range value will be multiplied (by default, the Coefficient is 1). In most typical applications the coefficient should be set to 1 when the value of the whisker is Min-Max or Non-Outlier Range.

Connect middle points. In box plots, you can select the desired mid-point (Mean or Median or trimmed Mean or Median of the selected variable) to be represented by the selected style (point or line; see Middle Point above). Select this check box to connect the middle points of the box plots with a line. If selected, for example, a line plot (or categorized line plot) of means with "error bars" or a line plot (or categorized line plot) of medians with quantiles and min-max range bars can be produced. Selecting the Overlaid option button in the Multiple Box Layout group box aligns the respective plots from each line. This setting of the Connect Middle Points check box can also be used to create line plots of means with error bars, or line plots of medians with range bars. Also, this format is used in some statistical procedures to create predefined graphical output.

Trim distr. extremes. Use this box to specify the percent of cases to be "trimmed" from the extremes (i.e., tails) of the distribution of cases for the selected variable. For example, if you specify 10%, then for a variable with 100 cases, Statistica removes the first 10 lowest value cases and the 10 highest value cases from the distribution, and only plots the 80 middle cases. If you enter a value for Trim distr. extremes for a mean-based box plot, so-called "trimmed means" will be plotted.

Outliers. You can elect to display none (select Off), only Outliers, only Extreme values, or both Outliers & Extremes in the box plot. For more details, see Outliers and Extremes.

Fit. You can fit an equation to the middle points in the box plots by selecting one of the predefined functions. If you select Off, no equation will be fitted.

Display raw data. Select this check box to display the raw data points.

Jitter. Use the options in this group box to jitter the data points, i.e. modify the original position of the data point from the center of the graph in order to more easily identify/brush overlapping points.

Off. If you select Off, no jitter is applied to the raw data points, outliers, and extremes.

Sequential. If you select Sequential, the jitter is applied sequentially to the raw data points, outliers, and extremes. The jitter is applied such that the first case in the data set is maximally shifted to the left and the last case is shifted maximally to the right.

Random. If you select Random, the data point is randomly shifted within the available range.  

Width. With this option, you can specify the maximum jitter width defined as percentage of box width. Possible percentages range from 0 to 250.