2D Box Plots - Advanced Tab

Graphical Analytic Techniques

Select the Advanced tab of the 2D Box Plots Startup Panel to access a greater variety of options to create a box plot.

Graph type. Select the type of box plot to be plotted from the Graph type list. Click the desired plot link below for a brief description of that type of graph.

You can also choose between two types of graph formats. Click the desired graph format link below to learn more about it.

Variables. Click the Variables button to display the standard variable selection dialog box, in which you select the grouping (category) and dependent variables for the graph. If you select a category variable and more than one dependent variable for Regular box plots (see the graph Format, above), a sequence of graphs (one for each dependent variable) is produced; if the format is set to Multiple, box plots for all selected variables are combined in one graph.

Note that if you select only one dependent variable and no category variable, a single box plot representing the distribution of the dependent variable is produced. If you select multiple dependent variables and no category variable, a single graph with box plots for each selected variable is produced. The latter plot is useful for comparing the distribution of several variables.

Grouping intervals. Use the options under Grouping intervals to select a method of categorization for the selected variable(s). Each of the methods is discussed in Method of Categorization.

Change Variable. Click the Change Variable button to display a standard variable selection dialog box in which you can change the selection of the Dependent variable(s) and/or the Grouping variable. If you change the variable(s) using this button, display of the variable selection under the Variables button (see above) will change accordingly.

Fit. You can choose to fit an equation to the mid-points in the box plots by selecting one of the predefined functions under Fit.

Graph icon. The graph icon in the middle of the tab represents the currently selected Graph Type and the selection of values for the Middle Point, Box, and Whiskers. The graph icon previews these three selections and the specific statistics that define the current box plot.

Middle point. The options under Middle point control the type of value and appearance of the middle point.

Value. The middle point can be either the Mean, Median, Mean/Median (uses the Mean as the middle point, plus it has an added marker for the Median), or Median/Mean (uses the Median as the middle point, plus it has an added marker for the Mean) of the selected variable. The options available for the Box and Whiskers depend on this selection.

Style. Use the Style box to control how the middle point is represented (by a Line or Point).

Pooled Variance. This check box is available when you select Mean as the middle point Value (see above). The setting of this check box determines how the standard deviations and standard errors (for the means) are computed from grouped data. When the Pooled Variance check box is selected, STATISTICA computes the pooled within-group (category) variance for all groups (categories), and uses this value as an estimate of s (Sigma) when computing the standard errors for the means (see, for example, Milliken and Johnson, 1984). Specifically, the program computes the pooled within-group (category) variance as:

spooled2 = 1/(n-k) * [s12 *(n1 -1) + ... + sk2 *(nk -1)]

In this equation, k refers to the k groups in the plot, s12, refers to the variance in the i'th category or group, n1 refers to number of valid observations in the i'th category or group, and n is the overall number of valid observations in the plot.

The standard error of the mean for the i'th group is then computed as:

s.e.(mean) = spooled /square root(ni)

Multiple box layout. When the Multiple box plot style is selected for the graph (see Graph Type - Format above), you can choose to display the boxes, whiskers, columns or box-whiskers in one of two styles:

Overlaid. Series of box plots are displayed one on top of the other.

Shifted. Series of box plots are displayed side by side.

Trim distrib. extremes. Use this box to specify the percent of cases to be "trimmed" from the extremes (i.e., tails) of the distribution of cases for the selected variable. For example, if you specify 10%, for a variable with 100 cases, STATISTICA removes the first 10 lowest value cases and the 10 highest value cases from the distribution, and only plots the 80 middle cases. If you enter a value for Trim distrib. extremes for a mean-based box plot, so-called "trimmed means" will be plotted.

Statistics. You can choose to include a variety of statistics as footnotes in the graph by selecting one or more of the statistics listed under Statistics.

Kruskal-Wallis. Select this check box to include the Kruskal-Wallis Test statistic as a footnote on the graph.

F test and p (ANOVA). Select this check box to include the F and p statistic as a footnote on the graph.

Box. If you selected the Median as the Middle point, the range (box) can be represented by Percentiles or the Min-Max values of the selected variable, or a specified Constant value (when you want a fixed size box around the medians).

If you selected Mean as the Middle point, the range (box) can be defined in terms of standard deviations (Std. Dev), standard errors (Std. Error), Min-Max values of the selected variable, or a specified Constant value (when you want a fixed size box around the means).

You can also specify a Coefficient, by which the selected range value is multiplied (by default, the Coefficient is 1). Note that, except for unusual applications, the default value of the coefficient (1) should not be changed if the box Value is Min-Max. If Median is specified as the Middle point, and Percentiles is specified as the Value, the Coefficient entered must be between 0.01 - 50.0. If Mean is specified as the Middle point, and Conf. Interval is specified as the Value, the Coefficient entered must be between 0.15 - 0.9999.

Whisker. If you selected the Median as the Middle point, the range (whiskers) can be represented by Percentiles or the Min-Max values of the selected variable, a specified Constant value, or Non-outlier range (see Outliers and extremes).

If you selected Mean as the Middle point, the range (whiskers) can be defined in terms of standard deviations (Std. Dev), standard errors (Std. Error), or Min-Max values of the selected variable, or Non-Outlier Range.

If you select Non-Outlier Range, STATISTICA determines which points in the data set are outliers (see Outliers and extremes), and then uses the highest and lowest data points which are closest to the outliers (but are not outliers) as the whiskers in the plot.

You can also specify a Coefficient by which the selected range value will be multiplied (by default, the Coefficient is 1). In most typical applications the coefficient should be set to 1 when the value of the whisker is Min-Max or Non-Outlier Range.

Outliers. You can elect to display none (select Off), only Outliers, only Extreme values, or both Outliers & Extremes in the box plot. For more details, see Outliers and Extremes.

Connect middle points. In box plots, you can select the desired mid-point (Mean or Median or trimmed Mean or Median of the selected variable) to be represented by the selected style (point or line; see Middle point above). Select this check box to connect the middle points of the box plots with a line. If selected, for example, a line plot (or categorized line plot) of means with "error bars" or a line plot (or categorized line plot) of medians with quantiles and min-max range bars can be produced. Selecting the Overlaid option button in the Multiple box layout group box aligns the respective plots from each line. This setting of the Connect middle points check box can also be used to create line plots of means with error bars, or line plots of medians with range bars. Also, this format is used in some statistical procedures to create predefined graphical output.

Display raw data. Select this check box to display the raw data points.

Jitter. Use the options in this group box to jitter the data points, i.e., modify the original position of the data point from the center of the graph in order to more easily identify/brush overlapping points.

Off. If you select Off, no jitter is applied to the raw data points, outliers, and extremes.

Sequential. If you select Sequential, the jitter is applied sequentially to the raw data points, outliers, and extremes. The jitter is applied such that the first case in the data set is maximally shifted to the left and the last case is shifted maximally to the right.

Random. If you select Random, the data point is randomly shifted within the available range.  

Width. With this option, you can specify the maximum jitter width defined as percentage of box width. Possible percentages range from 0 to 250.