Descriptive Statistics - Normality Tab

In the Descriptive Statistics dialog box, select the Normality tab  to access options to determine the normality of the selected variables. Note that you can use Distribution Fitting, Process Analysis, or Categorized Graphs (e.g., Q-Q, P-P plots) to fit other distributions to the variables. You can also use Survival Analysis to fit censored data.

Distribution. A variety of options are contained in this group box to create statistics, graphs, and summary results to help you explore the normality of the selected Variables.

Frequency tables. Click this button to produce a cascade of spreadsheets with the frequency distributions for the selected variables (one spreadsheet per variable). The manner in which the selected variables are categorized depends on the selections made in the Categorization group box. An extensive selection of categorization methods and frequency table statistics are available in the Frequency Tables analysis.

Histograms. Click this button to produce a cascade of histograms, analogous to the Frequency tables. If the Normal expected frequencies check box is selected, the histograms also display the normal curve, superimposed over the observed frequencies.

Categorization. The options selected in this group box only affect the frequency tables and histograms produced via the options in the Distribution group box. There are two modes available for categorizing the values of selected variables for the frequency table; a number of additional options, graphs, and statistics are available via the Frequency tables analysis.

Number of intervals. Select this option button to divide the range of values for the selected variables into approximately the specified number of intervals (entered in the corresponding edit field) for subsequent frequency table spreadsheets or histograms. This option is appropriate when the variables to be tabulated are continuous in nature. The tests of normality (Normal expected frequencies, Kolmogorov-Smirnov & Lilliefors test for normality, and Shapiro-Wilk W test) are only available if this option is selected. Note that the actual number of categories that are produced may sometimes differ from the number of intervals requested. Statistica produces neat intervals; that is, interval boundaries and widths with the last digit being 1, 2, or 5 (e.g., 10.5, 11.0, 11.5, etc.). Such simple or neat intervals are more easily interpreted than interval boundaries defined by many significant digits (e.g., 10.12423, 10.13533, etc.). Full control over the method of categorization of variables is available via the Frequency tables analysis.

Integer intervals (categories). Select this method of categorization if the variables to be tabulated can be interpreted as integer categories, or contain only integer values. If this method is selected, all non-integer values are ignored when producing Frequency tables or Histograms; the choice of categorization in this group box does not affect other computations (e.g., Detailed descriptive statistics on the Advanced or Quick tabs).

Normal expected frequencies. This check box is only available (active) if the Number of intervals option button is selected in the Categorization group box. When the Normal expected frequencies check box is selected, subsequent spreadsheets contain the expected normal frequencies (cumulative frequencies and relative frequencies) for each category. The Histograms option displays the normal curve superimposed over the observed frequencies. A wide variety of non-normal distributions can be fit to observed data in the Distribution Fitting module; specialized distributions for survival and reliability studies are available in the Survival Analysis module.

Kolmogorov-Smirnov & Lilliefors test for normality. This check box is only available (active) if the Number of intervals option button is selected in the Categorization group box. When the Kolmogorov-Smirnov & Lilliefors test for normality check box is selected, subsequent frequency spreadsheets include the results of the Kolmogorov-Smirnov one-sample test of normality. If the D statistic is significant, the hypothesis that the respective distribution is normal should be rejected. Two probability (significance) values are reported for each Kolmogorov-Smirnov D: The first is based on the probability values as tabulated by Massey (1951); those probability values pertain to cases when the mean and standard deviation of the normal distribution are known a priori and not estimated from the data. However, these parameters are typically computed from the actual data. In this case, the test for normality involves a complex conditional hypothesis ("how likely is it to obtain a D statistic of this magnitude or greater, contingent upon the mean and standard deviation computed from the data"), and the Lilliefors probabilities should be interpreted (Lilliefors, 1967). Note that, in recent years, the Shapiro-Wilk W test has become the preferred test of normality because of its good power properties as compared to a wide range of alternative tests (see Shapiro, Wilk, & Chen, 1968).

Shapiro-Wilk W test. This check box is only available (active) if the Number of intervals option button is set in the Categorization group box. When the Shapiro-Wilk W test check box is selected, subsequent frequency spreadsheets include the results of the Shapiro-Wilk W test of normality. If the W statistic is significant, the hypothesis that the respective distribution is normal should be rejected. The Shapiro-Wilk W test is the preferred test of normality because of its good power properties as compared to a wide range of alternative tests (see Shapiro, Wilk, & Chen, 1968). The algorithm implemented in Statistica employs an extension to the test described in Royston (1992), which makes it possible to be applied to samples with up to 5,000 observations (e.g., see 1992); if there are more than 5,000 observations, this test cannot be performed.

3D histograms, bivariate distributions. Click this button to produce a cascade of 3D histograms for selected pairs of variables, one plot per pair. You are first prompted to select two lists of variables (from among those originally selected via the Variables button) via a variable selection dialog box. 3D bivariate histograms are produced for each variable in the first list with each variable in the second list.

Categorized histograms. Click this button to produce a cascade of categorized histograms for the selected variables, one plot per selected variable. You are first prompted to select up to two categorical variables via a variable selection dialog box. For each selected variable (via the Variables button) Statistica produces a histogram, broken down (categorized) by the categorical variables. More complex breakdown analyses and graphs are available via the Breakdown & one-way ANOVA analysis.

Stem and leaf.  The stem and leaf plot is an alternative to the Histogram. Like the histogram, the stem and leaf plot (Tukey, 1972) is produced for the selected variables.

Stem & leaf plot. Click this button to produce a stem and leaf plot. In this plot, each stem represents an interval, just like in a regular histogram. However, unlike in the histogram where we plot a vertical bar to indicate the number of cases that fall into the respective interval, here we plot the actual values as leaves of the stem. The cases of the stem and leaf plot are displayed in the following format stem°leaf. Hence, if leaf unit is 1.000000 and stem and leaf value is 7° 000038, this means that for the specified variable, there are four 7.0 values, one 7.3 value and one 7.8 value.

Compressed. Select this check box to affect the number of intervals that is created for the stem and leaf plot. When Compressed is selected, fewer intervals are displayed on the stem and leaf plot.