Distribution Fitting Startup Panel and Quick Tab

Ribbon bar. Select the Statistics tab. In the Base group, click Distribution Fitting to display the Distribution Fitting Startup Panel.

Classic menus. From the Statistics menu, select Distribution Fitting to display the Distribution Fitting Startup Panel.

The Startup Panel contains one tab: Quick. Use this module to fit various distributions to your data. See also the Distribution Fitting Index.

See also the Distribution Fitting Index, Overviews, and Example.

Continuous Distributions. Select the Continuous Distributions option button to in order to fit a continuous distribution to your data. Select from the list of distributions, which are described in detail below.

Normal. Select Normal to fit the continuous distributions to your data as described here. The normal distribution (the "bell-shaped curve," which is symmetrical about the mean) is a theoretical function commonly used in inferential statistics as an approximation to sampling distributions (see also elementary concepts). In general, the normal distribution provides a good model for a random variable, when:

  1. There is a strong tendency for the variable to take a central value;

  2. Positive and negative deviations from this central value are equally likely;

  3. The frequency of deviations falls off rapidly as the deviations become larger.

As an underlying mechanism that produces the normal distribution, one may think of an infinite number of independent random (binomial) events that bring about the values of a particular variable. For example, there are probably a nearly infinite number of factors that determine a person's height (thousands of genes, nutrition, diseases, etc.). Thus, height can be expected to be normally distributed in the population.

The normal distribution function is determined by the following formula:

f(x) = 1 / [(2 * p)1/2 * s] * e**{-1/2*[(x-m)/s]2 }, for -∞ < x < ∞

where m is the mean, s is the standard deviation, e is the base of the natural logarithm, sometimes called Euler's e (2.71...), and p is the constant Pi (3.14...).

Rectangular. Select Rectangular to fit the continuous distributions to your data as described here. The rectangular distribution is useful for describing random variables with a constant probability density over the defined range a<b:

f(x) = 1/(b-a) a<x<b  

  = 0 elsewhere

where a<b are constants.

Exponential. Select Exponential to fit the continuous distributions to your data as described here. If T is the time between occurrences of rare events that happen on the average with a rate l per unit of time, then T is distributed exponentially with parameter l (Lambda). Thus, the exponential distribution is frequently used to model the time interval between successive random events. Examples of variables distributed in this manner would be the gap length between cars crossing an intersection, life-times of electronic devices, or arrivals of customers at the check-out counter in a grocery store.

The exponential distribution function is defined as:

f(x) = l*e-lx

0 <= x < ∞

l > 0

where l (Lambda) is an exponential function parameter (an alternative parameterization is scale parameter b = 1/l), and e is the base of the natural logarithm, sometimes called Euler's e (2.71...).

Gamma. Select Gamma to fit the continuous distributions to your data as described here. The probability density function of the exponential distribution has a mode of zero. In many instances, it is known a priori that the mode of the distribution of a particular random variable of interest is not equal to zero (e.g., when modeling the distribution of the lifetimes of a product such as an electric light bulb, or the serving time taken at a ticket booth at a baseball game). In those cases, the Gamma distribution is more appropriate for describing the underlying distribution.

The Gamma distribution is defined as:

f(x) = {1/[bG(c)]}*[x/b]c-1*e-x/b

0 <= x, c > 0

where G (Gamma) is the Gamma function, c is the shape parameter, b is the scale parameter, and e is the base of the natural logarithm, sometimes called Euler's e (2.71...).

Log-normal. Select Log-normal to fit the continuous distributions to your data as described here. The lognormal distribution is often used in simulations of variables such as personal incomes, age at first marriage, or tolerance to poison in animals. In general, if x is a sample from a normal distribution, then y = ex is a sample from a log-normal distribution.

Thus, the lognormal distribution is defined as:

where m is the location parameter, s is the scale parameter, and e is the base of the natural logarithm, sometimes called Euler's e (2.71...).

Chi-square. Select Chi-square to fit the continuous distributions to your data as described here. The sum of v independent squared random variables, each distributed following the standard normal distribution, is distributed as Chi-square with v degrees of freedom. This distribution is most frequently used in the modeling of random variables (e.g., representing frequencies) in statistical applications.

The Chi-square distribution is defined by:

f(x) = {1/[2n/2 * G(n/2)]} * [x(n/2)-1 * e-x/2]

n = 1, 2, ..., 0 < x

where n is the degrees of freedom, e is the base of the natural logarithm, sometimes called Euler's e (2.71...), and G (Gamma) is the Gamma function.

Note: See Survival Analysis for Weibull and Gompertz distributions (for complete/censored samples). Distribution and function fitting is also available in the Process Analysis, GLZ, Nonlinear Estimation, and Time Series modules, as well as from the Graphs tab or menu.

Discrete Distributions. Select this option button in order to fit a discrete distribution to your data. Select from the list of distributions, which are described in detail below.

Binomial. Select Binomial to fit the discrete distributions to your data as described here. The binomial distribution is useful for describing distributions of binomial events, such as the number of males and females in a random sample of companies, or the number of defective components in samples of 20 units taken from a production process.

The binomial distribution is defined as:

f(x) = [n!/(x!*(n-x)!)]*px * qn-x

for x = 0,1,2,...,n

where p is the probability that the respective event will occur, q is equal to 1-p, and n is the maximum number of independent trials.

Poisson. Select Poisson to fit the discrete distributions to your data as described here. The poisson distribution is sometimes referred to as the distribution of rare events. Examples of poisson distributed variables are number of accidents per person, number of sweepstakes won per person, or the number of catastrophic defects found in a production process.

The poisson distribution is defined as:

f(x) = (lx * e-l) / x!

for x = 0,1,2,..., 0 < l

where l (Lambda) is the expected value of x (the mean), and e is the base of the natural logarithm, sometimes called Euler's e (2.71...).

Geometric. Select Geometric to fit the discrete distributions to your data as described here. If independent Bernoulli trials are made until a "success" occurs, then the total number of trials required is a geometric random variable.

The geometric distribution is defined as:

f(x) = p*(1-p)x

for x = 1,2,...

where p is the probability that a particular event (e.g., success) will occur.

Bernoulli. Select Bernoulli to fit the discrete distributions to your data as described here. This distribution best describes all situations where a "trial" is made resulting in either "success" or "failure," such as when tossing a coin or when modeling the success or failure of a surgical procedure.

The Bernoulli distribution is defined as:

f(x) = px * (1-p)1-x

for x Î {0,1}

where p is the probability that a particular event (e.g., success) will occur.

OK.  Select one of the option buttons on the Distribution Fitting Startup Panel - Quick tab, select a distribution type, and click the OK button to display either the Fitting Continuous Distributions dialog box or the Fitting Discrete Distributions dialog box, according to which option button you selected.

Cancel. Click the Cancel button to close the Startup Panel without performing an analysis.

Options. See Options Menu for descriptions of the commands on this menu.

Open Data. Click the Open Data button to display the Select Data Source dialog box, in which you can choose the spreadsheet on which to perform the analysis. The Select Data Source dialog box contains a list of the spreadsheets that are currently active.

Select Cases. Click the Select Cases button to display the Analysis/Graph Case Selection Conditions dialog box, which contains options to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the case selection conditions overview, syntax summary, and dialog box description.

W. Click the W (Weight) button to display the Analysis/Graph Case Weights dialog box, which contains options to adjust the contribution of individual cases to the outcome of the current analysis by "weighting" those cases in proportion to the values of a selected variable.