Distribution
Fitting Introductory Overview  Types of Distributions
Bernoulli
Distribution.
This distribution best describes all situations where a "trial"
is made resulting in either "success" or "failure,"
such as when tossing a coin, or when modeling the success or failure of
a surgical procedure. The Bernoulli distribution is defined as:
where
p 
is the probability that a particular event (e.g., success)
will occur. 
Beta
Distribution. The beta
distribution arises from a transformation of the F
distribution and is typically used to model the distribution of order
statistics. Because the beta
distribution is bounded on both sides, it is often used for representing
processes with natural lower and upper limits. For examples, refer to
Hahn and Shapiro (1967). The beta
distribution is defined as:
where

is the Gamma function 
ν, ω 
are the shape parameters (shape1 and shape2, respectively) 
The animation above shows the beta
distribution as the two shape parameters change.
Binomial
Distribution. The binomial distribution is useful for describing
distributions of binomial events, such as the number of males and females
in a random sample of companies, or the number of defective components
in samples of 20 units taken from a production process. The binomial distribution
is defined as:
where
p 
is the probability that the respective event will occur 
q 
is equal to 1p 
n 
is the maximum number of independent trials. 
Cauchy
Distribution. The Cauchy distribution is interesting for
theoretical reasons. Although its mean can be taken as zero, since it
is symmetrical about zero, the expectation, variance, higher moments,
and moment generating function do not exist. The Cauchy distribution is
defined as:
where

is the location parameter (median) 

is the scale parameter 

is the constant Pi
(3.1415...) 
The animation above shows the changing shape of the Cauchy distribution
when the location parameter equals 0 and the scale parameter equals 1,
2, 3, and 4.
ChiSquare
Distribution. Chisquare
fits the continuous distributions to your data as described here. The
sum of v independent squared
random variables, each distributed following the standard normal distribution,
is distributed as Chisquare
with v degrees of freedom. This
distribution is most frequently used in the modeling of random variables
(e.g., representing frequencies) in statistical applications.
where

is the degrees of freedom 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 

(Gamma) is the Gamma function. 
The above animation shows the shape of the Chisquare
distribution as the degrees of freedom increase (1, 2, 5, 10, 25 and 50).
Exponential
Distribution. Exponential
fits the continuous distributions to your data as described here. If T is the time between occurrences of
rare events that happen on the average with a rate l
per unit of time, then T is distributed
exponentially with parameter l
(Lambda). Thus, the exponential distribution is frequently used to model
the time interval between successive random events. Examples of variables
distributed in this manner would be the gap length between cars crossing
an intersection, lifetimes of electronic devices, or arrivals of customers
at the checkout counter in a grocery store.
where

is an exponential function parameter (an alternative parameterization
is scale parameter
b=1/) 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
Extreme
Value. The extreme value distribution is often used to model
extreme events, such as the size of floods, gust velocities encountered
by airplanes, maxima of stock market indices over a given year, etc.;
it is also often used in reliability testing, for example in order to
represent the distribution of failure times for electric circuits (see
Hahn and Shapiro, 1967). The extreme value (Type I) distribution has the
probability density function:
where
a 
is the location parameter 
b 
is the scale parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
F
Distribution. Snedecor's F distribution
is most commonly used in tests of variance (e.g., ANOVA).
The ratio of two chisquares divided by their respective degrees of freedom
is said to follow an F distribution. The F distribution (for 0
x) has the probability density function (for = 1, 2, ...; = 1, 2, ...):
where
, 
are the shape parameters, degrees of freedom 

is the Gamma function 
The animation above shows various tail areas (pvalues)
for an F distribution with both degrees of freedom equal to 10.
Gamma
Distribution. The probability density function of the exponential
distribution has a mode of zero. In many instances, it is known a priori that the mode of the distribution
of a particular random variable of interest is not equal to zero (e.g.,
when modeling the distribution of the lifetimes of a product such as
an electric light bulb, or the serving time taken at a ticket booth at
a baseball game). In those cases, the gamma distribution is more appropriate
for describing the underlying distribution. The gamma distribution is
defined as:
where

is the Gamma function 
c 
is the shape parameter 
b 
is the scale parameter. 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The animation above shows the gamma distribution as the shape parameter
changes from 1 to 6.
Gaussian Distribution. The Gaussian distribution
is defined as the normal
distribution  a bellshaped function. The normal distribution (the
term first used by Galton, 1889) function is determined by the following
formula:
f(x) = 1/[(2*p)1/2 * s]
* e**{1/2*[(xm)/s]2}
∞ < x < ∞
where
m is the mean
s is the standard deviation
e is the base of the
natural logarithm, sometimes called Euler's e (2.71...)
p is the constant Pi
(3.14...)
Geometric
Distribution. If independent Bernoulli trials are made until
a "success" occurs, then the total number of trials required
is a geometric random variable. The geometric distribution is defined
as:
where
p 
is the
probability that a particular event (e.g., success) will occur. 
Gompertz
Distribution. The Gompertz distribution is a theoretical
distribution of survival times. Gompertz (1825) proposed a probability
model for human mortality, based on the assumption that the "average
exhaustion of a man's power to avoid death to be such that at the end
of equal infinitely small intervals of time he lost equal portions of
his remaining power to oppose destruction which he had at the commencement
of these intervals" (Johnson, Kotz, Balakrishnan, 1995, p. 25). The
resultant hazard function:
is often used in Survival
Analysis. See Johnson, Kotz, Balakrishnan (1995) for additional
details.
Johnson Distribution. Johnson (1949) described
a system of frequency curves that represents transformations of the standard
normal curve
(see Hahn and Shapiro, 1967, for details). By applying these transformations
to a standard normal variable, a wide variety of nonnormal distributions
can be approximated, including distributions that are bounded on either
one or both sides (e.g., Ushaped distributions).
See also, Johnson
From Moments Distribution and Fit
Johnson From Moments Function.
Laplace
Distribution. For interesting mathematical applications
of the Laplace distribution, see Johnson and Kotz (1995). The Laplace
(or Double Exponential) distribution is defined as:
and b>0
where
a 
is the location parameter (mean) 
b 
is the scale parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The graphic above shows the changing shape of the Laplace distribution
when the location parameter equals 0 and the scale parameter equals 1,
2, 3, and 4.
Logistic
Distribution. The logistic distribution is used to model
binary responses (e.g., Gender) and is commonly used in logistic
regression. The logistic distribution is defined as:
where
a 
is the location parameter (mean) 
b 
is the scale parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The graphic above shows the changing shape of the logistic distribution
when the location parameter equals 0 and the scale parameter equals 1,
2, and 3.
Lognormal
Distribution. The lognormal distribution is often used
in simulations of variables such as personal incomes, age at first marriage,
or tolerance to poison in animals. In general, if x is a sample from a
normal distribution,
then y = ex is a sample from
a lognormal distribution. Thus, the lognormal distribution is defined
as:
where
m 
is the scale parameter 
s 
is the shape parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The animation above shows the lognormal distribution with mu equal
to 0 for sigma equals .10, .30, .50, .70, and .90.
Normal
Distribution. The normal distribution (the "bellshaped
curve" which is symmetrical about the mean) is a theoretical function
commonly used in inferential statistics as an approximation to sampling
distributions (see also Elementary
Concepts). In general, the normal distribution provides a good model
for a random variable, when:
1. There is a strong tendency for the variable to take a central value;
2. Positive and negative deviations from this central value are equally
likely;
3. The frequency of deviations falls off rapidly as the deviations become
larger.
As an underlying mechanism that produces the normal distribution, one
may think of an infinite number of independent random (binomial) events
that bring about the values of a particular variable. For example, there
are probably a nearly infinite number of factors that determine a person's
height (thousands of genes, nutrition, diseases, etc.). Thus, height can
be expected to be normally distributed in the population. The normal distribution
function is determined by the following formula:
where

is the mean 

is the standard deviation 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 

is the constant Pi
(3.14...) 
The animation above shows several tail areas of the standard normal
distribution (i.e., the normal distribution with a mean of 0 and a standard
deviation of 1). The standard normal distribution is often used in hypothesis
testing.
Pareto
Distribution. The Pareto distribution is commonly used in
monitoring production processes (see Quality Control
and Process
Analysis). For example, a machine which produces copper wire
will occasionally generate a flaw at some point along the wire. The Pareto
distribution can be used to model the length of wire between successive
flaws. The standard Pareto distribution is defined as:
where
a 
is the shape parameter 
b 
is the scale parameter 
The animation above shows the Pareto distribution for the shape parameter
equal to 1, 2, 3, 4, and 5.
Poisson
Distribution. The Poisson distribution is also sometimes
referred to as the distribution of rare events. Examples of Poisson distributed
variables are number of accidents per person, number of sweepstakes won
per person, or the number of catastrophic defects found in a production
process. It is defined as:
where

(Lambda) is the expected
value of x (the mean) 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
Rayleigh
Distribution. If two independent variables y1
and y2 are independent
from each other and normally distributed with equal variance, then the
variable x = Ö
(y12+
y22)
will follow the Rayleigh distribution. Thus, an example (and appropriate
metaphor) for such a variable would be the distance of darts from the
target in a dartthrowing game, where the errors in the two dimensions
of the target plane are independent and normally distributed. The Rayleigh
distribution is defined as:
where
b 
is the scale parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The graphic above shows the changing shape of the Rayleigh distribution
when the scale parameter equals 1, 2, and 3.
Rectangular
Distribution. The rectangular distribution is useful for
describing random variables with a constant probability density over the
defined range a<b.
where
Student's
t Distribution. The student's t distribution is symmetric
about zero, and its general shape is similar to that of the standard normal distribution.
It is most commonly used in testing hypothesis about the mean of a particular
population. The student's t distribution is defined as (for
= 1, 2, . . .):
where

is the shape parameter, degrees of freedom 

is the Gamma function 

is the constant Pi
(3.14 . . .) 
The shape of the student's t distribution is determined by the degrees
of freedom. As shown in the animation above, its shape changes as the
degrees of freedom increase.
Weibull
Distribution. As described earlier, the exponential
distribution is often used as a model of timetofailure measurements,
when the failure (hazard) rate is constant over time. When the failure
probability varies over time, then the Weibull distribution is appropriate.
Thus, the Weibull distribution is often used in reliability testing (e.g.,
of electronic relays, ball bearings, etc.; see Hahn and Shapiro, 1967).
The Weibull distribution is defined as:
where
b 
is the scale parameter 
c 
is the shape parameter 
e 
is the base of the natural logarithm, sometimes called Euler's
e (2.71...) 
The animation above shows the Weibull distribution as the shape parameter
increases (.5, 1, 2, 3, 4, 5, and 10).