Example 1: Actuarial
In this example, you will compute a life (survival) table for these
data, estimate the survival, probability, and hazard functions for different
time intervals, and see which theoretical distribution best fits the survival
function. See Survival
Analysis Examples - Overview and Data File for a description of the
data file used in this example.
Specifying the Analysis.
Open the Heart.sta data file
via the File - Open Examples
menu; it is in the Datasets folder.
Then, select Survival Analysis from the Statistics
- Advanced Linear/Nonlinear Models menu to display the Survival and Failure Time Analysis Startup
Next, double-click Life tables &
Distributions to display the Life Table & Distribution of Survival
Analysis module will automatically "understand" dates
as well as any other measurements of survival times. If you click the
Variables button and select 6
variables, then STATISTICA will
interpret the first three variables as the month, day, and year, respectively,
marking the beginning of the respective observation, and the subsequent
three variables as the month, day, and year, marking the termination of
the observation (due to failure or censoring).
Now, click the Variables button
to display the standard variable
selection dialog. Here, select the first 6 variables as the Survival times (1), dates (2 or 6).
As explained above, STATISTICA
will interpret the first and fourth variable in the list as months, the
second and fifth as days, and the third and sixth as years. Next, specify
variable Censored as the Censoring indicator variable in the
variable selection dialog.
Click the OK button to return
to the Life Table & Distribution of Survival
Times dialog, which will now look like this.
Double-click in the Code for complete
responses field to display the Variable 7 dialog. Here, select Complete and click the OK
button. In the same manner, double-click the Code
for censored responses field and select Censored.
In addition, you could specify the
Number of intervals for the life table, or the Stepsize
(interval width). You could also specify whether the intervals
in which there are no deaths/terminations will be adjusted so that survival
distributions can be fitted by selecting the Correct
intervals containing no terminations/deaths check box when fitting
survival distributions, and clearing this check box when generating a
life table for descriptive purposes only.
Reading an Aggregated
Life Table. Note that instead of raw data, the Survival Analysis module will also
accept already tabulated survival times as input (select the Table of survival times tab).
Specifically, a file with tabulated data should contain 3 variables
with the following information:
lower limits for each time interval,
number of individuals withdrawn alive from each interval, and
number of individuals dying in each interval.
This is not the case in the Heart.sta
data file, so return to the Raw data tab.
We are now ready to begin the analysis. Accept all other default
selections and click the OK button.
After all cases have been processed the Life Table & Survival Time Distribution
Results dialog will be displayed.
Click the Summary: Life table
button to display a spreadsheet of the complete life table.
Note that only a partial listing of the complete life table is shown
in the spreadsheet illustration above.
Fitting a Theoretical
Survival Distribution. The Survival Analysis
module will fit the major theoretical survival time distributions to the
data, using ordinary and two methods of weighted least squares estimation.
Now, to choose the best fitting distribution, look first at the exponential
distribution (select Exponential
in the Results for model box).
Click the Parameter estimates
button to display the parameter estimates for that distribution as well
as the goodness of fit Chi-square
in a spreadsheet.
Goodness of fit. The logic of
this goodness of fit Chi-square
test is described in the Introductory
Overview. In short, the test is based on the comparison of the likelihood
of the respective model with the null model; that is, the model that allows
for separate hazard estimates in each interval. If this test is significant,
you can conclude that the fitted distribution is significantly different
from the observed data, and therefore, you reject it as a model for the
survival times. In the illustration above, none of the different parameter
estimates for the exponential distribution seems to fit the observed survival
Plot of survival function. To
see the lack of fit, click the Plot
of survival function button on the Function plots tab. As you can
see below, none of the lines approximates the observed distribution very
well. It seems that the observed survival times drop off faster than what
would be expected under this distribution.
Choosing a distribution. You
can review the parameter estimates for the different distributions by
first selecting the distribution from the Results
for model box and then clicking the
Parameter estimates button on the Quick tab. If you review all of
the distributions, you will find that the only one yielding a non-significant
fit is the Weibull
distribution with weighted least squares parameter estimates.
Shown below is a plot of the survival function with the expected values
under the Weibull distribution
indicated as lines in the plot. (Click the Plot
of survival function button on the Function plots tab.)
It appears that the third set of parameters (Weight
3) provides a reasonable fit to the data; the Chi-square
test for that model is not significant (p=.56). Therefore, you would conclude
that the Weibull
distribution with the third set of parameters provides a good theoretical
model for the data.
Hazard and probability density function.
The Introductory Overview
describes the computation of the hazard rate and probability density function.
In short, the hazard rate is an estimate of the probability (per time
unit) that an observation that has not failed prior to a particular interval
will fail in that interval; the probability density function is an estimate
of the probability density of failure per time unit in the respective
In order to evaluate the goodness of fit of the chosen theoretical distribution,
you can also review these functions in plots, together with the values
for the observed distribution (click the Plot
of hazard function and Plot of
probability density function buttons on the Function plots tab). Usually, the
hazard rate will increase over time (see below), because the probability
of failure generally increases as time progresses.
The probability density will usually decrease over time (see below),
reflecting the fact that, overall, the probability (density) of failure
is greater in the earlier time intervals.
See also, Survival