Example 1: Actuarial Life Table

In this example, you will compute a life (survival) table for these data, estimate the survival, probability, and hazard functions for different time intervals, and see which theoretical distribution best fits the survival function. See Survival Analysis Examples - Overview and Data File for a description of the data file used in this example.

Specifying the Analysis. Open the Heart.sta data file via the File - Open Examples menu; it is in the Datasets folder. Then, select Survival Analysis from the Statistics - Advanced Linear/Nonlinear Models menu to display the Survival and Failure Time Analysis Startup Panel.

Next, double-click Life tables & Distributions to display the Life Table & Distribution of Survival Times dialog.

The Survival Analysis module will automatically "understand" dates as well as any other measurements of survival times. If you click the Variables button and select 6 variables, then STATISTICA will interpret the first three variables as the month, day, and year, respectively, marking the beginning of the respective observation, and the subsequent three variables as the month, day, and year, marking the termination of the observation (due to failure or censoring).

Now, click the Variables button to display the standard variable selection dialog. Here, select the first 6 variables as the Survival times (1), dates (2 or 6). As explained above, STATISTICA will interpret the first and fourth variable in the list as months, the second and fifth as days, and the third and sixth as years. Next, specify variable Censored as the Censoring indicator variable in the variable selection dialog.

Click the OK button to return to the Life Table & Distribution of Survival Times dialog, which will now look like this.

Double-click in the Code for complete responses field to display the Variable 7 dialog. Here, select Complete and click the OK button. In the same manner, double-click the Code for censored responses field and select Censored.

In addition, you could specify the Number of intervals for the life table, or the Stepsize (interval width). You could also specify whether the intervals in which there are no deaths/terminations will be adjusted so that survival distributions can be fitted by selecting the Correct intervals containing no terminations/deaths check box when fitting survival distributions, and clearing this check box when generating a life table for descriptive purposes only.

Reading an Aggregated Life Table. Note that instead of raw data, the Survival Analysis module will also accept already tabulated survival times as input (select the Table of survival times tab).

Specifically, a file with tabulated data should contain 3 variables with the following information:

  1. The lower limits for each time interval,

  2. The number of individuals withdrawn alive from each interval, and

  3. The number of individuals dying in each interval.

This is not the case in the Heart.sta data file, so return to the Raw data tab.

Reviewing Results. We are now ready to begin the analysis. Accept all other default selections and click the OK button. After all cases have been processed the Life Table & Survival Time Distribution Results dialog will be displayed.

Click the Summary: Life table button to display a spreadsheet of the complete life table.

Note that only a partial listing of the complete life table is shown in the spreadsheet illustration above.

Fitting a Theoretical Survival Distribution. The Survival Analysis module will fit the major theoretical survival time distributions to the data, using ordinary and two methods of weighted least squares estimation. Now, to choose the best fitting distribution, look first at the exponential distribution (select Exponential in the Results for model box). Click the Parameter estimates button to display the parameter estimates for that distribution as well as the goodness of fit Chi-square in a spreadsheet.

Goodness of fit. The logic of this goodness of fit Chi-square test is described in the Introductory Overview. In short, the test is based on the comparison of the likelihood of the respective model with the null model; that is, the model that allows for separate hazard estimates in each interval. If this test is significant, you can conclude that the fitted distribution is significantly different from the observed data, and therefore, you reject it as a model for the survival times. In the illustration above, none of the different parameter estimates for the exponential distribution seems to fit the observed survival distribution.

Plot of survival function. To see the lack of fit, click the Plot of survival function button on the Function plots tab. As you can see below, none of the lines approximates the observed distribution very well. It seems that the observed survival times drop off faster than what would be expected under this distribution.

Choosing a distribution. You can review the parameter estimates for the different distributions by first selecting the distribution from the Results for model box and then clicking the Parameter estimates button on the Quick tab. If you review all of the distributions, you will find that the only one yielding a non-significant fit is the Weibull distribution with weighted least squares parameter estimates.

Shown below is a plot of the survival function with the expected values under the Weibull distribution indicated as lines in the plot. (Click the Plot of survival function button on the Function plots tab.)

It appears that the third set of parameters (Weight 3) provides a reasonable fit to the data; the Chi-square test for that model is not significant (p=.56). Therefore, you would conclude that the Weibull distribution with the third set of parameters provides a good theoretical model for the data.

Hazard and probability density function. The Introductory Overview describes the computation of the hazard rate and probability density function. In short, the hazard rate is an estimate of the probability (per time unit) that an observation that has not failed prior to a particular interval will fail in that interval; the probability density function is an estimate of the probability density of failure per time unit in the respective interval.

In order to evaluate the goodness of fit of the chosen theoretical distribution, you can also review these functions in plots, together with the values for the observed distribution (click the Plot of hazard function and Plot of probability density function buttons on the Function plots tab). Usually, the hazard rate will increase over time (see below), because the probability of failure generally increases as time progresses.

The probability density will usually decrease over time (see below), reflecting the fact that, overall, the probability (density) of failure is greater in the earlier time intervals.

See also, Survival Analysis Index.