# Survival Analysis Regression Models - Cox's Proportional Hazard Model with Time-Dependent Covariates

An assumption of the proportional hazard model is that the hazard function for an individual (i.e., observation in the analysis) depends on the values of the covariates and the value of the baseline hazard. Given two individuals with particular values for the covariates, the ratio of the estimated hazards over time will be constant, hence the name of the method: the proportional hazard model. The validity of this assumption may often be questionable. For example, age is often included in studies of physical health. Suppose we study survival after surgery. It is likely that age is a more important predictor of risk immediately after surgery than some time after the surgery (after initial recovery). In accelerated life testing, sometimes a stress covariate (e.g., amount of voltage) is used that is slowly increased over time until failure occurs (e.g., until the electrical insulation fails; see Lawless, 1982, page 393). In this case, the impact of the covariate is clearly dependent on time. We can specify arithmetic expressions to define covariates as functions of several variables and survival time.

Stratification. In the case of categorical covariates, such as whether a patient has had surgery, Kalbfleisch and Prentice (1980) recommend to perform a stratified analysis. The Survival Analysis module is used to fit the proportional hazard model to the data, separately for each group in a stratified analysis. In this manner, you can explicitly allow the hazard function to be different in each group.

Testing the Proportionality Assumption. As indicated by the previous examples, there are many applications where it is likely that the proportionality assumption does not hold. In this case, covariates can explicitly be defined as functions of time. For example, the analysis of a data set presented by Pike (1966) consists of survival times for two groups of rats that had been exposed to a carcinogen (see also Lawless, 1982, page 393, for a similar example). Suppose that z is a grouping variable with codes 1 and 0 to denote whether or not the respective rat was exposed. You could then fit the proportional hazard model:

h(t,z) = h0(t)*exp{b1*z + b2*[z*log(t)-5.4]}

Thus, in this model the conditional hazard at time t is a function of (1) the baseline hazard h0, (2) the covariate z, and (3) of z times the logarithm of time. Note that the constant 5.4 is used here for scaling purposes only: the mean of the logarithm of the survival times in this data set is equal to 5.4. In other words, the conditional hazard at each point in time is a function of the covariate and time; thus, the effect of the covariate on survival is dependent on time; hence the name time-dependent covariate. With this model, you can specifically test the proportionality assumption. If parameter b2 is statistically significant (e.g., if it is at least twice as large as its standard error), you can conclude that, indeed, the effect of the covariate z on survival is dependent on time, and, therefore, that the proportionality assumption does not hold.

Specifying Time-Dependent Covariates. In the Survival Analysis module, you can type in arithmetic expressions to define the time-dependent covariates. These expressions can contain all standard arithmetic and logical operators and functions, and thus a wide variety of models can be specified.

Estimating Parameters for Time-Dependent Covariates. Technical Notes contains a description of the parameter estimation procedure for Cox proportional hazard regression models with time-dependent covariates. In general, the partial likelihood (see Breslow, 1974) for these types of models is slightly modified, to reflect the respective transformation of the covariates. As usual, the partial likelihood for a given set of parameters is the geometric sum of the likelihood across cases. For time-dependent covariates, to compute the partial likelihood for a particular case, the program must process all cases with survival times as long or longer. Thus, when the data set is large, these computations may require noticeably more time than those necessary to estimate models with fixed covariates only.

Categorical Variables and Coding. The arithmetic expressions that define the covariates do not have to include references to survival time. Instead, you can specify some functions of two or more other covariates. This may be, for example, a convenient method for evaluating models for data collected in multi-factor experiments. For each factor, you can create a variable in the data file to define the desired contrasts. The logic and selection of a priori contrast coefficients is explained in detail in General ANOVA/MANOVA. When specifying the covariates for the proportional hazard regression model, you can then type in the respective multiplications to define the interaction terms. For example, suppose Factor A has two levels. All individuals assigned to the first level of the factor were assigned a value of -1 in the respective variable in the data file (variable A), all individuals assigned to the second level of that factor were assigned a value of +1. A second Factor B, also with two levels, was coded in the same manner (variable B). You could now specify as the covariates variables A and B, and the expression A * B as a third covariate to test for the interaction between the two factors in the experiment.

Segmented Time-Dependent Covariates. When specifying the arithmetic expressions for the time-dependent covariates, you can follow the same syntax as that used for entering spreadsheet formulas for transforming individual variables in the file. In some cases one may hypothesize that the effect of one or more covariates on the hazard is a non-continuous function of time. For example, the hazard for a patient after surgery may depend on age during the first two days after the operation, and thereafter on some other factors. In that case, you can use the same logical operators that are also supported in spreadsheet formulas. For example, you could specify a time dependent covariate as:

Age * (T_ <= 2)

Note that the logical expression T_ <= 2 will evaluate to 0 (false) if the survival time for an individual is greater than 2, and to 1 (true) otherwise. Thus, the parameter estimated for this time-dependent covariate pertains to the effect of age during the first two days only.