Lab 3: Cox Proportional Hazards Model
Setup
We will need to load the following packages before proceeding:
The Veterans’ Administration Lung Cancer Study
This time, we will work with the Veterans’ Administration Lung Cancer study (veteran) dataset from package survival.
According to Amaka and Etikan (2022), we have the following background information:
This study will be using the Veterans’ Administrative Lung Cancer dataset (
RDataset) of 137 patients having advanced inoperable Lung cancer, carried out by the US Veterans Administration. Out of 137 male patients, 69 patients received the Standard test while 68 patients received Chemotherapy test and 64 patients died in each treatment group while only 9 patients were alive (censored) at the end of the study.
Let us load the dataset.
We have a training set size \(n = 137\) subjects. That said, we will specifically work with these columns:
-
trt: The type of treatment received by the patient,1forStandardtest and2forChemotherapytest. It is of factor-type. -
celltype: Cell types of the tumor (cell type(1)-squamouscancer, cell type(2)-smallcellcancer, cell type(3)-adenocarcinoma, and cell type (4)-largecell carcinoma). It is of factor-type. -
Survival
timein days. -
Censoring
status:0for censored patient and1for dead patient. -
karno: Karnofsky performance score. This numerical variable tells us patients’ overall status at the start of the study, with100being a good status.
We will select our variables of interest. Then, we will rename trt levels 1 and 2 as Standard and Chemotherapy levels respectively.
We are mainly interested in the following:
-
In terms of the specific cancer treatments, is there a statistical difference between the
StandardandChemotherapytreatments? If so, can we quantify it? Moreover, can we obtain the corresponding estimated survival functions? - Does the Karnofsky performance score have a statistical impact on our hazard function? I.e., does a higher score lead to a statistically significant decrease in hazard? If so, by how much?
-
According to a consultation I had with a surgical oncologist (just to know more about this type of cancer), cell type(1)-
squamouscancer is not “that life-threatening.” Is that the case in this population (i.e., male patients having advanced inoperable Lung cancer) when compared to other cancer cell types? If so, by how much? Moreover, can we obtain the corresponding estimated survival functions by cancer cell type?
Q1.1. Exploratory Data Analysis
Based on our main statistical inquiries, let us answer the following:
Q1.1.1. What is our primary response of interest? What is its nature?
Available for MDS students.
Now, let us create a suitable side-by-side plots comparing survival time on the \(y\)-axis by each level in trt on the \(x\)-axis while controlling by censorship status. Run the following chunk of code to create the side-by-side plots for comparison:
Also note the following absolute frequencies by status (1 indicates a dead patient and 0 a censored one):
Q1.1.2. What do you observe about the relationship of survival times between treatments when controlling for censorship status?
Available for MDS students.
Now, let us create a suitable side-by-side plots comparing survival time on the \(y\)-axis by each level in celltype on the \(x\)-axis while controlling by censorship status with the following code:
Q1.1.3. What do you observe about the relationship of survival times between cancer cell type when controlling for censorship status? Note the following absolute frequencies by status (1 indicates a dead patient and 0 a censored one):
Available for MDS students.
Q1.2. Data Modelling Framework
Having defined the primary response and explanatory variables in this case, what is a suitable modelling framework?
Available for MDS students.
We will model the hazard function directly for the \(i\)th observation (\(i = 1, \dots, n\)) subject to the numerical regressor \(X_{i, \texttt{karno}}\) and the following dummy variables:
\[ X_{i, \texttt{chemotherapy}} = \begin{cases} 1 \; \; \; \; \text{if the patient was randomized to Chemotherapy},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]
\[ X_{i, \texttt{small}} = \begin{cases} 1 \; \; \; \; \text{cancer cell type(2)-small cell cancer},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]
\[ X_{i, \texttt{adeno}} = \begin{cases} 1 \; \; \; \; \text{cancer cell type(3)-adeno carcinoma},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]
\[ X_{i, \texttt{large}} = \begin{cases} 1 \; \; \; \; \text{cancer cell type(4)-large cell carcinoma},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]
Note that Standard treatment and cell type(1)-squamous cancer are the respective baseline categories.
We will model the hazard function as follows:
\[ \begin{align*} \lambda_i \left( t | X_{i, \texttt{karno}}, \dots, X_{i, \texttt{large}} \right) &= \lambda_0(t)\exp\big(\beta_1 X_{i, \texttt{karno}} + \beta_2 X_{i, \texttt{chemotherapy}} + \\ & \qquad \beta_3 X_{i, \texttt{small}} + \beta_4 X_{i, \texttt{adeno}} + \beta_5 X_{i, \texttt{large}}\big). \end{align*} \]
Note that we model the \(i\)th individual hazard function \(\lambda_i \left( t | X_{i, \texttt{karno}}, \dots, X_{i, \texttt{large}} \right)\) along with a baseline hazard \(\lambda_0(t)\), which is equal for all the \(n\) observations, multiplied by \(\exp\left(\beta_1 X_{i, \texttt{karno}} + \ldots + \beta_5 X_{i, \texttt{large}}\right)\).
Q1.3. Estimation
Using the corresponding fitting function, estimate the corresponding regression model of time (including censoring status) versus trt, celltype, and karno.
Q1.4. Inference
Using your fitted model, are the explanatory variables statistically associated to the survival time via the hazard function? State your conclusion with a significance level \(\alpha = 0.05\).
Available for MDS students.
Q1.5. Coefficient Interpretation
So far, we have solved our three inquiries in terms of inferential conclusions. Let us proceed to coefficient interpretation for those significant regressors. Obtain the estimates on the escale of the hazard function.
Available for MDS students.
Q1.6. Prediction
We still need to provide estimated survival function curves. Therefore, even though we already concluded that there is no difference between both cancer treatments, let us obtain their estimated survival functions.
Then, explain what you see in these plots.
Available for MDS students.
Now, let us do it for celltype. In this case, for our three estimated survival functions (note from our inferential conclusions, one celltype is not statistically significant with respect to the baseline level), we choose the baseline level Standard for trt and the mean value for karno.
Then, explain what you see in these plots.
Available for MDS students.
