In this lab assignment, you’ll be predicting and making inference/interpretation when data are (1) censored, and (2) ordinal.
These rubrics apply to your entire submission.
To get these mechanics marks, you are expected to:
To get the marks for this writing component, you should:
These marks apply to the collective code you write in this assignment.
For this exercise, we’ll use the
lung dataset that ships with the
survival R package. Consider the following variables:
time(with censor status in
ph.ecog(remove the single instance of 3 and NA)
Estimate the survival function using the Kaplan-Meier estimator. Then, estimate the survival function after making a Weibull distributional assumption. Display both estimates as a plot.
Hint: See the bottom of the
?survreg documentation to convert the intercept and scale parameters to Weibull’s shape and scale parameters.
Does your Kaplan-Meier estimate of the survival function allow for estimation of the mean? What about quantiles, are there any quantiles that cannot be estimated using this survival function? Why or why not?
Estimate the median and mean using both estimates of the survival function. If you need to calculated a restricted mean, indicate the restriction. Hint: check out Wikipedia’s page on the Weibull distribution for the mean of a Weibull distribution, noting that the Gamma function in R is
Estimate the mean and median, this time ignoring the censoring (don’t bother with the parametric assumption). How do the estimates compare? How would you expect them to compare, and why?
Plot the survival times along with the two predictors. The response should be mapped to the vertical axis. Be sure to indicate whether or not an observation is censored.
Fit a Proportional Hazards model to survival time using your predictors.
Which predictors appear to have an influence on the response under a 0.05 significance level? Don’t concern yourself with the issue of multiple testing.
Choose one regression coefficient to interpret. Ideally, it should be “significant”, but for the purpose of this exercise, it doesn’t have to be. Be sure to also indicate its estimate.
Use your proportional hazards regression model to obtain an estimate of the survival function associated with one of the patients in the dataset. Of the (hypothetical) population of patients having these same predictor values, what is the 0.8-quantile?
For the same hypothetical population in Exercise 1.6, plot a density estimate. Hint: you could generate lots of data using the survival function, then use
Reproduce your plot from Exercise 1.4, this time adding a model function of your choice (mean? median? some other quantile?). The model function should exist at least somewhere within the span of the predictor space.
For this exercise, we’ll use the happiness survey data collected by the GSS at NORC. Consider the following variables:
Not too happy<
Plot the data (only include the three variables). The response should be mapped to the vertical axis.
Fit a Proportional Odds model, with no interaction term.
Choose one regression coefficient to interpret. Be sure to also indicate its estimate.
Repeat the regression exercise, this time using linear regression (assuming that the response levels are actually numeric). In addition, state two negative consequences of using a linear regression model with these data. Are these consequences dire?
Choose one survey respondent. Using your proportional odds model, make both a probabilistic forecast (displayed as a plot) and a modal (i.e. the mode) prediction.
Plot the model functions for both linear regression and the proportional odds model. No points will be awarded if you only plot the linear regression model.