- Worksheet: using the
`mice`

package - 15-minute break: 10 of which is to be used for instructor evaluations.
- Discussion: types of missingness
- Review of the course (time permitting)

- Identify and explain the three common types of missing data mechanisms.
- Identify a potential consequence of removing missing data on downstream analyses.
- Identify a potential consequence of a mean imputation method on downstream analyses.
- Identify the three steps involved with a multiple imputation method for handling missing data.
- Use the
`mice`

package in R to fit multiply imputed models.

Tutorials on the `mice`

package, in increasing order of depth:

- my brief
`mice`

tutorial - data science plus
- jstat article on
`mice`

R package- Extensively covers the idea behind the
`mice`

package. - Section 2.2 provides a good overview of the package workflow, which I recommend reading.
- Section 2.3 provides details on the imputation. It’s theoretical, so only take a look if you really need to know how the imputation is happening.
- Section 2.4 provides an example of using the
`mice`

package. Perhaps more verbose than it needs to be?

- Extensively covers the idea behind the

If you want a deeper look on multiple imputation:

- Book: “Multiple Imputation for Nonresponse in Surveys” by Rubin (1987), available through UBC library.
- Extensively covers the theory behind multiple imputation.
- Page 76-77 covers pooling of models, and uses variable names that you see in the output of
`mice::pool()`

.

- There are three common missing data mechanisms:
- Missing Completely At Random (MCAR): when the chance of missingness does not depend on any variable; missingness is totally random.
- Missing At Random (MAR): when the chance of missingness depends on other observed variables.
- Missing Not At Random (MNAR): when the chance of missingness depends on unobserved variables.

- Proceeding with an analysis by removing missing data can result in a model with standard errors of the estimates that are larger than they could be by including partially complete records.
- Proceeding with an analysis by imputing missing data by an estimate of the mean can result in a model with standard errors of the estimates that are smaller than they ought to be.
- An approach that uses the information contained in partially complete records, yet does not assume any more information, is to use multiple imputations. The approach contains three steps:
- Form multiple datasets containing imputed values. Each dataset should be formed by imputing the missing records in each unit/row with a random draw from a predictive distribution for those records.
- Fit the model of interest on each imputed dataset.
- Combine the models to obtain one final model.