Lecture 4

Linear Mixed-effects Models

(Please, sign in on iClicker)

Today’s Learning Goals

By the end of this lecture, you should be able to:

Identify the model assumptions in a linear Mixed-effects model.
Associate a term (or combination of terms) in a Mixed-effects model with the following quantities:
- Fixed effect estimates.
- Variances of the random effects.
- Regression coefficients for each group and population.
- Predictions on existing groups and a new group.

Today’s Learning Goals (cont’d)

Fit a linear Mixed-effects model in R, and extract estimates of the above quantities.
Identify the consequences of fitting a Mixed-effects linear regression model when there are groups, whether a slope parameter is pooled or fit separately per group.
Explain the difference between the distributional assumption on the random effects and the fixed effects estimates’ sampling distribution.

Outline

Linear Fixed-effects Model
Linear Mixed-effects Model

1. Linear Fixed-effects Model

Let us start with different modelling techniques you learned in DSCI 561.
So far, we have been working with regression models fitted with a training set of $n$ independent elements.
Given a set of $k$ regressors $X_{i,j}$ and a continuous response $Y_i$, we fit a model:

\[Y_i = \beta_0 + \beta_1 X_{i,1} + \beta_2 X_{i,2} + \ldots + \beta_k X_{i,k} + \varepsilon_{i} \; \; \; \; \text{for} \; i = 1, \ldots, n.\]

Parameters $\beta_0, \dots, \beta_k$ are called fixed effects.

1.1. Grunfeld’s Investment Dataset

Consider the following example: to study how gross investment depends on the firm’s value and capital stock, Grunfeld (1958) collected data from eleven different American companies over the years 1935-1954.

The data frame Grunfeld, from package {AER}, contains 220 observations from a balanced panel of 11 sampled American firms from 1935 to 1954 (20 observations per firm). The dataset includes a continuous response investment subject to two explanatory variables, market_value and capital.

Data Summary

Firstly, we will load the data which has the following variables (those continuous refer to millions of USD):

investment: the gross investment, a continuous response.
market_value: the firm’s market value, a continuous explanatory variable.
capital: stock of plant and equipment, a continuous explanatory variable.

firm: a nominal explanatory variable with eleven levels indicating the firm (General Motors, US Steel, General Electric, Chrysler, Atlantic Refining, IBM, Union Oil, Westinghouse, Goodyear, Diamond Match, and American Steel).
year: the year of the observation (it will not be used in our analysis).

Data in `R`!

iClicker Question

What class of data hierarchy do you observe in this dataset? Do you expect any class of correlation within the data points? Think about each hierarchical level as a possible sampling pool.

A. Yes, we have a data hierarchy with one level: firm. Still, there will not be a correlation among subsets of data points.

B. Yes, we have a data hierarchy with one level: firm. Hence, there will be a correlation among subsets of data points.

C. There is no data hierarchy at all. All observations in the training set are independent.

D. Yes, we have a data hierarchy with two levels: firm (level 1) and the corresponding yearly observations (level 2). Hence, there will be a correlation among subsets of data points.

Main Statistical Inquiries

We are interested in assessing the association of gross investment with market_value and capital in the population of American firms.
Then, how can we fit a linear model to this data?

1.2. Exploratory Data Analysis

Let us plot the 220 data points of investment versus market_value but facetted by firm and use geom_smooth() to fit sub-models by firm.

Heads-up: Only for plotting, we will transform both $x$ and $y$-axes on the logarithmic scale in base 10 (trans = "log10"). This allows us to compare those firms under dissimilar market values, capital, and gross investments.

Plots!

More plots!

In-class Question

What do you observe in the previous plots?

1.3. OLS General Modelling Framework

Before digging into today’s new modelling approach, let us try a classical approach via Ordinary Least-squares (OLS) for comparative purposes.

Heads-up: Always keep in mind the main statistical inquiry when taking any given modelling approach.

1.3.1. Regression Alternatives

Based on what we have seen via OLS in DSCI 561, there might be four possible approaches:

Take the average for each firm, and fit an ordinary least-squares (OLS) regression on the averages. This is not an ideal approach.
We could ignore firm. This is not an ideal approach.
Allow different intercepts for each firm.
Allow a different slope and intercept for each firm (i.e., an interaction model!).

1.3.2. OLS Ignoring Firm

Let us start with this basic OLS model to warm up our modelling skills regarding setting up equations.
The regression equation for the $i$th sampled observation will be:

\[\begin{align*} \texttt{investment}_{i} &= \beta_0 + \beta_1 \texttt{marketValue}_{i} + \beta_2\texttt{capital}_{i} + \varepsilon_{i} \\ & \qquad \qquad \qquad \qquad \qquad \qquad \text{for} \; i = 1, \ldots, 220. \end{align*}\]

Code in `R!`

1.3.3. OLS Regression with Varying Intercept

Let us estimate another OLS model with investment as a response to market_value and capital as regressors but with varying intercepts by each firm.
We can use lm() by adding - 1 on the right-hand side of the argument formula.
This - 1 will allow the baseline firm to have its intercept (i.e., renaming (Intercept) in column estimate with firmCompanyName).

Fitting model in `R`!

Glancing model in `R`!

Model Comparison

By checking the adj.r.squared, we see that model_varying_intercept has a larger value (0.959) than ordinary_model (0.816).
Note that model_varying_intercept is equivalent to just fitting the OLS model using:

formula = investment ~ market_value + capital + firm.

Let us check the above fact!

Using `lm()` without `-1`

$F$-test

Going back to model_varying_intercept and ordinary_model, we can test if there is a gain in considering a varying intercept versus fixed intercept.
Hence, we will make a formal $F$-test to check whether the model_varying_intercept fits the data better than the ordinary_model.

Testing Conclusion

We obtain a $p\text{-value} < .001$.
Thus, with $\alpha = 0.05$, we have evidence to conclude that model_varying_intercept fits the data better than the ordinary_model.

However, this costs us one extra degree of freedom per firm except for the baseline.
Therefore, we lose another 10 degrees of freedom (column DF in the anova() output).

Heads-up: In this specific case, losing 10 degrees of freedom is not a big deal with 220 data points. Nonetheless, when data is scarce or the model’s complexity demands a large amount of regression parameters, this could be an issue.