Lecture 7

Quantile Regression

(Please, sign in on iClicker)

Today’s Learning Goals

By the end of this lecture, you should be able to:

Review what a quantile is.
Compare the error functions of Ordinary Least-squares (OLS) regression versus Quantile regression.
Recognize the impacts of parametric and distributional assumptions in Quantile regression.
Perform non-parametric Quantile regression.
Perform parametric Quantile regression.

Outline

The Quantiles
Applications of Quantile Regression
Parametric Quantile Regression
Non-Parametric Quantile Regression

Completing our regression toolbox!

We will close our regression mindmap this block with an approach on conditioned quantiles: Quantile regression.

We will check two approaches:
- Parametric (for inference and prediction).
- Non-parametric (for prediction).

1. The Quantiles

Let us set up an initial example to recap what a quantile is.
Suppose we measure the height of $n = 1000$ randomly sampled UBC students.
Here is the sample average in centimetres:

set.seed(123)

sample_heights <- draw_sample(1000)
sample_mean <- mean(sample_heights)

round(sample_mean,2)

[1] 175.11

Summary of data

Hence, let us get more information from the sample via summary() and var().

summary(sample_heights)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  155.3   170.6   175.1   175.1   179.7   197.7

paste("variance: ", round(var(sample_heights),2))

[1] "variance:  48.19"

iClicker Question

What about now? Do we have a better idea about the distribution?

A. Yes, having different quantiles gives us a better idea about the distribution of heights.

B. Not, at all!

Plot!

1.1. Let us recall what a quantile is!

Suppose we have a random variable $X\sim F_X(\cdot)$, i.e., $X$ has a cumulative distribution function (CDF) $F_X(\cdot)$.
Then, the $\tau$-quantile ($0\leq \tau \leq 1$), is given by $Q(\tau)$ such that \[\begin{align} F_X[Q(\tau)]= P[X\leq Q(\tau)] = \tau. \end{align}\]

In plain words, the quantile $Q(\tau)$ is that observed value of $X$ for which $\tau \times 100\%$ of the population’s data is below (or the area under the curve on the left-hand side of a given $x$).

As an example …

We can check the case of the 0.75-quantile in the Standard Normal distribution using its probability density function (PDF).

Moreover…

We can also show it using the CDF where the cumulative probabilities are on the $y$-axis.

1.2. Motivation for Quantile Regression

In the OLS regression model, we were focused on the conditional mean on the regressors:

\[ \mathbb{E}(Y_i \mid X_{i,j} = x_{i,j}) = \beta_0 + \beta_1 x_{i,1} + \ldots + \beta_k x_{i,k} \; \; \; \; \text{since} \; \; \; \; \mathbb{E}(\varepsilon_i) = 0. \]

Moreover, we discussed that (from the statistical point of view) regression models follow a stochastic relation.

Plot!

Nonetheless…

Is it possible to go beyond the conditioned mean?

Given the stochastic nature of the relation between the response and explanatory variables, we have a whole conditional distribution of the response $Y_i$, given the $k$ explanatory variables $X_{i,j}$ (i.e, regressors).

A conditioned $0.95$-quantile!

2. Applications of Quantile Regression

In many practical cases, we have inferential questions that are related to quantiles and not the mean.
We might want to see if, at different levels, the response is affected differently by the explanatory variables:

How do driver’s years of experience affect buses’ travel time when considering the 25% fastest travels, 50% fastest travels, and 90% fastest travels?

How does the alcohol price affect the weekly consumption of light drinkers, moderate drinkers, and heavy drinkers at given cut-off values of alcohol consumption?

Furthermore…

We might also be interested in a probabilistic prediction:

During December in Vancouver, with a 95% chance, what is the maximum snowfall threshold?

Furthermore (Cont’d)

Note we can translate these inquiries with $\tau$-quantiles ($\tau \in [0, 1]$) along with the regressor $X$ and response $Y$:

How do driver’s years of experience ($X$) affect buses’ travel time (Y) when considering the 25% fastest travels ($\tau = 0.25$), 50% fastest travels $(\tau = 0.5)$, and 90% fastest travels ($\tau = 0.9$)?

How does the alcohol price ($X$) affect the weekly consumption ($Y$) of light drinkers ($\tau = 0.25$), moderate drinkers ($\tau = 0.5$), and heavy drinkers $(\tau = 0.9)$ in given cut-off values in weekly alcohol consumption?

During December ($X$) in Vancouver, with a 95% ($\tau = 0.95$) chance, what is the maximum snowfall ($Y$) threshold?

2.1. The Teams Dataset

We will use the Teams dataset from the Lahman package.
It has $n = 3075$ observations of 48 variables related to yearly statistics and standings for teams (one row per team) from different American baseball leagues.

Quick data wrangling!

We will use the following columns:
- runs: Runs scored, a count-type variable.
- hits: Hits by batters, a count-type variable.

Main Statistical Inquiries

You are interested in determining the following using this recorded data:
1. Previous research has shown you that a large number of runs is associated with a large number of hits. Having said that, for those teams at the upper 75% threshold in runs, is this association significant? If so, by how much?
2. For any given team that scores 1000 hits in a future tournament, how many runs can this team score with a 50% chance (centred around this future tournament’s median runs)?

2.2. Exploratory Data Analysis

Let us create a scatterplot of hits (our regressor) versus runs (our response), even though these variables are not continuous.

3. Parametric Quantile Regression

We can address inquiries (1) and (2) via parametric Quantile regression.

But before digging into this model, it is necessary to recall some fundamentals from OLS.
In OLS, we use a linear combination on the right-hand side to explain the conditioned mean

\[ \mathbb{E}(Y_i \mid X_{i,j} = x_{i,j}) = \beta_0 + \beta_1 x_{i,1} + \ldots + \beta_k x_{i,k}. \]

Then…

We find $\hat{\beta}_0, \hat{\beta}_1, \dots, \hat{\beta}_k$ that minimize the squared error (loss function) in our training set of size $n$: \[ \sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_{i,1} - \ldots - \beta_k x_{i,k})^2. \]

3.1. General Modelling Framework

Suppose we have a random sample (or training data) of size $n$ with responses $Y_1, Y_2, \dots, Y_n$ where $i = 1, 2, \dots, n$.
Each $i$th response is subject to $k$ regressors $X_{i, 1}, X_{i, 2}, \dots, X_{i, k}.$
Furthermore, we are interested in modelling the response’s $\tau$-quantile, where $\tau \in [0, 1]$, conditioned on these $k$ regressors in a linear combination with an intercept $\beta_0(\tau)$ and $k$ regression coefficients $\beta_1(\tau), \beta_2(\tau), \dots, \beta_k(\tau)$.

The equation!

\[ Y_i = \underbrace{\beta_0(\tau) + \beta_1(\tau) X_{i,1} + \beta_2(\tau) X_{i,2} + \ldots + \beta_k(\tau) X_{i,k}}_{\text{Systematic Component}} + \underbrace{\varepsilon_i(\tau).}_{\text{Random Component}} \]

$\beta_j(\tau)$ (for $j = 0, 1, 2, \dots, k$) means that the regression term depends on the quantile $\tau$, which is the desired quantile.
For each quantile, we (might if our inquiry dictates so!) have a different regression function on the right-hand side of our modelling equation.
We still have a distribution-agnostic random component for the $i$th observation represented as $\varepsilon_i(\tau)$, which also depends on the quantile $\tau$.

Some mathematical rearrangements!

\[\begin{gather*} \varepsilon_i(\tau) = Y_i - \underbrace{\left[ \beta_0(\tau) + \beta_1(\tau) X_{i,1} + \beta_2(\tau) X_{i,2} + \ldots + \beta_k(\tau) X_{i,k} \right]}_{\text{Systematic Component}}. \end{gather*}\]

Quantile regression must ensure the following global condition: \[ P \left[ \varepsilon_i(\tau) < 0 \mid X_{i, 1}, X_{i, 2}, \dots, X_{i, k} \right] = \tau. \]

In a more practical setting…

Suppose you use the teams dataset to fit a Quantile regression of runs ($Y$) versus hits ($X_1$) on the response’s quantile $\tau = 0.75$.
Then, your regression equation for the $i$th observation becomes: \[ Y_i = \underbrace{\beta_0(\tau = 0.75) + \beta_1(\tau = 0.75) X_{i,1}}_{\text{Systematic Component}} + \underbrace{\varepsilon_i(\tau = 0.75).}_{\text{Random Component}} \]

This yields \[\begin{gather*} \varepsilon_i(\tau = 0.75) = Y_i - \underbrace{\left[ \beta_0(\tau = 0.75) + \beta_1(\tau = 0.75) X_{i,1} \right]}_{\text{Systematic Component}} \qquad \text{with} \\ P \left[ \varepsilon_i(\tau = 0.75) < 0 \mid X_{i, 1} \right] = \tau = 0.75. \end{gather*}\]

The estimated Quantile Regression Line with $\hat{\beta_0}(\tau = 0.75)$ and $\hat{\beta_1}(\tau = 0.75)$

Summarizing…

A parametric Quantile regression will condition the $\tau$-quantile ($\tau \in [0, 1]$) to $k$ regressors in a linear combination with an intercept $\beta_0(\tau)$ and $k$ regression coefficients: \[ Q_i( \tau \mid X_{i,j} = x_{i,j}) = \beta_0(\tau) + \beta_1(\tau) x_{i,1} + \ldots + \beta_k(\tau) x_{i,k}. \]

This is analogous to the conditional expected value that OLS aims to model such as: \[ \mathbb{E}(Y_i \mid X_{i,j} = x_{i,j}) = \beta_0 + \beta_1 x_{i,1} + \ldots + \beta_k x_{i,k}. \]

Heads-up!

Parametric Quantile regression will aim to find $\hat{\beta}_0(\tau), \hat{\beta}_1(\tau), \dots, \hat{\beta}_k(\tau)$ that minimize the following error function (loss function): \[\begin{equation*} \sum_{i} e_i[\tau - I(e_i < 0)] = \sum_{i: e_i \geq 0} \tau|e_i|+\sum_{i: e_i < 0}(1-\tau)|e_i|. \end{equation*}\]

Unlike OLS, where there is a single model fitting, any Quantile regression approach (either parametric or non-parametric) would require fitting as many models as quantiles we are interested in!

Some notes!

In the previous loss function we have the indicator variable: \[\begin{equation*} I(e_i < 0) = \begin{cases} 1 \; \; \; \; \mbox{if $e_i$ < 0},\\ 0 \; \; \; \; \mbox{otherwise;} \end{cases} \end{equation*}\]
The residual is defined as: \[\begin{gather*} e_i = y_i - \overbrace{\big[\hat{\beta_0}(\tau) + \hat{\beta_1}(\tau) x_{i,1} + \ldots + \hat{\beta_k}(\tau) x_{i,k}\big]}^{\hat{y_i}} \\ e_i = y_i - \hat{y_i} \end{gather*}\]
This error function is called fidelity (also known as check function or pinball loss function).

Now…

To understand how the fidelity function works, using the teams dataset, let us plot the three estimated parametric Quantile regression lines for quantiles $\tau = 0.25, 0.5, 0.75$.
Our response $Y$ is runs, whereas our regressor $X_1$ is hits.

Plot!

Heads-up!

A fundamental characteristic of a $\tau$-Quantile regression (either parametric or non-parametric) is that we will practically have the $\tau \times 100\%$ of our observed responses below our estimated regression line via $\hat{\beta}_0(\tau), \hat{\beta}_1(\tau), \dots, \beta_k(\tau)$.

How does the fidelity function behave?

Let us assume we have the same set of regressor values $x_{i,1}, \dots, x_{i,k}$ in the following panels for $\tau = 0.25, 0.5, 0.75$.
Moreover, we will have the corresponding set of estimates $\hat{\beta}_0(\tau), \hat{\beta}_1(\tau), \dots, \hat{\beta}_k(\tau)$ by $\tau = 0.25, 0.5, 0.75$.
We have the values of the corresponding fidelity function on the $y$-axis, whereas we have the corresponding values on the $x$-axis for the residual \[\begin{gather*} e_i = y_i - \overbrace{\big[\hat{\beta_0}(\tau) + \hat{\beta_1}(\tau) x_{i,1} + \ldots + \hat{\beta_k}(\tau) x_{i,k}\big]}^{\hat{y_i}} \\ e_i = y_i - \hat{y_i}. \end{gather*}\]

Plots!

From left to right!

$\tau = 0.25$

Recall the residual is $\quad e_i = y_i - \overbrace{\big[\hat{\beta_0}(\tau) + \hat{\beta_1}(\tau) x_{i,1} + \ldots + \hat{\beta_k}(\tau) x_{i,k}\big]}^{\hat{y_i}}.$

Note that for low values of $\tau$, we consider overpredicting \[e_i = y_i - \hat{y_i} < 0\] WORSE than underpredicting \[e_i = y_i - \hat{y_i} > 0.\]

From left to right! (Cont’d)

$\tau = 0.5$

For $\tau = 0.5$, we care equally about underpredicting and overpredicting.
This corresponds to the median value.

From left to right! (Cont’d)

$\tau = 0.75$

Recall the residual is $\quad e_i = y_i - \overbrace{\big[\hat{\beta_0}(\tau) + \hat{\beta_1}(\tau) x_{i,1} + \ldots + \hat{\beta_k}(\tau) x_{i,k}\big]}^{\hat{y_i}}.$

Note that for high values of $\tau$, we consider underpredicting \[e_i = y_i - \hat{y_i} > 0\] WORSE than overpredicting \[e_i = y_i - \hat{y_i} < 0.\]

Plot!

3.2. Estimation

If our linear systematic component has $k$ regression parameters (intercept AND coefficients), then our estimated Quantile regression function will interpolate $k + 1$ points.

For example, if we are fitting a line $Q_i( \tau | X = x_i) = \beta_0(\tau) + \beta_1(\tau) x_i$, we are estimating two parameters ($\hat{\beta_0}$ and $\hat{\beta_1}$). So, our fitted line is a line that passes through two points.
As we already mentioned, when we use the $\tau$-quantile with a training set of size $n$, approximately $n \times \tau$ points will be under the curve and $n \times (1 - \tau)$ will be above the curve.

For instance…

If we have $n = 100$ points to fit the $0.7$-quantile, then:
- The number of points above the line will be approximately 30.
- The number of points below the line will be approximately 70.

The “approximately” comes because we will have $k + 1$ points ON the estimated regression function.
Let us consider the teams dataset again. In this example, we will estimate the quantiles of runs as a linear function of hits. We can obtain these estimated $0.25$, $0.5$, and $0.75$-quantile regression lines via geom_quantile().

The code!

estimated_quantile_regression_lines <- ggplot(teams, aes(hits, runs)) +
  geom_point(alpha = 0.2, colour = "black") +
  geom_quantile(
    quantiles = c(0.25, 0.5, 0.75), aes(colour = as.factor(after_stat(quantile))),
    formula = y ~ x, linewidth = 1
  ) +
  theme_bw() +
  labs(
    x = "Number of Hits (X)",
    y = "Number of Runs (Y)"
  ) +
  theme(
    plot.title = element_text(size = 30, face = "bold"),
    axis.text = element_text(size = 26),
    axis.title = element_text(size = 26),
    legend.text = element_text(size = 24, margin = margin(r = 1, unit = "cm")),
    legend.title = element_text(size = 24, face = "bold"),
    legend.key.size = unit(1, "cm")
  ) +
  guides(colour = guide_legend(title = "Quantile", reverse = TRUE)) +
  ggtitle("Estimated Parametric Quantile Regression Lines of Hits versus Runs")

Plot!

Fit the model in `R`!

Via the {quantreg} package, we will use the rq() function to fit the parametric Quantile regression at $\tau = 0.25, 0.5, 0.75$.
It has a formula argument as the previous fitting functions along with data.
This function also allows us to fit as many Quantile regressions as we want via tau.

A quick note on the standard errors and $p$-values!

There is more than one way to obtain the standard errors and $p$-values of the regression estimates.
For the purpose of this course, we will use the pairwise bootstrapping method via the arguments se = "boot" and bsmethod = "xy".
Specifically, for this example, we will use R = 1000 replicates in function summary().

Modelling Output

3.3. Inference

We can determine whether a regressor is statistically associated with the $\tau$-quantile in our response through hypothesis testing for $\beta_j(\tau)$.
We will need the estimated regression coefficient $\hat{\beta_j}(\tau)$ and its bootstrapped standard error, $\mbox{se}\left[\hat{\beta_j}(\tau) \right]$.
You can test the below hypotheses via the test statistic $t_j = \frac{\hat{\beta_j}(\tau)}{\mbox{se}\left[\hat{\beta_j} (\tau) \right]}$: \[\begin{align*} H_0: \beta_j(\tau) &= 0 \\ H_a: \beta_j(\tau) &\neq 0. \end{align*}\]

Reminder!

Let us recall our inquiry (1):

Previous research has shown you that a large number of runs is associated with a large number of hits. Having said that, for those teams at the upper 75% threshold in runs, is this association significant? If so, by how much?

Note that this inferential question corresponds to the estimated $0.75$-Quantile regression.

Solving the inquiry!

Our sample gives us evidence to reject $H_0$ with a $p\text{-value} < .001$. So hits is statistically associated to the $0.75$-quantile of runs.

3.4. Coefficient Intepretation

We can quantify the statistical association of hits and the $0.75$-quantile of runs via the corresponding $\hat{\beta_1}(0.75)$.

The interpretation is: “for each one hit increase, the $0.75$-quantile of runs will increase by 0.5.”

3.5. Predictions

Now, let us check our inquiry (2):

For any given team that scores 1000 hits in a future tournament, how many runs can this team score with a 50% chance (centred around this future tournament’s median runs)?

Then…

We can obtain this prediction via our $0.25$ and $0.75$-Quantile regressions with predict().

Finally…

Our predictive inquiry can be answered as: “for any given team that scores 1000 hits in a future tournament, it is estimated to have a 50% ($0.75 - 0.25 \times 100\%$) chance to have between 452 and 556 runs.”
This is a probabilistic prediction!
Note we could also say that this team:
- has a 25% ($1 - 0.75 \times 100\%$) chance of achieving over 556 runs,
- has a 25% ($0.25 \times 100\%$) chance of getting less than 452 runs,
- and would typically get 507 runs (median).

4. Non-Parametric Quantile Regression

The predictive inquiry (2) can also be dealt with via non-parametric Quantile regression.
This class of Quantile regression implicates no distributional assumptions and no model function specification. It is a method based on splines.
The core idea is to allow the model to capture the local behaviour of the data (i.e., capturing its local structure).

4.1. General Modelling Framework

The struggle is choosing how local our model should be. In other words, what is the neighbourhood of a point $x$?

A parameter will control this. We will call it lambda, a penalization parameter:
- A small neighbourhood (i.e., a small lambda) will provide a better approximation, but with too much variability – we might be prone to overfitting.
- A too-large neighbourhood (i.e., a big lambda) will lose local information favouring smoothness – we might be prone to underfitting.

4.2. Estimation

In R, non-parametric Quantile regression is performed using the function rqss() from library quantreg.
It also works somewhat similarly to what we are used to with the lm() and glm() functions.
However, in rqss() each regressor must be introduced in the formula using the qss() function (as additive non-parametric terms). For example, qss(x, lambda = 10).

Syntax

The syntax would be:

rqss(y ~ qss(x, lambda = 10), tau = 0.5, data = my_data)

We use rqss() to run a non-parametric Quantile regression with runs as the response and hits as the regressor.
We use a tau = 0.25 and tau = 0.75.
Note rqss() does not allow multiple values for tau.
Therefore, we fit a model per tau value. Here, we try a lambda = 100.

Non-parametric Quantile Regression for $\tau = 0.25$

Non-parametric Quantile Regression for $\tau = 0.75$

Then…

We plot the two model functions on the data plot. This is basically our prediction band (which can also apply to our parametric models!).

Plot!

What if we use `lambda = 10`?

Next model!

Plot!

A quick check!

As a sanity check, let us calculate the proportion of points below these estimated regression functions with $\lambda = 10$.

4.3. Prediction

Suppose that we go ahead with the non-parametric models with $\lambda = 10$, then we obtain the corresponding probabilistic interval:

Thus…

Our predictive inquiry can be answered as:

For any given team that scores 1000 hits in a future tournament, it is estimated to have a 50% $(0.75 - 0.25 \times 100\%)$ chance to have between 491 and 625 runs.

5. Wrapping Up

We can extend the regression paradigm beyond the conditioned response mean.
We could assess association (or causation under a proper experimental framework!) via different response statistics such as quantiles to a set of regressors.
Quantile regression can be non-parametric (more suitable for predictions) or parametric (to allow for coefficient interpretation).

Lecture 7

Today’s Learning Goals

Outline

Completing our regression toolbox!

1. The Quantiles

In-Class Question (a.k.a. Think-Pair-Share)

Summary of data

iClicker Question

Plot!

1.1. Let us recall what a quantile is!

As an example …

Moreover…

1.2. Motivation for Quantile Regression

Plot!

Nonetheless…

A conditioned \(0.95\)-quantile!

2. Applications of Quantile Regression

Furthermore…

Furthermore (Cont’d)

2.1. The Teams Dataset

Quick data wrangling!

Main Statistical Inquiries

2.2. Exploratory Data Analysis

3. Parametric Quantile Regression

Then…

3.1. General Modelling Framework

The equation!

Some mathematical rearrangements!

In a more practical setting…

The estimated Quantile Regression Line with \(\hat{\beta_0}(\tau = 0.75)\) and \(\hat{\beta_1}(\tau = 0.75)\)

Summarizing…

Heads-up!

Some notes!

Now…

Plot!

Heads-up!

How does the fidelity function behave?

Plots!

From left to right!

From left to right! (Cont’d)

From left to right! (Cont’d)

Plot!

3.2. Estimation

For instance…

The code!

Plot!

Fit the model in R!

A quick note on the standard errors and \(p\)-values!

Modelling Output

3.3. Inference

Reminder!

Solving the inquiry!

3.4. Coefficient Intepretation

3.5. Predictions

Then…

Finally…

4. Non-Parametric Quantile Regression

4.1. General Modelling Framework

4.2. Estimation

Syntax

Non-parametric Quantile Regression for \(\tau = 0.25\)

Non-parametric Quantile Regression for \(\tau = 0.75\)

Then…

Plot!

What if we use lambda = 10?

Next model!

Plot!

A quick check!

4.3. Prediction

Thus…

5. Wrapping Up

Fit the model in `R`!

What if we use `lambda = 10`?