Lab 2: Ordinal Logistic Regression

Setup

We will need to load the following packages before proceeding:

The World Values Surveys

One of the advantages of using models such as Ordinal Logistic regression is their suitability to address Social Sciences-related data where encountering discrete and ordinal variables is quite common (e.g., Likert scales!). That said, let us dig into the World Values Surveys (WVS) dataset from package carData. According to the corresponding documentation, this is the description of the data frame WVS:

Data from the World Values Surveys 1995-1997 for Australia, Norway, Sweden, and the United States.

Main Statistical Inquiries

Suppose we are Data Scientists interested in untangling the differences in people’s views concerning their government’s efforts to fight poverty in Australia, Norway, Sweden, and USA. Hence, statistically speaking, we are interested in determining the following:

  • Is age of a person a significant factor in these views? Can we quantify this association? If so, by how much?
  • Is the country a significant factor in these views? Can we quantify this association? If so, by how much?

Data Collection and Wrangling

We have a training set size \(n = 5381\) subjects:

We will specifically work on the following columns:

  • poverty: The answer to the question: “do you think that what the government is doing for people in poverty in this country is about the right amount, too much, or too little?
  • age: Subject’s age in years.
  • country: Subject’s country (Australia, Norway, Sweden, or USA).

Hence, let us select the columns of interest in training_WVS:

Q1.1. Exploratory Data Analysis

Based on our main statistical inquiries, let us answer the following:

Q1.1.1. What is our response of interest?

Q1.1.2. What is the response’s nature?

Q1.1.3. What are our explanatory variables of interest?

Q1.1.4. What is the nature of the explanatory variables?

Answers

Available for MDS students.

Now, let us code suitable plots comparing age on the \(y\)-axis by each level in poverty on the \(x\)-axis, which has to be facetted by country:

Q1.1.5. What do you observe about the relationship of age and country on poverty?

Answer

Available for MDS students.

Now, let us explore the relationship between poverty and country via stacked bar charts:

Q1.1.6. What do you observe about the relationship of country on poverty?

Answer

Available for MDS students.

Q1.2. Data Modelling Framework

Q1.2.1. Having defined the response and explanatory variables in this case, what is the right modelling framework?

Answer

Available for MDS students.

Q1.2.2. Let us suppose that a given discrete ordinal response poverty \(Y_i\) (for \(i = 1, \dots, n\)) has categories Too Little, About Right, and Too Much in a training set of size \(n\).

Important

Categories Too Little, About Right, and Too Much implicate an ordinal scale, i.e., Too Little \(<\) About Right \(<\) Too much

Moreover, \(X_{i, \texttt{age}}\) is a continuous regressor along with the following dummy variables:

\[ X_{i, \texttt{Norway}} = \begin{cases} 1 \; \; \; \; \text{if the subject is from Norway},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]

\[ X_{i, \texttt{Sweden}} = \begin{cases} 1 \; \; \; \; \text{if the subject is from Sweden},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]

\[ X_{i, \texttt{USA}} = \begin{cases} 1 \; \; \; \; \text{if the subject is from USA},\\ 0 \; \; \; \; \mbox{otherwise}. \end{cases} \]

Important

Note that Australia will be the baseline for the variable country.

Now, briefly in plain words, describe the two odds of our chosen modelling framework for the response of interest.

Answer

Available for MDS students.

Q1.2.3. Mathematically, what is the system of modelling equations?

Answer

Available for MDS students.

Q1.3. Estimation

Using the corresponding fitting function and training_WVS, estimate the corresponding regression model of poverty versus age and country.

Answer

Q1.4. Inference

Using your fitted model, are the explanatory variables age and country statistically associated to the response poverty? Below, provide the corresponding code to fit your regression model.

Answer

Now, state your inferential conclusion with a significance level \(\alpha = 0.05\).

Answer

Available for MDS students.

Q1.5. Coefficient Interpretation

Q1.5.1. Obtain the estimates on the scale of the cumulative odds via poverty_model. Moreover, obtain the corresponding 95% confidence intervals.

Answer

Q1.5.2. Using the cumulative odds associated to the system of equations in Q1.2.3., provide the interpretation for age by each cumulative odd.

Answer

Available for MDS students.

Q1.5.3. Using the cumulative odds associated to the system of equations in Q1.2.3., provide the interpretation for country (versus the baseline) by each cumulative odd.

Answer

Available for MDS students.

Q1.6. Prediction

Going beyond our main statistical inquiries, use this model to predict the poverty view probabilities for an American subject who is 60 years old. Moreover, provide a complete description of this prediction in plain words.

Answer

How about the Swedish counterpart? Provide the corresponding full description and the necessary code to support this answer.

Answer