DSCI 561 Lecture 1 ‘Quiz’

Orientation to the course

1. What regression methods do you know?

Class: Use slack, or raise hand.
Me: Write on board.

2. What type of problems/tasks can regression address?

Class: Use slack, or raise hand.
Me: Write on board.

3. Of the regression methods, are there any that are inappropriate for certain regression tasks?

Class: Use slack, or raise hand.
Me: Write on board.

Check out the about page for the course for more details on what we’ll be focussing on in this course, and why.

Review Questions

1. In the context of supervised learning, what is a model function?

  1. It’s another word for the objective/loss function.
  2. It’s a function of the predictors, that evaluates to some probabilistic quantity of the response.
  3. It’s another word for the regression curve

2. What do the following model functions look like? Predictors are numeric.

  1. \(\beta_0 + \beta_1 x_1 + \beta_2 x_1^2\)
  2. \(\beta_0 + \beta_1 x_1 + \beta_2 x_2\)
  3. \(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2\)
  4. A model function obtained using some local method (like kNN or loess).

3. What probabilistic quantity does the model function estimate in linear regression?

  1. The mean of the response given the predictors.
  2. The mean of the response.
  3. The predicted response.
  4. It’s not necessarily associated with any probabilistic quantity.

4. How can you know whether a formula is an estimator of some probabilistic quantity? For example, how do I know that \(\bar{y}\) is a valid estimator of \(E(Y)\), or \(s^2\) is a valid estimator of \(Var(Y)\)?

  1. The estimator must converge to the probabilistic quantity as the sample size increases.
  2. Trick question! These quantities are the same, i.e., \(E(Y)=\bar{y}\), and \(Var(Y)=s^2\).
  3. The estimator must be unbiased.
  4. The estimator must be the MLE

5. How would you know whether one regression model “fits better” than another regression model (using the same predictors)?

  1. The better one has the lower generalization error.
  2. The better one has the higher \(R^2\) value.
  3. The better one will have narrower residuals.
  4. All of the above.

6. If ML methods (non-parametric, like kNN or random forests) fit better than linear regression, why might we still bother with linear regression?

Discuss.