Lecture Learning Objectives

Lecture Learning Objectives#

Review frequentist hypothesis testing.
Examine the misuse of \(p\)-values in scientific literature.
Demonstrate the misuse of \(p\)-values via a simulation.
Explain the macro-properties of carrying out many hypothesis tests in the context of a single dataset.
Use Bonferroni and False Discovery Rate correction to alter the macro-properties when conducting many hypothesis tests.
Contrast the Bonferroni correction with the False Discovery Rate.

Define the Simpson’s paradox.
Illustrate the Simpson’s paradox occurrence via simulated and real data.
Provide a basic guideline to avoid the Simpson’s paradox.
Provide a definition of a confounding variable.
Argue why evidence from a randomized study is of a higher grade than evidence from an observational study.

Explain foundational concepts of statistical design and analysis of experiments in a Data Science context.
Relate A/B and A/B/n testings to common statistical models via ordinary least-squares (OLS).
Fit and interpret appropriate OLS models from a factorial experiment.
Relate the corresponding ANOVA table to the substantive question of interest.
Introduce the concept of blocking in experimental design.
Contrast a blocking versus a non-blocking experimental model.

Propose the use of control, blocking, randomization, and replication in the design of an A/B testing.
Contrast the performance of blocking versus non-blocking designs with increasing overall sample sizes.
Relate the statistical notion of power to the real-world utility of an experiment in an A/B testing sample size computation.
Contrast the performance of blocking versus non-blocking designs in terms of block homogeneity.

Comment on nuances of randomized experiments that arise specifically for website optimization problems.
Determine an adequate sample size (i.e., experiment duration) for such problems.
Apply clever plotting arrangements to communicate power analyses.
Relate the danger of “early stopping” to experiments in A/B testing.
Analyze the advantages and disadvantages of aggressive and principled peekings in A/B testing.

Define a set of variables that control for confounding.
Analyze stratified observational data.
Interpret regression models fittings for observational data.
Supply appropriate caveats for causal claims from fitting regression models to observational data.
Comment on the differing goals of regression for prediction versus regression for explanation.

Introduce the use of different sampling schemes in observational studies.
Illustrate the concept of the proxy ground truth to assess different sampling schemes.
Apply complete simulation studies to a given sampling scheme via the proxy ground truth.
Explore the efficiency of the Case-Control sampling scheme via a modified power analysis.

Describe how case-control sampling and matching can simultaneously be used to design a study.
Demonstrate the use of contrasts in ordinal regressors.
Relate how data are analyzed to how they are collected.