Lecture 1 Depicting Uncertainty

September 9, 2019

Welcome to the course! The syllabus is on the README of the _students repo.

Today’s topics: probability, followed by distributions.

1.1 Lecture Learning Objectives

From today’s lecture, students are expected to be able to:

  • Identify probability as a proportion that converges to the truth as you collect more data.
  • Calculate probabilities using the inclusion-exclusion principle, the law of total probability, and probability distributions.
  • Convert between and interpret odds and probability.
  • Specify the usefulness of odds over probability.
  • Be aware that probability has multiple interpretations/philosophies.
  • Calculate and interpret mean, mode, entropy, variance, and standard deviation, from both a distribution and a sample.

(Hint: we make the quizzes based on lecture learning objectives)

1.2 Thinking about Probability

1.2.1 Defining Probability (5 min)

I like to play Mario Kart 8, a racing game with some “combat” involved using items. In the game, you are given an item at random whenever you get an “item box”.

Suppose you’re playing the game, and so far have gotten the following items in total:

Item Name Count
Banana 7
Bob-omb 3
Coin 37
Horn 1
Shell 2
Total: 50

Attribution: images from pngkey.

Questions that we’ll address:

  • What’s the probability that your next item is a coin?
  • How would you find the actual probability?
  • From this, how might you define probability?

In general, the probability of an event \(A\) occurring is denoted \(P(A)\) and is defined as \[\frac{\text{Number of times event } A \text{ is observed}}{\text{Total number of events observed}}\] as the number of events goes to infinity.

1.2.2 Calculating Probabilities using Logic

We’ll look at two laws for calculating probabilities of events. Suppose the table below show the true probabilities of each item. Also, let’s add some properties to these items.

Item Name Probability Combat Type Defeats blue shells
Banana 0.12 contact no
Bob-omb 0.05 explosion no
Coin 0.75 ineffective no
Horn 0.03 explosion yes
Shell 0.05 contact no

Disclaimer: I don’t think these are the true probabilities, but I’m pretty sure the coin probability is correct, as long as you’re in the lead. Law of Total Probability (5 min)

  • According to this table, are there any other items possible? Why or why not?
  • What’s the probability of getting something other than a coin? How did you arrive at that number?

Concept: When partitioning the sample space (= the set of all possibilities), the probabilities of each piece should add to one. That is, in this case, \[1 = P(\text{Banana}) + P(\text{Bob-omb}) + P(\text{Coin}) + P(\text{Horn}) + P(\text{Shell}).\]

A special case of this involves the complement of an event. This partitions the sample space into two – for example, getting a coin or not a coin. For a general event \(A\), the law becomes: \[1 = P(A) + P(\neg A),\] where \(\neg\) means the complement (read “not”). Inclusion-Exclusion (5 min)

Let’s answer these questions by counting:

1. What’s the probability of getting an item that has an explosion combat type?

2. What’s the probability of getting an item that is both an explosion item and defeats blue shells?

This is written \(P(\text{explosion} \cap \text{defeats blue shells})\), where \(\cap\) means “and”.

3. What’s the probability of getting an item that is an explosion item or an item that defeats blue shells?

This is written \(P(\text{explosion} \cup \text{defeats blue shells})\), where \(\cup\) means “or”.

In general, we can answer the third question with the inclusion-exclusion principle: for events \(A\) and \(B\), \[P(A \cup B) = P(A) + P(B) - P(A \cap B).\]

We can extend this to three events, too: \[P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C).\]

1.2.3 Comparing Probabilities (8 min)

True or False (2-3 min):

Suppose Vincenzo often wins at a game of solitaire, but that Tom is twice as good as Vincenzo. This means that \(P(\text{Tom wins}) = 2 \times P(\text{Vincenzo wins})\).

Probability is quite useful for communicating the chance of an event happening in an absolute sense, but is not useful for comparing probabilities. Odds, on the other hand, are useful for comparing the chance of two events. If \(p\) is the chance that Vincenzo wins at solitaire, his odds of winning is defined as \[\text{Odds} = \frac{p}{1-p}.\] This means that, if his odds are \(o\), then the probability of winning is \[\text{Probability} = \frac{o}{o+1}.\]

For example, if Vincenzo wins 80% of the time, his odds are \(0.8/0.2 = 4\). This is sometimes written as 4:1 odds – that is, four wins for every loss. If Tom is twice as good as Vincenzo, it’s most useful to say that this means Tom wins twice as many games before experiencing a loss (on average) – that is, 8:1 odds, or simply 8, and a probability of \(8/9=0.888\ldots\).

1.2.4 Interpreting Probability (5 min)

Thought experiment:

  1. What’s the probability of seeing a 6 after rolling a die?
  2. I roll a die, and cover the outcome. What’s the probability of seeing a 6 after I uncover the face?

No philosophy is “wrong”! But why is this relevant in practice?

  • It often doesn’t actually make sense to talk about the probability of an event, such as the probability that a patient has a particular disease. Instead, it’s a belief system that can be modified.
  • It influences our choice of whether we choose a Bayesian or Frequentist analysis. More on this later in MDS.

1.3 Probability Distributions

So far, we’ve been discussing probabilities of single events. But it’s often useful to characterize the full “spectrum” of uncertainty associated with an outcome. The set of all outcomes and their corresponding probabilities is called a probability distribution (or, often, just distribution).

The outcome itself, which is uncertain, is called a random variable. (Note: technically, this definition only holds if the outcome is numeric, not categorical like our Mario Kart example, but we won’t concern ourselves with such details)

When the outcomes are discrete, the distributions are called probability mass functions (or pmf’s for short).

1.3.1 Examples of Probability Distributions (3 min)

Mario Kart Example:

The distribution of items is given by the following table:

Item Name Probability
Banana 0.12
Bob-omb 0.05
Coin 0.75
Horn 0.03
Shell 0.05

The distribution of combat type is given by the following table:

Combat Type Probability
contact 0.17
explosion 0.08
ineffective 0.75

The distribution of defeating blue shells is given by the following table:

Defeats blue shells Probability
no 0.97
yes 0.03

Ship example (New):

Suppose a ship that arrives at the port of Vancouver will stay at port according to the following distribution:

Length of stay (days) Probability
1 0.25
2 0.50
3 0.15
4 0.10

The fact that the outcome is numeric means that there are more ways we can talk about things, as we will see.

1.3.2 Measures of central tendency and uncertainty

(3 min)

There are two concepts when communicating an uncertain outcome:

  • Central tendency: a “typical” value of the outcome.
  • Uncertainty: how “random” the outcome is.

There are many ways to measure these two concepts. They’re defined using a probability distribution, but just as probability can be defined as the limit of a fraction based on a sample, these measures often have a sample version (aka empirical version) from which they are derived.

As such, let’s call \(X\) the random outcome, and \(X_1, \ldots, X_n\) a set of \(n\) observations that form a sample (see the terminology page for alternative uses of the word sample). Mode and Entropy (5 min)

No matter what scale a distribution has, we can always calculate the mode and entropy. And, when the outcome is categorical (like the Mario Kart example), we are pretty much stuck with these as our choices.

The mode of a distribution is the outcome having highest probability.

  • A measure of central tendency.
  • The sample version is the observation you saw the most.
  • Measured as an outcome, not as the probabilities.

The entropy of a distribution is defined as \[-\displaystyle \sum_x P(X=x)\log(P(X=x)).\]

  • A measure of uncertainty.
  • Probably the only measure that didn’t originate from a sample version (comes from information theory).
  • Measured as a transformation of probabilities, not as the outcomes – so, hard to interpret on its own.
  • Cannot be negative; zero-entropy means no randomness. Mean and Variance (10 min)

When our outcome is numeric, we can take advantage of the numeric property and calculate the mean and variance:

The mean (aka expected value, or expectation) is defined as \[\displaystyle \sum_x x\cdot P(X=x).\]

  • A measure of central tendency, denoted \(E(X)\).
  • Its sample version is \(\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i,\) which gets closer and closer to the true mean as \(n \rightarrow \infty\) (this is in fact how the mean is originally defined!)
  • Useful if you’re wanting to compare totals of a bunch of observations (just multiply the mean by the number of observations to get a sense of the total).
  • Probably the most popular measure of central tendency.
  • Note that the mean might not be a possible outcome!

The variance is defined as \[E[(X-E(X))^2],\] or this works out to be equivalent to the (sometimes) more useful form, \[E[X^2]-E[X]^2.\]

  • A measure of uncertainty, denoted \(\text{Var}(X)\).
  • Yes! This is an expectation – of the squared deviation from the mean.
  • Its sample version is \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2\), or sometimes \(s^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2\) – both get closer and closer to the true variance as \(n \rightarrow \infty\) (you’ll be able to compare the goodness of these at estimating the true variance in DSCI 552 next block).
  • Like entropy, cannot be negative, and a zero variance means no randomness.
  • Unlike entropy, depends on the actual values of the random variable.

The standard deviation is the square root of the variance.

  • Useful because it’s measured on the same scale as the outcome, as opposed to variance, which takes on squared outcome measurements.

Note: you may have heard of the median – we’ll hold off on this until later.