Introduction to Probability

Learning Objectives

After this unit, students should be able to

describe the concepts sample space and events.
describe the concept of random variable.
describe the concept of probability distribution.
describe the concept of expected value.

From Statistics to Probability

Consider the experiment of tossing a fair coin. In theory we have learnt that the probability of observing the head is same as the probability of observing the tail, which is \(0.5\). But have you observed this in reality? Perform a simple experiment of tossing the coin \(10\) times and record the number of times you observe the tail.

Consider the following figure that plots a computer simulation of the same experiment. On the horizontal axis we have plotted the number of experiments whereas on the vertical axis we have plotted the fraction of times we had observed the head during those experiments. The red line is drawn to denote the theoretical probability of \(0.5\).

Statistics to Probability

We observe that the variability in the fraction of heads reduces as the number of experiments increase. It eventually converges with the theoretical line.

Let us formalise a few definitions and the notation before we progress.

Definitions

Random Experiment is a physical experiment whose outcome cannot be predicted until it is performed.
Sample Space \(\Omega\) is the set of all possible outcomes of the experiment.
Event (E) is any subset of the sample space.
Probability of an event \(E\) is defined as follows

\[ Pr[E] = \frac{|E|}{|\Omega|} \]

There are two kinds of experiment and it leads to two kinds of sample spaces.

	Discrete Sample Space	Continuous Sample Space
Experiment	Rolling a fair die.	Average rainfall in Singapore on a random day.
Sample Space	\(\Omega = \{1, 2, 3, 4, 5, 6\}\)	\(\Omega = [0, 100]\)
Example of an event	An even number is rolled.	Low (\(\leq 20\)) rainfall is observed

Events on the continuous sample spaces.

Event on the continuous space are defined as the interval on the sample space. We cannot define as event as a specific value due to imprecision of the real number while conducting experiments. For instance, a thermostat with the precision of \(0.01\) is unable to distinguish all temperatures (infinitely many) values between \(4\) and \(4.009\).

Random Variable

Random variable is a real-valued function defined on the sample space. Sounds very abstract?

Let us consider tn example. Suppose a pair of fair dice are rolled. Let \(X_{sum}\) denote the random variable that denotes the sum of the digits on the dice. Mathematically,

\[ X_{sum} : \Omega \rightarrow \{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\} \]

What is the difference between random variable and event?

Random variable is often confused events on the sample space. We can define multiple events using a random variable. Following are a few examples of events created using \(X_{sum}\).

The sum is less \(4\). Mathematically, \(E_1 = \{\omega \in \Omega ~|~ X(\omega) < 4\}\).
The sum is odd. Mathematically, \(E_2 = \{\omega \in \Omega ~|~ \omega \in \{1, 3, 5, 7, 9, 11\} \}\).

Probability Distribution

Probability distribution of a random variable is a function that assigns the probability (a value between \(0\) and \(1\)) to every possible value that the random variable takes.

What is the distribution of \(X_{sum}\)?

\(x_i\)	Event	\(Pr[X_{sum} = x_i]\)
\(2\)	\(\{(1, 1)\}\)	\(1/36\)
\(3\)	\(\{(1, 2), (2, 1)\}\)	\(2/36\)
\(4\)	\(\{(1, 3), (3, 1), (2, 2)\}\)	\(3/36\)
\(5\)	\(\{(1, 4), (4, 1), (2, 3), (3, 2)\}\)	\(4/36\)
\(6\)	\(\{(1, 5), (5, 1), (2, 4), (4, 2), (3, 3)\}\)	\(5/36\)
\(7\)	\(\{(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)\}\)	\(6/36\)
\(8\)	\(\{(2, 6), (6, 2), (3, 5), (5, 3), (4, 4)\}\)	\(5/36\)
\(9\)	\(\{(3, 6), (6, 3), (4, 5), (5, 4)\}\)	\(4/36\)
\(10\)	\(\{(4, 6), (6, 4), (5, 5)\}\)	\(3/36\)
\(11\)	\(\{(5, 6), (6, 5)\}\)	\(2/36\)
\(12\)	\(\{(6, 6)\}\)	\(2/36\)

For the discrete sample space, such as the space for \(X_{sum}\), we can enumerate and assign a probability to each of the value in the domain of the random variable. For the continuous space, we use continuous functions to denote the probability distributions.

	Probability Mass Function	Probability Density Function
Probability distribution of	a discrete random variable.	a continuous random variable.
Condition 1	\(Pr[X = x_i] = p(x_i) \geq 0\)	\(Pr[X \in (a, b)] = \int_a^b p(x) dx \geq 0\)
Condition 2	\(\sum_{x_i} p(x_i) = 1\)	\(\int_{-\infty}^{\infty} p(x) dx =1\)

Definition: Cumulative Distribution Function (CDF)

Cumulative distribution function \(F\) for a random random variable \(X\) is defined as follows:

\[ F(x) = Pr[X \leq x] \]

Inverse of the cumulative distribution \(F^{-1}\) is often used in data analysis. It is defined as follows:

\[ F^{-1}(p) = x ~~~\text{such that}~~~ Pr[X \leq x] = p \]

Expected Value

Expected value \(E[X]\) of a random variable \(X\) is defined as follows:

For discrete random variable: \(E[X] = \sum_{x_i} x_i p(x_i)\)
For continuous random variable: \(E[X] = \int_{-\infty}^{\infty} x p(x) dx\)

Expected value is also called as weighted average. It is equal to the mean for a uniformly distributed dataset. We can quickly validate using a uniform discrete random variable. For \(n\) possible values, each value occurs with \(1/n\) probability in the uniform distribution. Plugging this value in the formula for \(E[X]\) reduces it to the formula for the mean.