Some Well-known Probability Distributions

Learning Objectives

After this unit, students should be able to

identify the random variable in a real-world scenario.
recognise the Bernoulli trials in a real-world scenario.
recall the mean and standard deviations of some well-known distributions.

Bernoulli Distribution

Bernoulli distribution is a discrete probability distribution of a random variable with domain \(\{0, 1\}\). It is parametrised by a single parameter \(p\) that denotes the probability of observing \(1\). Typically, \(1\) is associated with the successful event and \(0\) is associated with the unsuccessful event in an experiments with two possible outcomes. Any random experiment with two possible outcomes is called as Bernoulli experiment or Bernoulli trial.

Examples of the experiments as follows:

In a study investigating user conversion rates, researchers analyze user responses regarding subscription to the platform. Subscription can be taken as the successful event.
In a scholarship review process, researchers examine the gender of the next applicant in line for consideration. A female next applicant can be taken as the successful event.

We can use the following snippet to simulate the experiment of tossing a fair coin \(5\) times.

>>> from scipy.stats import bernoulli
>>> p = 0.5
>>> bernoulli.rvs(p, size = 5)
array([0, 1, 0, 1, 0])

Binomial Distribution

Binomial distribution is a discrete probability distribution of the random variable that counts the number of successes when a Bernoulli experiment is repeated \(n\) times. It is parametrised by two parameters: \(n\) denotes the number of experiments and \(p\) denotes the probability of success for the Bernoulli experiment.

Let \(X\) denote the random variable that counts the number of successes. The probability of observing \(k\) successes is computed using the probability mass function given as follows:

\[ Pr[X = k] = Binomial(k; n, p) = {n \choose k} p^k (1-p)^{n-k} \]

It is a symmetric distribution with mean \(np\) and variance \(np(1-p)\).

Examples of the experiments as follows:

In a study investigating user conversion rates, the number of users that subscribed to the platform.
In a scholarship review process, the number of female applicants.

We can use the following snippet to simulate the experiment of counting the number of heads in five experiments, where each experiment tosses a fair coin \(100\) times.

>>> from scipy.stats import binom
>>> n, p = 100, 0.5
>>> binom.rvs(p, size = 5)
array([44, 42, 56, 52, 56])

Poisson Distribution

Poisson distribution is a discrete probability distribution of the random variable that counts the number events occurring in a fixed time interval. It is parametrised by \(\lambda\), known as the rate parameter, that denotes the expected number of events occurring per unit time interval. Poisson distribution assumes that all events occur independently of each other.

Let \(X\) denote the random variable that holds the number of occurrences of an event. The probability of observing \(k\) events in th next time interval is computed using the probability mass function given as follows:

\[ Pr[X = k] = Poisson(k; \lambda) = \frac{\lambda^k e^{-\lambda}}{k!} \]

Both mean and variance of the Poisson distribution equal to \(\lambda\).

Examples of the experiments as follows:

In a food item delivery analysis, the number of orders received in the next hour.
The number of students failing in a certain final exam.

We can use the following snippet to simulate the experiment of counting the failures in five exams, where approximately \(5\%\) students fail in every exam.

>>> from scipy.stats import poisson
>>> rate = 30
>>> poisson.rvs(rate, size = 5)
array([26, 29, 37, 35, 26])

Normal Distribution

Normal distribution (also known as Gaussian distribution) is a continuous probability distribution. It can be used to model any continuous real-valued random variable. It is parametrised by mean \(\mu\) and standard deviation \(\sigma\). The probability density function is given as follows:

\[ f(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2} \]

It is a widely used distribution also termed as a bell curve due to the shape of its density function. Mean \(\mu\) determines the location of the peak whereas the standard deviation \(\sigma\) controls the flatness of the curve.

normal_distribution

Definition: Standard Normal Distribution

Normal distribution with mean \(0\) and standard deviation \(1\) is widely known as the standard normal distribution. A standard normal random variable is typically denoted by \(Z\).

We can use the following snippet to simulate five draws from a normal distribution with mean \(10\) and standard deviation \(5\).

>>> from scipy.stats import norm
>>> mu, sigma = 10, 5
>>> norm.rvs(mu, sigma, size = 5)
array([ 2.72825689,  5.60071064, 16.18071665, 13.23833799, 10.94629555])