Popular Hypothesis Tests

Learning Objectives

After this unit, students should be able to

choose between \(Z\) test and \(t\) test.
choose between one-sample and two-sample tests.
choose between two-sample test and paired test.
conduct the popular tests.
learn new hypothesis tests that are not covered in the unit.

Please ensure that you have read and understood the basics of Hypothesis tests and \(p\)-value from Unit 10 before you proceed. In these notes, we will only focus on the hypothesis tests that compare the mean to a constant (which quantifies the established belief).

One-Sample Tests

One-sample tests use a single sample drawn from the population to perform the hypothesis test. Specifically, we use sample mean to assess hypotheses regarding the population mean.

\(Z\)-test is a hypothesis test that works on a normal distribution as the sampling distribution and requires prior knowledge of the population standard deviation. According to the Central Limit Theorem, when the sample size is sufficiently large, the sample mean follows a normal distribution. Hence, the \(Z\)-test is applied to hypothesis testing for the mean with large sample sizes. The example provided in Unit 10 illustrates a two-tailed Z-test. What to do when the standard deviation is not known? What if the sample size is small?

Student's \(t\) distribution

For a sample of size \(n\), let \(\bar{X}\) and \(S^2\) respectively denote the sample mean and the sample variance. The statistic \(t\) defined as:

\[ t = \frac{\bar{X} - \mu}{\sqrt{S^2 / n}} \]

follows Student's \(t\) Distribution with degrees of freedom \(n-1\).

\(t\)-test is a hypothesis test that works on a Student's \(t\) distribution as the sampling distribution and does not require prior knowledge of the population standard deviation. Hence, the \(t\)-test is applied to hypothesis testing for the mean with any sample size. If we compare to \(Z\)-test, the only difference lies in the usage of \(t\) distribution in the computation of the critical region (or \(p\)-value) in place of the normal distribution. Rest of the process and the inference remains the same.

What is a sufficiently large sample size?

As a rule of thumb, any number larger than \(30\) is considered to be a sufficiently large sample size. The rationale behind this threshold varies from anecdotes about being able to fit a statistically relevant table on a single page in the pre-calculator era to more advanced empirical observations indicating that the \(t\)-distribution closely approximates the normal distribution beyond this benchmark of \(30\). It is typically considered a convention within the community (just like the confidence level of \(95\%\)).

In this course, we use the availability of the population standard deviation to choose between \(Z\)-test and \(t\)-test instead of fixating on the sample size.

Two-Sample Tests

Two-sample tests use two samples to perform hypothesis test to asses if they have the same mean.

For the ease of discussion, let us denote two samples as \(\{X_1^1, X_2^1, ..., X_{n_1}^1\}\) and \(\{X_1^2, X_2^2, ..., X_{n_2}^2\}\). In this case the null hypothesis is \(\mu_1 = \mu_2\) where \(\mu_1\) and \(\mu_2\) refer to the population mean for the first and second sample. Test statistic is given as follows:

\[ T = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\left( \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} \right)}} \]

where \(\bar{X_1}\) and \(\bar{X_2}\) refer to sample means of first and second samples. \(S_1^2\) and \(S_2^2\) refer to sample variance of first and second samples. \(T\) follows Student's \(t\) distribution with \((n_1 + n_2 - 1)\) degrees of freedom. Therefore, this test is known as two-sample t-test.

Paired samples have a one-to-one correspondence between the values in two samples. Typically, for the paired data instead of keeping two samples a single sample can be created as \(\{(X_1^1 - X_1^2), (X_2^1 - X_2^2), ..., (X_n^1 - X_n^2)\}\). Paired t-test uses the difference between \(X_1\) and \(X_2\) as the test statistic with null hypothesis \(\mu_1 - \mu_2 = d\) where \(d\) is a constant based on the established belief.

Examples

Identify the hypothesis test in each of the following examples. Use Python APIs to perform the hypothesis test.

A government contractor wants to estimate the carrying capacity of a newly constructed bridge. The bridge should withstand at least \(1000\) tonnes of active load. The bridge was subjected to a stress test at \(25\) different locations. The average load in the sample is \(1026.4\) tonnes with a standard deviation of \(60\) tonnes. Should the contractor sanction the bridge for public use?
A pharmaceutical company claims that the new medicine is effective against at least \(60\%\) of the infected people. A random sample of \(50\) infected individuals were given medicines and \(27\) people recovered. Do you reject the claim of the company? Hint: Binomial Test
An analyst wants to choose between two classifiers. In order to make an informed choice, the analyst uses five datasets to compare the performance of the classifiers. The accuracy of each model on these datasets is tabulated below. Can you ay based on this data that both models have same accuracy?

	Dataset 1	Dataset 2	Dataset 3	Dataset 4	Dataset 5
Model A	\(0.85\)	\(0.80\)	\(0.75\)	\(0.82\)	\(0.78\)
Model B	\(0.86\)	\(0.82\)	\(0.82\)	\(0.79\)	\(0.81\)

Scipy functions for various tests

scipy.stats provides various functions to conduct hypothesis tests. It typically takes sample(or samples) required to conduct the test. Each test takes alternative as an argument that specifies if the test is two-sided or one-sided test. The result consists of two values: value of the statistic and \(p\)-value.

For instance: the following snippet conducts a two-side one-sample \(t\)-test.

>>> from scipy.stats import norm, ttest_1samp
>>> X = norm.rvs(3.0, size = 20)

>>> # H_0 is rejected
>>> ttest_1samp(X_sample, popmean = 0, alternative='two-sided')
Ttest_1sampResult(statistic=16.090051943670648, pvalue=1.59255392178398e-12)

>>> #No evidence to reject H_0
>>> ttest_1samp(X_sample, popmean = 3, alternative='two-sided')
Ttest_1sampResult(statistic=-0.22444410348459137, pvalue=0.8248077584200367)

You can refer to the available tests here.