Hypothesis Testing - The t-test

Based on the “Statistical Consulting Cheatsheet” by Prof. Kris Sankaran

If I had to make a bet for which test was used the most on any given day, I’d bet it’s the t-test. There are actually several variations, which are used to interrogate different null hypothesis, but the statistic that is used to test the null is similar across scenarios.

Because it is so common, this test is customisable:

We provide a little bit more information on these tests in the subsequent subsections.

One sided/ two sided

An example: Two-sided test of the mean
pval

Is the mean flight arrival delay statistically equal to 0?

Two-sided test of the mean: Test the null hypothesis:

\(H_0: \mu = \mu_0 = 0\)
\(H_a: \mu \ne \mu_0 = 0\)
where \(\mu\) is where \(\mu\) is the average arrival delay.

library(tidyverse)
library(nycflights13)
mean(flights$arr_delay, na.rm = T)

pval

Is this statistically significant?

( tt = t.test(x=flights$arr_delay, mu=0, alternative="two.sided" ) )

pval

The function t.test returns an object containing the following components:

names(tt)

pval

# The p-value:
tt$p.value

pval

# The 95% confidence interval for the mean:
tt$conf.int

pval

An example: One-sided test of the mean
pval

Test the null hypothesis:

\(H_0: \mu = \mu_0 =0\)
\(H_a: \mu < \mu_0 = 0\)

In R: Is the average delay 5 or is it lower?

( tt = t.test(x=flights$arr_delay, mu=5, alternative="less" ) )

pval

Failure to reject is not acceptance of the null hypothesis.



Single vs Two-sample t-test

In R, you can simply use the command:

t.test(x, y)

Single vs Paired t-tests

Pairing is a useful device for making the t-test applicable in a setting where individual level variation would otherwise dominate effects coming from treatment vs. control. See Figure 1 for a toy example of this behavior.  Instead of testing the difference in means between two groups, test for whether the per-individual differences are centered around zero.

Paired t-test Figure 1: Pairing makes it possible to see the effect of treatment in this toy example. The points represent a value for patients (say, white blood cell count) measured at the beginning and end of an experiment. In general, the treatment leads to increases in counts on a per-person basis. However, the inter-individual variation is very large { looking at the difference between before and after without the lines joining pairs, we wouldn't think there is much of a difference. Pairing makes sure the effect of the treatment is not swamped by the varia- tion between people, by controlling for each persons' white blood cell count at baseline.

 For example, in Darwin’s Rhea Mays data, a treatment and control plant are put in each pot. Since there might be a pot-level effect in the growth of the plants, it’s better to look at the per-pot difference (the differences are i.i.d).

Pairing is related to a few other common statistical ideas:

What needs to be true for these t-tests to be valid?