Based on the “Statistical Consulting Cheatsheet” by Prof. Kris Sankaran
If I had to make a bet for which test was used the most on any given day, I’d bet it’s the t-test. There are actually several variations, which are used to interrogate different null hypothesis, but the statistic that is used to test the null is similar across scenarios.
Because it is so common, this test is customisable:
We provide a little bit more information on these tests in the subsequent subsections.
An example: Two-sided test of the mean
Is the mean flight arrival delay statistically equal to 0?
Two-sided test of the mean: Test the null hypothesis:
\(H_0: \mu = \mu_0 = 0\)
\(H_a: \mu \ne \mu_0 = 0\)
where \(\mu\) is where \(\mu\) is the average arrival delay.
library(tidyverse)
library(nycflights13)
mean(flights$arr_delay, na.rm = T)
Is this statistically significant?
( tt = t.test(x=flights$arr_delay, mu=0, alternative="two.sided" ) )
The function t.test returns an object containing the following components:
names(tt)
# The p-value:
tt$p.value
# The 95% confidence interval for the mean:
tt$conf.int
An example: One-sided test of the mean
Test the null hypothesis:
\(H_0: \mu = \mu_0 =0\)
\(H_a: \mu < \mu_0 = 0\)
In R: Is the average delay 5 or is it lower?
( tt = t.test(x=flights$arr_delay, mu=5, alternative="less" ) )
Failure to reject is not acceptance of the null hypothesis.
In R, you can simply use the command:
t.test(x, y)
Pairing is a useful device for making the t-test applicable in a setting where individual level variation would otherwise dominate effects coming from treatment vs. control. See Figure 1 for a toy example of this behavior. Instead of testing the difference in means between two groups, test for whether the per-individual differences are centered around zero.
Figure 1: Pairing makes it possible to see the effect of treatment in this toy example. The points represent a value for patients (say, white blood cell count) measured at the beginning and end of an experiment. In general, the treatment leads to increases in counts on a per-person basis. However, the inter-individual variation is very large { looking at the difference between before and after without the lines joining pairs, we wouldn't think there is much of a difference. Pairing makes sure the effect of the treatment is not swamped by the varia- tion between people, by controlling for each persons' white blood cell count at baseline.
For example, in Darwin’s Rhea Mays data, a treatment and control plant are put in each pot. Since there might be a pot-level effect in the growth of the plants, it’s better to look at the per-pot difference (the differences are i.i.d).
Pairing is related to a few other common statistical ideas: