Nonparametric tests

There are reasons we may prefer a nonparametric test to a parametric one:

Section 1: Non-parametric, robust tests

These tests are typically rank-based: instead of the magnitude of the effect, we look at the rank of the observations/. Rank-based tests are often used as a substitute for \(t\)-tests in small-sample set- tings. The most common are the Mann-whitney (substitute for 2-sample t-test), sign (substitute for paired t-test), and signed-rank tests (also a substitute for paired t-test, but using more information than the sign test). Some details are given below.

Since this test only requires a measurement of the sign of the difference between pairs, it can be applied in settings where there is no numerical data (for example, data in a survey might consist of “likes” and “dislikes” before and after a treatment).

Section 2: Computational methods

2.1. Permutation tests

Permutation tests are a kind of computationally intensive test that can be used quite generally. The typical setting in which it applies has two groups between which we believe there is some difference. The way we measure this difference might be more complicated than a simple difference in means, so no closed-form distribution under the null may be available.

The basic idea of the permutation test is that we can randomly create artificial groups in the data, so there will be no systematic differences between the groups. This replicates a “null distribution” of your dataset. Then, computing the statistic on these many artiffcial sets of data gives us an approximation to the null distribution of that statistic. Comparing the value of that statistic on the observed data with this approximate null can give us a p-value. See figure for a representation of this idea.

drawing1

Figure: A representation of a two-sample difference in means permutation test. The values along the x-axis represent the measured data, and the colors represent two groups. The two row gives the values in the observed data, while each following row represents a permutation in the group labels of that same data. The crosses are the averages within those groups. Here, it looks like in the real data the blue group has a larger mean than the pink group. This is reflected in the fact that the difference in means in this observed data is larger here than in the permuted data. The proportion of times that the permuted data have a larger difference in means is used as the p-value.

2.2. Bootstrap tests

While the bootstrap is typically used to construct confidence intervals (see Sec- tion Inference in Linear Models), it is also possible to use the bootstrap principle to perform hypothesis test. Like permutation tests, it can be applied in a range of situations where classical testing may not be appropriate.

 The main idea is to simulate data under the null and calculate the test statistic on these null data sets. The p-value can then be calculated by making a comparison of the real data to this approximate null distribution, as in permutation tests.

As in permutation tests, this procedure will always be valid, but will only be powerful if the test-statistic is attuned to the actual structure of departures from the null.

 The trickiest part of this approach is typically describing an appropriate scheme for sampling under the null. This means we need to estimate \(\hat{F}_0\) among a class of CDFs \(F_0\) consistent with the null hypothesis.  For example, in a two-sample difference of means testing situation, to sample from the null, we center each group by subtracting away the mean, so that \(H_0\) actually holds, and then we simulate new data by sampling with replacement from this pooled, centered histogram.

drawing

Figure: To compute a p-value for a permutation test, refer to the permutation null distribution. Here, the histogram provides the value of the test statistic under many permutations of the group labeling { this approximates how the test statistic is distributed under the null hypothesis. The value of the test statistic in the observed data is the vertical bar. The fraction of the area of this histogram that has a more extreme value of this statistic is the p-value, and it exactly corresponds to the usual interpretation of p-values as the probability under the null that I observe a test statistic that is as or more extreme. .

2.3. Kolmogorov Smirnov

The Kolmogorov-Smirnov (KS) test is a test for either (1) comparing two groups of real-valued measurements or (2) evaluating the goodness-of-fit of a collection of real-valued data to a prespecified reference distribution.

drawing

Figure: The motivating idea of the two-sample and goodness-of-fit variants of the KS test. In the 2-sample test variant, the two colors represent two different empirical CDFs. The largest vertical gap between these CDFs is labeled by the black bar, and this defines the KS statistic. Under the null that the two groups have the same CDF, this statistic has a known distribution, which is used in the test. In the goodness-of-fit variant, the pink line now represents the true CDF for the reference population. This test sees whether the observed empirical CDF (blue line) is consistent with this reference CDF, again by measuring the largest gap between the pair. .