Hypothesis Testing - Power Analysis

Based on the “Statistical Consulting Cheatsheet” by Prof. Kris Sankaran

Before performing an experiment, it is important to get a rough sense of how many samples will need to be collected in order for the follow-up analysis to have a chance at detecting phenomena of interest. This general exercise is called a power-analysis, and it often comes up in consulting sessions because many grant agencies will require a power analysis be conducted before agreeing to provide funding.

Analytical Power Analysis

Traditionally, power analysis have been done by deciding in advance upon the type of statistical test to apply to the collected data and then using basic statistical theory to work out exactly the number of samples required to reject the null when the signal has some assumed strength. For example, if the true data distribution is assumed to be \(N(\mu, \sigma^2)\) and we are testing against the null \(N(0, \sigma^2)\) using a one-sample t-test, then the fact that \(\bar{X} = \frac{\sum_{i=1}^n X_i }{N} \sim N(\mu, \frac{\sigma^2}{N})\) can be used to analytically calculate the probability that the observed mean will be above the t-test rejection threshold. The size of the signal \(\mu\) is assumed known (smaller signals require larger sample sizes to detect). Of course this is the quantity of interest in the study, and if it were known, there would be no point in doing the study:

The idea though is to get a rough estimate of the number of samples required for a few different signal strengths.
Sometimes, a pilot study has been conducted previously, which can give an approximate range for the signal strength to expect.
We can also use similar studies in the literature to determine what a “reasonable effect size” could be.

There are many power calculators available, these can be useful to share / walk through with clients.

Computational Power Analysis

When more complex tests or designs are used, it is typically impossible to work out an analytical form for the sample size as a function of signal strength – we can’t invert the t-test/z-test formula to assess the appropriate number of samples.

In this situation, it is common to set up a simulation experiment to approximate this function.

1: The client needs to specify a simulation mechanism under which the data can be plausibly generated, along with a description of which knobs change the signal strengths in what ways.
2: The client needs to specify the actual analysis that will be applied to these data to declare significance.
3: From here, many simulated datasets are generated for every configuration of signal strengths along with a grid of sample sizes. The number of times the signal was correctly detected is recorded and is used to estimate of the power under each configuration of signal strength and sample size.

Statistical Consulting Program

Hypothesis Testing - Power Analysis

Analytical Power Analysis

Computational Power Analysis