# Inference in Linear Models

One of the reasons linear models are very popular across many areas of science is that the machinery for doing inference with them is (1) relatively straightfor- wards and (2) applicable to a wide variety of hypotheses. While we can fit linear models without making assumptions about the un- derlying data generating mechanism, these assumptions are crucial for doing valid inference.

The high-level strategy of inference in linear models is to compare dif- ferent types of data generating mechanisms, favoring those that are both parsimonious and fit the data reasonably well.  As a consequence, we need to make assumptions about the data generating mechanism. Limiting our attention to linear models for now (GLMs are considered below), we assume the responses $$Y_i$$ are drawn according to

$Y_i = \beta X_i + \epsilon$

where xi is considered fixed and $$\epsilon_i$$ are drawn i.i.d. from $$N(0, \sigma^2)$$.  Mathematically, the problem of comparing models can be approached by considering

$r= \frac{RSS_{sub} - RSS_{full}}{RSS_{full}}$

where $$RSS_{full}$$ and $$RSS_{sub}$$ are residuals from fitting a full linear model and a submodel (the full model with some of the $$\beta_j$$ ‘s set to 0), respectively. The idea is that if the submodel doesn’t lose much explanatory power (when compared ot the full model), then this quantity will be small, and we should favor the smaller model (Occam’s razor). If it is large, however, then the improved fit of the full model is worth the cost in complexity. 

Suppose that the full and submodels have $$p_0$$ and $$p_1$$ parameters, respec- tively. Then, if the data are in fact generated by the submodel, it can be shown that $$F= \frac{ \frac{RSS_{sub} - RSS_{full}}{p_0-p_1}}{\frac{RSS_{full}}{n-p_0}} \sim F_{p_0-p_1, n-p_0}$$

since both the numerator and denominator are $$\chi^2$$-distributed.  This provides grounds for a formal hypothesis test: assume $$H_0$$ the data are generated from the submodel, but if the statistic $$F$$ is too large when compared to the reference F-distribution, reject it in favor of $$H_1$$ that the data are generated by the full model. The two most common scenarios where this type of test is useful are,

•  Testing $$H_0$$ : $$\beta_j = 0$$. When people talk about a variable in a linear model being significant, they mean that they have rejected this null hypothesis. The interpretation is that the submodel including all variables but this one does substantially worse than the full model that additionally includes $$\beta_j$$ ( note the connection to the interpretation of $$\beta_j$$ as the amount that $$y$$ changes when $$x_j$$ is changed by a unit, conditional on all other variables (which make up the submodel). The results of this test are usually pre- sented in terms of a t-statistic, but this follows directly from the fact that $$t^2 = F$$ in this special case.
• Testing $$H_0$$ : $$\beta_0\in \mathbb{R} \cap \{ \beta_1 = \beta_2 = \cdots = \beta_p= 0$$. This is the test of whether the model including the $$x_i$$’s does better than the model which just predicts the average of the $$y_i$$’s. Here, the submodel is the empty model. Note that in an experimental setting (for example, where treatment and control were randomized), rejection of a submodel can be grounds for claiming causal relationships. This is because we have established that the response is nonconstant with respect to the terms not in the submodel, and all potentially confounding effects have been averaged out due to randomization.

A few variations on this standard linear model F and t-testing are worth being familiar with: ANOVA and testing for GLMs.  ANOVA is actually just a name for a common special case of linear mod- els, where there are only one or two predictors in xi, all of which are categorical. We provide additional references for ANOVA in the corresponding section.