Inference in Linear Models

One of the reasons linear models are very popular across many areas of science is that the machinery for doing inference with them is (1) relatively straightfor- wards and (2) applicable to a wide variety of hypotheses. While we can fit linear models without making assumptions about the un- derlying data generating mechanism, these assumptions are crucial for doing valid inference.

The high-level strategy of inference in linear models is to compare dif- ferent types of data generating mechanisms, favoring those that are both parsimonious and fit the data reasonably well.  As a consequence, we need to make assumptions about the data generating mechanism. Limiting our attention to linear models for now (GLMs are considered below), we assume the responses \(Y_i\) are drawn according to

\[Y_i = \beta X_i + \epsilon\]

where xi is considered fixed and \(\epsilon_i\) are drawn i.i.d. from \(N(0, \sigma^2)\).  Mathematically, the problem of comparing models can be approached by considering

\[r= \frac{RSS_{sub} - RSS_{full}}{RSS_{full}}\]

where \(RSS_{full}\) and \(RSS_{sub}\) are residuals from fitting a full linear model and a submodel (the full model with some of the \(\beta_j\) ‘s set to 0), respectively. The idea is that if the submodel doesn’t lose much explanatory power (when compared ot the full model), then this quantity will be small, and we should favor the smaller model (Occam’s razor). If it is large, however, then the improved fit of the full model is worth the cost in complexity. 

Suppose that the full and submodels have \(p_0\) and \(p_1\) parameters, respec- tively. Then, if the data are in fact generated by the submodel, it can be shown that \(F= \frac{ \frac{RSS_{sub} - RSS_{full}}{p_0-p_1}}{\frac{RSS_{full}}{n-p_0}} \sim F_{p_0-p_1, n-p_0}\)

since both the numerator and denominator are \(\chi^2\)-distributed.  This provides grounds for a formal hypothesis test: assume \(H_0\) the data are generated from the submodel, but if the statistic \(F\) is too large when compared to the reference F-distribution, reject it in favor of \(H_1\) that the data are generated by the full model. The two most common scenarios where this type of test is useful are,

A few variations on this standard linear model F and t-testing are worth being familiar with: ANOVA and testing for GLMs.  ANOVA is actually just a name for a common special case of linear mod- els, where there are only one or two predictors in xi, all of which are categorical. We provide additional references for ANOVA in the corresponding section.