Topics in Linear Regression

Learning Objectives

After this unit, students should be able to

model a polynomial regression.
explain the need of regularised linear regression.
identify interaction and additive effects in the regression model.
conduct difference in differences analysis.

Polynomial Regression

Predictors and response in the dataset may not inherently violate the linearity assumption of the linear regression. At such times, it can be advantageous to introduce quadratic, cubic or higher degree polynomial (in general) versions of existing predictors. We can then train OLS regression on old predictors as well as these synthetically generated predictors. Such a regression is called as polynomial regression. For instance: consider a regression model that models response \(y\) is proportional to a single predictor \(x\). Polynomial regression can be trained by introducing new predictors which are squares/cubes of the original predictor \(x\).

\[ y = b_0 + b_1 x + b_2 x^2 + b_3x^3 + ... \]

Even though the predictors are transformed into polynomial terms, the model remains linear in terms of the parameters (coefficients). This linearity in parameters allows OLS to be used because OLS assumes a linear relationship between the dependent variable and the parameters. In polynomial regression, despite the non-linearity of the predictor terms, the relationship between the response variable and the parameters is linear. While polynomial regression can provide a better fit, it also introduces the risk of overfitting, especially with higher-degree polynomials. Overfitting occurs when the model captures noise or random fluctuations in the training data rather than the underlying pattern.

Structural Multicollinearity

The introduction of higher order terms naturally introduce multicollinearity in the data. For instance \(x\) and \(x^2\) certainly have a strong positive correlation between each other. In order to subdue the effect, it is recommended to center the predictor by subtracting off the mean from actual values.

Regularised Regression

Regularised regression is a technique used to prevent overfitting and enhance the predictive performance of regression models by adding a penalty term to the loss function. This penalty term discourages the model from becoming overly complex and helps in handling multicollinearity among predictors. Regularised regression adds a regularisation term, which penalises the higher coefficient values, to the optimisation of the loss. A general version of regularised regression finds coefficients as the solution to following optimisation function:

\[ \hat{\mathbf{b}} = \underset{\mathbf{b} \in \mathbb{R}^{m+1}}{\operatorname{\arg \min}} ~~ \ell_D(\mathbf{b}) + \lambda \mathcal{R}(\mathbf{b}) \]

\(\lambda\) determines the strength of regularisation and it can be set by tuning the model on the specified dataset. Depending on the regularisation term, there are different kinds of regularised regression models. The most popular models are stated below:

Ridge regression. The regularisation term is proportional to \(L2\) norm of the coefficients (\(\sum_i b_i^2\)). Ridge regression penalises high values of coefficients and shrinks them towards zero. It is useful to reduce the effects of outliers in the data.
LASSO regression. The regularisation term is proportional to \(L1\) norm of the coefficients (\(\sum_i |b_i|\)). LASSO regression can shrink some coefficients to exactly zero. Thus, it is indirectly to select appropriate predictors from a large set of predictors.

Additive Effect

An additive effect in linear regression refers to the situation where the effect of each predictor on the response is considered independently of other predictors. The total effect on the response is the sum of the individual effects of each predictor. Predictors in a regression model with additive effect are said to be non-interacting. The simplicity of additive models allow for a straightforward interpretation of the individual effects of each predictor.

additive

Consider the example shown in the adjoining plot. The scatter plot displays data on gestation period versus baby weight, with the data points labeled based on a binary variable indicating the mother's smoking habits. The regression model uses baby weight as the response variable, with smoking habits and gestation period as predictors. We observe that for a given gestation period, the baby's weight can be uniquely determined by the mother's smoking habits, and vice versa.

Interaction Effect

An interaction effect in linear regression occurs when the effect of one predictor variable on the dependent variable depends on the level or value of another predictor variable. This means that the combined influence of two variables is not simply additive but rather multiplicative. To model an interaction effect between two predictor variables, say \(x_1\) and \(x_2\), we include a term in the regression model that represents the product of these variables. The interaction term allows the slope of one predictor to vary depending on the value of the other predictor.

The general form of a linear regression model with an interaction term is:

\[ y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_1 x_2 \]

Statistically significant \(b_3\) leads to interaction effect. Otherwise, the model has additive effect.

interaction

Consider the example shown in the adjoining plot. The scatter plot displays data on efficacy of the vaccine (\(y\)) versus age, with the data points labeled based on the treatment given to patients. The regression model uses efficacy of the vaccine as the response variable, witage and treatment as predictors. We observe that for a given the treatment type, efficacy of the vaccine can not be uniquely determined by the age due to interaction between treatment types and age.