Revisiting Linear Regression

Learning Objectives

After this unit, students should be able to

state various kinds of linear regression models.
interpret parameters of simple linear regression model.
interpret parameters of multiple linear regression model.

Revise Unit 17

Readers should carefully revise Unit 17 before proceeding with this unit.

For the sake of completeness, we will restate the problem of linear regression.

We are given a labeled dataset \(D = \{(x_i, y_i)\}\) of \(n\) points where \(\mathbf{x}_i \in \mathbb{R}^m, y_i \in \mathbb{R}\). We can represent the features as data matrix \(X \in \mathbb{R}^{n \times (m + 1)}\) (augmented matrix to account for the intercepts) and the labels as a vector \(\mathbf{y} \in \mathbb{n}\). Linear regression estimates parameters \(\mathbf{b} \in \mathbb{R}^{m+1}\) that explain the observed data as \(\mathbf{y} \approx X\mathbf{b}\).

Notation

Linear regression is one of the most widely used model in data analytics. It has been used in a wide variety of domains such as econometrics, business, applied mathematics, sociology, pure sciences, medicine. Each domain has used terminologies from its own rich literature to denote features and labels.

Features of the datapoint \(x_i\)s are called as predictors, explanatory variables, independent variables or exogenous variables. In order to avoid confusion, we will call them predictors.
Labels of the datapoint \(y_i\)s are called as responses, scores, outcomes, dependent variable or endogenous variable. In order to avoid confusion, we will call them responses.

Types of Linear Regression

Linear regression can be classified in to one of the following types based on loss function.

Ordinary Least Square (OLS) regression. The linear regression that optimises the least square loss on the training data as presented in Unit 17 is said to be ordinary least square regression.

\[ \hat{\mathbf{b}} = \underset{\mathbf{b} \in \mathbb{R}^{m+1}}{\operatorname{\arg \min}} (\mathbf{y} - X\mathbf{b})^T (\mathbf{y} - X\mathbf{b}). \]

Weighted Least Square (OLS) regression. The linear regression that optimises weighted least square loss on the training data as presented in Unit 17 is said to be weighted least square regression. \(W\) in the following equation is a diagonal matrix, called as the weight matrix. It can be set such that variable importance can be given to the datapoints. The minimisation favours datapoints with lower weights over the data points with higher weights.

\[ \hat{\mathbf{b}} = \underset{\mathbf{b} \in \mathbb{R}^{m+1}}{\operatorname{\arg \min}} (\mathbf{y} - X\mathbf{b})^T W (\mathbf{y} - X\mathbf{b}). \]

Linear regression can be classified in to one of the following types based on the number of dimensions of predictors and responses.

Simple Linear Regression. The linear regression trained on the data with a single predictor and a single response is called as the simple linear regression. For every datapoint, \((x_i, y_i)\) where \(x_i, y_i \in \mathbb{R}\) simple linear regression learns a vector \(\mathbf{b} \in \mathbb{R}^2\) such that

\[ y_i = b_0 + b_1x_i \]

Multiple Linear Regression. The linear regression trained on the data with multiple predictor and a single response is called as the multiple linear regression. For every datapoint, \((x_i, y_i)\) where \(x_i \in \mathbb{R}^d, y_i \in \mathbb{R}\), multiple linear regression learns vector \(\mathbf{b} \in \mathbb{R}^{d+1}\) such that:

\[ y_i = b_0 + b_1x_1 + b_2x_2 + ... + b_dx_d \]

Multivariate Linear Regression. The linear regression trained on the data with multiple predictors and multiple responses is called as the multivariate linear regression. For every datapoint, \((x_i, y_i)\) where \(x_i \in \mathbb{R}^d, y_i \in \mathbb{R}^k\) multivariate linear regression learns a matrix \(B \in \mathbb{R}^{(d+1) \times k}\) such that

\[ y_{ij} = b_{0j} + \sum_{l=1}^d x_{il}b_{lj} \]

In this text, we will focus on ordinary least square regression in its simple as well as multiple linear regression form.

Interpreting Linear Regression

Let us consider a simple linear regression modeled by the equation \(y = b_0 + b_1x\). \(b_0\) is the intercept of the line whereas \(b_1\) is the slope of the line. The intercept (\(b_0\)) is the value of response when the predictor equals to zero. The slope (\(b_1\)) is the change in response per unit change the value of the predictor.

Consider the linear regression modeled as \(\text{GPA} = 0.5 + 0.75 * \text{Score}\). We can interpret the equation as follows. Student with a zero score obtains the GPA of \(0.5\). The GPA of the student increases by \(0.75\) per unit increment in the score. Had the equation been modeled as \(\text{GPA} = 0.5 - 0.75 * \text{Score}\), we would have interpreted the slope as the decrement of \(0.75\) in GPA per unit increment in the score.

One has to be extra careful to interpret the equation when the predictor is a categorical variable. For instance consider the linear regression: \(\text{GPA} = 0.5 + 0.75 * \text{Gender}\). Let us assume that Gender is ordinally encoded as \(0\) for men and \(1\) for women. The interpretation of the intercept does not change. Per unit increment in the gender must be interpreted as the data for women instead of men. Thus, we would interpret the equation as follows: women tend to score \(0.75\) more GPA than men.

We can naturally extend this interpretation to the case of multiple linear regression. The intercept is still the value of the response when all predictors are set to zero. A coefficient \(b_i\) for the predictor variable \(x_i\) is interpreted as follows: \(b_i\) denotes the change in response per unit change in the \(i^{th}\) predictor holding values of all other features constant.

Consider the linear regression modeled as \(\text{Salary} = 1212 + 340 * \text{PastEXP} + 650 * \text{Test}\). The salary of the person is \(1212\) if the applicant does not appear for the entrance test and does not provide information related to education. The salary increase by \(650\) dollars per unit increment in the test score for two persons with the same educational background. The salary increases by \(340\) dollars per unit increment in the past work experience for two persons with the same score in the entrance exam.

Homework

How can we interpret the intercept of a simple linear regression when the zero value for the predictor does not make any sense?
Prove that the slope represents the change in the response per unit change in the value of the predictor for the simple linear regression.