authors

description

image

layout

subtitle

Linear Regression

Linear regression is a supervised machine learning model, which can be expressed in a matrix form as follows:

$$g(X) \approx y$$

$X$ is a matrix where the features of observations are rows of the matrix and $y$ is a vector with the values we want to predict. Function $g(X)$ can be represented as a matrix-vector multiplication of feature matrix $X$ and some weight vector $w$:

$$Xw = y$$

After some transformations described in Training Linear Regression: Normal Equation{:target="_blank"} lecture of Machine Learning Zoomcamp{:target="_blank"}, weight vector $w$ can be represented as:

$$w = (X^T X)^{-1} X^T y$$

where

$X^T$ is the transpose{:target="_blank"} of $X$,
$X^T X$ is a Gram matrix{:target="_blank"},
$(X^T X)^{-1}$ is the inverse{:target="_blank"} of Gram matrix.

A matrix inversion should be considered with caution. If a matrix contains a column that is a linear combination of its other columns the matrix is singular, which means the inverse matrix does not exist.

Why do we need regularization in Linear Regression

Linear dependent columns in a matrix is not a typical case in real-world problems, even though due to noise in the data, characteristics of your machine, OS, or NumPy version there might be some similar vectors in the above sense. When it happens, the weight vector $w$ can result in very large values (both positive and negative) and the overall model predictions won't be useful.

To overcome this numerical instability problem we can refer to regularization. Regularization in linear regression guarantees the existence of inverse matrix $(X^T X)^{-1}$

One of the regularization techniques is adding a factor to the diagonal of matrix $X^T X$ like this:

$$w = (X^T X + \alpha I)^{-1} X^T y$$

where

$I$ is an Identity matrix{:target="_blank"} and
$\alpha$ is a (typically small) factor.

This modification of the linear regression is commonly called Ridge Regression{:target="_blank"}.

How regularization can fix your Regression

Let’s demonstrate the effect of regularization through an example and see that the more regularization we add (factor $\alpha$), the smaller the weights $w$ become.

We will build a Linear Regression model for predicting car prices based on a dataset from Kaggle - Car Prices Dataset{:target="_blank"}.

The full code is in the notebook here{:target="_blank"}.

For the sake of simplicity we won’t use any specific ML packages, instead we train a simple linear regression model in a vector form:

# define feature matrix X of size 6x3 with nearly same second and third column 
X = np.array([[4, 4, 4],
             [3, 5, 5],
             [5, 1, 1],
             [5, 4, 4],
             [7, 5, 5],
             [4, 5, 5.00000001]])
# define vector y of size 1x6
y= np.array([1, 2, 3, 1, 2, 3])
# calculate Gram matrix for X
XTX = X.T.dot(X)
XTX

array([[140.        , 111.        , 111.00000004],
       [111.        , 108.        , 108.00000005],
       [111.00000004, 108.00000005, 108.0000001 ]])
 
# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 3.86409478e-02, -1.26839821e+05,  1.26839770e+05],
       [-1.26839767e+05,  2.88638033e+14, -2.88638033e+14],
       [ 1.26839727e+05, -2.88638033e+14,  2.88638033e+14]])

# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W

array([-1.93908875e-01, -3.61854375e+06,  3.61854643e+06])

As you can see the second and the third values of the weights vector $w$ are huge. It comes from the fact that initial feature matrix $X$ contains almost the same columns: 2 and 3.

Let’s introduce a regularisation term and see how the vector $w$ changes:

# add regularization factor 0.01 to the main diagonal of Gram matrix
XTX = XTX + 0.01 * np.eye(3)

# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv

array([[ 3.85624712e-02, -1.98159300e-02, -1.98158861e-02],
       [-1.98159300e-02,  5.00124975e+01, -4.99875026e+01],
       [-1.98158861e-02, -4.99875026e+01,  5.00124974e+01]])


# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W

array([0.33643484, 0.04007035, 0.04007161])

The weights in vector $w$ now are reasonable and suitable for prediction.

The example of applying regularization in Linear Regression for car price prediction can be found in this notebook{:target="_blank"}.

Summary

The main purpose of regularization techniques is to control the weights vector $w$ and not let it grow too large in magnitude. A regularized regression considered in this article is called Ridge Regression and you can typically find it in various ML packages (e.g. scikit-learn{:target="_blank"}).

Regularization is capable of finding a solution when there are correlated columns, reduce overfitting and improve your model performance in many cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-09-22-regularization-in-regression.md

2022-09-22-regularization-in-regression.md

Linear Regression

Why do we need regularization in Linear Regression

How regularization can fix your Regression

Summary

Files

2022-09-22-regularization-in-regression.md

Latest commit

History

2022-09-22-regularization-in-regression.md

File metadata and controls

Linear Regression

Why do we need regularization in Linear Regression

How regularization can fix your Regression

Summary