authors | description | image | layout | subtitle | tags | title | math | |||
---|---|---|---|---|---|---|---|---|---|---|
|
Remedy for numerical instability |
images/posts/2022-09-22-regularization-in-regression/cover.jpg |
post |
Remedy for numerical instability |
|
Regularization in Regression |
true |
There are a lot of methods of how you can improve your ML model accuracy. They include feature engineering, missing value imputation, improvements in data quality, etc.
One of the effective approaches is regularization. It is a popular concept that helps to control coefficients under the numerical instability in computation taste such as model training.
In this article, we will take a closer look at why you might want to regularize your model. As an example, we will apply a basic regularization technique to a simple linear regression model and learn how it influences the model.
Linear regression is a supervised machine learning model, which can be expressed in a matrix form as follows:
After some transformations described in Training Linear Regression: Normal Equation{:target="_blank"} lecture of Machine Learning Zoomcamp{:target="_blank"}, weight vector
where
-
$X^T$ is the transpose{:target="_blank"} of$X$ , -
$X^T X$ is a Gram matrix{:target="_blank"}, -
$(X^T X)^{-1}$ is the inverse{:target="_blank"} of Gram matrix.
A matrix inversion should be considered with caution. If a matrix contains a column that is a linear combination of its other columns the matrix is singular, which means the inverse matrix does not exist.
Linear dependent columns in a matrix is not a typical case in real-world problems, even though due to noise in the data, characteristics of your machine, OS, or NumPy version there might be some similar vectors in the above sense. When it happens, the weight vector
To overcome this numerical instability problem we can refer to regularization. Regularization in linear regression guarantees the existence of inverse matrix
One of the regularization techniques is adding a factor to the diagonal of matrix
where
-
$I$ is an Identity matrix{:target="_blank"} and -
$\alpha$ is a (typically small) factor.
This modification of the linear regression is commonly called Ridge Regression{:target="_blank"}.
Let’s demonstrate the effect of regularization through an example and see that the more regularization we add (factor
We will build a Linear Regression model for predicting car prices based on a dataset from Kaggle - Car Prices Dataset{:target="_blank"}.
The full code is in the notebook here{:target="_blank"}.
For the sake of simplicity we won’t use any specific ML packages, instead we train a simple linear regression model in a vector form:
# define feature matrix X of size 6x3 with nearly same second and third column
X = np.array([[4, 4, 4],
[3, 5, 5],
[5, 1, 1],
[5, 4, 4],
[7, 5, 5],
[4, 5, 5.00000001]])
# define vector y of size 1x6
y= np.array([1, 2, 3, 1, 2, 3])
# calculate Gram matrix for X
XTX = X.T.dot(X)
XTX
array([[140. , 111. , 111.00000004],
[111. , 108. , 108.00000005],
[111.00000004, 108.00000005, 108.0000001 ]])
# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv
array([[ 3.86409478e-02, -1.26839821e+05, 1.26839770e+05],
[-1.26839767e+05, 2.88638033e+14, -2.88638033e+14],
[ 1.26839727e+05, -2.88638033e+14, 2.88638033e+14]])
# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W
array([-1.93908875e-01, -3.61854375e+06, 3.61854643e+06])
As you can see the second and the third values of the weights vector
Let’s introduce a regularisation term and see how the vector
# add regularization factor 0.01 to the main diagonal of Gram matrix
XTX = XTX + 0.01 * np.eye(3)
# take inverse matrix of Gram matrix
XTX_inv = np.linalg.inv(XTX)
XTX_inv
array([[ 3.85624712e-02, -1.98159300e-02, -1.98158861e-02],
[-1.98159300e-02, 5.00124975e+01, -4.99875026e+01],
[-1.98158861e-02, -4.99875026e+01, 5.00124974e+01]])
# calculate a weights vector w:
w = XTX_inv.dot(X.T).dot(y)
W
array([0.33643484, 0.04007035, 0.04007161])
The weights in vector
The example of applying regularization in Linear Regression for car price prediction can be found in this notebook{:target="_blank"}.
The main purpose of regularization techniques is to control the weights vector
Regularization is capable of finding a solution when there are correlated columns, reduce overfitting and improve your model performance in many cases.