A Look at Linear Regression

Andrew Cole
Mar 29, 2020
4 min read

A Quick Breakdown of Linear Regression Types

Linear Regression is perhaps the most basic type of machine learning models, but also one of the most essential. Even though it is considered a“basic” model, there are still several layers of mathematical and statistical operations occurring in various types of linear models and having a sense of what each type of regression does is essential to building a strong model.

A linear regression is a statistical method to help estimate strength and direction of the relationship between two or more variables. By establishing a model that accurately captures the relationships between multiple feature variables (independent variables) and a target variable (dependent variable), we can then in turn use the model to predict the values of an unknown dependent variable in the future. We call it a “linear” regression because the model functions along a straight line. We assume that the relationship between dependent and independent variables can be captured with a straight line.

Simple Linear Regression: a single independent variable is used to model a relationship with a target variable. The best straight line between the two points is your “best fit line”.
Multiple Linear Regression: Uses multiple independent variables to predict the predicted target variable value

Types of Linear Regression:

Linear

A standard linear regression takes exactly what was described above. Simple linear regressions are great when you have a smaller amount of data because there is not too much computational complexity, as we can see in the equation below (see “singluar”):

The weights and bias that each of these variables have on the regression is determined via the underlying gradient descent function to arrive at a best fit line between the variables in the regression.

As soon as we add multiple independent variables, we move to the second equation in the figure above: the multiple linear regression.

Polynomial

A polynomial regression uses a similar regression procedure to the linear regression, except that now we can have feature data that is not linear. Where in the linear regression we could only plot a straight line, a polynomial allows for us to weight certain variables heavier than others (based on importance to determining the dependent variable), resulting in a non-linear best fit line. We can change the weights and bias of each variable by changing the exponential value of a particular variable in the regression equation.

With polynomial regressions, it is absolutely essential that we have domain information. This is paramount to most data science applications, but especially here. Understanding the scale of your data and what each feature is and holds will help you determine the value of your variable exponents (see polynomial linear regression in picture above). If we do not have domain information and give an exponential value to a variable that does not actually deserve it, we can end up with a severely overfit regression.

Polynomial regressions give us full control over our features and gives us the ability to model non-linearly separable data.

Ridge (L2-Regularization)

A ridge regression is a method of regularization — the process of adding information to prevent overfitting or limit collinearity of a model. A ridge regression will alter the cost function underlying our regression to add a penalty equivalent to the square magnitude of the coefficients. In other words, we are constraining the coefficients of each equation with a penalty term (lambda) so that the coefficients are shrunk in the presence of multi-collinearity.

If we do not remove collinearity, the model will be extremely rigid, meaning that the weights of each variable will be very strong and there will not be flexibility for the line of best fit to work itself between. The result is a regression model with very high levels of variance.

A ridge regression will add a small squared bias factor to variables so that each feature variable coefficient is pulled away from this issue of rigidity. It is important to not that these coefficients do not shrink to zero, although near zero is possible.

Ridge regressions are often present when we operate with many categorical variables. A good way to check if there is collinearity is to perform a jointplot distribution of a correlation matrix.

Lasso (L1-Regularization)

A lasso regression is very similar to the ridge regression in that it addresses issues of collinearity between our feature variables. The lasso regression will also add bias to the optimization function of the regression, but it uses an absolute bias function instead of squared. By doing so, we enable ourselves to actually perform feature selection (not present in ridge).

In ridge, the coefficient values cannot be equal to zero, but in lasso we have that ability. This means that if a coefficient is equal to zero, we should remove it from the equation to reduce complexity! If we keep them in the equation, we are just giving the regression unnecessary information. Lasso feature selection is a great method when your dataset contains hundreds of features. ‘

ElasticNet Regression

This is your best of both worlds regression method. When performing an ElasticNet Regression, we allow for some rigidity to be inherited from the L2-regularization while also allowing for coefficients to near zero, which enables the all-important feature selection.