Linear Regression
Linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.
Formulation
Given a dataset
where T denotes the transpose, so that xiTβ is the inner product between vectors xi and β.
Often these n equations are stacked together and written in matrix notation as
where
Notation and terminology
Sometimes one of the regressors can be a non-linear function of another regressor or of the data values, as in polynomial regression and segmented regression. The model remains linear as long as it is linear in the parameter vector β.
The values xij may be viewed as either observed values of random variables Xj or as fixed values chosen prior to observing the dependent variable. Both interpretations may be appropriate in different cases, and they generally lead to the same estimation procedures; however different approaches to asymptotic analysis are used in these two situations.
Applications:
Linear regression is widely used in various fields for prediction, forecasting, and understanding the relationships between variables. Some common applications include:
Predicting house prices based on features such as size, number of bedrooms, and location.
Forecasting sales based on advertising spending, economic indicators, etc.
Analyzing the impact of independent variables on a dependent variable in scientific research.
Advantages and Disadvantages:
Advantages:
Simple and easy to understand.
Provides interpretable coefficients for each independent variable.
Can be applied to both numerical and categorical independent variables.
Disadvantages:
Assumes a linear relationship between variables, which may not always be the case.
Sensitive to outliers and multicollinearity.
Limited to linear relationships and may not capture complex patterns in the data.
Implementation:
In Python, linear regression can be implemented using libraries such as scikit-learn or StatsModels. Here's a basic example using scikit-learn:
Last updated