Basic Assumptions In A Linear Regression Model

When we build a Supervised Machine Learning Regression Model , there are certain Basic Assumptions In A Linear Regression Model. These goes as part of the Model Building. Many beginners mostly don't get a clear picture as to what are those. But it is a Best Practice to know and understand that No model is Perfect - everything works within certain limits and boundaries. This keeps us at control as regards to the practicality about where and when a specific model should (or should not) be used. The below pointers are pertaining to the Basic Assumptions used in a Linear Regression Model -

Basic Assumption in a Linear Regression model stands based on the term "Linear" itself. It means it is assumed that there is a linear and additive relationship between Target\Dependent\Response\Output variable and the Features\Independent\Input variable(s).
There should be no Multi-collinearity. What we mean by this is - the Feature variables used in the model should not be correlated. Normally Variance Inflation Factor (VIF) is the measure which is used to check Multi-collinearity. If the variables are correlated, it becomes extremely difficult for the model to determine the true effect of Input variables on the Output variable.
Residual Error is the difference between Actual and Predicted values. There should be no correlation between the Residual Errors.
The Residual Errors should have a Property called Homoskedasticity. Which means Residual errors should have a constant variance. When you plot the Residual errors against Predicted Values , basically it should give a relatively shapeless graph (without any clear patterns) and be generally symmetrical around the zero-line without particularly large residuals. Heteroskedasticity is the opposite of that (which means Residual error vs Predicted value graph shows a clear pattern e.g. Funnel-shape) which is not good . We don't want Heteroskedasticity.
The Residual Error terms must be normally distributed. We normally use Q-Q plot to check Normality. If the graph is a Straight line , it is a Happy case. If we see the plot as curved or distorted line , then it shows error terms are not Normally Distributed.

Additional Read -

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

Basic Assumptions In A Linear Regression Model

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

Basic Assumptions In A Linear Regression Model

Popular Articles

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)