When we build a Supervised Machine Learning Regression Model , there are certain Basic Assumptions In A Linear Regression Model. These goes as part of the Model Building. Many beginners mostly don't get a clear picture as to what are those. But it is a Best Practice to know and understand that No model is Perfect - everything works within certain limits and boundaries. This keeps us at control as regards to the practicality about where and when a specific model should (or should not) be used. The below pointers are pertaining to the Basic Assumptions used in a Linear Regression Model -

- Basic Assumption in a Linear Regression model stands based on the term "Linear" itself. It means it is assumed that there is a linear and additive relationship between Target\Dependent\Response\Output variable and the Features\Independent\Input variable(s).
- There should be no Multi-collinearity. What we mean by this is - the Feature variables used in the model should not be correlated. Normally
**Variance Inflation Factor (VIF)**is the measure which is used to check Multi-collinearity. If the variables are correlated, it becomes extremely difficult for the model to determine the true effect of Input variables on the Output variable. - Residual Error is the difference between Actual and Predicted values. There should be no correlation between the Residual Errors.
- The Residual Errors should have a Property called
**Homoskedasticity**. Which means Residual errors should have a constant variance. When you plot the Residual errors against Predicted Values , basically it should give a relatively shapeless graph (without any clear patterns) and be generally symmetrical around the zero-line without particularly large residuals.**Heteroskedasticity**is the opposite of that (which means Residual error vs Predicted value graph shows a clear pattern e.g. Funnel-shape) which is not good . We don't want Heteroskedasticity. - The Residual Error terms must be normally distributed. We normally use
**Q-Q plot**to check Normality. If the graph is a Straight line , it is a Happy case. If we see the plot as curved or distorted line , then it shows error terms are not Normally Distributed.

**Free Google Datasets for Analytics NLP & AI****Best Laptop Hardware Configuration for Coding (Deep Learning or Machine Learning or Big Data) ?****AutoML – Databricks’ Automated End-To-End Machine Learning**