We all know some conditions must be fulfilled in order to perform a task. Don’t we?
Like is it possible to play cricket without a ball? I know your answer is “NO”. This is because having a ball is a necessary condition to play cricket or we can say it is assumed that we have a ball when we say that we are playing cricket.
Similar is in the case of regression models, there are certain assumptions which must be fulfilled to perform linear regression.
Some of these are:
This assumption is obvious from the name itself. There must be a linear relationship of the dependent variable with the independent ones. Linearity means that a change in the dependent variable due to one-unit change in independent variable must be same or constant, no matter what the value of X is. Didn’t get it?
Let’s take the example of cricket again… (You might have known till now that I love cricket) So, linearity is reflected by the say that the speed of the delivery (dependent) would increase by 1 kmph if we increase the height of the bowler by 1 inch. So, no matter what was the height of the bowler earlier, be it 5’1 or 6’2, increasing it by 1 inch in both the cases would lead to a 1kmph increase in the speed of the delivery.
No Correlation in the residual terms :
Such an irony it is…. Using a regression model to analyse the correlation between the two variables assuming no correlation in the residual terms. It means that the data should have no autocorrelation which generally is a problem in time series or while calculating the price of a stock as the value at one point of time is dependent on its previous one.
No multicollinearity :
It states there shouldn’t be a high correlation among the independent variables as it will pose difficulty while analysing which independent factor is correlated with the response variable at what level. Say for e.g., When the energy of a person calculated using a machine before the match and the number of energy bars taken by the person before the match, both are taken as independent variable to calculate the speed of the person during a match wouldn’t give accurate results as both have a high collinearity.
It states that the variance should be constant for all the error terms in the model. Generally, this assumption is violated in presence of an outlier.
Can you solve this???
Written by Sarthak Goel ( pre final year in B.com (hons) from Hindu College and 4 Actuarial Papers passed from IFOA and IAI)