Thanks for your feedback! I am glad you are finding the article helpful.

2 min readNov 21, 2020

In real world data sets that have not been 'curated' for educational or competition use, the residuals from a fitted OLS model on such data sets are rarely normally distributed. If the residuals 'look' normally distributed, and other performance characteristics of the OLS model are acceptable, then you may want to simply accept the residuals the way they are.

Trust me, your semi-perfect model will still be able to produce results that are useful to your stakeholders. Just be sure to report the confidence interval for every single prediction you make.

When the residuals from the OLS model (or for that matter, from pretty much any regression model) are not normally distributed, it can mean that there is some variance (some 'signal') in the data that the model has not been able to adequately 'explain'. This residual signal has leaked into the model's errors.

This situation can be addressed by doing one or more of 3 things:

1. Perform one or more suitable data transformation such as inflation adjustment, seasonal adjustment, a simple de-trending using the first difference, or a sqrt or a log transformation of the data before you fit the OLS model. An appropriate transformation will have the effect of 'explaining away' some of the variance in the data.

2. Check if there are any regression variables that might be missing from the model and whose absence is causing some the variance to leak through in the residuals.

3. Experiment with a different model.

I hope I was able to answer your question.

'best

Written by Sachin Date

No responses yet