Great questions Fomins, and apologies for the delay in replying! Let me see how best to address each of your queries:
I have used statsmodels to compare the lagged model with the mean model a.k.a. the intercept only model, a.k.a. the null model. That’s because the F-statistic reported by statsmodels is always implicitly w.r.t. to the mean model. So in this case, k1=degrees of freedom of mean model=1 and k2=degrees of freedom of the lagged model = 2
To your second question, you would set k1=2 and k2=3. Then you would use the formula I have mentioned in the article, namely,
F(k2-k1, n-k2) = [(RSS1-RSS2)/(RSS2)]*[(n-k2)/(k2-k1)]
to calculate the F-statistic. Then you would use the F-table to lookup the p-value for F(k2-k1, n-k2) at say 95% level. If p ≥ 0.05 (95% level), you *accept* H0 that model2 really doesn’t explain the variance in y any better than model1.
Regarding your third question, the following insight might help a bit:
The formula for the F-statistic is actually a generalization of the formula for F when model1 is the mean model. When model1 is the null model, RSS1 which is the residual error of the mean model is what’s known as the total sum of squares (TSS). RSS2 is the Residual Sum of Squares of your unrestricted model with ‘p’ variables— model2. k1=1 and k2=p+1. The F-statistic in this special case becomes:
F(p, n-p-1) = [(TSS-RSS)/RSS]*[(n-p-1)/p]
In OLSR models, it can be shown that:
TSS = ESS + RSS
Where ESS is the explained sum of squares i.e. Σ(y_pred_i — y_mean)² over all i. Hence we have:
F(p, n-p-1) = [ESS/RSS]*[(n-p-1)/p]
Now both ESS and RSS are de-meaned i.e. centered at zero distributions. Why? Because ESS is Σ(y_pred_i — y_mean)² which is obviously centered at 0. And RSS — the residual error of the fitted OLSR model is always centered at zero by a property of the OLS estimation algorithm. So both the numerator and denominator are centered at 0 normal distributions and that is all that is needed for their sums to be a (variance scaled) Chi-squared distributed. You may want to try this fact out as a little experiment. You can set the variance of the random variables different from 1. As long they are N(0, σ²), the squared distributions are Chi-square distributed with a scaling factor of 1/σ² !
I hope that answers your questions.
‘best