Image by Clker-Free-Vector-Images from Pixabay (Pixabay License)

Integer ARIMA models are used for modeling time series data consisting of whole numbered counts.

Such data sets pose a few unique challenges:

The data is auto-correlated: Time series data is often auto i.e. self correlated. Any model we build for such data needs to account for these serial correlations…


Hands-on Tutorials

In this article, we’ll cover the following topics:

  • What is efficiency?
  • What is a statistical estimator, and how is its efficiency defined?
  • How to calculate the efficiency of an estimator?
  • How to use efficiency to build better regression models?

Let’s dive in!

What is Efficiency?

In layperson terms:

Efficiency is a measure of…


Thoughts and Theory

Fisher information provides a way to measure the amount of information that a random variable contains about some parameter θ (such as the true mean) of the random variable’s assumed probability distribution.

We’ll start with the raw definition and the formula for Fisher Information.

Definition and formula of Fisher Information

Given a random variable y that…


Absolute difference between sample and population standard deviation plotted against sample size (Image by Author)

A consistent estimator is one which produces a better and better estimate of whatever it is that it’s estimating, as the size of the data sample it is working upon goes on increasing. …


Sea Surface Temperatures of the North Atlantic. Image source: NOAA OSPO under Terms of Use

A statistical estimator can be evaluated on the basis of how biased it is in its prediction, how consistent its performance is, and how efficiently it can make predictions. And the quality of your model’s predictions are only as good as the quality of the estimator it uses.

In this…


Midland beach (Source: NY Public Library digital collections — free for use without restrictions)

Suppose you know the mean value of a sample and you want to use the sample mean to estimate the interval that the population’s mean will lie in. The Interval Estimation technique can be used to arrive at this estimate at some specified confidence level. …


Image by Alexandra_Koch from Pixabay (Pixabay License)

Nonlinear Least Squares (NLS) is an optimization technique that can be used to build regression models for data sets that contain nonlinear features. Models for such data sets are nonlinear in their coefficients.

Structure of this article:

PART 1: The concepts and theory underlying the NLS regression model. This section…


Image by Clker-Free-Vector-Images from Pixabay (Pixabay License)

Poisson and Poisson-like regression models are often used for counts based data sets, namely data that contain whole numbered counts. For example, the number of people walking into the emergency room of a hospital every hour is one such data set.

Ordinary Least Squares Regression based linear models or non-linear…


Vaccine Efficacy (VE) is a measure of how much the vaccine was able to reduce the incidence of the disease in the vaccinated group of people as compared to the non-vaccinated group.

When reported by itself, Vaccine Efficacy is of limited clinical use as it’s only a point estimate of…


Observed and expected frequencies of NUMBIDS (Image by Author)

The Chi-Squared test (pronounced as Kai-squared as in Kaizen or Kaiser) is one of the most versatile tests of statistical significance.

Here are some of the uses of the Chi-Squared test:

  1. Goodness of fit to a distribution: The Chi-squared test can be used to determine whether your data obeys a…

Sachin Date

In-depth explanations of regression and time series models. Get the intuition behind the equations.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store