The concept of confidence intervals (CI) is commonly used in data science. Hence, using an intuitive example, let us learn it with confidence!
Imagine you are waiting for the bus at a bus stop. Usually, the bus arrives at 9.30 am. But the arrival time varies.
Another person arrives at the bus stop to catch the same bus and asks you, "Based on your experience, between 9.25 am to 9.35 am, what percentage of the time the bus arrived here?"
You think and answer, "90% of the time".
He asks again, "And what about between 9.20 am to 9.40 am?"
You answer, "95% of the time".
This is the main logic behind confidence intervals.
The confidence intervals provide an estimated range of values based on certain confidence levels. Here, 90% and 95% refer to the confidence levels. More common is the 95% confidence, while 90% and 99% are also rarely used.
A) 95% Confidence Interval
The 95% Confidence Interval = [9.20 am - 9.40 am] = 9.30 am ± 10 minutes.
Just for easier understanding, I have plotted the same below. (Note that confidence intervals and prediction intervals are different. The following graphics are only for understanding.)
95% confidence interval - wider than 90% CI |
B) 90% Confidence Interval
The 90% Confidence Interval = [9.25 am - 9.35 am] = 9.30 am ± 5 minutes.
90% confidence interval - narrower |
- A higher confidence level produces a wider confidence interval as shown above
- 90% CI narrowest, 95% CI wider than 90% CI. 99% CI would be the widest
- The higher the variability in the sample, the wider will be the confidence interval.
- Keeping other variables constant, a larger sample leads to a narrower confidence interval
- Understanding how to calculate and interpret confidence intervals is an important skill for any data scientist