In this blog post, I will outline the foundational statistical concepts that are essential for every Data Scientist to know. These concepts are timeless and fundamental to Data Science and will not change unlike the constantly evolving versions of software that we use for data analysis. Hence, if you learn these concepts thoroughly, you will remain up-to-date and better equipped with the skills needed to excel.
- The first concept is the Scales of measurement (nominal, ordinal, interval and ratio - understanding scales of measurement is essential for deciding on the appropriate analysis to perform on each type of scale.
- Degrees of Freedom - a basic concept
- Z-score - how many standard deviations a data point is from the mean
- Central Limit Theorem - an important concept
- Standard Deviation vs Standard Error - a confusing topic
- Confidence Interval - useful for interpretation
- Confusion matrix: useful tool for measuring the accuracy of a classification model.
- Occam's Razor, Bias-Variance Tradeoff, No Free Lunch Theorem and The Curse of Dimensionality - to understand the limitations of machine learning
- Train-Test split and Cross-validation: for building an optimum model which neither underfits nor overfits the dataset.
- Components of Time Series (TCSI): this is the fundamental concept for time series analysis.