Data Science Simplified: November 2018

Random Forest: A Beginner's Guide with Visual Illustrations & Examples

In this post, let us explore:

Random Forest
When to use
Advantages
Disadvantages
Hyperparameters
Examples

Basics of Ensemble Models

In this post, let us explore:

Ensemble Models
Bagging
Boosting
Stacking

Train-Test split and Cross-validation: Visual Illustrations & Examples

Building an optimum model which neither underfits nor overfits the dataset takes effort. To know the performance of our model on unseen data, we can split the dataset into train and test sets and also perform cross-validation.

Heatmap: Visual Examples

Heatmap depicts the two-dimensional data (matrix form) in the form of graph.

Occam's Razor, Bias-Variance Tradeoff, No Free Lunch Theorem and The Curse of Dimensionality

In this post, let us discuss some of the basic concepts/theorems used in Machine Learning:

Occam's Razor (Law of Parsimony)
What is Bias-variance Tradeoff
No Free Lunch Theorem
The curse of dimensionality

Mastering Decision Trees with Visual Examples

Decision Tree models are simple and easy to interpret.

In this post, let us explore

What are decision trees
When to use decision trees
Advantages
Disadvantages
Examples with code (Python)

"A picture is worth a thousand words"

A complex idea can be understood effectively with the help of visual representations. Exploratory Data Analysis (EDA) helps us to understand the nature of the data with the help of summary statistics and visualizations capturing the details which numbers can't.

In this post, let us explore

Visualizing the data
Summarizing the data
Correlation matrix

Data Preprocessing: Transformation - Explained with Visual Examples

Data preprocessing is an important step before fitting any model. The following steps are performed under data preprocessing:

Handling missing values
Handling outliers
Transforming nominal variables to dummy variables
Converting ordinal data to numbers
Transformation

In this post, with the help of an example, let us explore transformation:

Standardization
Normalization
Log transformation
How to transform data in Python

Data Preprocessing - Creating Dummy Variables and Converting Ordinal Variables to Numbers with Examples

Data cleaning is a critical step before fitting any statistical model. It includes:

Handling missing values
Handling outliers
Transforming nominal variables to dummy variables (discussed in this post)
Converting ordinal data to numbers (discussed in this post)
Transformation (discussed in this post)

Confusion Matrix, Accuracy, Precision, Recall, F score Explained with Intuitive Visual Examples

In this post, we will learn about

What is accuracy
What are precision, recall, specificity and F score
How to manually calculate these measures
How to interpret these measures
What is confusion matrix and how to construct it
What is the AUC score and its interpretation
How to get confusion matrix and classification report in sklearn

Handling Outliers in Python: Explained with Visual Examples

In this post, we will discuss about

How to identify outliers
How to handle the outliers

Handling Missing Values in Python: Different Methods Explained with Visual Examples

In this post, we will discuss:

How to check for missing values
Different methods to handle missing values

Scales of Measurement - Data types: Nominal, Ordinal, Interval and Ratio scale

There are four measurement scales:

Nominal
Ordinal
Interval
Ratio scale.

Importing data into Python

In this post, we will learn:

How to import data into python
How to import time series data
How to handle different time series formats while importing

Data Science Simplified