Z score (Part 1)

What if I tell you a secret? The secret of how data is distributed in any situation.


Let us start. The mean will be the central point here.

Let us assume if a dataset A is : 10, 12, 14, 16, 19.

Dataset B is: 10, 200, 350, 600, 900.

Now both mean and SD of B is higher than A.


If the dataset C is: 10, 30, 40, 60.

Dataset D is: 31,33,36,40.


Now the mean of dataset C and dataset D are same (35). But what about standard deviation? Are they same? No, the SD of dataset C is higher than dataset D.


So to understand the distribution of data we need both mean and also standard deviation.


While the mean conveys the central point, SD tells us the spread of the data.


Next concept is Z score. What is Z score? In simple words, Z score combines both mean and standard deviation of the data.


Z = (x-mean)/standard deviation.


What is x here? x is a value.


Now let us understand Z scores with example. For example, if the mean is, let’s say 20, and SD is 2, then the Z score for x=20 is 0.


But if the value of x is far away from the mean, let us say x is 30, then the Z score is 5.


Similarly, if the value of x is on the lower side, that is, let us say x is 10, then also Z score is -5, that is minus 5.


(to be continued)

Basics of Blockchain technology

Blockchain technology is popular these days. In this blog, let us understand some of the basic concepts.


What is blockchain?

Why do we need this blockchain?

How does blockchain ensure trust?

Who invented it?

When to use it?

When not to use it?


Let us start.


What is blockchain?


In simplest words, blockchain is a chain of blocks. Okay, then what are blocks? Blocks contain some information:

  1. Transaction details

  2. Participants

  3. Something unique about that block


So, block is a digital information holder.


Okay, now why do we need this blockchain?


Blockchain is like a ledger or a record keeping book. Problem with record keeping book is that anyone can steal it or modify it. However, blockchain overcomes this problem. 


But how does blockchain ensure trust?


  • Blockchain is like a distributed ledger. Instead of one person owning it, here ledger is collectively owned.

  • Okay, now since it is a distributed ledger, consensus of the majority is needed to change or write new information on a block.

  • Further, once data has been written on a block, it cannot be changed retroactively since we cannot fool all.


Who invented it?

A person (s) by the name Satoshi Nakamoto in 2008 whose identity we don’t know yet.


When to use it?

When there is a need for decentralization or need for a shared ledger/database.


When not to use it?

Since transactions take time in blockchain and consume lots of resources, if there is a need for faster performance then blockchain is not suited. 


Feature Selection using sklearn

In this post, we will understand how to perform Feature Selection using sklearn.

  • Dropping features which have low variance
    • Dropping features with zero variance
    • Dropping features with variance below the threshold variance
  • Univariate feature selection
  • Model based feature selection
  • Feature Selection using pipeline

Feature Engineering for machine learning

In this post, let us explore:

  • What is the difference between Feature Selection, Feature Extraction, Feature Engineering and Feature Learning
  • Process of Feature Engineering 
  • And examples of Feature Engineering

Feature Selection: Filter method, Wrapper method and Embedded method

In this post, let us explore:
  • What is feature selection?
  • Why we need to perform feature selection?
  • Methods

Naïve Bayes classification model for Natural Language Processing problem using Python

In this post, let us understand how to fit a classification model using Naïve Bayes (read about Naïve Bayes in this post) to a natural language processing (NLP) problem.

Natural Language Processing made simple: Word Cloud, Sentiment Analysis and Topic Modelling

In this chapter, let us understand
  • What is NLP?
  • Concepts
  • How to get word cloud?
  • How to perform sentiment analysis?
  • How to build Topic modelling?
  • Summary