Z score (Part 1)

What if I tell you a secret? The secret of how data is distributed in any situation.

Let us start. The mean will be the central point here.

Let us assume if a dataset A is : 10, 12, 14, 16, 19.

Dataset B is: 10, 200, 350, 600, 900.

Now both mean and SD of B is higher than A.

If the dataset C is: 10, 30, 40, 60.

Dataset D is: 31,33,36,40.

Now the mean of dataset C and dataset D are same (35). But what about standard deviation? Are they same? No, the SD of dataset C is higher than dataset D.

So to understand the distribution of data we need both mean and also standard deviation.

While the mean conveys the central point, SD tells us the spread of the data.

Next concept is Z score. What is Z score? In simple words, Z score combines both mean and standard deviation of the data.

Z = (x-mean)/standard deviation.

What is x here? x is a value.

Now let us understand Z scores with example. For example, if the mean is, let’s say 20, and SD is 2, then the Z score for x=20 is 0.

But if the value of x is far away from the mean, let us say x is 30, then the Z score is 5.

Similarly, if the value of x is on the lower side, that is, let us say x is 10, then also Z score is -5, that is minus 5.

(to be continued)

Addictive algorithms

Seth Godin, the famous American author, more than that, a thinker and a marketing guru I would say, said that we have to write daily. He writes daily blogs Inspired by that, I am planning to write more regularly, if not daily.

The topic that I am writing today is how powerful algorithms and statistical models are contributing to our addictions to scroll the feed.

Around 13 years ago or so, it was the era of Orkut, and Facebook also entered the market. I used to like Facebook a lot. At that time, content on the homepage feed was all from my friends arranged chronologically . But something changed and the control over the homepage feed was taken over by Facebook. I was not happy that Facebook is controlling what I have to see on my page.

Now in 2020, the algorithm determined homepage is more of a norm: Twitter, Youtube, Netflix, Amazon Prime, you name it. Most of the views on Netflix are from the recommended section. That shows the power of these models. Models can detect the underlying patterns so accurately that the model knows more than what we consciously know about ourselves. Over time and with more data, it only gets better. Of course, this is both scary and interesting, at the same time.

One big potential application would be in the education sector, to design courses and content in such a way that it maximizes learners' engagement and keeps him/her hooked.

In the short run, tech companies will be happy that views and app usage are up. But the question for which I don't know the answer is, what will the impact of these addictive algorithms in the long run.

What will be the tipping point? Will we get bored with the recommendations? Will the models evolve with changing taste and keep us continually hooked?

Basics of Blockchain technology

Blockchain technology is popular these days. In this blog, let us understand some of the basic concepts.

What is blockchain?

Why do we need this blockchain?

How does blockchain ensure trust?

Who invented it?

When to use it?

When not to use it?

Let us start.

What is blockchain?

In simplest words, blockchain is a chain of blocks. Okay, then what are blocks? Blocks contain some information:

  1. Transaction details

  2. Participants

  3. Something unique about that block

So, block is a digital information holder.

Okay, now why do we need this blockchain?

Blockchain is like a ledger or a record keeping book. Problem with record keeping book is that anyone can steal it or modify it. However, blockchain overcomes this problem. 

But how does blockchain ensure trust?

  • Blockchain is like a distributed ledger. Instead of one person owning it, here ledger is collectively owned.

  • Okay, now since it is a distributed ledger, consensus of the majority is needed to change or write new information on a block.

  • Further, once data has been written on a block, it cannot be changed retroactively since we cannot fool all.

Who invented it?

A person (s) by the name Satoshi Nakamoto in 2008 whose identity we don’t know yet.

When to use it?

When there is a need for decentralization or need for a shared ledger/database.

When not to use it?

Since transactions take time in blockchain and consume lots of resources, if there is a need for faster performance then blockchain is not suited. 

Basics of Pandas - A Python Library - Video

In this video, let us discuss basics of Pandas, a Python library. 

If you like my channel, you may consider subscribing:

Basics of Python - Video

In this video, let us discuss basics of Python.

If you like my channel, you may consider subscribing:

Feature Selection using sklearn

In this post, we will understand how to perform Feature Selection using sklearn.

  • Dropping features which have low variance
    • Dropping features with zero variance
    • Dropping features with variance below the threshold variance
  • Univariate feature selection
  • Model based feature selection
  • Feature Selection using pipeline

Feature Engineering for machine learning

In this post, let us explore:

  • What is the difference between Feature Selection, Feature Extraction, Feature Engineering and Feature Learning
  • Process of Feature Engineering 
  • And examples of Feature Engineering