Tableau vs Other Tools

Tableau is a data visualization and analysis tool which is popular among the data science community.

For better understanding, differences between Tableau and other similar tools are provided below.

Tableau 

Excel

Mainly for visualizations & creating dashboards

Spreadsheet tool mainly for calculations

Can handle Big data

Small data

Can connect to Python & R

-


Tableau 

Python

Data visualization tool

Programming language for Data Analytics and visualization

For basic use, programming is not needed

Simple programming language

Tableau 

Microsoft Power BI

Can handle Big data

Small data

Slightly difficult

Easy

Features of Tableau

  • It helps to develop visualizations

  • It allow user to work with large volume of data

  • It is widely used in business where making data driven decisions are more common

  • It helps to create dashboards

  • It can extract data from various sources including cloud


    Some of the Tableau products

    Tableau Desktop

    Largely used for business intelligence

    • Visualizing large data

    • For creating Worksheets,dashboard and stories on files/server which can be shared

    Tableau Prep

    Tool designed to prepare data - for data cleaning before performing analysis


    Tableau Server

    Used to publish works prepared using Tableau desktop

    Tableau online

    Analytics platform fully hosted on cloud

    • Dashboards can be published and shared with anyone

    • Dashboards are accessible on browser and mobile app



    File types in Tableau

    Tableau workbook(.twb)

    • Stores visualization but not source data

    Tableau data source(.tds)

    • Stores all information required to access a datasource in server

    Tableau bookmark(.tbm)

    • Tableau workbook is connected to worksheet in another tableau workbook

    Tableau extract(.tde)

    • On a large dataset, if aggregate calculation is done for a filtered subset then this file type stores data as filtered and aggregated extract

    Tableau packaged workbook(.twbx)

    • Keeps both extracted data and visualisations

    • Hence, this file type is used when it is necessary to share both data and visualisations

    • Visualizations can be viewed in Tableau reader/Tableau online



Z score (Part 1)

What if I tell you a secret? The secret of how data is distributed in any situation.


Let us start. The mean will be the central point here.

Let us assume if a dataset A is : 10, 12, 14, 16, 19.

Dataset B is: 10, 200, 350, 600, 900.

Now both mean and SD of B is higher than A.


If the dataset C is: 10, 30, 40, 60.

Dataset D is: 31,33,36,40.


Now the mean of dataset C and dataset D are same (35). But what about standard deviation? Are they same? No, the SD of dataset C is higher than dataset D.


So to understand the distribution of data we need both mean and also standard deviation.


While the mean conveys the central point, SD tells us the spread of the data.


Next concept is Z score. What is Z score? In simple words, Z score combines both mean and standard deviation of the data.


Z = (x-mean)/standard deviation.


What is x here? x is a value.


Now let us understand Z scores with example. For example, if the mean is, let’s say 20, and SD is 2, then the Z score for x=20 is 0.


But if the value of x is far away from the mean, let us say x is 30, then the Z score is 5.


Similarly, if the value of x is on the lower side, that is, let us say x is 10, then also Z score is -5, that is minus 5.


(to be continued)

Basics of Blockchain technology

Blockchain technology is popular these days. In this blog, let us understand some of the basic concepts.


What is blockchain?

Why do we need this blockchain?

How does blockchain ensure trust?

Who invented it?

When to use it?

When not to use it?


Let us start.


What is blockchain?


In simplest words, blockchain is a chain of blocks. Okay, then what are blocks? Blocks contain some information:

  1. Transaction details

  2. Participants

  3. Something unique about that block


So, block is a digital information holder.


Okay, now why do we need this blockchain?


Blockchain is like a ledger or a record keeping book. Problem with record keeping book is that anyone can steal it or modify it. However, blockchain overcomes this problem. 


But how does blockchain ensure trust?


  • Blockchain is like a distributed ledger. Instead of one person owning it, here ledger is collectively owned.

  • Okay, now since it is a distributed ledger, consensus of the majority is needed to change or write new information on a block.

  • Further, once data has been written on a block, it cannot be changed retroactively since we cannot fool all.


Who invented it?

A person (s) by the name Satoshi Nakamoto in 2008 whose identity we don’t know yet.


When to use it?

When there is a need for decentralization or need for a shared ledger/database.


When not to use it?

Since transactions take time in blockchain and consume lots of resources, if there is a need for faster performance then blockchain is not suited. 


Feature Selection using sklearn

In this post, we will understand how to perform Feature Selection using sklearn.

  • Dropping features which have low variance
    • Dropping features with zero variance
    • Dropping features with variance below the threshold variance
  • Univariate feature selection
  • Model based feature selection
  • Feature Selection using pipeline

Feature Engineering for machine learning

In this post, let us explore:

  • What is the difference between Feature Selection, Feature Extraction, Feature Engineering and Feature Learning
  • Process of Feature Engineering 
  • And examples of Feature Engineering

Feature Selection: Filter method, Wrapper method and Embedded method

In this post, let us explore:
  • What is feature selection?
  • Why we need to perform feature selection?
  • Methods

Naïve Bayes classification model for Natural Language Processing problem using Python

In this post, let us understand how to fit a classification model using Naïve Bayes (read about Naïve Bayes in this post) to a natural language processing (NLP) problem.