In this post, let us understand how to fit a classification model using Naïve Bayes (read about Naïve Bayes in this post) to a natural language processing (NLP) problem.

Extract the files. This dataset contains 5,572 SMS messages, labelled ham (legitimate) or spam.

Step 2: Import the text data set, provide column names.

Step 3: Convert labels (ham and spam) to numbers (0 and 1).

Step 4: Split the dataset into test and train.

Step 5: Vectorize

In this step, words are converted to numerical structure. You can read more on this here.

Step 6: Vectorize training dataset

Step 7: Vectorize test data set

Step 8: Build the Naïve Bayes classification model. If you want to know what is Naive Bayes model, then read my post on Naive Bayes.

Step 9: Measure the accuracy on test data

Accuracy of the Naïve Bayes model in classifying the test data is 0.98851.

- Download sample dataset
- Split dataset into test and train data
- Vectorize
- Build and measure the accuracy of the model

#### Example

