## Naive Bayes Classifier Essentials

The **Naive Bayes classifier** is a simple and powerful method that can be used for binary and multiclass classification problems.

Naive Bayes classifier predicts the class membership probability of observations using Bayes theorem, which is based on conditional probability, that is the probability of something to happen, given that something else has already occurred.

Observations are assigned to the class with the largest probability score.

In this chapter, you’ll learn how to perform naive Bayes classification in R using the `klaR`

and `caret`

package.

Contents:

## Loading required R packages

`tidyverse`

for easy data manipulation and visualization`caret`

for easy machine learning workflow

```
library(tidyverse)
library(caret)
```

## Preparing the data

The input predictor variables can be categorical and/or numeric variables.

Here, we’ll use the `PimaIndiansDiabetes2`

[in `mlbench`

package], introduced in Chapter @ref(classification-in-r), for predicting the probability of being diabetes positive based on multiple clinical variables.

We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model). Make sure to set seed for reproducibility.

```
# Load the data and remove NAs
data("PimaIndiansDiabetes2", package = "mlbench")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
# Inspect the data
sample_n(PimaIndiansDiabetes2, 3)
# Split the data into training and test set
set.seed(123)
training.samples <- PimaIndiansDiabetes2$diabetes %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- PimaIndiansDiabetes2[training.samples, ]
test.data <- PimaIndiansDiabetes2[-training.samples, ]
```

## Computing Naive Bayes

```
library("klaR")
# Fit the model
model <- NaiveBayes(diabetes ~., data = train.data)
# Make predictions
predictions <- model %>% predict(test.data)
# Model accuracy
mean(predictions$class == test.data$diabetes)
```

`## [1] 0.821`

## Using caret R package

The `caret`

R package can automatically train the model and assess the model accuracy using k-fold cross-validation Chapter @ref(cross-validation).

```
library(klaR)
# Build the model
set.seed(123)
model <- train(diabetes ~., data = train.data, method = "nb",
trControl = trainControl("cv", number = 10))
# Make predictions
predicted.classes <- model %>% predict(test.data)
# Model n accuracy
mean(predicted.classes == test.data$diabetes)
```

## Discussion

This chapter introduces the basics of Naive Bayes classification and provides practical examples in R using the `klaR`

and `caret`

package.