Intro to Naive Bayes Classifiers — Machine Learning 101
The Basics:
Let’s say someone approaches you and says:
“Wanna play a game? I have either a soccer ball or a basketball in this bag. My ball has lines, has pentagons, and is black and white. Can tell me what ball I have?”
What would be your answer? If you said soccer ball, I really wouldn’t be surprised. Let’s dig a little deeper; Let’s analyze our approach.

To arrive at answer, we use the probabilities (of whether the object is a soccer or basket ball) that each piece of info infers. All these calculated probabilities the culminate into our final decision that it is indeed a soccer ball.
In essence, this is also how the Naive Bayes (nai·eev beiz) classifiers works. A classifier is a supervised learning model that maps input data to a discrete/categorical output. Naive Bayes classifiers can be used to categorize a lot of things such as spam and even breast cancer!
In this article, we will be talking more about Naives Bayes classification and go into further detail about how the classifier functions.
Important Concepts; Explaining how “naive” and “bayes” fit into Naive Bayes;
Conditional Independence: (explaining “naive”)
The “naive” part of Naive Bayes comes from how the model holds conditional independence amongst its predictors.
Conditional independence is an important term when discussing Naive Bayes. It refers to how a hypothesis (A) has observations B and C, where B and C and independent of each other.
For Naive Bayes, conditional independence means that when the model classifies, it evaluates each observation/predictor individually, “naively” assuming that all predictors are independent of each other.

A and B are conditionally independent given:
Pr (A ∩ B | C) = Pr (A | C) Pr (B | C)
A ∩ B= A intersect B
A | B = A given B
pr( ) = probability of
It is pertinent that Naive Bayes classifiers maintain conditional independence as if we were to analyze relationships between predictors, classifiers would need to calculate the probability of a particular occurrence of predictors and a classification. For this to be accurately accomplished there needs to be a lot of examples of the particular combination of predictors creating the particular classification. This leads to a need for a drastically larger dataset which is often unfeasible or simply not worth sourcing when considering the insignificant loss of accuracy due to the simplifications of conditional independence.
Bayes Theorem: (explaining “bayes”)
The “bayes” in Naive Bayes classifiers comes from Bayes theorem, a major part of how the classifier functions! So what is it?
The Bayes theorem is a probabilistic theorem which examines the conditional probability of an event. Conditional probability is the probability of an event (A) given other event or events. (B) To put it simply, the Bayes theorem is just a method to find the probability of an event given the occurrence of another event or events.
It can stated like this:

A = event
B = event(s)
P(A), P(B) =probability of A, probability of B
P (B|A) = probability of B given A
P (A|B) = probability of A given B
When referring to Naive Bayes classification, A and B can also be denoted, respectively, as y and X.
An example to illustrate the use of Bayes theorem is finding the probability that there is good weather given that you go for a walk where (A) would represent good weather and (B) would represent walking.
How does it work:
Naive Bayes classifiers use the Bayes theorem to classify data.

However, the used function is rewritten to suit different input types. (Ex. booleans, discrete and continuous data)
Naive Bayes classifiers substitutes P(A), P(B) and P(B|A) using the training data as reference.
By doing so, the model is able to estimate the probability of classes (A) given the predictors. (B) Thus, Naive Bayes classifiers are able to infer the most probable class thus classifying the data.
For classification of huge, multivariate datasets, it is possible that a particular classification will not have certain predictor data. In this instance, Naive Bayes classifiers will be unable to classify due to its inability to compute zero values.
Pros and Cons:
Pros:
- Naive Bayes classifiers classify quickly due to relative lack of complexity.(of process)
- Naive Bayes classifiers require less data to train a functioning model (if covariates maintain a certain degree of conditional independence)
- Naive Bayes classifiers function well with discrete data and can be applied efficiently for multiclass classification
Cons:
- If a categorical type of predictor was in the training data but not in the testing, a zero probability will be given which will prevent the model from classifying. This is known as zero frequency or zero factor. When this issue arises, we can use smoothing techniques like the Laplace Estimation.
- Naive Bayes assumes conditional independence of all of its predictors. However, in the real world, covariates often have relationships between each other. As such, Naive Bayes classifier is not using all the potential information that data has to offer.
Applications:
Naive Bayes is mostly applied to:
- real time classification - the speed of Naive Bayes allows it to classify things in real time
- text classification - Naive Bayes functions well in multi-class classification. As such, it is excellent for text classification. It is popularly used to identify spam and not spam.
TL;DR:
- Naive Bayes classifiers is used for univariate and multivariate classification of discrete and continuous predictors
- Bayes theorem defines the conditional probability of an event
- Naive Bayes classifiers assume conditional independence
- Naive Bayes classifiers uses the Bayes theorem to find the most probable class given certain covariates
- Naive Bayes can be applied to real time classification and text classification due to its speed and profiency in multi-class classification.
I hope you’ve found this article interesting. Thanks for reading!
Feel free to contact me at my email: martintin@rogers.com