Difference between revisions of "Página de pruebas 3"

From Sinfronteras
Jump to: navigation, search
(Applying Bayes' Theorem)
 
(445 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Naive Bayes==
+
{{Sidebar}}
Multinomial Naive Bayes: https://www.youtube.com/watch?v=O2L2Uv9pdDA
 
  
Gaussian Naive Bayes: https://www.youtube.com/watch?v=uHK1-Q8cKAw       https://www.youtube.com/watch?v=H3EjCKtlVog
+
<html><buttonclass="averte" onclick="aver()">aver</button></html>
  
 +
<html>
 +
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
 +
<script>
 +
function aver() {
 +
  link = "http://wiki.sinfronteras.ws/index.php?title=P%C3%A1gina_de_pruebas_3+&+action=edit"
 +
  link2 = link.replace("amp;","")
 +
  window.location = link2
 +
  sleep(2);
 +
  window.document.getElementById('firstHeading').style.color = "red"
 +
}
 +
$(document).ready( function() {
 +
    $('#totalItems, #enteredItems').keyup(function(){
 +
        window.document.getElementById('firstHeading').style.color = "red"
 +
    }); 
 +
    window.document.getElementById('firstHeading').style.color = "red"
 +
});
 +
</script>
 +
</html>
  
 +
<br />
 +
==Projects portfolio==
  
https://www.youtube.com/watch?v=Q8l0Vip5YUw
 
  
https://www.youtube.com/watch?v=l3dZ6ZNFjo0
+
<br />
 +
==Data Analytics courses==
  
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
 
  
https://scikit-learn.org/stable/modules/naive_bayes.html
+
<br />
 +
==Possible sources of data==
  
  
 +
<br />
 +
==What is data==
  
Noel's Lecture and Tutorial:
 
https://moodle.cct.ie/mod/scorm/player.php?a=4&currentorg=tuto&scoid=8&sesskey=wc2PiHQ6F5&display=popup&mode=normal
 
  
Note, on all the Naive Bayes examples given, the Performance operator is Performance (Binomial Classification)
+
<br />
 +
===Qualitative vs quantitative data===
  
  
 
<br />
 
<br />
'''Naive Bayes classifiers''' are a family of "probabilistic classifiers" based on applying the Bayes' theorem to calculate the conditional probability of an event A given that another event B (or many other events) has occurred.
+
====Discrete and continuous data====
  
  
'''The Naïve Bayes algorithm''' is named as such because it makes a couple of naïve assumptions about the data. In particular, '''it assumes that all of the features in a dataset are equally important and independent ''' [strong independence assumptions between the features «naïve» (features are the conditional events)]
+
<br />
 +
===Structured vs Unstructured data===
  
  
These assumptions are rarely true in most of the real-world applications. However, in most cases when these assumptions are violated, Naïve Bayes still performs fairly well. This is true even in extreme circumstances where strong dependencies are found among the features.
+
<br />
 +
===Data Levels and Measurement===
  
  
Bayesian classifiers utilize training data to calculate an observed probability for each class based on feature values (the values of the conditional events). When such classifiers are later used on unlabeled data, they use those observed probabilities to predict the most likely class, given the features in the new data.
+
<br />
 +
===What is an example===
  
  
Due to the algorithm's versatility and accuracy across many types of conditions, Naïve Bayes is often a strong first candidate for classification learning tasks.
+
<br />
 +
===What is a dataset===
  
  
 
<br />
 
<br />
'''Bayesian classifiers have been used for:'''
+
===What is Metadata===
* '''Text classification:'''
 
:* Spam filtering: It uses the frequency of the occurrence of words in past emails to identify junk email.
 
:* Author identification, and Topic modeling
 
  
  
* '''Weather forecast:''' The chance of rain describes the proportion of prior days with similar measurable atmospheric conditions in which precipitation occurred. A 60 percent chance of rain, therefore, suggests that in 6 out of 10 days on record where there were similar atmospheric conditions, it rained.
+
<br />
 +
==What is Data Science==
  
  
* Diagnosis of medical conditions, given a set of observed symptoms.
+
<br />
 
+
===Supervised Learning===
  
* Intrusion detection and anomaly detection on computer networks
 
  
  
 
<br />
 
<br />
===Probability===
+
===Unsupervised Learning===
The probability of an event can be estimated from observed data by dividing the number of trials in which an event occurred by the total number of trials.
 
  
  
* '''Events'''  are possible outcomes, such as a heads or tails result in a coin flip, sunny or rainy weather, or spam and not spam email messages.
+
<br />
 +
===Reinforcement Learning===
  
* '''A trial''' is a single opportunity for the event to occur, such as a coin flip, a day's weather, or an email message.
 
  
 +
<br />
 +
==Some real-world examples of big data analysis==
  
* '''Examples:'''
 
:* If it rained 3 out of 10 days, the probability of rain can be estimated as 30 percent.
 
:* If 10 out of 50 email messages are spam, then the probability of spam can be estimated as 20 percent.
 
  
 
+
<br />
* '''The notation''' <math>P(A)</math> is used to denote the probability of event <math>A</math>, as in <math>P(spam) = 0.20</math>
+
==Statistic==
  
  
 
<br />
 
<br />
 +
==Descriptive Data Analysis==
  
===Independent and dependent events===
 
If the two events are totally unrelated, they are called '''independent events'''. For instance, the outcome of a coin flip is independent of whether the weather is rainy or sunny.
 
  
On the other hand, a rainy day depends and the presence of clouds are '''dependent events'''. The presence of clouds is likely to be predictive of a rainy day. In the same way, the appearance of the word Viagra is predictive of a spam email.
+
<br />
 
+
===Central tendency===
If all events were independent, it would be impossible to predict any event using data about other events. Dependent events are the basis of predictive modeling.
 
  
  
 
<br />
 
<br />
===Mutually exclusive and collectively exhaustive===
+
====Mean====
<!-- Events are '''mutually exclusive''' and '''collectively exhaustive'''. -->
 
  
In probability theory and logic, a set of events is '''Mutually exclusive''' or '''disjoint''' if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails, but not both. https://en.wikipedia.org/wiki/Mutual_exclusivity
 
  
 +
<br />
 +
=====When not to use the mean=====
  
A set of events is '''jointly''' or '''collectively exhaustive''' if at least one of the events must occur. For example, when rolling a six-sided die, the events <math>1, 2, 3, 4, 5,\ and\ 6</math> (each consisting of a single outcome) are collectively exhaustive, because they encompass the entire range of possible outcomes. https://en.wikipedia.org/wiki/Collectively_exhaustive_events
 
  
 
+
<br />
Is a set of events is Mutially exclusive and Collectively exhaustive, such as <math>heads</math> or <math>tails</math>, or <math>spam</math> and <math>ham\ (non-spam)</math>, then knowing the probability of <math>n-1</math> outcomes reveals the probability of the remaining one. In other words, if there are two outcomes and we know the probability of one, then we automatically know the probability of the other: For example, given the value <math>P(spam) = 0.20</math>, we are able to calculate <math>P(ham) = 1 - 0.20 = 0.80</math>
+
====Median====
  
  
 
<br />
 
<br />
 +
====Mode====
  
===Marginal probability===
 
The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution
 
  
 +
<br />
 +
====Skewed Distributions and the Mean and Median====
  
<br >
 
===Joint Probability===
 
Joint Probability (Independence)
 
  
 +
<br />
 +
====Summary of when to use the mean, median and mode====
 +
measures-central-tendency-mean-mode-median-faqs.php
  
For any two independent events A and B, the probability of both happening (Joint Probability) is:
 
  
 +
<br />
 +
===Measures of Variation===
  
<div style="font-size: 14pt; text-align: center; margin-left:0px">
 
<math>P(A \cap B) = P(A) \times P(B)</math>
 
</div>
 
  
 +
<br />
 +
====Range====
  
[[File:Joint_probability1.png|400px|thumb|center|Taken from https://corporatefinanceinstitute.com/resources/knowledge/other/joint-probability/ <br /> See also: [[Mathematics#Union - Intersection - Complement]] ]]
 
  
 +
<br />
 +
====Quartile====
  
Often, we are interested in monitoring several non-mutually exclusive events for the same trial. If some other events occur at the same time as the event of interest, we may be able to use them to make predictions.
 
  
 +
<br />
 +
====Box Plots====
  
In the case of spam detection, consider, for instance, a second event based on the outcome that the email message contains the word Viagra. This word is likely to appear in a spam message. Its presence in a message is therefore a very strong piece of evidence that the email is spam.
 
  
  
We know that <math>20%</math> of all messages were <math>Spam</math> and <math>5%</math> of all messages contain the word <math>Viagra</math>. Our job is to quantify the degree of overlap between these two probabilities. In other words, we hope to estimate the probability of both <math>Spam</math> and the word <math>Viagra</math> co-occurring, which can be written as <math>P(spam \cap Viagra)</math>.
+
<br />
 +
====Variance====
  
  
If we assume that <math>P(spam)</math> and <math>P(Viagra)</math> are '''independent''' (note, however! that they are not independent), we could then easily calculate the probability of both events happening at the same time, which can be written as <math>P(spam \cap Viagra)</math>
+
<br />
 +
====Standard Deviation====
  
  
Because <math>20%</math> of all messages are spam, and <math>5%</math> of all emails contain the word Viagra, we could assume that <math>5%</math> of the <math>20%</math> of spam messages contains the word <math>Viagra</math>. Thus, <math>5%</math>. of the <math>20%</math> represents <math>1%</math> of all messages <math>( 0.05 * 0.20 = 0.01 )</math>. So, <math>1%</math> of all messages are <math>Spams\ that\ contain\ the\ word\ Viagra \ \rightarrow \ P(spam \cap Viagra) = 1%</math>
+
<br />
 +
==== Z Score ====
  
  
In reality, it is far more likely that <math>P(spam)</math> and <math>P(Viagra)</math> are highly '''dependent''', which means that this calculation is incorrect. Hence the importance of the '''conditional probability'''.
+
<br />
 +
===Shape of Distribution===
  
  
 
<br />
 
<br />
 +
====Probability distribution====
  
===Conditional probability===
 
Conditional probability is a measure of the probability of an event occurring, given that another event has already occurred. If the event of interest is <math>A</math> and the event <math>B</math> is known or assumed to have occurred, "the conditional probability of <math>A</math> given <math>B</math>", or "the probability of <math>A</math> under the condition <math>B</math>", is usually written as <math>P(A|B)</math>, or sometimes <math>P_{B}(A)</math> or <math>P(A/B)</math>. https://en.wikipedia.org/wiki/Conditional_probability
 
  
 
+
<br />
For example, the probability that any given person has a cough on any given day may be only <math>5%</math>. But if we know or assume that the person is sick, then they are much more likely to be coughing. For example, the conditional probability that someone sick is coughing might be <math>75%</math>, in which case we would have that <math>P(Cough) = 5%</math> and <math>P(Cough|Sick) = 75%</math>. https://en.wikipedia.org/wiki/Conditional_probability
+
=====The Normal Distribution=====
  
  
 
<br />
 
<br />
====Kolmogorov definition of Conditional probability====
+
====Histograms====
Al parecer, la definición más común es la de Kolmogorov.
 
  
  
Given two events <math>A</math> and <math>B</math> from the sigma-field of a probability space, with the unconditional probability of <math>B</math> being greater than zero (i.e., <math>P(B)>0</math>), the conditional probability of <math>A</math> given <math>B</math> is defined to be the quotient of the probability of the joint of events <math>A</math> and <math>B</math>, and the probability of <math>B</math>: https://en.wikipedia.org/wiki/Conditional_probability
+
<br />
 +
====Skewness====
  
  
<div style="font-size: 14pt; text-align: center; margin-left:-100px">
+
<br />
<math>
+
====Kurtosis====
P(A \mid B) = \frac{P(A \cap B)}{P(B)}
 
</math>
 
</div>
 
  
  
 
<br />
 
<br />
 +
====Visualization of measure of variations on a Normal distribution====
  
====Bayes s theorem====
 
Also called Bayes' rule and Bayes' formula
 
  
 +
<br />
 +
==Simple and Multiple regression==
  
'''Thomas Bayes (1763)''': An essay toward solving a problem in the doctrine of chances, Philosophical Transactions fo the Royal Society, 370-418.
 
  
 +
<br />
 +
===Correlation===
  
Bayes's Theorem provides a way of calculating the conditional probability when we know the conditional probability in the other direction.
 
  
 +
<br />
 +
====Measuring Correlation====
  
It cannot be assumed that <math>P(A|B) \approx P(B|A)</math>. Now, very often we know a conditional probability in one direction, say <math>P(B|A)</math>, but we would like to know the conditional probability in the other direction, <math>P(A|B)</math>. https://web.stanford.edu/class/cs109/reader/3%20Conditional.pdf. So, we can say that Bayes' theorem provides a way of reversing conditional probabilities: how to find <math>P(A|B)</math> from <math>P(B|A)</math> and vice-versa.
 
  
 +
<br />
 +
=====Pearson correlation coefficient - Pearson s r=====
  
Bayes's Theorem is stated mathematically as the following equation:
 
  
 +
<br />
 +
=====The coefficient of determination <math>R^2</math>=====
  
<div style="font-size: 14pt; text-align: center; margin-left:-100px">
 
<math>
 
P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}
 
</math>
 
</div>
 
  
 +
<br />
 +
====Correlation <math>\neq</math> Causation====
  
<math>P(A \mid B)</math> can be read as the probability of event <math>A</math> given that event <math>B</math> occurred. This is known as conditional probability since the probability of <math>A</math> is dependent or '''conditional''' on the occurrence of event <math>B</math>.
 
  
 +
<br />
 +
====Testing the "generalizability" of the correlation ====
  
'''The terms are usually called:'''
 
  
{|
+
<br />
|* <math>\bold{P(B|A)}</math>
+
===Simple Linear Regression===
|:
 
|Likelihood <ref name=":1" />;
 
|
 
|
 
Also called Update <ref name=":2" />
 
|
 
|
 
|-
 
|* <math>\bold{P(B)}</math>
 
|:
 
|Marginal likelihood;
 
|
 
|
 
Also called Evidence <ref name=":1" />;
 
|
 
|Also called Normalization constant <ref name=":2" />
 
|-
 
|* <math>\bold{P(A)}</math>
 
|:
 
|Prior probability <ref name=":1" />;
 
|
 
|Also called Prior<ref name=":2" />
 
|
 
|
 
|-
 
|* <math>\bold{P(A|B)}</math>
 
|:
 
|Posterior probability <ref name=":1" />;
 
|
 
|Also called Posterior <ref name=":2" />
 
|
 
|
 
|}
 
  
  
 
<br />
 
<br />
=====Likelihood and Marginal Likelihood=====
+
===Multiple Linear Regression===
When we are calculating the probabilities of discrete data, like individual words in our example, and not the probability of something continuous, like weight or height, these '''Probabilities''' are also called '''Likelihoods'''. However, in some sources, you can find the use of the term '''Probability''' even when talking about discrete data. https://www.youtube.com/watch?v=O2L2Uv9pdDA
 
  
  
In our example:
+
<br />
* The probability that the word 'Viagra' was used in previous spam messages is called the '''Likelihood'''.
+
===RapidMiner Linear Regression examples===
* The probability that the word 'Viagra' appeared in any email (spam or ham) is known as the '''Marginal likelihood.'''
 
  
  
 
<br />
 
<br />
 
+
==K-Nearest Neighbour==
=====Prior Probability=====
 
Suppose that you were asked to guess the probability that an incoming email was spam. Without any additional evidence (other dependent events), the most reasonable guess would be the probability that any prior message was spam (that is, 20% in the preceding example). This estimate is known as the prior probability. It is sometimes referred to as the «initial guess»
 
  
  
 
<br />
 
<br />
=====Posterior Probability=====
+
==Decision Trees==
Now suppose that you obtained an additional piece of evidence. You are told that the incoming email contains the word <math>Viagra</math>.
 
  
By applying Bayes' theorem to the evidence, we can compute the posterior probability that measures how likely the message is to be spam.
 
  
In the case of spam classification, if the posterior probability is greater than 50 percent, the message is more likely to be <math>Spam</math> than <math>Ham</math>, and it can potentially be filtered out.
+
<br />
 +
===The algorithm===
  
The following equation is the Bayes' theorem for the given evidence:
 
  
 
+
<br />
<div style="font-size: 14pt; text-align: center; margin-left:-150px">
+
====Basic explanation of the algorithm====
<math>
 
\overbrace{ P(Spam | Viagra) }^{\bold{\color{salmon}{\text{Posterior probability}}}} = \frac{ \overbrace{ P(Viagra|Spam) }^{\bold{\color{salmon}\text{Likelihood}}} \overbrace{P(Spam)}^{\bold{\color{salmon}\text{Prior probability}}} } { \underbrace{P(Viagra) }_{\bold{\color{salmon}\text{Marginal likelihood}}} }
 
</math>
 
</div>
 
<!-- [[File:BayesTheorem-Posterior_probability.png|500px|thumb|center|]] -->
 
  
  
 
<br />
 
<br />
===Applying Bayes' Theorem===
+
====Algorithms addressed in Noel s Lecture====
https://stats.stackexchange.com/questions/66079/naive-bayes-classifier-gives-a-probability-greater-than-1
 
  
Let's say that we are training a Span classifier.
 
  
We need information about the frequency of words in spam or ham (non-spam) emails. We will assume that the Naïve Bayes learner was trained by constructing a likelihood table for the appearance of these four words in 100 emails, as shown in the following table:
+
<br />
 +
=====The ID3 algorithm=====
  
  
<div style="text-align: center; margin-left:-150px; font-size: 12pt">
+
<br />
{| class="wikitable" style="width: 20px; height: 20px; margin: 0 auto; border: 0px"
+
=====The C5.0 algorithm=====
|+
 
|style="background:white; border: 0px"|
 
! colspan="2" |Viagra
 
! colspan="2" |Money
 
! colspan="2" |Groceries
 
! colspan="2" |Unsubscribe
 
|style="background:white; border: 0px"|
 
|-
 
|style="background:white; border: 0px"|
 
|'''Yes'''
 
|'''No'''
 
|'''Yes'''
 
|'''No'''
 
|'''Yes'''
 
|'''No'''
 
|'''Yes'''
 
|'''No'''
 
|'''Total'''
 
|- style="background: #f7a8b8" |
 
|'''Spam'''
 
|4/20
 
|16/20
 
|10/20
 
|10/20
 
|0/20
 
|20/20
 
|12/20
 
|8/20
 
|'''20'''
 
|- style="background: #92bce8" |
 
|'''Ham'''
 
|1/80
 
|79/80
 
|14/80
 
|66/80
 
|8/80
 
|72/80
 
|23/80
 
|57/80
 
|'''80'''
 
|-
 
|'''Total'''
 
|5/100
 
|95/100
 
|24/100
 
|76/100
 
|8/100
 
|92/100
 
|35/100
 
|65/100
 
|'''100'''
 
|}
 
</div>
 
  
  
As new messages are received, the posterior probability must be calculated to determine whether the messages are more likely to be spam or ham, given the likelihood of the words found in the message text.
+
<br />
 +
===Example in RapidMiner===
  
  
 
<br />
 
<br />
====Scenario 1 - A single feature====
+
==Random Forests==
Suppose we received a message that contains the word <math>\bold{Viagra}</math>:'''
+
https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s
  
We can define the problem as shown in the equation below, which captures the probability that a message is spam, given that the words 'Viagra' is present:
 
  
<div style="font-size: 14pt; text-align: center; margin-left:-150px">
+
<br />
<math>
+
==Naive Bayes==
P(Spam|Viagra) = \frac{P(Viagra|spam)P(spam)}{P(Viagra)}
 
</math>
 
</div>
 
 
 
{|
 
|
 
* <math>\bold{P(Viagra|Spam)}</math>
 
|
 
|(Likelihood)
 
|<div style="margin:  5pt">:</div>
 
|The probability that a spam message contains the term <math>Viagra</math>
 
|<div style="margin: 10pt"><math>\rightarrow</math></div>
 
|<math>4/20 = 0.20 = 20%</math>
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
|
 
* <math>\bold{P(Viagra)}</math>
 
|
 
|(Marginal likelihood)
 
|<div style="margin:  5pt">:</div>
 
|The probability that the word <math>Viagra</math> appeared in any email (spam or ham)
 
|<div style="margin: 10pt"><math>\rightarrow</math></div>
 
|<math>5/100 = 0.05 = 5%</math>
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
|
 
* <math>\bold{P(Spam)}</math>
 
|
 
|(Prior probability)
 
|<div style="margin:  5pt">:</div>
 
|The probability that an email is Spam
 
|<div style="margin: 10pt"><math>\rightarrow</math></div>
 
|<math>20/100 = 0.20 = 20%</math>
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
|
 
* <math>\bold{P(Spam|Viagra)}</math>
 
|
 
|(Posterior probability)
 
|<div style="margin:  5pt">:</div>
 
|The probability that an email is Spam given that contain the word <math>Viagra</math>
 
|<div style="margin: 10pt"><math>\rightarrow</math></div>
 
|<math>\frac{0.2 \times 0.2}{0.05} = 0.8 = 80%</math>
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
| colspan="7" |
 
* '''The probability that a message is spam, given that it contains the word "Viagra" is <math>{\bold{80%}}</math>. Therefore, any message containing this term should be filtered.'''
 
|}
 
  
  
 
<br />
 
<br />
 +
===Probability===
  
====Scenario 2 - Class-conditional independence====
 
Suppose we received a new message that contains the words <math>\bold{Viagra,\ Money}</math> and <math>\bold{Unsubscribe}</math>:
 
  
 +
<br />
 +
===Independent and dependent events===
  
<div style="font-size: 14pt; text-align: center; margin-left:-150px">
 
<math>
 
P(Spam|Viagre \cap Money \cap Unsubscribe) = \frac{P(Viagra \cap Money \cap Unsubscribe | spam)P(spam)}{P(Viagra \cap Money \cap Unsubscribe)}
 
</math>
 
</div>
 
  
 
+
<br />
<span style="color: #007bff">For a number of reasons, this is computationally difficult to solve. As additional features are added, tremendous amounts of memory are needed to store probabilities for all of the possible intersecting events. Therefore, '''Class-conditional independence''' can be assumed to simplify the problem.</span>
+
===Mutually exclusive and collectively exhaustive===
  
  
 
<br />
 
<br />
'''Class-conditional independence'''
+
===Marginal probability===
 +
The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution
  
The work becomes much easier if we can exploit the fact that Naïve Bayes assumes independence among events. Specifically, Naïve Bayes assumes '''class-conditional independence''', which means that events are independent so long as they are conditioned on the same class value.
 
  
Assuming conditional independence allows us to simplify the equation using the probability rule for independent events <math>P(A \cap B) = P(A) \times P(B)</math>. This results in a much easier-to-compute formulation:
+
<br >
 +
===Joint Probability===
  
  
<div style="font-size: 11pt; text-align: left; margin-left:80px">
+
<br />
<math>
+
===Conditional probability===
P(Spam|Viagre \cap Money \cap Unsubscribe) = \frac{P(Viagra|Spam) \cdot P(Money|Spam) \cdot P(Unsubscribe|Spam) \cdot P(spam)}{P(Viagra \cap Money \cap Unsubscribe)}
 
</math>
 
</div>
 
  
  
<div style="font-size: 11pt; text-align: left; margin-left:80px">
+
<br />
<math>
+
====Kolmogorov definition of Conditional probability====
P(Non\text{-}spam|Viagre \cap Money \cap Unsubscribe) = \frac{P(Viagra|Non\text{-}spam) \cdot P(Money|Non\text{-}spam) \cdot P(Unsubscribe|Non\text{-}spam) \cdot P(Non\text{-}spam)}{P(Viagra \cap Money \cap Unsubscribe)}
 
</math>
 
</div>
 
  
  
<div style="background: #ededf2; padding: 5px">
+
<br />
<span style="color:#007bff; font-weight: bold"> Es <span style="color: red">EXTREMADAMENTE IMPORTANTE</span> notar que the independence assumption made in Naïve Bayes is <span style="color: red; font-weight: bold">Class-conditional</span>. This means that the words a and b appear independently, given that the message is Spam (and also, given that the message is not Spam). This is why we cannot apply this assumption to the denominator of the equation. This is, we CANNOT assume that <span style="border: 0px solid blue; padding: 5px 0px 5px 0px"><math>P(word\ a \cap word\ b) = P(word\ a)P(word\ b)</math></span> because in this case the words are not conditioned to belong to one class (Span or Non-spam). Esto no me queda del todo claro. See this post:</span> https://stats.stackexchange.com/questions/66079/naive-bayes-classifier-gives-a-probability-greater-than-1
+
====Bayes s theorem====
</div>
 
  
  
So, we are not able to simplify the denominator. Therefore, what is done in Naïve Bayes is to calculate the nominator for both classes (<math>Spam</math> and <math>Non\text{-}spam</math>). Because the denominator is the same for both classes, we are able to state that the class whose nominator is greater would have the greater conditional probability and therefore is the more likely class for the features given.
+
<br />
 +
=====Likelihood and Marginal Likelihood=====
  
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
+
<br />
<math>
+
=====Prior Probability=====
P(Viagra|Spam) \cdot P(Money|Spam) \cdot P(Unsubscribe|Spam) \cdot P(Spam) = \frac{4}{20} \cdot \frac{10}{20} \cdot \frac{12}{20} \cdot \frac{20}{100} = 0.012
 
</math>
 
</div>
 
  
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
+
<br />
<math>
+
=====Posterior Probability=====
P(Viagra|Non\text{-}spam) \cdot P(Money|Non\text{-}spam) \cdot P(Unsubscribe|Non\text{-}spam) \cdot P(Non\text{-}spam) = \frac{1}{80} \cdot \frac{14}{80} \cdot \frac{23}{80} \cdot \frac{80}{100} = 0.0005
 
</math>
 
</div>
 
  
  
Because <math>0.012/0.0005 \approx 24</math>, we can say that this message is 24 times more likely to be <math>Spam</math> than <math>Ham</math>.
+
<br />
 +
===Applying Bayes' Theorem===
  
  
Finally, the probability of spam is equal to the likelihood that the message is spam divided by the likelihood that the message is either <math>Spam</math> or <math>Ham</math>:
+
<br />
 +
====Scenario 1 - A single feature====
  
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
+
<br />
<math>
+
====Scenario 2 - Class-conditional independence====
\text{The probability that the message is}\ Spam\ \text{is} = \frac{0.012}{(0.012 + 0.0005)} = 0.96 = 96%
 
</math>
 
</div>
 
  
  
 
<br />
 
<br />
 
====Scenario 3 - Laplace Estimator====
 
====Scenario 3 - Laplace Estimator====
<!-- Naïve Bayes problem -->
 
  
Suppose we received another message, this time containing the terms: <math>Viagra</math>, <math>Money</math>, <math>Groceries</math>, and <math>Unsubscribe</math>.
 
  
 +
<br />
 +
===Naïve Bayes -  Numeric Features===
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
 
<math>
 
P(Viagra|Spam) \cdot P(Money|Spam) \cdot P(Groceries|Spam) \cdot P(Unsubscribe|Spam) \cdot P(Spam) = \frac{4}{20} \cdot \frac{10}{20} \cdot \frac{0}{20} \cdot \frac{12}{20} \cdot \frac{20}{100} = 0
 
</math>
 
</div>
 
  
 +
<br />
 +
===RapidMiner Examples===
  
Surely this is a misclassification? right?. This problem might arise if an event never occurs for one or more levels of the class. for instance, the term Groceries had never previously appeared in a spam message. Consequently, <math>P(Spam|groceries) = 0%</math>
 
  
This <math>0%</math> value causes the posterior probability of <math>Spam</math> to be zero, giving the presence of the word <math>Groceries</math> the ability to effectively nullify and overrule all of the other evidence.
+
<br />
 +
==Perceptrons - Neural Networks and Support Vector Machines==
  
Even if the email was otherwise overwhelmingly expected to be spam, the zero likelihood for the word <math>Groceries</math> will always result in a probability of <math>spam</math> being zero.
 
  
 +
<br />
 +
==Boosting==
  
A solution to this problem involves using the '''Laplace estimator'''
 
  
 +
<br />
 +
===Gradient boosting===
  
The '''Laplace estimator''', named after the French mathematician Pierre-Simon Laplace, essentially adds a small number to each of the counts in the frequency table, which ensures that each feature has a nonzero probability of occurring with each class.
 
  
Typically, the Laplace estimator is set to 1, which ensures that each class-feature combination is found in the data at least once. The Laplace estimator can be set to any value and does not necessarily even have to be the same for each of the features.
+
<br />
 +
==K Means Clustering==
  
Using a value of 1 for the Laplace estimator, we add one to each numerator in the likelihood function. The sum of all the 1s added to the numerator must then be added to each denominator. The likelihood of <math>Spam</math> is therefore:
 
  
 +
<br />
 +
===Clustering class of the Noel course===
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
 
<math>
 
P(Viagra|Spam) \cdot P(Money|Spam) \cdot P(Groceries|Spam) \cdot P(Unsubscribe|Spam) \cdot P(Spam) = \frac{5}{20} \cdot \frac{11}{20} \cdot \frac{1}{20} \cdot \frac{13}{20} \cdot \frac{20}{100} = 0.0009
 
</math>
 
</div>
 
  
 +
<br />
 +
====RapidMiner example 1====
  
While the likelihood of ham is:
 
  
 +
<br />
 +
==Principal Component Analysis PCA==
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
 
<math>
 
P(Viagra|Non\text{-}spam) \cdot P(Money|Non\text{-}spam) \cdot P(Groceries|Non\text{-}spam) \cdot P(Unsubscribe|Non\text{-}spam) \cdot P(Non\text{-}spam) = \frac{2}{80} \cdot \frac{15}{80} \cdot \frac{9}{80} \cdot \frac{24}{80} \cdot \frac{80}{100} = 0.0001
 
</math>
 
</div>
 
  
 +
<br />
 +
==Association Rules - Market Basket Analysis==
  
This means that the probability of spam is <math>80%</math> and the probability of ham is <math>20%</math>; a more plausible result that the one obtained when Groceries alone determined the result:
 
  
 +
<br />
 +
===Association Rules example in RapidMiner===
  
<div style="font-size: 10pt; text-align: left; margin-left:80px">
 
<math>
 
\text{The probability that the message is}\ Spam\ \text{is} = \frac{0.0009}{(0.0009 + 0.0001)} = 0.899 \approx 90%
 
</math>
 
</div>
 
  
 +
<br />
 +
==Time Series Analysis==
  
<div class="mw-collapsible mw-collapsed" style="width:100%; background: #ededf2; padding: 1px 5px 1px 5px">
 
''' The presentation shows this example this way. I think there are mistakes in this presentation: '''
 
<div class="mw-collapsible-content">
 
* Let's extend our spam filter by adding a few additional terms to be monitored: "money", "groceries", and "unsubscribe".
 
* We will assume that the Naïve Bayes learner was trained by constructing a likelihood table for the appearance of these four words in 100 emails, as shown in the following table:
 
  
 +
<br />
 +
==[[Text Analytics|Text Analytics / Mining]]==
  
[[File:ApplyingBayesTheorem-Example.png|800px|thumb|center|]]
 
  
 +
<br />
 +
==Model Evaluation==
  
As new messages are received, the posterior probability must be calculated to determine whether the messages are more likely to be spam or ham, given the likelihood of the words found in the message text.
 
  
 +
<br />
 +
===Why evaluate models===
  
We can define the problem as shown in the equation below, which captures the probability that a message is spam, given that the words 'Viagra' and Unsubscribe are present and that the words 'Money' and  'Groceries' are not.
 
  
 +
<br />
 +
===Evaluation of regression models===
  
[[File:ApplyingBayesTheorem-ClassConditionalIndependance.png|800px|thumb|center|]]
 
  
Using the values in the likelihood table, we can start filling numbers in these equations. Because the denominatero si the same in both cases, it can be ignored for now. The overall likelihood of spam is then:
+
<br />
 +
===Evaluation of classification models===
  
  
<math>
+
<br />
\frac{4}{20} \cdot \frac{10}{20} \cdot \frac{20}{20} \cdot \frac{12}{20} \cdot \frac{20}{100} = 0.012
+
===References===
</math>
+
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.
  
  
While the likelihood of ham given the occurrence of these words is:
+
<br />
 +
==[[Python for Data Science]]==
  
  
<math>
+
<br />
\frac{1}{80} \cdot \frac{60}{80} \cdot \frac{72}{80} \cdot \frac{23}{80} \cdot \frac{80}{100} = 0.002
+
===[[NumPy and Pandas]]===
</math>
 
  
  
Because 0.012/0.002 = 6, we can say that this message is six times more likely to be spam than ham. However, to convert these numbers to probabilities, we need one last step.
+
<br />
 +
===[[Data Visualization with Python]]===
  
  
The probability of spam is equal to the likelihood that the message is  spam divided by the likelihood that the message is either spam or  ham:
+
<br />
 +
===[[Text Analytics in Python]]===
  
  
<math>
+
<br />
\frac{0.012}{(0.012 + 0.002)} = 0.857
+
===[[Dash - Plotly]]===
</math>
 
  
  
The probability that the message is spam is 0.857. As this is over the threshold of 0.5, the message is classified as spam.
+
<br />
</div>
+
===[[Scrapy]]===
</div>
 
  
  
 
<br />
 
<br />
 +
==[[R]]==
  
===Naïve Bayes -  Numeric Features===
 
Because Naïve Bayes uses frequency tables for learning the data, each feature must be categorical in order to create the combinations of class and feature values comprising the matrix.
 
  
Since numeric features do not have categories of values, the preceding algorithm does not work directly with numeric data.
+
<br />
 +
===[[R tutorial]]===
  
One easy and effective solution is to discretize numeric features, which simply means that the numbers are put into categories knows as bins. For this reason, discretization is also sometimes called '''binning'''.
 
  
This method is ideal when there are large amounts of training data, a common condition when working with Naïve Bayes.
+
<br />
 +
==[[RapidMiner]]==
  
There is also a version of Naïve Bayes that uses a '''kernel density estimator''' that can be used on numeric features with a normal distribution.
 
  
 
+
<br />
[[File:NaiveBayes-NumericFeatures.mp4|700px|thumb|center|]]
+
==Assessments==
  
  
 
<br />
 
<br />
 
+
===Diploma in Predictive Data Analytics assessment===
===RapidMiner Examples===
 
 
 
* '''Example 1:'''
 
:* [[File:NaiveBayes-RapidMiner_Example1.zip]]
 
  
  
 
<br />
 
<br />
* '''Example 2:'''
+
==Notas==
[[File:NaiveBayes-RapidMiner_Example2_1.png|950px|thumb|center|Download the directory including the data, video explanation and RapidMiner process file at [[File:NaiveBayes-RapidMiner_Example2.zip]] ]]
 
  
  
 
<br />
 
<br />
* '''Example 3:'''
+
==References==
:* [[File:NaiveBayes-RapidMiner_Example3.zip]]
 
  
  
 
<br />
 
<br />

Latest revision as of 21:50, 10 March 2021



aver


Contents

Projects portfolio


Data Analytics courses


Possible sources of data


What is data


Qualitative vs quantitative data


Discrete and continuous data


Structured vs Unstructured data


Data Levels and Measurement


What is an example


What is a dataset


What is Metadata


What is Data Science


Supervised Learning


Unsupervised Learning


Reinforcement Learning


Some real-world examples of big data analysis


Statistic


Descriptive Data Analysis


Central tendency


Mean


When not to use the mean


Median


Mode


Skewed Distributions and the Mean and Median


Summary of when to use the mean, median and mode

measures-central-tendency-mean-mode-median-faqs.php



Measures of Variation


Range


Quartile


Box Plots


Variance


Standard Deviation


Z Score


Shape of Distribution


Probability distribution


The Normal Distribution


Histograms


Skewness


Kurtosis


Visualization of measure of variations on a Normal distribution


Simple and Multiple regression


Correlation


Measuring Correlation


Pearson correlation coefficient - Pearson s r


The coefficient of determination


Correlation Causation


Testing the "generalizability" of the correlation


Simple Linear Regression


Multiple Linear Regression


RapidMiner Linear Regression examples


K-Nearest Neighbour


Decision Trees


The algorithm


Basic explanation of the algorithm


Algorithms addressed in Noel s Lecture


The ID3 algorithm


The C5.0 algorithm


Example in RapidMiner


Random Forests

https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s



Naive Bayes


Probability


Independent and dependent events


Mutually exclusive and collectively exhaustive


Marginal probability

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution



Joint Probability


Conditional probability


Kolmogorov definition of Conditional probability


Bayes s theorem


Likelihood and Marginal Likelihood


Prior Probability


Posterior Probability


Applying Bayes' Theorem


Scenario 1 - A single feature


Scenario 2 - Class-conditional independence


Scenario 3 - Laplace Estimator


Naïve Bayes - Numeric Features


RapidMiner Examples


Perceptrons - Neural Networks and Support Vector Machines


Boosting


Gradient boosting


K Means Clustering


Clustering class of the Noel course


RapidMiner example 1


Principal Component Analysis PCA


Association Rules - Market Basket Analysis


Association Rules example in RapidMiner


Time Series Analysis


Text Analytics / Mining


Model Evaluation


Why evaluate models


Evaluation of regression models


Evaluation of classification models


References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.



Python for Data Science


NumPy and Pandas


Data Visualization with Python


Text Analytics in Python


Dash - Plotly


Scrapy


R


R tutorial


RapidMiner


Assessments


Diploma in Predictive Data Analytics assessment


Notas


References