Difference between revisions of "Supervised Machine Learning for Fake News Detection"

Revision as of 14:22, 15 June 2019

https://github.com/adeloaleman/gofaaas

https://github.com/ShiMarinho/MachineLearning-FakeNews

Declaration

Acknowledgement

Thanks for Muhammad, Graham and Mark

Abstract

Introduction

Chapter 1 Chapter 2 - Training a Supervised Machine Learning Model for fake news detection

Procedure

The Dataset:

Splitting the data into Train and Test data:

Cleaning the data:

Building the Document-Term Matrix:

Model Building:

Cross validation:

Making predictions from the model created and displaying a Confusion matrix:

Results

Summary of Results

Algorithms	Author	Package	Keyword	Accuracy
				Kaggle Fake News Dataset 1		Kaggle Fake News Dataset 2		Gofaaas Fake News Dataset		Using the KFN1* Model to make predictions over the Gofaaas Dataset	Using the KFN2** Model to make predictions over the Gofaaas Dataset
				Train:70% Test: 30%	Cross validation	Train:70% Test: 30%	Cross validation	Train:70% Test: 30%	Cross validation
Naive Bayes	Bayes, Thomas	e1071	NB	${\color {blue}76.41\%}$
Support vector machine	Meyer et al., 2012	We used RTextTools, which depends on e1071	SVM	${\color {blue}95.42\%}$
Random forest	Liawand Wiener, 2002	randomForest	RF
Extreme Gradient Boosting	Chen & Guestrin, 2016	xgboost	XGBOOST	${\color {red}97.42\%}$
General linearized models	Friedman et al., 2010	We used RTextTools, which depends on wglmnet	GLMNET	${\color {blue}94.58\%}$
Maximum entropy	Jurka, 2012	We used RTextTools, which depends on maxent	MAXENT	${\color {blue}96.09\%}$
* KFN1 (Kaggle Fake News Model 1): Model created using 70% (train data) of the Kaggle Fake News Dataset 1. ** KFN2 (Kaggle Fake News Model 2): Model created using 70% (train data) of the Kaggle Fake News Dataset 2.

Evaluation of Results

We evaluate our approach in different settings. First, weperform cross-validation on our noisy training set; second,and more importantly, we train models on the training setand validate them against a manually created gold standard.17Moreover, we evaluate two variants, i.e., including and exclud-ing user features. [smb:home/adelo/1-system/1-disco_local/1-mis_archivos/1-pe/1-ciencia/1-computacion/2-data_analysis-machine_learning/gofaaaz-machine_learning/5-References/7-Weakly_supervised_searning_for_fake_news_detection_on_twitter.pdf]

Datasets used

Kaggle Fake News Dataset 1

https://www.kaggle.com/c/fake-news/data

Distribution of the data:

The distribution of Stance classes in train_stances.csv is as follows:

rows	unrelated	discuss	agree	disagree
49972	0.73131	0.17828	0.0736012	0.0168094

Kaggle Fake News Dataset 2

https://www.kaggle.com/jruvika/fake-news-detection

Gofaaas Fake News Dataset

Algorithms

Naive Bayes

Naïve Bayes is based on the Bayesian theorem, there in order to understand Naïve Bayes it is important to first understand the Bayesian theorem.

Bayesian theorem is a mathematical formula for determining conditional probability which is the probability of something, happening given that something else has already occurred.

P(c|x) is the posterior probability of class (target) given predictor (attribute).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.

Prior probability, in Bayesian statistical inference, is the probability of an event before new data is collected.

Posterior probability is the revised probability of an event occurring after taking into consideration new information.

In statistical terms, the posterior probability is the probability of event A occurring given that event B has occurred.

Revision as of 14:21, 15 June 2019 (view source) Adelo Vieira (talk \| contribs) ← Older edit		Revision as of 14:22, 15 June 2019 (view source) Adelo Vieira (talk \| contribs) Newer edit →
Line 1:		Line 1:
−	<div style="position:fixed;top:0;right:0">__TOC__</div>	+	<div style="position:fixed;top:20;right:20">__TOC__</div>

	https://github.com/adeloaleman/gofaaas		https://github.com/adeloaleman/gofaaas

Difference between revisions of "Supervised Machine Learning for Fake News Detection"

Revision as of 14:22, 15 June 2019

Contents

Declaration

Acknowledgement

Abstract

Introduction

Chapter 1

Chapter 2 - Training a Supervised Machine Learning Model for fake news detection

Procedure

Results

Summary of Results

Evaluation of Results

Datasets used

Kaggle Fake News Dataset 1

Kaggle Fake News Dataset 2

Gofaaas Fake News Dataset

Algorithms

Naive Bayes

Support vector machine

Random forest

Extreme Gradient Boosting

The XGBoost R package

The RTextTools package

Chapter 3 - The Gofaaas-Fake News Detector R Package

Installation

Functions

Chapter 4 - Gofaas Web App

Conclusion

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Tools