Difference between revisions of "Página de pruebas 3"

Latest revision as of 21:50, 10 March 2021

aver

1 Projects portfolio
2 Data Analytics courses
3 Possible sources of data
4 What is data
- 4.1 Qualitative vs quantitative data
  - 4.1.1 Discrete and continuous data
- 4.2 Structured vs Unstructured data
- 4.3 Data Levels and Measurement
- 4.4 What is an example
- 4.5 What is a dataset
- 4.6 What is Metadata
5 What is Data Science
- 5.1 Supervised Learning
- 5.2 Unsupervised Learning
- 5.3 Reinforcement Learning
6 Some real-world examples of big data analysis
7 Statistic
8 Descriptive Data Analysis
- 8.1 Central tendency
- 8.2 Measures of Variation
- 8.3 Shape of Distribution
9 Simple and Multiple regression
- 9.1 Correlation
- 9.2 Simple Linear Regression
- 9.3 Multiple Linear Regression
- 9.4 RapidMiner Linear Regression examples
10 K-Nearest Neighbour
11 Decision Trees
- 11.1 The algorithm
  - 11.1.1 Basic explanation of the algorithm
  - 11.1.2 Algorithms addressed in Noel s Lecture
    - 11.1.2.1 The ID3 algorithm
    - 11.1.2.2 The C5.0 algorithm
- 11.2 Example in RapidMiner
12 Random Forests
13 Naive Bayes
- 13.1 Probability
- 13.2 Independent and dependent events
- 13.3 Mutually exclusive and collectively exhaustive
- 13.4 Marginal probability
- 13.5 Joint Probability
- 13.6 Conditional probability
  - 13.6.1 Kolmogorov definition of Conditional probability
  - 13.6.2 Bayes s theorem
- 13.7 Applying Bayes' Theorem
- 13.8 Naïve Bayes - Numeric Features
- 13.9 RapidMiner Examples
14 Perceptrons - Neural Networks and Support Vector Machines
15 Boosting
- 15.1 Gradient boosting
16 K Means Clustering
- 16.1 Clustering class of the Noel course
  - 16.1.1 RapidMiner example 1
17 Principal Component Analysis PCA
18 Association Rules - Market Basket Analysis
- 18.1 Association Rules example in RapidMiner
19 Time Series Analysis
20 Text Analytics / Mining
21 Model Evaluation
- 21.1 Why evaluate models
- 21.2 Evaluation of regression models
- 21.3 Evaluation of classification models
- 21.4 References
22 Python for Data Science
- 22.1 NumPy and Pandas
- 22.2 Data Visualization with Python
- 22.3 Text Analytics in Python
- 22.4 Dash - Plotly
- 22.5 Scrapy
23 R
- 23.1 R tutorial
24 RapidMiner
25 Assessments
- 25.1 Diploma in Predictive Data Analytics assessment
26 Notas
27 References

Projects portfolio

Data Analytics courses

Possible sources of data

What is data

Qualitative vs quantitative data

Discrete and continuous data

Structured vs Unstructured data

Data Levels and Measurement

What is an example

What is a dataset

What is Metadata

What is Data Science

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Some real-world examples of big data analysis

Statistic

Descriptive Data Analysis

Central tendency

Mean

When not to use the mean

Median

Mode

Skewed Distributions and the Mean and Median

Summary of when to use the mean, median and mode

measures-central-tendency-mean-mode-median-faqs.php

Measures of Variation

Range

Quartile

Box Plots

Variance

Standard Deviation

Z Score

Shape of Distribution

Probability distribution

The Normal Distribution

Histograms

Skewness

Kurtosis

Visualization of measure of variations on a Normal distribution

Simple and Multiple regression

Correlation

Measuring Correlation

Pearson correlation coefficient - Pearson s r

The coefficient of determination $R^{2}$

Correlation $\neq$ Causation

Testing the "generalizability" of the correlation

Simple Linear Regression

Multiple Linear Regression

RapidMiner Linear Regression examples

K-Nearest Neighbour

Decision Trees

The algorithm

Basic explanation of the algorithm

Algorithms addressed in Noel s Lecture

The ID3 algorithm

The C5.0 algorithm

Example in RapidMiner

Random Forests

https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s

Naive Bayes

Probability

Independent and dependent events

Mutually exclusive and collectively exhaustive

Marginal probability

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution

Joint Probability

Conditional probability

Kolmogorov definition of Conditional probability

Bayes s theorem

Likelihood and Marginal Likelihood

Prior Probability

Posterior Probability

Applying Bayes' Theorem

Scenario 1 - A single feature

Scenario 2 - Class-conditional independence

Scenario 3 - Laplace Estimator

Naïve Bayes - Numeric Features

RapidMiner Examples

Perceptrons - Neural Networks and Support Vector Machines

Boosting

Gradient boosting

K Means Clustering

Clustering class of the Noel course

RapidMiner example 1

Principal Component Analysis PCA

Association Rules - Market Basket Analysis

Association Rules example in RapidMiner

Time Series Analysis

Text Analytics / Mining

Model Evaluation

Why evaluate models

Evaluation of regression models

Evaluation of classification models

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.

Working on - Job search	Another	Another
Entrevista Classic questions CV_-_Skills_and_Qualifications_1 Podría mejorar the CGGVeritas experience and the IDG experience (Adding something related to Data or the deputy Team leader) Podría también mejorar un poco the profile description CV Data science All the Python for data science .ipynb Descriptive Data Analysis Second example of Kurtosis Data Science Programming StockMarketSimulator-Java1 code StockMarketSimulator-Python code Web development example1 code ---- ZooManagementSystem Class diagram ZooManagementSystem-Java1 code ZooManagementSystem-Python1 code Object-Oriented_Concepts_and_Constructs JavaScript & React Database BookDB implementation example (.ipynb) ---- LaboratoryDB design and implementation example Databases	Level2 Level3 Level4 Level5 Level5	Level2 Level3 Level4 Level5 Level5

@@ Line 1: / Line 1: @@
+{{Sidebar}}
+<html><buttonclass="averte" onclick="aver()">aver</button></html>
+<html>
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
+<script>
+function aver() {
+  link = "http://wiki.sinfronteras.ws/index.php?title=P%C3%A1gina_de_pruebas_3+&+action=edit"
+  link2 = link.replace("amp;","")
+  window.location = link2
+  sleep(2);
+  window.document.getElementById('firstHeading').style.color = "red"
+}
+$(document).ready( function() {
+    $('#totalItems, #enteredItems').keyup(function(){
+        window.document.getElementById('firstHeading').style.color = "red"
+    });
+    window.document.getElementById('firstHeading').style.color = "red"
+});
+</script>
+</html>
+<br />
+==Projects portfolio==
+<br />
+==Data Analytics courses==
+<br />
+==Possible sources of data==
+<br />
+==What is data==
+<br />
+===Qualitative vs quantitative data===
+<br />
+====Discrete and continuous data====
+<br />
+===Structured vs Unstructured data===
+<br />
+===Data Levels and Measurement===
+<br />
+===What is an example===
+<br />
+===What is a dataset===
+<br />
+===What is Metadata===
+<br />
+==What is Data Science==
+<br />
+===Supervised Learning===
+<br />
+===Unsupervised Learning===
+<br />
+===Reinforcement Learning===
+<br />
+==Some real-world examples of big data analysis==
+<br />
+==Statistic==
+<br />
+==Descriptive Data Analysis==
+<br />
+===Central tendency===
+<br />
+====Mean====
+<br />
+=====When not to use the mean=====
+<br />
+====Median====
+<br />
+====Mode====
+<br />
+====Skewed Distributions and the Mean and Median====
+<br />
+====Summary of when to use the mean, median and mode====
+measures-central-tendency-mean-mode-median-faqs.php
+<br />
+===Measures of Variation===
+<br />
+====Range====
+<br />
+====Quartile====
+<br />
+====Box Plots====
+<br />
+====Variance====
+<br />
+====Standard Deviation====
+<br />
+==== Z Score ====
+<br />
+===Shape of Distribution===
+<br />
+====Probability distribution====
+<br />
+=====The Normal Distribution=====
+<br />
+====Histograms====
 <br />
-{| class="wikitable"
+====Skewness====
-|+
-|-
-! colspan="3" style="vertical-align:top;" |Regression Error:
-The evaluation of regression models involves calculation on the errors (also known as residuals or innovations).
-Errors are the differences between the predicted values, represented as <math>\hat{y}</math> and the actual values, denoted <math>y</math>.
+<br />
-{|
+====Kurtosis====
-![[File:Regression_errors.png|300px|center|link=Special:FilePath/Regression_errors.png]]
-!
-{| class="wikitable"
-!<math>y</math>
-!<math>\hat{y}</math>
-!<math>\left \vert y - \hat{y} \right \vert</math>
-|-
-|5
-|6
-|1
-|-
-|6.5
-|5.5
-|1
-|-
-|8
-|9.5
-|1.5
-|-
-|8
-|6
-|2
-|-
-|7.5
-|10
-|2.5
-|}
-|}
-|-
-! style="vertical-align:top;" |<h5 style="text-align:left; vertical-align:top">Mean Absolute Error - MAE</h5>
-|The Mean Absolute Error (MAE) is calculated taking the sum of the absolute differences between the actual and predicted values (i.e. the errors with the sign removed) and multiplying it by the reciprocal of the number of observations.
-Note that the value returned by the equation is dependent on the range of the values in the dependent variable. it ks '''scale dependent'''.
-MAE is preferred by many as the evaluation metric of choice as it gives equal weight to all errors, irrespective of their magnitude.
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>
-MAE = \frac{}{} \sum_{i=1}^{n} \left \vert Y_i - \hat{Y}_i \right \vert
-</math>
-<div class="mw-collapsible-content">
-<br /><math>
-Accuracy = \frac{72 + 24}{72 + 24 + 16 + 6} = \frac{96}{120} = 0.8
-</math>
-</div>
-</div>
-|-
-! style="vertical-align:top;" |<h5 style="text-align:left; vertical-align:top">Mean Squared Error - MSE</h5>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-The Mean Squared Error (MSE) is very similar to the MAE, except that it is calculated taking the sum of the squared differences between the actual and predicted values and multiplying it by the reciprocal of the number of observations. Note that squaring the differences also removes their sign.
-<div class="mw-collapsible-content">
 <br />
-As with MAE, the value returned by the equation is dependent on the range of the values in the dependent variable. It is '''scale dependent'''.
+====Visualization of measure of variations on a Normal distribution====
-</div>
-</div>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>
-MSE = \frac{1}{n} \sum_{i=1}^n (Y_i - \hat{Y}_i)^2
-</math>
-<div class="mw-collapsible-content">
 <br />
-<math>
+==Simple and Multiple regression==
-Balanced Accuracy = \frac{\frac{72}{72 + 8} + \frac{24}{24 + 16} }{2} = \frac{0.9 + 0.6}{2} = 0.75
-</math>
-</div>
-</div>
-|-
-! style="vertical-align:top;" |<h5 style="text-align:left; vertical-align:top">Root Mean Squared Error</h5>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-The Root Mean Squared Error (MSE) is basically the same as MSE, except that it is calculated taking the square root of sum of the squared differences between the actual and predicted values and multiplying it by the reciprocal of the number of observations.
-<div class="mw-collapsible-content">
 <br />
-As with MAE and MSE, the value returned by the equation is dependent on the range of the values in the dependent variable. It is '''scale dependent'''.
+===Correlation===
-MSE and its related metric, RMSE, have been both criticized because they both give heavier weight to larger magnitude errors (outliers). However, this property may be desirable in some circumstances, where large magnitude errors are undesirable, even in small numbers.
-</div>
-</div>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>
-RMSE = \sqrt{\frac{1}{n} \sum_{i=i}^n (Y_i - \hat{Y}_i)^2 }
-</math>
-<div class="mw-collapsible-content">
 <br />
-<math>Sensitivity = \frac{72}{72 + 8} = 0.9</math>
+====Measuring Correlation====
-<math>Sencitivity = \frac{24}{24 + 16} = 0.6</math>
-</div>
-</div>
-|-
-! style="vertical-align:top;" |<h5 style="text-align:left; vertical-align:top">Mean Absolute Percentage Error</h5>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-Mean Absolute Percentage Error (MAPE) is a '''scale-independent''' measure of the performance of a regression model. It is calculated by summing the absolute value of the difference between the actual value and the predicted value, divided by the actual value. This is then multiplied by the reciprocal of the number of observations. This is then multiplied by 100 to obtain a percentage.
-<div class="mw-collapsible-content">
 <br />
-Although it offers a scale-independent measure, MAPE is not without problems:
+=====Pearson correlation coefficient - Pearson s r=====
-* It can not be employed if any of the actual values are true zero, as this would result in a division by zero error.
-* Where predicted values frequently exceed the actual values, the percentage error can exceed 100%
-* It penalizes negative errors more than positive errors, meaning that models that routinely predict below the actual values will have a higher MAPE.
-</div>
-</div>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>
-MAPE = \frac{1}{n} \sum_{i=1}^n \left \vert \frac{Y_i - \hat{Y}_i}{Y_i} \right \vert \times 100
-</math>
-<div class="mw-collapsible-content">
 <br />
-<math>Precision = \frac{72}{72 + 16} = 0.8182</math>
+=====The coefficient of determination <math>R^2</math>=====
-<math>Recall = \frac{72}{72 + 8} = 0.90</math>
+<br />
-</div>
+====Correlation <math>\neq</math> Causation====
-</div>
-|-
-! style="vertical-align:top;" |<h5 style="text-align:left; vertical-align:top">R squared</h5>
+<br />
-|
+====Testing the "generalizability" of the correlation ====
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>R^2</math>, or the Coefficient of Determination, is the ratio of the amount of variance explained by a model and the total amount of variance in the dependent variable and is the rage [0,1].
+<br />
+===Simple Linear Regression===
+<br />
+===Multiple Linear Regression===
+<br />
+===RapidMiner Linear Regression examples===
+<br />
+==K-Nearest Neighbour==
+<br />
+==Decision Trees==
+<br />
+===The algorithm===
+<br />
+====Basic explanation of the algorithm====
+<br />
+====Algorithms addressed in Noel s Lecture====
+<br />
+=====The ID3 algorithm=====
+<br />
+=====The C5.0 algorithm=====
+<br />
+===Example in RapidMiner===
+<br />
+==Random Forests==
+https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s
+<br />
+==Naive Bayes==
+<br />
+===Probability===
+<br />
+===Independent and dependent events===
+<br />
+===Mutually exclusive and collectively exhaustive===
+<br />
+===Marginal probability===
+The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution
+<br >
+===Joint Probability===
+<br />
+===Conditional probability===
+<br />
+====Kolmogorov definition of Conditional probability====
+<br />
+====Bayes s theorem====
+<br />
+=====Likelihood and Marginal Likelihood=====
+<br />
+=====Prior Probability=====
+<br />
+=====Posterior Probability=====
+<br />
+===Applying Bayes' Theorem===
+<br />
+====Scenario 1 - A single feature====
+<br />
+====Scenario 2 - Class-conditional independence====
+<br />
+====Scenario 3 - Laplace Estimator====
+<br />
+===Naïve Bayes -  Numeric Features===
+<br />
+===RapidMiner Examples===
+<br />
+==Perceptrons - Neural Networks and Support Vector Machines==
+<br />
+==Boosting==
+<br />
+===Gradient boosting===
+<br />
+==K Means Clustering==
+<br />
+===Clustering class of the Noel course===
+<br />
+====RapidMiner example 1====
+<br />
+==Principal Component Analysis PCA==
+<br />
+==Association Rules - Market Basket Analysis==
+<br />
+===Association Rules example in RapidMiner===
+<br />
+==Time Series Analysis==
+<br />
+==[[Text Analytics|Text Analytics / Mining]]==
+<br />
+==Model Evaluation==
+<br />
+===Why evaluate models===
+<br />
+===Evaluation of regression models===
+<br />
+===Evaluation of classification models===
+<br />
+===References===
+Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.
+<br />
+==[[Python for Data Science]]==
+<br />
+===[[NumPy and Pandas]]===
+<br />
+===[[Data Visualization with Python]]===
+<br />
+===[[Text Analytics in Python]]===
+<br />
+===[[Dash - Plotly]]===
+<br />
+===[[Scrapy]]===
+<br />
+==[[R]]==
+<br />
+===[[R tutorial]]===
+<br />
+==[[RapidMiner]]==
+<br />
+==Assessments==
+<br />
+===Diploma in Predictive Data Analytics assessment===
-Values close to 1 indicate that a model will be better at predicting the dependent variable.
-<div class="mw-collapsible-content">
 <br />
-R squared is calculated by summing up the squared differences between the predicted values and the actual values (the top part of the equation) and dividing that by the squared deviation of the actual values from their mean (the bottom part of the equation). The resulting value is then subtracted from 1.
+==Notas==
-A high <math>R^2</math> is not necessarily an indicator of a good model, as it could be the result of overfitting.
-</div>
-</div>
-|
-<div class="mw-collapsible mw-collapsed" data-expandtext="+/-" data-collapsetext="+/-">
-<math>
-R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
-= 1 - \frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{\sum_{i=1}^n(y_i - \bar{y})^2}
-</math>
-<div class="mw-collapsible-content">
 <br />
-<math>
+==References==
-F1Score = \frac{2 \times 72}{2 \times 72 + 16 + 8} = 0.8571
-</math>
-</div>
-</div>
-|}
 <br />