Difference between revisions of "Página de pruebas 3"

Latest revision as of 21:50, 10 March 2021

aver

1 Projects portfolio
2 Data Analytics courses
3 Possible sources of data
4 What is data
- 4.1 Qualitative vs quantitative data
  - 4.1.1 Discrete and continuous data
- 4.2 Structured vs Unstructured data
- 4.3 Data Levels and Measurement
- 4.4 What is an example
- 4.5 What is a dataset
- 4.6 What is Metadata
5 What is Data Science
- 5.1 Supervised Learning
- 5.2 Unsupervised Learning
- 5.3 Reinforcement Learning
6 Some real-world examples of big data analysis
7 Statistic
8 Descriptive Data Analysis
- 8.1 Central tendency
- 8.2 Measures of Variation
- 8.3 Shape of Distribution
9 Simple and Multiple regression
- 9.1 Correlation
- 9.2 Simple Linear Regression
- 9.3 Multiple Linear Regression
- 9.4 RapidMiner Linear Regression examples
10 K-Nearest Neighbour
11 Decision Trees
- 11.1 The algorithm
  - 11.1.1 Basic explanation of the algorithm
  - 11.1.2 Algorithms addressed in Noel s Lecture
    - 11.1.2.1 The ID3 algorithm
    - 11.1.2.2 The C5.0 algorithm
- 11.2 Example in RapidMiner
12 Random Forests
13 Naive Bayes
- 13.1 Probability
- 13.2 Independent and dependent events
- 13.3 Mutually exclusive and collectively exhaustive
- 13.4 Marginal probability
- 13.5 Joint Probability
- 13.6 Conditional probability
  - 13.6.1 Kolmogorov definition of Conditional probability
  - 13.6.2 Bayes s theorem
- 13.7 Applying Bayes' Theorem
- 13.8 Naïve Bayes - Numeric Features
- 13.9 RapidMiner Examples
14 Perceptrons - Neural Networks and Support Vector Machines
15 Boosting
- 15.1 Gradient boosting
16 K Means Clustering
- 16.1 Clustering class of the Noel course
  - 16.1.1 RapidMiner example 1
17 Principal Component Analysis PCA
18 Association Rules - Market Basket Analysis
- 18.1 Association Rules example in RapidMiner
19 Time Series Analysis
20 Text Analytics / Mining
21 Model Evaluation
- 21.1 Why evaluate models
- 21.2 Evaluation of regression models
- 21.3 Evaluation of classification models
- 21.4 References
22 Python for Data Science
- 22.1 NumPy and Pandas
- 22.2 Data Visualization with Python
- 22.3 Text Analytics in Python
- 22.4 Dash - Plotly
- 22.5 Scrapy
23 R
- 23.1 R tutorial
24 RapidMiner
25 Assessments
- 25.1 Diploma in Predictive Data Analytics assessment
26 Notas
27 References

Projects portfolio

Data Analytics courses

Possible sources of data

What is data

Qualitative vs quantitative data

Discrete and continuous data

Structured vs Unstructured data

Data Levels and Measurement

What is an example

What is a dataset

What is Metadata

What is Data Science

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Some real-world examples of big data analysis

Statistic

Descriptive Data Analysis

Central tendency

Mean

When not to use the mean

Median

Mode

Skewed Distributions and the Mean and Median

Summary of when to use the mean, median and mode

measures-central-tendency-mean-mode-median-faqs.php

Measures of Variation

Range

Quartile

Box Plots

Variance

Standard Deviation

Z Score

Shape of Distribution

Probability distribution

The Normal Distribution

Histograms

Skewness

Kurtosis

Visualization of measure of variations on a Normal distribution

Simple and Multiple regression

Correlation

Measuring Correlation

Pearson correlation coefficient - Pearson s r

The coefficient of determination $R^{2}$

Correlation $\neq$ Causation

Testing the "generalizability" of the correlation

Simple Linear Regression

Multiple Linear Regression

RapidMiner Linear Regression examples

K-Nearest Neighbour

Decision Trees

The algorithm

Basic explanation of the algorithm

Algorithms addressed in Noel s Lecture

The ID3 algorithm

The C5.0 algorithm

Example in RapidMiner

Random Forests

https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s

Naive Bayes

Probability

Independent and dependent events

Mutually exclusive and collectively exhaustive

Marginal probability

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution

Joint Probability

Conditional probability

Kolmogorov definition of Conditional probability

Bayes s theorem

Likelihood and Marginal Likelihood

Prior Probability

Posterior Probability

Applying Bayes' Theorem

Scenario 1 - A single feature

Scenario 2 - Class-conditional independence

Scenario 3 - Laplace Estimator

Naïve Bayes - Numeric Features

RapidMiner Examples

Perceptrons - Neural Networks and Support Vector Machines

Boosting

Gradient boosting

K Means Clustering

Clustering class of the Noel course

RapidMiner example 1

Principal Component Analysis PCA

Association Rules - Market Basket Analysis

Association Rules example in RapidMiner

Time Series Analysis

Text Analytics / Mining

Model Evaluation

Why evaluate models

Evaluation of regression models

Evaluation of classification models

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.

Working on - Job search	Another	Another
Entrevista Classic questions CV_-_Skills_and_Qualifications_1 Podría mejorar the CGGVeritas experience and the IDG experience (Adding something related to Data or the deputy Team leader) Podría también mejorar un poco the profile description CV Data science All the Python for data science .ipynb Descriptive Data Analysis Second example of Kurtosis Data Science Programming StockMarketSimulator-Java1 code StockMarketSimulator-Python code Web development example1 code ---- ZooManagementSystem Class diagram ZooManagementSystem-Java1 code ZooManagementSystem-Python1 code Object-Oriented_Concepts_and_Constructs JavaScript & React Database BookDB implementation example (.ipynb) ---- LaboratoryDB design and implementation example Databases	Level2 Level3 Level4 Level5 Level5	Level2 Level3 Level4 Level5 Level5

@@ Line 1: / Line 1: @@
-This is the dashboard we have just built to analyze laptops' data from Amazon.
+{{Sidebar}}
-It is currently displaying a dataset that includes laptops of different brands and series. Remember that this is a dataset that we have built by scraping data from Amazon; but the module that is intended to Load a new dataset is not ready. We are currently working on it. So when this module is ready, the application is gonna be able to scrape data from Amazon, from a page like this, in real-time.
+<html><buttonclass="averte" onclick="aver()">aver</button></html>
-The other page in which we are currently working on is the Sentiment analysis page. That is also a very impotant topic for the application but it't no ready yet.
+<html>
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
+<script>
+function aver() {
+  link = "http://wiki.sinfronteras.ws/index.php?title=P%C3%A1gina_de_pruebas_3+&+action=edit"
+  link2 = link.replace("amp;","")
+  window.location = link2
+  sleep(2);
+  window.document.getElementById('firstHeading').style.color = "red"
+}
+$(document).ready( function() {
+    $('#totalItems, #enteredItems').keyup(function(){
+        window.document.getElementById('firstHeading').style.color = "red"
+    });
+    window.document.getElementById('firstHeading').style.color = "red"
+});
+</script>
+</html>
-So before showing the main features of the application, I wanted to show you how we are scraping the data from Amazon.
+<br />
-So this is the script we built with a Python framework called Scrapy, I'm gonna run it, so here we are scraping the data from amazon, it is quite fast, here we are just scraping about 35 laptops
+==Projects portfolio==
-It has created this file, let's see the file. It a JSON file with the detail of 35 computers, so we can see all the details of the laptps, the technical details and the reviews. So when this module is ready, it will run the script we have just seen and scrape the data from amazon in real-time.
-So let's talk about the home page. So the home page has been designed to allow the user to discover and visualize the data. So you are able to customize the data you want to display by selecting the brand, series, and the range of prices.
+<br />
+==Data Analytics courses==
-So let's say that you want to analyze all the brands at the same time. When you select a brand it automatically selects all the series for this brand but then you can filter the brands you want. It takes a bit because is processing the data
-Or I don't know maybe we are interested in expensive laptops. So let's select computers over 1000 dolars for example. You can see that now the application only proposes computers in this range of prices, there aren't a lot actually. You can see that they are gaming laptops which are usually expensive.
+<br />
+==Possible sources of data==
-But in other cases is better to analyze only one brand. Let's, for example, analyze Acer computers.
-The firs charts that we have included is to compare average customer reviews and prices, You can see that the blue bars show the values for all items, that means for all the series of the brand,
+<br />
-but we have also included a red bar that displays the values for selected items only. For example, is you want to know the price of a specific computer, so let's select for example this one... you can see that this is a very expensive one.. 1088$ and that has actually a very good customer review score of 4.3, so it's apparently a very good computer
+==What is data==
-The second panel we have included is a Bubble chart that shows the Average customer reviews vs. Prices.
-We have included this chart because actually one of the main faatures that can be analyzed when talking about sales, is the relationship between price and customer satisfaction. So with this kind of chart, we would try to determine a trend to establish a relationship between price and customer review.
+<br />
+===Qualitative vs quantitative data===
-One nice feature of these charts is that you can select the brand that you want to visualize. If you click in one brand this is going to be excluded the brand clicked from the chart, but if you double click, only the brand clicked will be shown
-The other panel we have included in to visualize the most frequent words in customer reviews. Word clouds provide a nice visualization of the most frequent word. But if you need to be most precise, you can use the word count chart that provides the exact number of times a word has been mentioned.
+<br />
+====Discrete and continuous data====
-So let's for example analyse the information provided by the wordcloud. We can see that some of the most frequent words in customer reviews are:
-Words like good of grate indicate that it is a computer that users have liked, but we already knew that customers liked this computer by analyzing the average customer reviews score that is 4.3.
+<br />
+===Structured vs Unstructured data===
-But a information that we didn't know and it's provided by the wordcloud with words like Gamming or game it that this is a Gaming laptop.
-We finally found the word “Screen”. which is probably the word that provides the most important information from this word cloud. We can see that users are talking about the screen of this laptop, but we can not be sure if they are saying something good or bad about the screen. We can actually infer that is something good based on the good customer reviews score or based in the other words that are present in the word cloud that provided a positive sentiment like great and good but in the end we cannot be sure about what customers are saying about the screen. This is why there are other analyses that can bring more information, like sentiment analysis, which is the topic we are currently working on.
+<br />
+===Data Levels and Measurement===
+<br />
+===What is an example===
+<br />
+===What is a dataset===
+<br />
+===What is Metadata===
+<br />
+==What is Data Science==
+<br />
+===Supervised Learning===
+<br />
+===Unsupervised Learning===
+<br />
+===Reinforcement Learning===
+<br />
+==Some real-world examples of big data analysis==
+<br />
+==Statistic==
+<br />
+==Descriptive Data Analysis==
+<br />
+===Central tendency===
+<br />
+====Mean====
+<br />
+=====When not to use the mean=====
+<br />
+====Median====
+<br />
+====Mode====
+<br />
+====Skewed Distributions and the Mean and Median====
+<br />
+====Summary of when to use the mean, median and mode====
+measures-central-tendency-mean-mode-median-faqs.php
+<br />
+===Measures of Variation===
+<br />
+====Range====
+<br />
+====Quartile====
+<br />
+====Box Plots====
+<br />
+====Variance====
+<br />
+====Standard Deviation====
+<br />
+==== Z Score ====
+<br />
+===Shape of Distribution===
+<br />
+====Probability distribution====
+<br />
+=====The Normal Distribution=====
+<br />
+====Histograms====
+<br />
+====Skewness====
+<br />
+====Kurtosis====
+<br />
+====Visualization of measure of variations on a Normal distribution====
+<br />
+==Simple and Multiple regression==
+<br />
+===Correlation===
+<br />
+====Measuring Correlation====
+<br />
+=====Pearson correlation coefficient - Pearson s r=====
+<br />
+=====The coefficient of determination <math>R^2</math>=====
+<br />
+====Correlation <math>\neq</math> Causation====
+<br />
+====Testing the "generalizability" of the correlation ====
+<br />
+===Simple Linear Regression===
+<br />
+===Multiple Linear Regression===
+<br />
+===RapidMiner Linear Regression examples===
+<br />
+==K-Nearest Neighbour==
+<br />
+==Decision Trees==
+<br />
+===The algorithm===
+<br />
+====Basic explanation of the algorithm====
+<br />
+====Algorithms addressed in Noel s Lecture====
+<br />
+=====The ID3 algorithm=====
+<br />
+=====The C5.0 algorithm=====
+<br />
+===Example in RapidMiner===
+<br />
+==Random Forests==
+https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s
+<br />
+==Naive Bayes==
+<br />
+===Probability===
+<br />
+===Independent and dependent events===
+<br />
+===Mutually exclusive and collectively exhaustive===
+<br />
+===Marginal probability===
+The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution
+<br >
+===Joint Probability===
+<br />
+===Conditional probability===
+<br />
+====Kolmogorov definition of Conditional probability====
+<br />
+====Bayes s theorem====
+<br />
+=====Likelihood and Marginal Likelihood=====
+<br />
+=====Prior Probability=====
+<br />
+=====Posterior Probability=====
+<br />
+===Applying Bayes' Theorem===
+<br />
+====Scenario 1 - A single feature====
+<br />
+====Scenario 2 - Class-conditional independence====
+<br />
+====Scenario 3 - Laplace Estimator====
+<br />
+===Naïve Bayes -  Numeric Features===
+<br />
+===RapidMiner Examples===
+<br />
+==Perceptrons - Neural Networks and Support Vector Machines==
+<br />
+==Boosting==
+<br />
+===Gradient boosting===
+<br />
+==K Means Clustering==
+<br />
+===Clustering class of the Noel course===
+<br />
+====RapidMiner example 1====
+<br />
+==Principal Component Analysis PCA==
+<br />
+==Association Rules - Market Basket Analysis==
+<br />
+===Association Rules example in RapidMiner===
+<br />
+==Time Series Analysis==
+<br />
+==[[Text Analytics|Text Analytics / Mining]]==
+<br />
+==Model Evaluation==
+<br />
+===Why evaluate models===
+<br />
+===Evaluation of regression models===
+<br />
+===Evaluation of classification models===
+<br />
+===References===
+Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.
+<br />
+==[[Python for Data Science]]==
+<br />
+===[[NumPy and Pandas]]===
+<br />
+===[[Data Visualization with Python]]===
+<br />
+===[[Text Analytics in Python]]===
+<br />
+===[[Dash - Plotly]]===
+<br />
+===[[Scrapy]]===
+<br />
+==[[R]]==
+<br />
+===[[R tutorial]]===
+<br />
+==[[RapidMiner]]==
+<br />
+==Assessments==
+<br />
+===Diploma in Predictive Data Analytics assessment===
+<br />
+==Notas==
+<br />
+==References==
+<br />