Difference between revisions of "Página de pruebas 3"

Latest revision as of 21:50, 10 March 2021

aver

1 Projects portfolio
2 Data Analytics courses
3 Possible sources of data
4 What is data
- 4.1 Qualitative vs quantitative data
  - 4.1.1 Discrete and continuous data
- 4.2 Structured vs Unstructured data
- 4.3 Data Levels and Measurement
- 4.4 What is an example
- 4.5 What is a dataset
- 4.6 What is Metadata
5 What is Data Science
- 5.1 Supervised Learning
- 5.2 Unsupervised Learning
- 5.3 Reinforcement Learning
6 Some real-world examples of big data analysis
7 Statistic
8 Descriptive Data Analysis
- 8.1 Central tendency
- 8.2 Measures of Variation
- 8.3 Shape of Distribution
9 Simple and Multiple regression
- 9.1 Correlation
- 9.2 Simple Linear Regression
- 9.3 Multiple Linear Regression
- 9.4 RapidMiner Linear Regression examples
10 K-Nearest Neighbour
11 Decision Trees
- 11.1 The algorithm
  - 11.1.1 Basic explanation of the algorithm
  - 11.1.2 Algorithms addressed in Noel s Lecture
    - 11.1.2.1 The ID3 algorithm
    - 11.1.2.2 The C5.0 algorithm
- 11.2 Example in RapidMiner
12 Random Forests
13 Naive Bayes
- 13.1 Probability
- 13.2 Independent and dependent events
- 13.3 Mutually exclusive and collectively exhaustive
- 13.4 Marginal probability
- 13.5 Joint Probability
- 13.6 Conditional probability
  - 13.6.1 Kolmogorov definition of Conditional probability
  - 13.6.2 Bayes s theorem
- 13.7 Applying Bayes' Theorem
- 13.8 Naïve Bayes - Numeric Features
- 13.9 RapidMiner Examples
14 Perceptrons - Neural Networks and Support Vector Machines
15 Boosting
- 15.1 Gradient boosting
16 K Means Clustering
- 16.1 Clustering class of the Noel course
  - 16.1.1 RapidMiner example 1
17 Principal Component Analysis PCA
18 Association Rules - Market Basket Analysis
- 18.1 Association Rules example in RapidMiner
19 Time Series Analysis
20 Text Analytics / Mining
21 Model Evaluation
- 21.1 Why evaluate models
- 21.2 Evaluation of regression models
- 21.3 Evaluation of classification models
- 21.4 References
22 Python for Data Science
- 22.1 NumPy and Pandas
- 22.2 Data Visualization with Python
- 22.3 Text Analytics in Python
- 22.4 Dash - Plotly
- 22.5 Scrapy
23 R
- 23.1 R tutorial
24 RapidMiner
25 Assessments
- 25.1 Diploma in Predictive Data Analytics assessment
26 Notas
27 References

Projects portfolio

Data Analytics courses

Possible sources of data

What is data

Qualitative vs quantitative data

Discrete and continuous data

Structured vs Unstructured data

Data Levels and Measurement

What is an example

What is a dataset

What is Metadata

What is Data Science

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Some real-world examples of big data analysis

Statistic

Descriptive Data Analysis

Central tendency

Mean

When not to use the mean

Median

Mode

Skewed Distributions and the Mean and Median

Summary of when to use the mean, median and mode

measures-central-tendency-mean-mode-median-faqs.php

Measures of Variation

Range

Quartile

Box Plots

Variance

Standard Deviation

Z Score

Shape of Distribution

Probability distribution

The Normal Distribution

Histograms

Skewness

Kurtosis

Visualization of measure of variations on a Normal distribution

Simple and Multiple regression

Correlation

Measuring Correlation

Pearson correlation coefficient - Pearson s r

The coefficient of determination $R^{2}$

Correlation $\neq$ Causation

Testing the "generalizability" of the correlation

Simple Linear Regression

Multiple Linear Regression

RapidMiner Linear Regression examples

K-Nearest Neighbour

Decision Trees

The algorithm

Basic explanation of the algorithm

Algorithms addressed in Noel s Lecture

The ID3 algorithm

The C5.0 algorithm

Example in RapidMiner

Random Forests

https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s

Naive Bayes

Probability

Independent and dependent events

Mutually exclusive and collectively exhaustive

Marginal probability

The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution

Joint Probability

Conditional probability

Kolmogorov definition of Conditional probability

Bayes s theorem

Likelihood and Marginal Likelihood

Prior Probability

Posterior Probability

Applying Bayes' Theorem

Scenario 1 - A single feature

Scenario 2 - Class-conditional independence

Scenario 3 - Laplace Estimator

Naïve Bayes - Numeric Features

RapidMiner Examples

Perceptrons - Neural Networks and Support Vector Machines

Boosting

Gradient boosting

K Means Clustering

Clustering class of the Noel course

RapidMiner example 1

Principal Component Analysis PCA

Association Rules - Market Basket Analysis

Association Rules example in RapidMiner

Time Series Analysis

Text Analytics / Mining

Model Evaluation

Why evaluate models

Evaluation of regression models

Evaluation of classification models

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.

Working on - Job search	Another	Another
Entrevista Classic questions CV_-_Skills_and_Qualifications_1 Podría mejorar the CGGVeritas experience and the IDG experience (Adding something related to Data or the deputy Team leader) Podría también mejorar un poco the profile description CV Data science All the Python for data science .ipynb Descriptive Data Analysis Second example of Kurtosis Data Science Programming StockMarketSimulator-Java1 code StockMarketSimulator-Python code Web development example1 code ---- ZooManagementSystem Class diagram ZooManagementSystem-Java1 code ZooManagementSystem-Python1 code Object-Oriented_Concepts_and_Constructs JavaScript & React Database BookDB implementation example (.ipynb) ---- LaboratoryDB design and implementation example Databases	Level2 Level3 Level4 Level5 Level5	Level2 Level3 Level4 Level5 Level5

@@ Line 1: / Line 1: @@
-==CA - Network design for high availability==
+{{Sidebar}}
-[[:File:Network_design_for_high_availability-CA_description.pdf]]
-[[:File:Network_design_for_high_availability-PacketTracerFile.zip]]
+<html><buttonclass="averte" onclick="aver()">aver</button></html>
+<html>
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
+<script>
+function aver() {
+  link = "http://wiki.sinfronteras.ws/index.php?title=P%C3%A1gina_de_pruebas_3+&+action=edit"
+  link2 = link.replace("amp;","")
+  window.location = link2
+  sleep(2);
+  window.document.getElementById('firstHeading').style.color = "red"
+}
+$(document).ready( function() {
+    $('#totalItems, #enteredItems').keyup(function(){
+        window.document.getElementById('firstHeading').style.color = "red"
+    });
+    window.document.getElementById('firstHeading').style.color = "red"
+});
+</script>
+</html>
 <br />
-===Group justification report===
+==Projects portfolio==
-[[:File:Network_design_for_high_availability-GroupJustificationReport.pdf]]
-To make the decision of the more suitable network design for the new data center of Dublin Computer School (DCS), we consider the following specification provided in the description of the project:
+<br />
+==Data Analytics courses==
+<br />
+==Possible sources of data==
+<br />
+==What is data==
+<br />
+===Qualitative vs quantitative data===
+<br />
+====Discrete and continuous data====
+<br />
+===Structured vs Unstructured data===
+<br />
+===Data Levels and Measurement===
+<br />
+===What is an example===
+<br />
+===What is a dataset===
+<br />
+===What is Metadata===
+<br />
+==What is Data Science==
+<br />
+===Supervised Learning===
+<br />
+===Unsupervised Learning===
+<br />
+===Reinforcement Learning===
+<br />
+==Some real-world examples of big data analysis==
+<br />
+==Statistic==
+<br />
+==Descriptive Data Analysis==
-*The fact that the business is home grown in Dublin and the organization is expanding rapidly both in Dublin and in many sites around Ireland,
+<br />
-*The growth expected by the Infrastructure Manager,
+===Central tendency===
-*The need of a new Moodle system and a CRM system for the Marketing department,
-We can see that Dublin Computer School (DCS) is, without any doubt, expecting a significant growth for the next years. Therefore, based on this fact, and after evaluating the budget, we decided to go for an ambitious design that ensure not only availability and reliability but also scalability of the network. We have to take into consideration that this data center is going to be used for all the sites around Ireland, where the company is also expecting growing.
+<br />
+====Mean====
 <br />
-====Dublin====
+=====When not to use the mean=====
-As we have already mentioned, in the Dublin LAN we are going to place the new data center; but also, this network have to be designed to provided end user devices communication (Wired and Wireless).
-In general, our design is based on the concepts described in the «Campus LAN and Wireless LAN Design Guide» of Cisco [\cite]. We built a hierarchical Three-Tier Design: Core, Distribution and Access layers.
+<br />
+====Median====
-At the beginning of the project, we though a Two-Tier Design was the most suitable option, but after consider many factors, the expected growing of the network tipped the scale in favor of the Three-Tier Design (See Figure \ref).
+<br />
+====Mode====
-In Figure see we show the design for the Dublin network. Our design is composed by:
-*A layer 3 switch in the core.
+<br />
-*Two layer 3 distribution switches.
+====Skewed Distributions and the Mean and Median====
-*Four access switches.
 <br />
-=====VLANs=====
+====Summary of when to use the mean, median and mode====
-We created 4 VLANs:
+measures-central-tendency-mean-mode-median-faqs.php
-*VLAN10 (Student)
-*VLAN20 (Marketing)
-*VLAN30 (HR)
-*VLAN40 (Finance)
-*VLAN99 (Management)
+<br />
+===Measures of Variation===
-We perform the following settings:
-*We configure the management interface (VLAN99) in every switch with an IP address
+<br />
-*802.1Q Trunk Between the Switches (Manually configuration)
+====Range====
-*In the access switches, we configured access ports for the end user devices and server and assigned VLANs to the correct switch interfaces (See Figure XX). The servers interfaces were assigned to the Management VLAN99.
 <br />
-=====Rapid spanning tree between switches=====
+====Quartile====
-In our implementation, we made sure root bridge is in a suitable position. To do so, we manually configuring priority to influence the root election:
-*We placed the root bridge in to core of our design for all VLANs
-*We placed the root secondary in the distribution level of the network and configure Load Balancing sharing the root secondary between the 2 distribution switches.
-<syntaxhighlight>
+<br />
-MS1(config)#spanning-tree vlan 1,10,20,30,40,99 root primary
+====Box Plots====
-MS2(config)#spanning-tree vlan 1,10,20 root secondary
-MS3(config)#spanning-tree vlan 30,40,99 root secondary
-</syntaxhighlight>
-With this configuration, RSTP is avoiding redundant by blocking port mostly in the access layer.
-Because we did load balancing sharing the root secondary between the 2 distribution switches, and because we are doing «Per-Vlan rapid spanning tree mode», the port blocked would depend on the VLAN. For example, if we consider '''S4'''. The rapid spanning tree protocol is blocking the '''F0/18''' port for the VLANs where the '''root secondary''' is '''MS2'''. However, for the VLANs where the '''root secondary''' is '''MS3''', rapid spanning is blocking the '''Fa0/14''' port. That is why all the ports are shown in green in our network (none of the port in blocked for all VLANs) (See Figure XX).
+<br />
+====Variance====
+<br />
+====Standard Deviation====
 <br />
-=====Configuring 802 1Q trunk-based inter-VLAN routing=====
+==== Z Score ====
-No key decisions had to be taken in this part, we just configure 802 1Q trunk-based inter-VLAN routing to provide routing for our multiple VLANs. You can verified all IP addresses and interfaces configured in the Addressing table.
 <br />
-=====Wireless access for a GUEST wifi network=====
+===Shape of Distribution===
-The GUEST wifi network was configured using a wireless rourters attached to one of the access switches. In Figure XX we show the configuration performed. We attached the wireless router to VLAN10 and created a new wifi network. A DHCP server was also enable in the wireless router so the devices were are able to request an IP via DHCP (Figure xx)
-Some security configurations were also performed:
+<br />
-* We configured a passphrase for the GUEST network: duboffice2019
+====Probability distribution====
-* Enable encryption.
 <br />
-====WAN====
+=====The Normal Distribution=====
-We created a WAN network connecting a total of 5 sites: Dublin, Galway, Limerick, Cork and Sligo. You can see the IP addresses in the Addressing table. They corespondent to the 10.0.0.0 network.
-We make sure to include redundant paths between Dublin and Galway, which is the main concern of our WAN.
+<br />
+====Histograms====
-We configured OSPF Routing Protocol. OSPF is a widely used protocols with one of the lower Administrative Distance (110). That is why, in case of multiple routing protocols configured in a router (such as RIP or IS-IS), OSPF would be the defauld one and used to route packets. OSPF is able to determine the shortest path to a destination by adding the costs of each path to reach a destination.
+<br />
+====Skewness====
 <br />
-====Addressing table====
+====Kurtosis====
-{| class="wikitable" style="margin: 0 auto;"
-|+
-!
-!Device
-!Interface
-!IP Address
-!Subnet Mask
-!Default Gateway
-!Comments
-|-
-| rowspan="28" |'''Dublin'''
-| rowspan="9" |'''R1'''
-|G0/1.1
-|172.16.1.1
-|255.255.255.0
-|
-|
-|-
-|G0/1.10
-|172.16.10.1
-|255.255.255.0
-|
-|
-|-
-|G0/1.20
-|172.16.20.1
-|255.255.255.0
-|
-|
-|-
-|G0/1.30
-|172.16.30.1
-|255.255.255.0
-|
-|
-|-
-|G0/1.40
-|172.16.40.1
-|255.255.255.0
-|
-|
-|-
-|G0/.1.99
-|172.16.99.1
-|255.255.255.0
-|
-|
-|-
-|S0/0/0
-DCE
-|10.16.1.1
-|255.255.255.252
-|
-|
-|-
-|S0/0/1
-|10.16.2.1
-|255.255.255.252
-|
-|
-|-
-|S0/1/0
-DCE
-|10.16.3.1
-|255.255.255.252
-|
-|
-|-
-|'''MS1'''
-|VLAN 99
-|172.16.99.11
-|255.255.255.0
-|
-|Root primary for all VLANs
-|-
-|'''MS2'''
-|VLAN 99
-|172.16.99.12
-|255.255.255.0
-|
-|Root secondary for VLAN  1, 10, 20
-|-
-|'''MS3'''
-|VLAN 99
-|172.16.99.13
-|255.255.255.0
-|
-|Root secondary for VLAN 30, 40, 99
-|-
-|'''S1'''
-|VLAN 99
-|172.16.99.21
-|255.255.255.0
-|
-|
-|-
-|'''S2'''
-|VLAN 99
-|172.16.99.22
-|255.255.255.0
-|
-|
-|-
-|'''S3'''
-|VLAN 99
-|172.16.99.23
-|255.255.255.0
-|
-|
-|-
-|'''S4'''
-|VLAN 99
-|172.16.99.24
-|255.255.255.0
-|
-|
-|-
-|'''Server1'''
-|G0
-(vlan99)
-|172.16.99.80
-|255.255.255.0
-|172.16.99.1
-|
-|-
-|
-|G1
-(vlan99)
+<br />
-|
+====Visualization of measure of variations on a Normal distribution====
-|
-|
-|
+<br />
-|-
+==Simple and Multiple regression==
-|'''Server2'''
-|G0
-(vlan99)
+<br />
-|172.16.99.82
+===Correlation===
-|255.255.255.0
-|172.16.99.1
-|
+<br />
-|-
+====Measuring Correlation====
-|
-|G1
-(vlan99)
+<br />
-|
+=====Pearson correlation coefficient - Pearson s r=====
-|
-|
-|
+<br />
-|-
+=====The coefficient of determination <math>R^2</math>=====
-|'''PC1'''
-|NIC
-(vlan10)
+<br />
-|172.16.10.51
+====Correlation <math>\neq</math> Causation====
-|255.255.255.0
-|172.16.10.1
-|
+<br />
-|-
+====Testing the "generalizability" of the correlation ====
-|'''PC2'''
-|NIC
-(vlan20)
+<br />
-|172.16.20.52
+===Simple Linear Regression===
-|255.255.255.0
-|172.16.20.1
-|
+<br />
-|-
+===Multiple Linear Regression===
-|'''PC3'''
-|NIC
-(vlan30)
+<br />
-|172.16.30.53
+===RapidMiner Linear Regression examples===
-|255.255.255.0
-|172.16.30.1
-|
+<br />
-|-
+==K-Nearest Neighbour==
-|'''PC4'''
-|NIC
-(vlan40)
+<br />
-|172.16.40.54
+==Decision Trees==
-|255.255.255.0
-|172.16.40.1
-|
+<br />
-|-
+===The algorithm===
-| rowspan="2" |'''Wireless router0'''
-|Internet setup
-|172.16.10.101
+<br />
-|255.255.255.0
+====Basic explanation of the algorithm====
-|172.16.10.1
-|
-|-
+<br />
-|Network setup
+====Algorithms addressed in Noel s Lecture====
-|172.16.50.1
-|255.255.255.0
-|
+<br />
-|
+=====The ID3 algorithm=====
-|-
-|'''Laptop1'''
-|
+<br />
-|
+=====The C5.0 algorithm=====
-|
-|
-|
+<br />
-|-
+===Example in RapidMiner===
-|'''Laptop2'''
-|
-|
+<br />
-|
+==Random Forests==
-|
+https://www.youtube.com/watch?v=J4Wdy0Wc_xQ&t=4s
-|
-|-
-| colspan="7" | -
+<br />
-|-
+==Naive Bayes==
-| rowspan="4" |'''Limerik'''
-| rowspan="3" |'''R2'''
-|S/0/0/0
+<br />
-|10.16.1.2
+===Probability===
-|255.255.255.252
-|
-|
+<br />
-|-
+===Independent and dependent events===
-|S/0/0/1
-DCE
-|10.16.4.1
+<br />
-|255.255.255.252
+===Mutually exclusive and collectively exhaustive===
-|
-|
-|-
+<br />
-|G0/0
+===Marginal probability===
-|172.18.1.1
+The marginal probability is the probability of a single event occurring, independent of other events. A conditional probability, on the other hand, is the probability that an event occurs given that another specific event has already occurred. https://en.wikipedia.org/wiki/Marginal_distribution
-|255.255.255.0
-|
-|
+<br >
-|-
+===Joint Probability===
-|'''PC7'''
-|NIC
-|172.18.1.57
+<br />
-|255.255.255.0
+===Conditional probability===
-|172.18.1.1
-|
-|-
+<br />
-| colspan="7" | -
+====Kolmogorov definition of Conditional probability====
-|-
-| rowspan="9" |'''Galway'''
-| rowspan="3" |'''R3'''
+<br />
-|S0/0/1
+====Bayes s theorem====
-|10.16.4.2
-|255.255.255.252
-|
+<br />
-| rowspan="3" |'''Standby router in HSRP'''
+=====Likelihood and Marginal Likelihood=====
-Slow path
-|-
-|S0/1/1
+<br />
-|10.16.5.1
+=====Prior Probability=====
-|255.255.255.252
-|
-|-
+<br />
-|G0/1
+=====Posterior Probability=====
-|172.17.1.1
-|255.255.255.0
-|
+<br />
-|-
+===Applying Bayes' Theorem===
-| rowspan="2" |'''R4'''
-|S0/0/0
-DCE
+<br />
-|10.16.6.1
+====Scenario 1 - A single feature====
-|255.255.255.252
-|
-| rowspan="2" |'''Active router in HSRP'''
+<br />
-(Because the other path is slow)
+====Scenario 2 - Class-conditional independence====
-|-
-|G0/0
-|172.17.1.2
+<br />
-|255.255.255.0
+====Scenario 3 - Laplace Estimator====
-|
-|-
-|'''Switch0'''
+<br />
-|VLAN 1
+===Naïve Bayes -  Numeric Features===
-|172.17.1.6
-|255.255.255.0
-|<code>172.17.1.254</code> (virtual IP for <code>HSRP</code>)  <s>172.17.1.1</s>
+<br />
-|
+===RapidMiner Examples===
-|-
-|'''Switch1'''
-|VLAN 1
+<br />
-|172.17.1.7
+==Perceptrons - Neural Networks and Support Vector Machines==
-|255.255.255.0
-|<code>172.17.1.254</code> (virtual IP for <code>HSRP</code>)  <s>172.17.1.2</s>
-|
+<br />
-|-
+==Boosting==
-|'''PC5'''
-|NIC
-|172.17.1.55
+<br />
-|255.255.255.0
+===Gradient boosting===
-|<code>172.17.1.254</code> (virtual IP for <code>HSRP</code>)   <s>172.17.1.1</s>
-|
-|-
+<br />
-|'''PC6'''
+==K Means Clustering==
-|NIC
-|172.17.1.56
-|255.255.255.0
+<br />
-|<code>172.17.1.254</code> (virtual IP for <code>HSRP</code>)   <s>172.17.1.2</s>
+===Clustering class of the Noel course===
-|
-|-
-| colspan="7" | -
+<br />
-|-
+====RapidMiner example 1====
-| rowspan="4" |'''Cork'''
-| rowspan="3" |'''R5'''
-|S0/0/0
+<br />
-|10.16.6.2
+==Principal Component Analysis PCA==
-|255.255.255.252
-|
-|
+<br />
-|-
+==Association Rules - Market Basket Analysis==
-|S0/0/1
-DCE
-|10.16.2.2
+<br />
-|255.255.255.252
+===Association Rules example in RapidMiner===
-|
-|
-|-
+<br />
-|G0/0
+==Time Series Analysis==
-|172.19.1.1
-|255.255.255.0
-|
+<br />
-|
+==[[Text Analytics|Text Analytics / Mining]]==
-|-
-|'''PC8'''
-|NIC
+<br />
-|172.19.1.58
+==Model Evaluation==
-|255.255.255.0
-|172.19.1.1
-|
+<br />
-|-
+===Why evaluate models===
-| colspan="7" | -
-|-
-| rowspan="4" |'''Sligo'''
+<br />
-| rowspan="3" |'''R6'''
+===Evaluation of regression models===
-|S0/1/0
-|10.16.3.2
-|255.255.255.252
+<br />
-|
+===Evaluation of classification models===
-|
-|-
-|S0/1/1
+<br />
-DCE
+===References===
-|10.16.5.2
+Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-174. DOI: 10.2307/2529310.
-|255.255.255.252
-|
-|
+<br />
-|-
+==[[Python for Data Science]]==
-|G0/0
-|172.20.1.1
-|255.255.255.0
+<br />
-|c
+===[[NumPy and Pandas]]===
-|
-|-
-|'''PC9'''
+<br />
-|NIC
+===[[Data Visualization with Python]]===
-|172.20.1.59
-|255.255.255.0
-|172.20.1.1
+<br />
-|
+===[[Text Analytics in Python]]===
-|}
+<br />
+===[[Dash - Plotly]]===
+<br />
+===[[Scrapy]]===
+<br />
+==[[R]]==
+<br />
+===[[R tutorial]]===
+<br />
+==[[RapidMiner]]==
+<br />
+==Assessments==
+<br />
+===Diploma in Predictive Data Analytics assessment===
+<br />
+==Notas==
+<br />
+==References==
+<br />