Difference between revisions of "Data Science"

From Sinfronteras
Jump to: navigation, search
(Replaced content with "{{Sidebar}} <accesscontrol> Autoconfirmed users </accesscontrol> ==Projects portfolio== <div style="margin-left: 20px; width: 550pt; margin-top: 50px !important"> <ul> {{...")
(Tag: Replaced)
(Replaced content with "~ Migrated")
(Tag: Replaced)
 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Sidebar}}
+
~ Migrated
<accesscontrol>
 
Autoconfirmed users
 
</accesscontrol>
 
 
 
==Projects portfolio==
 
<div style="margin-left: 20px; width: 550pt; margin-top: 50px !important">
 
<ul>
 
{{#lst:Mis páginas|portfolio_data_science}}
 
</ul>
 
</div>
 
 
 
 
 
<br />
 
 
 
==Data Analytics courses==
 
Data Science courses
 
 
 
 
 
* Posts
 
:* '''Top 50 Machine Learning interview questions:''' https://www.linkedin.com/posts/mariocaicedo_machine-learning-interviews-activity-6573658058562555904-CzeV
 
:* https://www.linkedin.com/feed/update/urn:li:ugcPost:6547849699011977216/
 
 
 
 
 
* Udemy: https://www.udemy.com/
 
:* Python for Data Science and Machine Learning Bootcamp - Nivel básico
 
:: https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
 
 
 
:* Machine Learning, Data Science and Deep Learning with Python - Nivel básico - Parecido al anterior
 
:: https://www.udemy.com/course/data-science-and-machine-learning-with-python-hands-on/
 
 
 
:* Data Science: Supervised Machine Learning in Python - Nivel más alto
 
:: https://www.udemy.com/course/data-science-supervised-machine-learning-in-python/
 
 
 
:*  Mathematical Foundation For Machine Learning and AI
 
:: https://www.udemy.com/course/mathematical-foundation-for-machine-learning-and-ai/
 
 
 
:*  The Data Science Course 2019: Complete Data Science Bootcamp
 
:: https://www.udemy.com/course/the-data-science-course-complete-data-science-bootcamp/
 
 
 
 
 
* Coursera - By Stanford University
 
:* https://www.coursera.org/learn/machine-learning/home/welcome
 
 
 
 
 
* Udacity: https://eu.udacity.com/
 
 
 
 
 
* Columbia University - COURSE FEES USD 1,400
 
:* https://www2.emeritus.org/programs/applied-machine-learning-fb?utm_source=Facebook&utm_medium=Other+Countries&utm_campaign=B-5551_WW_FB_INT_AML_JUNE_19_Interest-06-06&utm_content=square+1&utm_term=23843404626490040&fbclid=IwAR1GlbCvK8F2RdJ29lAmhFm6BEZMUdiUZSQil5USaVLyL84R7Wz5iUTz880
 
 
 
 
 
<br />
 
 
 
==Possible sources of data==
 
<br />
 
 
 
{| class="wikitable"
 
| Irish Government Data Portal        || https://data.gov.ie/
 
|-
 
| UK Government Data Portal || https://data.gov.uk/
 
|-
 
| UK National Health Service Data      || https://digital.nhs.uk/data-and-information                                             
 
|-
 
| EU Open Data Portal || http://data.europa.eu/euodp/en/data/
 
|-
 
| US Government Data Portal || https://www.data.gov/
 
|-
 
| Canadian Government Data Portal      || https://open.canada.ca/en/open-data
 
|-
 
| Indian Government Open Data || https://data.gov.in/
 
|-
 
| World Bank  || https://data.worldbank.org/
 
|-
 
| International Monetary Fund || https://www.imf.org/en/Data
 
|-
 
| World Health Organisation || http://www.who.int/gho/en/
 
|-
 
| UNICEF || https://data.unicef.org/
 
|-
 
| Federal Drug Administration || https://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
 
|-
 
| Google Public Data Explorer || https://www.google.com/publicdata/directory
 
|-
 
| Human Rights Data Analysis Group || https://hrdag.org/
 
|-
 
| Armed Conflict Data || http://www.pcr.uu.se/research/UCDP/
 
|-
 
| Amazon Web Services Open Data Registry || https://registry.opendata.aws/
 
|-
 
| Pew Research Datasets || http://www.pewinternet.org/datasets/
 
|-
 
| CERN Open Data || http://opendata.cern.ch/
 
|-
 
| Kaggle || https://www.kaggle.com/
 
|-
 
| UCI Machine Learning Repository || https://archive.ics.uci.edu/ml/index.php
 
|-
 
| Open Data Network || https://www.opendatanetwork.com/
 
|-
 
| Linked Open Data - University of Münster || https://www.uni-muenster.de/LODUM/
 
|-
 
| US National Climate Data || https://www.ncdc.noaa.gov/data-access/quick-links#loc-clim
 
|-
 
| US Medicare Hospital Quality Data      || https://data.medicare.gov/data/hospital-compare
 
|-
 
| Yelp Data || https://www.yelp.com/dataset/challenge
 
|-
 
| US Census Data || https://www.census.gov/data.html
 
|-
 
| Broad Institute Cancer Program Data    || http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi
 
|-
 
| National Centers for Environmental Information    || https://www.ncdc.noaa.gov/data-access
 
|-
 
| Centers for Disease Control and Prevention || https://www.cdc.gov/datastatistics/
 
|-
 
| Open Data Monitor || https://opendatamonitor.eu/
 
|-
 
| Plenario || http://plenar.io/
 
|-
 
| British Film Institute || http://www.bfi.org.uk/education-research/film-industry-statistics-research
 
|-
 
| Edinburgh University Datasets || http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html
 
|-
 
| DataHub || http://datahub.io
 
|}
 
 
 
 
 
<br />
 
==What is data==
 
It is difficult to define such a broad concept, but the definition that I like it that data is a collection (or any set) of characters or files, such as numbers, symbols, words, text files, images, files, audio files, etc, that represent measurements, observations, or just descriptions, that are gathered and stored for some purpose. https://www.mathsisfun.com/data/data.html  https://www.computerhope.com/jargon/d/data.htm
 
 
 
 
 
<br />
 
===Qualitative vs quantitative data===
 
https://learn.g2.com/qualitative-vs-quantitative-data
 
 
 
[[File:Qualitative_quantitative_data1.png|400px|thumb|right|Taken from https://www.mathsisfun.com/data/data.html]]
 
 
 
 
 
[[File:Qualitative_quantitative_data2.png|400px|thumb|right|Taken from https://www.mathsisfun.com/data/data.html]]
 
 
 
 
 
{| class="wikitable"
 
|+
 
!'''Qualitative data'''
 
!'''Quantitative data'''
 
|-
 
|'''Qualitative data is descriptive and conceptual information (it describes something)'''
 
|'''Quantitative data is numerical information (numbers)'''
 
|-
 
|It is subjective, interpretive, and exploratory
 
|It is objective, to-the-point, and conclusive
 
|-
 
|It is non-statistical
 
|It is statistical
 
|-
 
|It is typically unstructured or semi-structured.
 
|It is typically structured
 
|-
 
|'''Examples:'''
 
 
 
See unstructured data examples below.
 
|'''Examples:'''
 
 
 
See structured data examples below.
 
|}
 
 
 
 
 
<br />
 
====Discrete and continuous data====
 
https://www.youtube.com/watch?v=cz4nPSA9rlc
 
 
 
 
 
Quantitative data can be discrete or continuous.
 
 
 
* '''Continuous data can take on any value in an interval.'''
 
:* We usually say that continuous data is measured.
 
:* Examples:
 
::* Measurements of temperature: <math>[83.6, 99.46, 103.31, 105,91]</math>ºF.
 
::: Temperature can be any value within an interval and it is measured (not counted)
 
 
 
 
 
* '''Discrete data can only have specific values.'''
 
:* We usually say that discrete data is counted.
 
:* Discrete data is usually (but not always) whole numbers: <math>[-2,-1,0,1,2,3,4,5...]</math>
 
:* Examples:
 
::* Possible values on a Dice Roller: <math>[1,2,3,4,5,6]</math>
 
::* Shoe sizes:  <math>[...6, 6.5, 7, 7.5...]</math>. They are not whole numbers but can not be any number.
 
 
 
 
 
<div style="text-align: center;">
 
<ul>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Discrete_and_continuous_data_examples1.png|400px|thumb|right|Taken from https://www.youtube.com/watch?v=cz4nPSA9rlc]]
 
</li>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Discrete_and_continuous_data_examples2.png|400px|thumb|right|Taken from https://www.youtube.com/watch?v=cz4nPSA9rlc]]
 
</li>
 
</ul>
 
</div>
 
 
 
 
 
<br />
 
===Structured vs Unstructured data===
 
https://learn.g2.com/structured-vs-unstructured-data
 
 
 
http://troindia.in/journal/ijcesr/vol3iss3/36-40.pdf
 
 
 
 
 
{| class="wikitable"
 
|+
 
! style="width: 350pt" |'''Structured data'''
 
! style="width: 350pt" |'''Unstructured data'''
 
! style="width: 200pt" |Semi-structured data
 
|- style="vertical-align:top;"
 
|'''Structured data is organized within fixed fields or columns, usually in relational databases (or spreadsheets) so it can be easily queried with SQL'''
 
 
 
https://learn.g2.com/structured-vs-unstructured-data
 
 
 
https://www.talend.com/resources/structured-vs-unstructured-data
 
|'''It's data that doesn't fit easily into a spreadsheet or a relational database.'''
 
|The line between Semi-structured data and Unstructured data has always been unclear. Semi-structured data is usually referred to as information that is not structured in a traditional database but contains some organizational properties that make its processing easier.
 
|- style="vertical-align:top;"
 
|
 
* '''Examples of structured data include:'''
 
 
 
:* '''Quantative data:'''
 
::* Weather forecast data: Measurements of temperature, precipitation (in millimeters (mm)), atmospheric pressure, wind speed, cloud coverage
 
::* Seismic data: Measurement of ground movement caused by seismic activity.
 
::* Housing data: Gattered housing data composed, for example, by Price, Area of the house, Number of rooms, House age, Area population, Avg. Income of residents of the city
 
::* Numeric financial information and Market reports
 
 
 
 
 
:* Another good example of structured data is a company's database where the company stores all the data that is usually associated with the ERP (Enterprise resource planning: A suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities), such as:
 
 
 
::* Human resource data: For example, an «Employees» table: id, fname, lname, dob, email, phone_number, address
 
::* Customer data (Customer relationship management (CRM)): «Client» table
 
::* Projects data
 
::* Accounting data
 
|
 
* '''Examples of unstructured data include:''' https://www.m-files.com/blog/what-is-structured-data-vs-unstructured-data
 
 
 
:* '''Text files''': Word docs, PowerPoint presentations, Email, Chat logs, Text messages, Customer reviews, News articles, etc.
 
::* Email: There’s some internal metadata structure, so it’s sometimes called semi-structured, but the message field is unstructured and difficult to analyze with traditional tools.
 
 
 
 
 
:* '''Media files''' '''(Images, Audio, and Video files)''': Satellite images, surveillance images/videos, Call recordings (Call logs), Music audios/videos, Locations, etc.
 
 
 
 
 
:* '''Some sources of data are:'''
 
::* Social Media data: Data from social networking sites like Facebook, Twitter, and LinkedIn
 
::* Mobile data: Text messages
 
::* Call centers data
 
|
 
For example, NoSQL documents are considered to be semi-structured data since they contain keywords that can be used to process the documents easier. https://www.youtube.com/watch?v=dK4aGzeBPkk
 
|}
 
 
 
 
 
It is important to highlight that the huge increase in data in the last 10 years has been driven by the increase in unstructured data. Currently, some estimations indicate that there are around 300 exabytes of data, of which around 80% is unstructured data.
 
 
 
The prefix exa indicates multiplication by the sixth power of 1000 (<math>10^{18}</math>).
 
<math> 1 EB = 10^{18} bytes = 1000^6 bytes = 1,000,000,000,000,000,000 B </math>
 
 
 
 
 
Some sources also suggest that the amount of data is doubling every 2 years.
 
 
 
 
 
<div style="text-align: center;">
 
<ul>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data1.png|500px|thumb|center|Source: IDC. Taken from https://www.youtube.com/watch?v=WBU7sW1jy2o]]
 
</li>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data4.jpg|500px|thumb|center|Taken from http://mis587pushkarmaid.blogspot.com/2016/03/big-unstructured-data-vs-structured.html]]
 
</li>
 
</ul>
 
</div>
 
 
 
 
 
<div style="text-align: center;">
 
<ul>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data2.png|500px|thumb|center|Source: Taken from https://docplayer.net/3430405-Self-service-bi-for-big-data-applications-using-apache-drill.html]]
 
</li>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data3.png|500px|thumb|center|Taken from https://www.datanami.com/solution_content/hpe/media-entertainment/navigating-unstructured-retail-data-storm/]]
 
</li>
 
</ul>
 
</div>
 
 
 
 
 
 
 
<div style="text-align: center;">
 
<ul>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data6.png|500px|thumb|center|Source: Taken from https://docplayer.net/3430405-Self-service-bi-for-big-data-applications-using-apache-drill.html]]
 
</li>
 
<li style="display: inline-block; height: 100%; vertical-align: middle">
 
[[File:Structured_vs_Unstructured_data5.png|500px|thumb|center|Taken from https://itbrandpulse.com/enterprise-storage-tco-case-study/]]</li>
 
</ul>
 
</div>
 
 
 
 
 
<br />
 
 
 
===Data Levels and Measurement===
 
Levels of M
 

Latest revision as of 11:52, 27 February 2026

~ Migrated