Python for Data Science
Contents
Anaconda
Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications.
En otras palabras, Anaconda puede ser visto como un paquete (a distribution) que incluye no solo Python (or R) but many libraries that are used in Data Science, as well as its own virtual environment system. It's an "all-in-one" install that is extremely popular in data science and Machine Learning.
Anaconda comes with a few IDE:
- Jupyter Lab
- Jupyter Notebook
- Spyder
- Qtconsole
- and others
Jupyter
Jupyter comes with Anaconda. It is a development environment (IDE) where we can write codes; but it also allows us to display images, and write down markdown notes.
- GitHub:
- https://docs.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github
- Example: https://github.com/adeloaleman/AmazonLaptopsDashboard/blob/master/DataAnalysis/data_analysis2.ipynb
- Nbviewer
- https://nbviewer.jupyter.org/
- Example: https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/main/tutorial/06%20-%20Linking%20and%20Interactions.ipynb
Online Jupyter
There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/
For example: https://colab.research.google.com
Popular Python Data Science Libraries
- NumPy
- SciPy
- Pandas
- Seaborn
- SciKit'Learn
- MatplotLib
- Plotly
- PySpartk
NumPy and Pandas
Data Visualization with Python
Natural Language Processing
Plotly Dash
Using SQL in Jupyter
Connecting to a database in Jupyter
https://pypi.org/project/ipython-sql/
https://stackoverflow.com/questions/454854/no-module-named-mysqldb
pip install ipython-sql sudo apt install default-libmysqlclient-dev pip install mysqlclient sudo apt-get install python3-mysqldb
Luego adding SQL syntax highlighting to Jupyter as describe above in the corrrespoinding source.