Difference between revisions of "Python for Data Science"

From Sinfronteras
Jump to: navigation, search
(Anaconda)
(Keep a python script running on a remote server)
 
(117 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
 +
<br />
 
For a standard Python tutorial go to [[Python]]
 
For a standard Python tutorial go to [[Python]]
 +
 +
 +
<br />
 +
==Courses==
 +
*Udemy - Python for Data Science and Machine Learning Bootcamp
 +
 +
:https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
  
  
Line 11: Line 20:
 
<br />
 
<br />
 
===Installation===
 
===Installation===
https://linuxize.com/post/how-to-install-anaconda-on-ubuntu-18-04/
+
Installation from the official Anaconda Web site: https://docs.anaconda.com/anaconda/install/
  
https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04
 
  
 +
<br />
  
<br />
 
 
===Anaconda comes with a few IDE===
 
===Anaconda comes with a few IDE===
  
Line 35: Line 43:
  
 
<br />
 
<br />
==Pandas==
 
  
 
==Jupyter==
 
==Jupyter==
Line 50: Line 57:
  
 
<br />
 
<br />
===Online Jupyter===
+
===Remote connection===
There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/
+
https://jupyter-notebook.readthedocs.io/en/stable/public_server.html
 +
 
 +
 
 +
A**1
  
I have tried:
 
  
*https://cocalc.com/app
+
<syntaxhighlight lang="shell">
 +
(base) adelo@vmi346715:~/.jupyter$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
 +
Generating a RSA private key
 +
......................................+++++
 +
....................................+++++
 +
writing new private key to 'mykey.key'
 +
-----
 +
You are about to be asked to enter information that will be incorporated
 +
into your certificate request.
 +
What you are about to enter is what is called a Distinguished Name or a DN.
 +
There are quite a few fields but you can leave some blank
 +
For some fields there will be a default value,
 +
If you enter '.', the field will be left blank.
 +
-----
 +
Country Name (2 letter code) [AU]:IE
 +
State or Province Name (full name) [Some-State]:Dublin
 +
Locality Name (eg, city) []:Dublin
 +
Organization Name (eg, company) [Internet Widgits Pty Ltd]:.
 +
Organizational Unit Name (eg, section) []:.
 +
Common Name (e.g. server FQDN or YOUR name) []:sinfronteras   
 +
Email Address []:adeloaleman@gmail.com
 +
</syntaxhighlight>
  
::https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
 
::Parece bueno, pero tiene opciones que no son gratis
 
  
 +
<br />
 +
===Share Jupyter Notebook online===
 +
* '''GitHub:'''
 +
: https://docs.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github
 +
: Example: https://github.com/adeloaleman/AmazonLaptopsDashboard/blob/master/DataAnalysis/data_analysis2.ipynb
  
*https://www.kaggle.com/
 
  
::https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
+
* '''Nbviewer''
::Parece bueno pero no encontré la forma adicionar una TOC
+
: https://nbviewer.jupyter.org/
 +
: Example: https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/main/tutorial/06%20-%20Linking%20and%20Interactions.ipynb
  
  
*https://drive.google.com
+
<br />
  
:*https://colab.research.google.com
+
===Customize Jupyter===
::Es el que estoy utilizando ahora
 
  
  
 
<br />
 
<br />
==Courses==
+
====Themes====
 +
https://github.com/dunovank/jupyter-themes
 +
 
 +
Ver el tema que muestran en esta página: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab
 +
 
 +
 
 +
jt  -t oceans16    -cellw 98%  -lineh 120  -fs 14  -nfs 14  -dfs 14  -ofs 14
 +
 
 +
 
 +
https://www.kaggle.com/getting-started/97540
 +
jt  -t monokai      -cellw 98%  -lineh 120  -fs 14  -nfs 14  -dfs 14  -ofs 14  -f fira  -nf ptsans  -N  -kl  -cursw 2  -cursc r  -T
  
*Udemy - Python for Data Science and Machine Learning Bootcamp
 
  
:https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
+
<br />
  
 +
====Extensions====
 +
This post mention so nice extension and configuration that can be done: https://towardsdatascience.com/bringing-the-best-out-of-jupyter-notebooks-for-data-science-f0871519ca29
  
 
<br />
 
<br />
==Most popular Python Data Science Libraries==
+
=====Unofficial Jupyter Notebook Extensions=====
 +
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html
  
*NumPy
+
<span style="color: green">'''This is very important. There are very nice extensions in this package:'''</span>
*SciPy
 
*Pandas
 
*Seaborn
 
*SciKit'Learn
 
*MatplotLib
 
*Plotly
 
*PySpartk
 
  
 +
* toc2: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html
 +
* Collapsible Headings
 +
* ... etc
  
 
<br />
 
<br />
 +
======Installation======
 +
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html
 +
 +
<span style="color: red">'''I had some issues to install it. La format indicada por defecto:'''</span>
  
==NumPy==
+
pip install jupyter_contrib_nbextensions
 +
jupyter contrib nbextension install --user
  
*NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks.
+
<span style="color: red">'''A través de la forma anterior no pude instalar el paquete de forma correcta. La instalación no retornó errorres, y la extensión se mostraba en Jupyter-notebook pero no podía activar "enable" las extensiones.'''</span>
  
*Numpy is also incredibly fast, as it has bindings to C libraries. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).
+
 
 +
<span style="color: red">'''Al parecer es un problema con la ubicación de la instalación. Yo estaba usando conda pero conda está presentando problemas. La instalación de los paquestes demora muchísimo y luego el paquete parece no estar disponible.'''</span>
 +
 
 +
 
 +
<span style="color: red">'''En el siguiente post encontré una solución para instalar nbextension usando pip:'''</span>
 +
https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1127
 +
 
 +
pip install --upgrade jupyter_contrib_nbextensions
 +
jupyter contrib nbextension install  --sys-prefix  --symlink
 +
 
 +
<span style="color: red">'''«--symlink» creo que lo usé pero no estoy completamente seguro. También realicé el --upgrade pero creo que la diferencia la hicieron las opciones --sys-prefix  --symlink'''</span>
 +
 
 +
 
 +
 
 +
Si no se muestra la '''Nbextensions''' tab (), try to reinstall the https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator
 +
 
 +
pip install jupyter_nbextensions_configurator
 +
or
 +
conda install -c conda-forge jupyter_nbextensions_configurator
  
  
 
<br />
 
<br />
===Installation===
 
It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install.
 
  
 +
====CustomJS and CustonCSS files====
 +
This is a good post: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
 +
 +
Keyboard Shortcut Customization: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Custom%20Keyboard%20Shortcuts.html
 +
 +
 +
<br />
 +
custom.js
 +
<syntaxhighlight lang="js">
 +
/** Mis configuraciones */
 +
 +
// This is to enable syntax highlighting for SQL code:
 +
// https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
 +
require(['notebook/js/codecell'], function(codecell) {
 +
  codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = {'reg':[/^%%sql/]} ;
 +
  Jupyter.notebook.events.one('kernel_ready.Kernel', function(){
 +
  Jupyter.notebook.get_cells().map(function(cell){
 +
      if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;
 +
  });
 +
});
 +
 +
 +
// My plain theme
 +
// This is a good post where I took some ideas to write the following fuction: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
 +
function plainTheme() {
 +
    var input_promp_fields = document.getElementsByClassName("prompt_container");
 +
    var text_render_fields = document.getElementsByClassName("text_cell_render");
 +
 +
    if (input_promp_fields[0].style.visibility == "collapse"){
 +
        action = "visible";
 +
        input_marginLeft = "0px";
 +
        border_top  = "3px";
 +
        prompt_width = "74px";
 +
        padding_top = "0px";
 +
        output_margin = "40px";
 +
    }else{
 +
        action = "collapse";
 +
        input_marginLeft = "74px";
 +
        border_top  = '0px';
 +
        prompt_width = "74px";
 +
        padding_top = "40px";
 +
        output_margin = "40px";
 +
    }
 +
 +
    // Si queremos usar !important debemos hacerlo de esta forma utilizando JQuery:
 +
    // https://makitweb.com/how-to-add-important-to-css-property-with-jquery/
 +
    var text_cell_fields = document.getElementsByClassName("text_cell");
 +
    $(text_cell_fields).ready(function(){
 +
        $('.input_prompt').css({
 +
            'cssText': `width: 40px !important; max-width: ${prompt_width} !important; min-width: ${prompt_width} !important;`
 +
        });
 +
    });
 +
 +
    $(document).ready(function(){
 +
        $(".prompt_container").css(
 +
            'visibility', `${action}`
 +
        );
 +
       
 +
        $(".input").css(
 +
            'padding-left', `${input_marginLeft}`
 +
        );
 +
       
 +
        $(".output_subarea").css(
 +
            'margin-left', `${output_margin}`
 +
        );
 +
                   
 +
        $('.cell').css({
 +
            'cssText': `border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
 +
        });
 +
       
 +
        $(".collapsible_headings_ellipsis").css({
 +
            'cssText': `padding-top:${padding_top} !important; border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
 +
        });
  
If you have Anaconda, install NumPy by:
+
        $(".text_cell_render").css({
 +
            'cssText': `margin-left: -10px;`
 +
        });
 +
    });           
 +
}
  
conda install numpy
+
Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-Ctrl-Q', {
<br />If you are not using Anaconda distribution:
+
    help : '...',
 +
    help_index : 'zz',
 +
    handler : function (event) {
 +
        plainTheme();
 +
    return false;
 +
    }}
 +
);
  
*
+
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Alt-Ctrl-Q', {
 +
    help : '...',
 +
    help_index : 'zz',
 +
    handler : function (event) {
 +
        plainTheme();
 +
    return false;
 +
    }}
 +
);
  
pip install numpy
 
  
 +
// This could be very usefull. It allows to add text automatically into a cell
 +
// https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064/27
 +
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Ctrl-Shift-J', {
 +
    help : '...',
 +
    help_index : 'zz',
 +
    handler : function (event) {
 +
        document.body.style.background = 'blue'
 +
        var target = Jupyter.notebook.get_selected_cell()
 +
        var cursor = target.code_mirror.getCursor()
 +
        var before = target.get_pre_cursor()
 +
        var after = target.get_post_cursor()
 +
        target.set_text(before + 'from IPython.core.display import display, HTML; \n\taverrrdisplay(HTML("<style>.container { width:98% !important;}</style>"))' + after)
 +
        cursor.ch += 20 // where to put your cursor
 +
        target.code_mirror.setCursor(cursor)
 +
        return false;
 +
    }}
 +
);
  
  
Then, to use it:<syntaxhighlight lang="python3">
+
// To get the real value of a css field: https://stackoverflow.com/questions/26074476/document-body-style-backgroundcolor-doesnt-work-with-external-css-style-sheet
import numpy as np
+
// window.getComputedStyle(document.body).backgroundColor
arr = np.arange(0,10)
+
// window.getComputedStyle(document.getElementsByClassName("input_area")[0]).backgroundColor
 
</syntaxhighlight>
 
</syntaxhighlight>
  
  
===Arrays===
+
<br />
{| class="wikitable" style="width: 100%;"
+
custom.css
! colspan="2" rowspan="2" |
+
<syntaxhighlight lang="css">
! colspan="2" rowspan="2" |Method/Operation
+
/*  Mis configuraciones  */
! rowspan="2" |Description/Comments
+
 
!Example
+
.container { width:98% !important; }
|-
+
/* document.getElementById("notebook-container").style.minWidth = "50%"; */
!<syntaxhighlight lang="python3">
+
/* document.getElementById("notebook-container").style.maxWidth = "50%"; */
import numpy as np
+
 
</syntaxhighlight>
+
#notebook-container {
|- style="vertical-align:top;"
+
width:98% !important;
! rowspan="10" |<h5 style="text-align:left">Methods for creating NumPy Arrays</h5>
+
}
|<h6 style="text-align:left">From a Python List</h6>
+
 
| colspan="2" |'''''<code>array()</code>'''''
+
.CodeMirror-gutters {
|We can create an array by directly converting a list or list of lists.
+
background-color: transparent !important;
|<code>my_list = [1,2,3]</code>
+
background: transparent !important;
<code>np.array(my_list)</code>
+
}
 +
 
 +
.CodeMirror-linenumber {
 +
margin-left: -20px !important;
 +
}
  
 +
.output_subarea {
 +
margin-left: 40px !important;
 +
}
  
<code>my_matrix = [[1,2,3],[4,5,6],[7,8,9]]</code>
+
#toc .fa-fw {
 +
color: blue !important;
 +
}
  
<code>np.array(my_matrix)</code>
+
#toc .highlight_on_scroll {
|- style="vertical-align:top;"
+
margin-left: -4px !important;
| rowspan="9" |<h6 style="text-align:left">From Built-in NumPy Methods</h6>
+
| colspan="2" |'''''<code>arange()</code>'''''
+
}
|Return evenly spaced values within a given interval.
 
|<code>np.arange(0,10)</code>
 
<code>np.arange(0,11,2)</code>
 
|-
 
| colspan="2" |'''''<code>zeros()</code>'''''
 
|Generate arrays of zeros.
 
|<code>np.zeros(3)</code>
 
<code>np.zeros((5,5))</code>
 
|-
 
| colspan="2" |'''''<code>ones()</code>'''''
 
|Generate arrays of ones.
 
|<code>np.ones(3)</code>
 
<code>np.ones((3,3))</code>
 
|-
 
| colspan="2" |'''''<code>linspace()</code>'''''
 
|Return evenly spaced numbers over a specified interval.
 
|<code>np.linspace(0,10,3)</code>
 
<code>np.linspace(0,10,50)</code>
 
|-
 
| colspan="2" |'''''<code>eye()</code>'''''
 
|Creates an identity matrix.
 
|<code>np.linspace(0,10,50)</code>
 
|-
 
| rowspan="4" |'''''<code>random</code>'''''
 
|'''''<code>rand()</code>'''''
 
|Create an array of the given shape and populate it with random samples from a uniform distribution over <code>[0, 1)</code>.
 
|<syntaxhighlight lang="python3">
 
np.random.rand(2)
 
np.random.rand(5,5)
 
  
 +
#toc {
 +
padding-left: 10px !important;
 +
}
  
# Another way to invoke a function:
+
/*  I have also changed the color
from numpy.random import rand
+
/*  #a6e22e  by  #388bfd
# Then you can call the function directly
+
  *  in the entire custom.css
rand(5,5)
+
*/
</syntaxhighlight><br />
 
|-
 
|'''''<code>randn()</code>'''''
 
|Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform.
 
|<code>np.random.randn(2)</code>
 
<code>np.random.randn(5,5)</code>
 
|-
 
|'''''<code>randint()</code>'''''
 
|Return random integers from <code>low</code> (inclusive) to <code>high</code> (exclusive).
 
|<code>np.random.randint(1,100)</code>
 
<code>np.random.randint(1,100,10)</code>
 
|-
 
|'''<code>seed()</code>'''
 
|sets the random seed of the NumPy pseudo-random number generator. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes. See [[wikipedia:Random_seed|s1]] and [https://www.sharpsightlabs.com/blog/numpy-random-seed/ s2]. for explanation.
 
|<code>np.random.seed(101)</code>
 
|- style="vertical-align:top;"
 
! rowspan="4" |<h5 style="text-align:left">Others Array Attributes and Methods</h5>
 
| rowspan="4" |
 
| colspan="2" |''<code>'''reshape()'''</code>''
 
|Returns an array containing the same data with a new shape.
 
|<code>arr.reshape(5,5)</code>
 
|-
 
| colspan="2" |'''''<code>max()</code>, <code>min()</code>, <code>argmax()</code>, <code>argmin()</code>'''''
 
|Finding max or min values. Or to find their index locations using argmin or argmax.
 
|<code>arr.max()</code>
 
<code>arr.argmax()</code>
 
|-
 
| colspan="2" |''<code>'''shape()'''</code>''
 
|Shape is an attribute that arrays have (not a method).
 
|NO LO ENTENDI.. REVISAR!
 
  
 +
/* I have also chenged some of the properties of the toc directly above in the code:
 +
 +
#toc-wrapper {
 +
z-index: 90;
 +
position: fixed !important;
 +
display: flex;
 +
flex-direction: column;
 +
overflow: hidden;
 +
padding: 10px;
 +
padding-top: 40px !important;
 +
border-style: solid;
 +
border-width: thin;
 +
border-right-width: medium !important;
 +
background-color: #1e1e1e !important;
 +
}
 +
#toc-wrapper.ui-draggable.ui-resizable.sidebar-wrapper {
 +
border-color: rgba(93,92,82,.25) !important;
 +
}
 +
#toc a,
 +
#navigate_menu a,
 +
.toc {
 +
color: #f8f8f0 !important;
 +
font-size: 16pt !important;
 +
}
 +
#toc li > span:hover {
 +
background-color: rgba(93,92,82,.25) !important;
 +
}
 +
#toc a:hover,
 +
#navigate_menu a:hover,
 +
.toc {
 +
color: #DAA520 !important;
 +
font-size: 16pt !important;
 +
}
 +
#toc-wrapper .toc-item-num {
 +
color: #388bfd !important;
 +
font-size: 16pt !important;
 +
}
 +
*/
 +
</syntaxhighlight>
  
<nowiki>#</nowiki>Length of array
 
  
arr_length = arr2d.shape[1]
 
 
<br />
 
<br />
|-
 
| colspan="2" |''<code>'''dtype()'''</code>''
 
|You can also grab the data type of the object in the array.
 
|<code>arr.dtype</code>
 
|-
 
!<nowiki>-</nowiki>
 
!-
 
! colspan="2" |-
 
!-
 
!-
 
|- style="vertical-align:top;"
 
! rowspan="8" |<h5 style="text-align:left">Indexing and Selection</h5>
 
  
<div style="text-align:left">
+
====Configurations from the Juniper notebook====
*How to select elements or groups of elements from an array.
+
 
*The general format is '''arr_2d[row][col]''' or '''arr_2d[row,col]'''. I recommend usually using the comma notation for clarity.
 
</div>
 
|
 
| colspan="2" |
 
| colspan="2" |<div class="mw-collapsible mw-collapsed" style="">
 
'''Creating sample array for the following examples:'''
 
<div class="mw-collapsible-content">
 
 
<syntaxhighlight lang="python3">
 
<syntaxhighlight lang="python3">
import numpy as np
+
from IPython.core.display import display, HTML;
arr = np.arange(0,10)
+
 
# 1D Array:
+
display(HTML("<style>.container { width:98% !important;}</style>"<))
arr = np.arange(0,11)
+
 
#Show
+
display(HTML('<style>.prompt.input_prompt{display:none !important;}</style>'))
arr
+
display(HTML('<style>.prompt.input_prompt{visibility: visible !important;</style>'))
Output: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+
display(HTML('<style>.prompt.input_prompt{margin-left8kmclustering.ipynb 50px}</style>'))
 +
display(HTML('<style>.prompt.input_prompt{visibility: visible !important; width: 0px !important; min-width: 0px !important}</style>')) 
 +
 
 +
display(HTML('<style>.input_area{margin-left: -50px;}</style>'))
 +
display(HTML('<style>.input{margin-left: -20px;}</style>'))
 +
 
 +
display(HTML('<style>.output_area{margin-left: 55px}</style>'))
 +
 
 +
# display(HTML('<style>.cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
 +
# display(HTML('<style>.code_cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
  
# 2D Array
+
# display(HTML('<style>.output_wrapper{margin-bottom: 0px !important; margin-top: 0px !important;}</style>'))
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
 
#Show
 
arr_2d
 
Output:
 
array([[ 5, 10, 15],
 
      [20, 25, 30],
 
      [35, 40, 45]])
 
 
</syntaxhighlight>
 
</syntaxhighlight>
</div>
 
</div>
 
|- style="vertical-align:top;"
 
| rowspan="2" |<h6 style="text-align:left">Bracket Indexing and Selection (Slicing)</h6>
 
| colspan="2" |
 
|Note: When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems! To get a copy, need to use the method '''copy()'''. See important note below.
 
|<syntaxhighlight lang="python3">
 
#Get a value at an index
 
arr[8]
 
  
#Get values in a range
 
arr[1:5]
 
  
slice_of_arr = arr[0:6]
+
<br />
 +
 
 +
===Online Jupyter===
 +
There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/
 +
 
 +
I have tried:
 +
 
 +
*https://cocalc.com/app
 +
 
 +
::https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
 +
::Parece bueno, pero tiene opciones que no son gratis
 +
 
 +
 
 +
*https://www.kaggle.com/
 +
 
 +
::https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
 +
::Parece bueno pero no encontré la forma adicionar una TOC
 +
 
 +
 
 +
*https://drive.google.com
 +
 
 +
:*https://colab.research.google.com
 +
::Es el que estoy utilizando ahora
 +
 
 +
 
 +
<br />
 +
===Some remarks===
 +
 
  
#2D
+
<br />
arr_2d[1]
+
====Executing Terminal Commands in Jupyter Notebooks====
arr_2d[1][0]
+
https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks
arr_2d[1,0] # The same that above
 
  
#Shape (2,2) from top right corner
+
If we are in the Notebook, and we want to run a shell command rather than a notebook command we use the <code>'''!''' or '''%'''</code>
arr_2d[:2,1:]
 
#Output:
 
array([[10, 15],
 
      [25, 30]])
 
  
#Shape bottom row
+
Try, for example:
arr_2d[2,:]
+
%ls
</syntaxhighlight><br />
+
!pwd
|-
 
| colspan="2" |
 
| colspan="2" |<div class="mw-collapsible mw-collapsed" style="">
 
'''Fancy Indexing''':
 
<div class="mw-collapsible-content">
 
Fancy indexing allows you to select entire rows or columns out of order.
 
  
Example:<syntaxhighlight lang="python3">
+
It's the same as if you opened up a terminal and typed it without the <code>'''!'''</code>
# Set up matrix
 
arr2d = np.zeros((10,10))
 
  
# Length of array
 
arr_length = arr2d.shape[1]
 
  
# Set up array
+
<br />
for i in range(arr_length):
 
    arr2d[i] = i
 
   
 
arr2d
 
# Output:
 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
 
      [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
 
      [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
 
      [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
 
      [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
 
      [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
 
      [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
 
      [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
 
      [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
 
      [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])
 
  
# Fancy indexing allows the following
+
===[[HTML presentation with Reveal.js#Creating Presentations in Jupyter Notebook with RevealJS|Creating Presentations in Jupyter Notebook with RevealJS]]===
arr2d[[6,4,2,7]]
+
 
# Output:
+
 
array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
+
<br />
      [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
+
 
      [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
+
==Some of the most popular Python Data Science Libraries==
      [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])
+
 
</syntaxhighlight><br />
+
*NumPy
</div>
+
*SciPy
</div>
+
*Pandas
|- style="vertical-align:top;"
+
*Seaborn
| rowspan="2" |<h6 style="text-align:left">Broadcasting</h6>
+
*SciKit'Learn
 +
*MatplotLib
 +
*Plotly
 +
*PySpartk
  
  
(Setting a value with index range)
+
<br />
| colspan="2" rowspan="2" |
 
| rowspan="2" |Setting a value with index range:
 
Numpy arrays differ from a normal Python list because of their ability to broadcast.
 
|arr[0:5]=100<br />'''#'''Show
 
arr
 
  
Output: array([100, 100, 100, 100, 100,  5,  6,  7,  8,  9,  10])
+
==[[NumPy and Pandas]]==
|-
 
|'''#'''Setting all the values of an Array
 
arr[:]=99
 
|- style="vertical-align:top;"
 
|<h6 style="text-align:left">Get a copy of an Array</h6>
 
| colspan="2" |'''<code>copy''()''</code>'''
 
|Note: When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems! To get a copy, need to use the method '''copy()'''. See important note below.
 
|arr_copy = arr.copy()
 
|- style="vertical-align:top;"
 
|<h6 style="text-align:left">Important notes on Slices</h6>
 
| colspan="2" |
 
| colspan="2" |<div class="mw-collapsible mw-collapsed" style=""><syntaxhighlight lang="python3">
 
slice_of_arr = arr[0:6]
 
#Show slice
 
slice_of_arr
 
Output: array([0, 1, 2, 3, 4, 5])
 
  
#Making changes in slice_of_arr
 
slice_of_arr[:]=99
 
#Show slice
 
slice_of_arr
 
Output: array([99, 99, 99, 99, 99, 99])
 
  
#Now note the changes also occur in our original array!
+
<br />
#Show
+
==[[Data Visualization with Python]]==
arr
 
Output: array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])
 
  
#When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems!
 
  
#To get a copy, need to use the method copy()
+
<br />
</syntaxhighlight>
+
 
</div>
+
==[[Natural Language Processing]]==
|- style="vertical-align:top;"
+
 
|<h6 style="text-align:left">Using brackets for selection based on comparison operators and booleans</h6>
+
 
| colspan="2" |
+
<br />
| colspan="2" |<div class="mw-collapsible mw-collapsed" style=""><syntaxhighlight lang="python3">
+
==[[Dash - Plotly]]==
arr = np.arange(1,11)
+
 
arr > 4
+
 
# Output:
+
<br />
array([False, False, False, False,  True,  True,  True,  True,  True,
+
==[[Scrapy]]==
        True])
+
 
 +
 
 +
<br />
 +
==Using SQL in Jupyter==
 +
Connecting to a database in Jupyter
 +
 
 +
 
 +
https://pypi.org/project/ipython-sql/
  
bool_arr = arr>4
+
https://stackoverflow.com/questions/454854/no-module-named-mysqldb
bool_arr
 
# Output:
 
array([False, False, False, False,  True,  True,  True,  True,  True,
 
        True])
 
  
arr[bool_arr]
+
https://stackoverflow.com/questions/5178292/pip-install-mysql-python-fails-with-environmenterror-mysql-config-not-found
# Output:
 
array([ 5,  6,  7,  8,  9, 10])
 
  
arr[arr>2]
+
https://docs.kyso.io/guides/sql-interface-within-jupyterlab
# Output:
 
array([ 3,  4,  5,  6,  7,  8,  9, 10])
 
  
x = 2
+
https://www.datacamp.com/community/tutorials/sql-interface-within-jupyterlab
arr[arr>x]
 
# Output:
 
array([ 3,  4,  5,  6,  7,  8,  9, 10])
 
</syntaxhighlight>
 
</div>
 
|-
 
!-
 
!-
 
! colspan="2" |-
 
!-
 
!-
 
|- style="vertical-align:top;"
 
!<h5 style="text-align:left">Arithmetic operations</h5>
 
|
 
| colspan="2" |<code>arr + arr</code>
 
<code>arr - arr</code>
 
  
<code>arr * arr</code>
+
https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
  
<code>arr/arr</code>
+
https://www.sqlshack.com/learn-jupyter-notebooks-for-sql-server/
  
<code>1/arr</code>
 
  
<code>arr**3</code>
+
Verificar las fuentes above. Creo que lo único que tuve que hacer la última vez que lo instalé fue basado en las 3 primeras sources:
|Warning on division by zero, but not an error!
 
<code>0/0 -> nan</code>
 
  
<code>1/0 -> inf</code>
+
pip install ipython-sql
|<syntaxhighlight lang="python3">
+
import numpy as np
+
sudo apt install default-libmysqlclient-dev
arr = np.arange(0,10)
+
 +
pip install mysqlclient
 +
 +
sudo apt-get install python3-mysqldb
  
arr + arr
 
# Output:
 
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
 
  
arr**3
+
Luego adding SQL syntax highlighting to Jupyter as describe above in the corrrespoinding source.
# Output:
 
array([  0,  1,  8,  27,  64, 125, 216, 343, 512, 729])
 
</syntaxhighlight>
 
|- style="vertical-align:top;"
 
! rowspan="5" |<h5 style="text-align:left">[https://docs.scipy.org/doc/numpy/reference/ufuncs.html Universal Array Functions]</h5>
 
| rowspan="5" |
 
| colspan="2" |<code>np.sqrt(arr)</code>
 
|Taking Square Roots
 
| rowspan="5" |<syntaxhighlight lang="python3">
 
np.sin(arr)
 
# Output:
 
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
 
      -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])
 
</syntaxhighlight>
 
|-
 
| colspan="2" |<code>np.exp(arr)</code>
 
|Calcualting exponential (e^)
 
|-
 
| colspan="2" |<code>np.max(arr)</code>
 
same as <code>arr.max()</code>
 
|Max
 
|-
 
| colspan="2" |<code>np.sin(arr)</code>
 
|Sin
 
|-
 
| colspan="2" |<code>np.log(arr)</code>
 
|Natural logarithm
 
|}
 
  
  
 
<br />
 
<br />

Latest revision as of 15:47, 11 September 2024


For a standard Python tutorial go to Python



Courses

  • Udemy - Python for Data Science and Machine Learning Bootcamp
https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/



Anaconda

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)

En otras palabras, Anaconda puede ser visto como un paquete (a distribution) que incluye no solo Python (or R) but many libraries that are used in Data Science, as well as its own virtual environment system. It's an "all-in-one" install that is extremely popular in data science and Machine Learning.Creating sample array for the following examples:



Installation

Installation from the official Anaconda Web site: https://docs.anaconda.com/anaconda/install/



Anaconda comes with a few IDE

  • Jupyter Lab
  • Jupyter Notebook
  • Spyder
  • Qtconsole
  • and others



Anaconda Navigator

Anaconda Navigator is a GUI that helps you to easily start important applications and manage the packages in your local Anaconda installation

You can open the Anaconda Navigator from the Terminal:

anaconda-navigator



Jupyter

Jupyter comes with Anaconda.

  • It is a development environment (IDE) where we can write codes; but it also allows us to display images, and write down markdown notes.
  • It is the most popular IDE in data science for exploring and analyzing data.
  • Other famoues IDE for Python are Sublime Text and PyCharm.
  • There is Jupyter Lab and Jupyter Notebook



Remote connection

https://jupyter-notebook.readthedocs.io/en/stable/public_server.html


A**1


(base) adelo@vmi346715:~/.jupyter$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
Generating a RSA private key
......................................+++++
....................................+++++
writing new private key to 'mykey.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:IE	
State or Province Name (full name) [Some-State]:Dublin
Locality Name (eg, city) []:Dublin
Organization Name (eg, company) [Internet Widgits Pty Ltd]:.
Organizational Unit Name (eg, section) []:.
Common Name (e.g. server FQDN or YOUR name) []:sinfronteras    
Email Address []:adeloaleman@gmail.com



Share Jupyter Notebook online

  • GitHub:
https://docs.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github
Example: https://github.com/adeloaleman/AmazonLaptopsDashboard/blob/master/DataAnalysis/data_analysis2.ipynb


  • 'Nbviewer
https://nbviewer.jupyter.org/
Example: https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/main/tutorial/06%20-%20Linking%20and%20Interactions.ipynb



Customize Jupyter


Themes

https://github.com/dunovank/jupyter-themes

Ver el tema que muestran en esta página: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab


jt   -t oceans16     -cellw 98%   -lineh 120   -fs 14   -nfs 14   -dfs 14   -ofs 14


https://www.kaggle.com/getting-started/97540

jt   -t monokai      -cellw 98%   -lineh 120   -fs 14   -nfs 14   -dfs 14   -ofs 14   -f fira   -nf ptsans   -N   -kl   -cursw 2   -cursc r   -T



Extensions

This post mention so nice extension and configuration that can be done: https://towardsdatascience.com/bringing-the-best-out-of-jupyter-notebooks-for-data-science-f0871519ca29


Unofficial Jupyter Notebook Extensions

https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html

This is very important. There are very nice extensions in this package:


Installation

https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html

I had some issues to install it. La format indicada por defecto:

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user

A través de la forma anterior no pude instalar el paquete de forma correcta. La instalación no retornó errorres, y la extensión se mostraba en Jupyter-notebook pero no podía activar "enable" las extensiones.


Al parecer es un problema con la ubicación de la instalación. Yo estaba usando conda pero conda está presentando problemas. La instalación de los paquestes demora muchísimo y luego el paquete parece no estar disponible.


En el siguiente post encontré una solución para instalar nbextension usando pip: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1127

pip install --upgrade jupyter_contrib_nbextensions
jupyter contrib nbextension install  --sys-prefix  --symlink

«--symlink» creo que lo usé pero no estoy completamente seguro. También realicé el --upgrade pero creo que la diferencia la hicieron las opciones --sys-prefix --symlink


Si no se muestra la Nbextensions tab (), try to reinstall the https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator

pip install jupyter_nbextensions_configurator

or

conda install -c conda-forge jupyter_nbextensions_configurator



CustomJS and CustonCSS files

This is a good post: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064

Keyboard Shortcut Customization: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Custom%20Keyboard%20Shortcuts.html



custom.js
/** Mis configuraciones */ 

// This is to enable syntax highlighting for SQL code: 
// https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
require(['notebook/js/codecell'], function(codecell) {
  codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = {'reg':[/^%%sql/]} ;
  Jupyter.notebook.events.one('kernel_ready.Kernel', function(){
  Jupyter.notebook.get_cells().map(function(cell){
      if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;
  });
});


// My plain theme
// This is a good post where I took some ideas to write the following fuction: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
function plainTheme() {
    var input_promp_fields = document.getElementsByClassName("prompt_container");
    var text_render_fields = document.getElementsByClassName("text_cell_render");

    if (input_promp_fields[0].style.visibility == "collapse"){
        action = "visible";
        input_marginLeft = "0px";
        border_top  = "3px";
        prompt_width = "74px";
        padding_top = "0px";
        output_margin = "40px";
    }else{
        action = "collapse";
        input_marginLeft = "74px";
        border_top  = '0px';
        prompt_width = "74px";
        padding_top = "40px";
        output_margin = "40px";
    }

    // Si queremos usar !important debemos hacerlo de esta forma utilizando JQuery:
    // https://makitweb.com/how-to-add-important-to-css-property-with-jquery/
    var text_cell_fields = document.getElementsByClassName("text_cell");
    $(text_cell_fields).ready(function(){
        $('.input_prompt').css({
            'cssText': `width: 40px !important; max-width: ${prompt_width} !important; min-width: ${prompt_width} !important;`
        });
    });

    $(document).ready(function(){
        $(".prompt_container").css(
            'visibility', `${action}`
        );
        
        $(".input").css(
            'padding-left', `${input_marginLeft}`
        );
        
        $(".output_subarea").css(
            'margin-left', `${output_margin}`
        );
                    
        $('.cell').css({
            'cssText': `border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
        });
        
        $(".collapsible_headings_ellipsis").css({
            'cssText': `padding-top:${padding_top} !important; border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
        });

        $(".text_cell_render").css({
            'cssText': `margin-left: -10px;`
        });
    });            
}

Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-Ctrl-Q', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        plainTheme();
    return false;
    }}
);

Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Alt-Ctrl-Q', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        plainTheme();
    return false;
    }}
);


// This could be very usefull. It allows to add text automatically into a cell
// https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064/27
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Ctrl-Shift-J', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        document.body.style.background = 'blue'
        var target = Jupyter.notebook.get_selected_cell()
        var cursor = target.code_mirror.getCursor()
        var before = target.get_pre_cursor()
        var after = target.get_post_cursor()
        target.set_text(before + 'from IPython.core.display import display, HTML; \n\taverrrdisplay(HTML("<style>.container { width:98% !important;}</style>"))' + after)
        cursor.ch += 20 // where to put your cursor
        target.code_mirror.setCursor(cursor)
        return false;
    }}
);


// To get the real value of a css field: https://stackoverflow.com/questions/26074476/document-body-style-backgroundcolor-doesnt-work-with-external-css-style-sheet
// window.getComputedStyle(document.body).backgroundColor
// window.getComputedStyle(document.getElementsByClassName("input_area")[0]).backgroundColor



custom.css
/*  Mis configuraciones  */

.container { width:98% !important; }
/* document.getElementById("notebook-container").style.minWidth = "50%"; */
/* document.getElementById("notebook-container").style.maxWidth = "50%"; */

#notebook-container {
 width:98% !important;
}

.CodeMirror-gutters {
 background-color: transparent !important;
 background: transparent !important;
}

.CodeMirror-linenumber {
 margin-left: -20px !important;
}

.output_subarea {
 margin-left: 40px !important;
}

#toc .fa-fw {
 color: blue !important;
}

#toc .highlight_on_scroll {
 margin-left: -4px !important;
 
}

#toc {
 padding-left: 10px !important;
}

/*  I have also changed the color
/*  #a6e22e   by   #388bfd 
 *  in the entire custom.css
 */

/* I have also chenged some of the properties of the toc directly above in the code: 

#toc-wrapper {
 z-index: 90;
 position: fixed !important;
 display: flex;
 flex-direction: column;
 overflow: hidden;
 padding: 10px;
 padding-top: 40px !important;
 border-style: solid;
 border-width: thin;
 border-right-width: medium !important;
 background-color: #1e1e1e !important;
}
#toc-wrapper.ui-draggable.ui-resizable.sidebar-wrapper {
 border-color: rgba(93,92,82,.25) !important;
}
#toc a,
#navigate_menu a,
.toc {
 color: #f8f8f0 !important;
 font-size: 16pt !important;
}
#toc li > span:hover {
 background-color: rgba(93,92,82,.25) !important;
}
#toc a:hover,
#navigate_menu a:hover,
.toc {
 color: #DAA520 !important;
 font-size: 16pt !important;
}
#toc-wrapper .toc-item-num {
 color: #388bfd !important;
 font-size: 16pt !important;
}
*/



Configurations from the Juniper notebook

from IPython.core.display import display, HTML; 

display(HTML("<style>.container { width:98% !important;}</style>"<))

display(HTML('<style>.prompt.input_prompt{display:none !important;}</style>'))
display(HTML('<style>.prompt.input_prompt{visibility: visible !important;</style>'))
display(HTML('<style>.prompt.input_prompt{margin-left8kmclustering.ipynb 50px}</style>'))
display(HTML('<style>.prompt.input_prompt{visibility: visible !important; width: 0px !important; min-width: 0px !important}</style>'))  

display(HTML('<style>.input_area{margin-left: -50px;}</style>'))
display(HTML('<style>.input{margin-left: -20px;}</style>'))

display(HTML('<style>.output_area{margin-left: 55px}</style>'))

# display(HTML('<style>.cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
# display(HTML('<style>.code_cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))

# display(HTML('<style>.output_wrapper{margin-bottom: 0px !important; margin-top: 0px !important;}</style>'))



Online Jupyter

There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/

I have tried:

https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
Parece bueno, pero tiene opciones que no son gratis


https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
Parece bueno pero no encontré la forma adicionar una TOC


Es el que estoy utilizando ahora



Some remarks


Executing Terminal Commands in Jupyter Notebooks

https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks

If we are in the Notebook, and we want to run a shell command rather than a notebook command we use the ! or %

Try, for example:

%ls 
!pwd

It's the same as if you opened up a terminal and typed it without the !



Creating Presentations in Jupyter Notebook with RevealJS


Some of the most popular Python Data Science Libraries

  • NumPy
  • SciPy
  • Pandas
  • Seaborn
  • SciKit'Learn
  • MatplotLib
  • Plotly
  • PySpartk



NumPy and Pandas


Data Visualization with Python


Natural Language Processing


Dash - Plotly


Scrapy


Using SQL in Jupyter

Connecting to a database in Jupyter


https://pypi.org/project/ipython-sql/

https://stackoverflow.com/questions/454854/no-module-named-mysqldb

https://stackoverflow.com/questions/5178292/pip-install-mysql-python-fails-with-environmenterror-mysql-config-not-found

https://docs.kyso.io/guides/sql-interface-within-jupyterlab

https://www.datacamp.com/community/tutorials/sql-interface-within-jupyterlab

https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic

https://www.sqlshack.com/learn-jupyter-notebooks-for-sql-server/


Verificar las fuentes above. Creo que lo único que tuve que hacer la última vez que lo instalé fue basado en las 3 primeras sources:

pip install ipython-sql

sudo apt install default-libmysqlclient-dev

pip install mysqlclient

sudo apt-get install python3-mysqldb


Luego adding SQL syntax highlighting to Jupyter as describe above in the corrrespoinding source.