Difference between revisions of "Python for Data Science"

From Sinfronteras
Jump to: navigation, search
(DataFrames)
(Keep a python script running on a remote server)
 
(191 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
 +
<br />
 
For a standard Python tutorial go to [[Python]]
 
For a standard Python tutorial go to [[Python]]
 +
 +
 +
<br />
 +
==Courses==
 +
*Udemy - Python for Data Science and Machine Learning Bootcamp
 +
 +
:https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
  
  
Line 11: Line 20:
 
<br />
 
<br />
 
===Installation===
 
===Installation===
https://linuxize.com/post/how-to-install-anaconda-on-ubuntu-18-04/
+
Installation from the official Anaconda Web site: https://docs.anaconda.com/anaconda/install/
  
https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04
 
  
 +
<br />
  
<br />
 
 
===Anaconda comes with a few IDE===
 
===Anaconda comes with a few IDE===
  
Line 35: Line 43:
  
 
<br />
 
<br />
 +
 
==Jupyter==
 
==Jupyter==
 
Jupyter comes with Anaconda.
 
Jupyter comes with Anaconda.
Line 48: Line 57:
  
 
<br />
 
<br />
===Online Jupyter===
+
===Remote connection===
There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/
+
https://jupyter-notebook.readthedocs.io/en/stable/public_server.html
 +
 
 +
 
 +
A**1
  
I have tried:
 
  
*https://cocalc.com/app
+
<syntaxhighlight lang="shell">
 +
(base) adelo@vmi346715:~/.jupyter$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
 +
Generating a RSA private key
 +
......................................+++++
 +
....................................+++++
 +
writing new private key to 'mykey.key'
 +
-----
 +
You are about to be asked to enter information that will be incorporated
 +
into your certificate request.
 +
What you are about to enter is what is called a Distinguished Name or a DN.
 +
There are quite a few fields but you can leave some blank
 +
For some fields there will be a default value,
 +
If you enter '.', the field will be left blank.
 +
-----
 +
Country Name (2 letter code) [AU]:IE
 +
State or Province Name (full name) [Some-State]:Dublin
 +
Locality Name (eg, city) []:Dublin
 +
Organization Name (eg, company) [Internet Widgits Pty Ltd]:.
 +
Organizational Unit Name (eg, section) []:.
 +
Common Name (e.g. server FQDN or YOUR name) []:sinfronteras   
 +
Email Address []:adeloaleman@gmail.com
 +
</syntaxhighlight>
  
::https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
 
::Parece bueno, pero tiene opciones que no son gratis
 
  
 +
<br />
 +
===Share Jupyter Notebook online===
 +
* '''GitHub:'''
 +
: https://docs.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github
 +
: Example: https://github.com/adeloaleman/AmazonLaptopsDashboard/blob/master/DataAnalysis/data_analysis2.ipynb
  
*https://www.kaggle.com/
 
  
::https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
+
* '''Nbviewer''
::Parece bueno pero no encontré la forma adicionar una TOC
+
: https://nbviewer.jupyter.org/
 +
: Example: https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/main/tutorial/06%20-%20Linking%20and%20Interactions.ipynb
  
  
*https://drive.google.com
+
<br />
  
:*https://colab.research.google.com
+
===Customize Jupyter===
::Es el que estoy utilizando ahora
 
  
  
 
<br />
 
<br />
==Courses==
+
====Themes====
 +
https://github.com/dunovank/jupyter-themes
  
*Udemy - Python for Data Science and Machine Learning Bootcamp
+
Ver el tema que muestran en esta página: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab
  
:https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
 
  
 +
jt  -t oceans16    -cellw 98%  -lineh 120  -fs 14  -nfs 14  -dfs 14  -ofs 14
  
<br />
 
==Most popular Python Data Science Libraries===
 
  
*NumPy
+
https://www.kaggle.com/getting-started/97540
*SciPy
+
jt  -t monokai      -cellw 98%  -lineh 120  -fs 14  -nfs 14  -dfs 14  -ofs 14  -f fira  -nf ptsans  -N  -kl  -cursw 2  -cursc r  -T
*Pandas
 
*Seaborn
 
*SciKit'Learn
 
*MatplotLib
 
*Plotly
 
*PySpartk
 
  
  
 
<br />
 
<br />
==NumPy==
 
 
*NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks.
 
 
*Numpy is also incredibly fast, as it has bindings to C libraries. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).
 
  
 +
====Extensions====
 +
This post mention so nice extension and configuration that can be done: https://towardsdatascience.com/bringing-the-best-out-of-jupyter-notebooks-for-data-science-f0871519ca29
  
 
<br />
 
<br />
===Installation===
+
=====Unofficial Jupyter Notebook Extensions=====
It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install.
+
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html
  
 +
<span style="color: green">'''This is very important. There are very nice extensions in this package:'''</span>
  
If you have Anaconda, install NumPy by:
+
* toc2: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html
 +
* Collapsible Headings
 +
* ... etc
  
conda install numpy
+
<br />
<br />If you are not using Anaconda distribution:
+
======Installation======
 +
https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html
  
*
+
<span style="color: red">'''I had some issues to install it. La format indicada por defecto:'''</span>
  
  pip install numpy
+
  pip install jupyter_contrib_nbextensions
 +
jupyter contrib nbextension install --user
  
 +
<span style="color: red">'''A través de la forma anterior no pude instalar el paquete de forma correcta. La instalación no retornó errorres, y la extensión se mostraba en Jupyter-notebook pero no podía activar "enable" las extensiones.'''</span>
  
  
Then, to use it:<syntaxhighlight lang="python3">
+
<span style="color: red">'''Al parecer es un problema con la ubicación de la instalación. Yo estaba usando conda pero conda está presentando problemas. La instalación de los paquestes demora muchísimo y luego el paquete parece no estar disponible.'''</span>
import numpy as np
 
arr = np.arange(0,10)
 
</syntaxhighlight>
 
  
  
===Arrays===
+
<span style="color: red">'''En el siguiente post encontré una solución para instalar nbextension usando pip:'''</span>
{| class="wikitable"
+
https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1127
! colspan="2" rowspan="2" |
 
! colspan="2" rowspan="2" |Method/Operation
 
! rowspan="2" |Description/Comments
 
!Example
 
|-
 
!<syntaxhighlight lang="python3">
 
import numpy as np
 
</syntaxhighlight>
 
|-
 
! rowspan="10" |<h5 style="text-align:left">Methods for creating NumPy Arrays</h5>
 
|<h5 style="text-align:left">From a Python List</h5>
 
| colspan="2" |'''''<code>array()</code>'''''
 
|We can create an array by directly converting a list or list of lists.
 
|<code>my_list = [1,2,3]</code>
 
<code>np.array(my_list)</code>
 
  
 +
pip install --upgrade jupyter_contrib_nbextensions
 +
jupyter contrib nbextension install  --sys-prefix  --symlink
  
<code>my_matrix = [[1,2,3],[4,5,6],[7,8,9]]</code>
+
<span style="color: red">'''«--symlink» creo que lo usé pero no estoy completamente seguro. También realicé el --upgrade pero creo que la diferencia la hicieron las opciones --sys-prefix  --symlink'''</span>
  
<code>np.array(my_matrix)</code>
 
|-
 
| rowspan="9" |<h5 style="text-align:left">From Built-in NumPy Methods</h5>
 
| colspan="2" |'''''<code>arange()</code>'''''
 
|Return evenly spaced values within a given interval.
 
|<code>np.arange(0,10)</code>
 
<code>np.arange(0,11,2)</code>
 
|-
 
| colspan="2" |'''''<code>zeros()</code>'''''
 
|Generate arrays of zeros.
 
|<code>np.zeros(3)</code>
 
<code>np.zeros((5,5))</code>
 
|-
 
| colspan="2" |'''''<code>ones()</code>'''''
 
|Generate arrays of ones.
 
|<code>np.ones(3)</code>
 
<code>np.ones((3,3))</code>
 
|-
 
| colspan="2" |'''''<code>linspace()</code>'''''
 
|Return evenly spaced numbers over a specified interval.
 
|<code>np.linspace(0,10,3)</code>
 
<code>np.linspace(0,10,50)</code>
 
|-
 
| colspan="2" |'''''<code>eye()</code>'''''
 
|Creates an identity matrix.
 
|<code>np.linspace(0,10,50)</code>
 
|-
 
| rowspan="4" |'''''<code>random</code>'''''
 
|'''''<code>rand()</code>'''''
 
|Create an array of the given shape and populate it with random samples from a uniform distribution over <code>[0, 1)</code>.
 
|<syntaxhighlight lang="python3">
 
np.random.rand(2)
 
np.random.rand(5,5)
 
  
  
# Another way to invoke a function:
+
Si no se muestra la '''Nbextensions''' tab (), try to reinstall the https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator
from numpy.random import rand
 
# Then you can call the function directly
 
rand(5,5)
 
</syntaxhighlight><br />
 
|-
 
|'''''<code>randn()</code>'''''
 
|Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform.
 
|<code>np.random.randn(2)</code>
 
<code>np.random.randn(5,5)</code>
 
|-
 
|'''''<code>randint()</code>'''''
 
|Return random integers from <code>low</code> (inclusive) to <code>high</code> (exclusive).
 
|<code>np.random.randint(1,100)</code>
 
<code>np.random.randint(1,100,10)</code>
 
|-
 
|'''<code>seed()</code>'''
 
|sets the random seed of the NumPy pseudo-random number generator.  It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes. See [[wikipedia:Random_seed|s1]] and [https://www.sharpsightlabs.com/blog/numpy-random-seed/ s2]. for explanation.
 
|<code>np.random.seed(101)</code>
 
|-
 
! rowspan="4" |<h5 style="text-align:left">Others Array Attributes and Methods</h5>
 
| rowspan="4" |
 
| colspan="2" |''<code>'''reshape()'''</code>''
 
|Returns an array containing the same data with a new shape.
 
|<code>arr.reshape(5,5)</code>
 
|-
 
| colspan="2" |'''''<code>max()</code>, <code>min()</code>, <code>argmax()</code>, <code>argmin()</code>'''''
 
|Finding max or min values. Or to find their index locations using argmin or argmax.
 
|<code>arr.max()</code>
 
<code>arr.argmax()</code>
 
|-
 
| colspan="2" |''<code>'''shape()'''</code>''
 
|Shape is an attribute that arrays have (not a method).
 
|NO LO ENTENDI.. REVISAR!
 
  
 +
pip install jupyter_nbextensions_configurator
 +
or
 +
conda install -c conda-forge jupyter_nbextensions_configurator
  
<nowiki>#</nowiki>Length of array
 
  
arr_length = arr2d.shape[1]
 
 
<br />
 
<br />
|-
 
| colspan="2" |''<code>'''dtype()'''</code>''
 
|You can also grab the data type of the object in the array.
 
|<code>arr.dtype</code>
 
|-
 
!<nowiki>-</nowiki>
 
!-
 
! colspan="2" |-
 
!-
 
!-
 
|-
 
! rowspan="8" |<h5 style="text-align:left">Indexing and Selection</h5>
 
  
<div style="text-align:left">
+
====CustomJS and CustonCSS files====
*How to select elements or groups of elements from an array.
+
This is a good post: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
*The general format is '''arr_2d[row][col]''' or '''arr_2d[row,col]'''. I recommend usually using the comma notation for clarity.
+
 
</div>
+
Keyboard Shortcut Customization: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Custom%20Keyboard%20Shortcuts.html
|
 
| colspan="2" |
 
| colspan="2" |<div class="mw-collapsible mw-collapsed" style="">
 
'''Creating sample array for the following examples:'''
 
<div class="mw-collapsible-content">
 
<syntaxhighlight lang="python3">
 
import numpy as np
 
arr = np.arange(0,10)
 
# 1D Array:
 
arr = np.arange(0,11)
 
#Show
 
arr
 
Output: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
 
  
# 2D Array
 
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
 
#Show
 
arr_2d
 
Output:
 
array([[ 5, 10, 15],
 
      [20, 25, 30],
 
      [35, 40, 45]])
 
</syntaxhighlight>
 
</div>
 
</div>
 
|-
 
| rowspan="2" |<h5 style="text-align:left">Bracket Indexing and Selection (Slicing)</h5>
 
| colspan="2" |
 
|Note: When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems! To get a copy, need to use the method '''copy()'''. See important note below.
 
|<syntaxhighlight lang="python3">
 
#Get a value at an index
 
arr[8]
 
  
#Get values in a range
+
<br />
arr[1:5]
+
custom.js
 +
<syntaxhighlight lang="js">
 +
/** Mis configuraciones */
  
slice_of_arr = arr[0:6]
+
// This is to enable syntax highlighting for SQL code:
 +
// https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
 +
require(['notebook/js/codecell'], function(codecell) {
 +
  codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = {'reg':[/^%%sql/]} ;
 +
  Jupyter.notebook.events.one('kernel_ready.Kernel', function(){
 +
  Jupyter.notebook.get_cells().map(function(cell){
 +
      if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;
 +
  });
 +
});
  
#2D
 
arr_2d[1]
 
arr_2d[1][0]
 
arr_2d[1,0] # The same that above
 
  
#Shape (2,2) from top right corner
+
// My plain theme
arr_2d[:2,1:]
+
// This is a good post where I took some ideas to write the following fuction: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
#Output:
+
function plainTheme() {
array([[10, 15],
+
    var input_promp_fields = document.getElementsByClassName("prompt_container");
      [25, 30]])
+
    var text_render_fields = document.getElementsByClassName("text_cell_render");
  
#Shape bottom row
+
    if (input_promp_fields[0].style.visibility == "collapse"){
arr_2d[2,:]
+
        action = "visible";
</syntaxhighlight><br />
+
        input_marginLeft = "0px";
|-
+
        border_top  = "3px";
| colspan="2" |
+
        prompt_width = "74px";
| colspan="2" |<div class="mw-collapsible mw-collapsed" style="">
+
        padding_top = "0px";
'''Fancy Indexing''':
+
        output_margin = "40px";
<div class="mw-collapsible-content">
+
    }else{
Fancy indexing allows you to select entire rows or columns out of order.
+
        action = "collapse";
 +
        input_marginLeft = "74px";
 +
        border_top  = '0px';
 +
        prompt_width = "74px";
 +
        padding_top = "40px";
 +
        output_margin = "40px";
 +
    }
  
Example:<syntaxhighlight lang="python3">
+
    // Si queremos usar !important debemos hacerlo de esta forma utilizando JQuery:
# Set up matrix
+
    // https://makitweb.com/how-to-add-important-to-css-property-with-jquery/
arr2d = np.zeros((10,10))
+
    var text_cell_fields = document.getElementsByClassName("text_cell");
 +
    $(text_cell_fields).ready(function(){
 +
        $('.input_prompt').css({
 +
            'cssText': `width: 40px !important; max-width: ${prompt_width} !important; min-width: ${prompt_width} !important;`
 +
        });
 +
    });
  
# Length of array
+
    $(document).ready(function(){
arr_length = arr2d.shape[1]
+
        $(".prompt_container").css(
 +
            'visibility', `${action}`
 +
        );
 +
       
 +
        $(".input").css(
 +
            'padding-left', `${input_marginLeft}`
 +
        );
 +
       
 +
        $(".output_subarea").css(
 +
            'margin-left', `${output_margin}`
 +
        );
 +
                   
 +
        $('.cell').css({
 +
            'cssText': `border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
 +
        });
 +
       
 +
        $(".collapsible_headings_ellipsis").css({
 +
            'cssText': `padding-top:${padding_top} !important; border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
 +
        });
  
# Set up array
+
        $(".text_cell_render").css({
for i in range(arr_length):
+
            'cssText': `margin-left: -10px;`
    arr2d[i] = i
+
        });
      
+
     });           
arr2d
+
}
# Output:
 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
 
      [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
 
      [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
 
      [3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
 
      [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
 
      [5., 5., 5., 5., 5., 5., 5., 5., 5., 5.],
 
      [6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
 
      [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.],
 
      [8., 8., 8., 8., 8., 8., 8., 8., 8., 8.],
 
      [9., 9., 9., 9., 9., 9., 9., 9., 9., 9.]])
 
  
# Fancy indexing allows the following
+
Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-Ctrl-Q', {
arr2d[[6,4,2,7]]
+
    help : '...',
# Output:
+
    help_index : 'zz',
array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
+
    handler : function (event) {
      [4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
+
        plainTheme();
      [2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
+
    return false;
      [7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])
+
    }}
</syntaxhighlight><br />
+
);
</div>
 
</div>
 
|-
 
| rowspan="2" |<h5 style="text-align:left">Broadcasting</h5>
 
  
 +
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Alt-Ctrl-Q', {
 +
    help : '...',
 +
    help_index : 'zz',
 +
    handler : function (event) {
 +
        plainTheme();
 +
    return false;
 +
    }}
 +
);
  
(Setting a value with index range)
 
| colspan="2" rowspan="2" |
 
| rowspan="2" |Setting a value with index range:
 
Numpy arrays differ from a normal Python list because of their ability to broadcast.
 
|arr[0:5]=100<br />'''#'''Show
 
arr
 
  
Output: array([100, 100, 100, 100, 100,  5,  6,  7,  8,  9,  10])
+
// This could be very usefull. It allows to add text automatically into a cell
|-
+
// https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064/27
|'''#'''Setting all the values of an Array
+
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Ctrl-Shift-J', {
arr[:]=99
+
    help : '...',
|-
+
    help_index : 'zz',
|<h5 style="text-align:left">Get a copy of an Array</h5>
+
    handler : function (event) {
| colspan="2" |'''<code>copy''()''</code>'''
+
        document.body.style.background = 'blue'
|Note: When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems! To get a copy, need to use the method '''copy()'''. See important note below.
+
        var target = Jupyter.notebook.get_selected_cell()
|arr_copy = arr.copy()
+
        var cursor = target.code_mirror.getCursor()
|-
+
        var before = target.get_pre_cursor()
|<h5 style="text-align:left">Important notes on Slices</h5>
+
        var after = target.get_post_cursor()
| colspan="2" |
+
        target.set_text(before + 'from IPython.core.display import display, HTML; \n\taverrrdisplay(HTML("<style>.container { width:98% !important;}</style>"))' + after)
| colspan="2" |<div class="mw-collapsible mw-collapsed" style=""><syntaxhighlight lang="python3">
+
        cursor.ch += 20 // where to put your cursor
slice_of_arr = arr[0:6]
+
        target.code_mirror.setCursor(cursor)
#Show slice
+
        return false;
slice_of_arr
+
    }}
Output: array([0, 1, 2, 3, 4, 5])
+
);
  
#Making changes in slice_of_arr
 
slice_of_arr[:]=99
 
#Show slice
 
slice_of_arr
 
Output: array([99, 99, 99, 99, 99, 99])
 
  
#Now note the changes also occur in our original array!
+
// To get the real value of a css field: https://stackoverflow.com/questions/26074476/document-body-style-backgroundcolor-doesnt-work-with-external-css-style-sheet
#Show
+
// window.getComputedStyle(document.body).backgroundColor
arr
+
// window.getComputedStyle(document.getElementsByClassName("input_area")[0]).backgroundColor
Output: array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])
+
</syntaxhighlight>
  
#When we create a sub-array slicing an array (slice_of_arr = arr[0:6]), data is not copied, it's a view of the original array! This avoids memory problems!
 
  
#To get a copy, need to use the method copy()
+
<br />
</syntaxhighlight>
+
custom.css
</div>
+
<syntaxhighlight lang="css">
|-
+
/* Mis configuraciones */
|<h5 style="text-align:left">Using brackets for selection based on comparison operators and booleans</h5>
 
| colspan="2" |
 
| colspan="2" |<div class="mw-collapsible mw-collapsed" style=""><syntaxhighlight lang="python3">
 
arr = np.arange(1,11)
 
arr > 4
 
# Output:
 
array([False, False, False, False, True, True,  True,  True,  True,
 
        True])
 
  
bool_arr = arr>4
+
.container { width:98% !important; }
bool_arr
+
/* document.getElementById("notebook-container").style.minWidth = "50%"; */
# Output:
+
/* document.getElementById("notebook-container").style.maxWidth = "50%"; */
array([False, False, False, False,  True,  True,  True,  True,  True,
 
        True])
 
  
arr[bool_arr]
+
#notebook-container {
# Output:
+
width:98% !important;
array([ 5,  6,  7,  8,  9, 10])
+
}
  
arr[arr>2]
+
.CodeMirror-gutters {
# Output:
+
background-color: transparent !important;
array([ 3, 4,  5,  6,  7,  8,  9, 10])
+
  background: transparent !important;
 +
}
  
x = 2
+
.CodeMirror-linenumber {
arr[arr>x]
+
  margin-left: -20px !important;
# Output:
+
}
array([ 3,  4,  5, 6,  7,  8,  9, 10])
 
</syntaxhighlight>
 
</div>
 
|-
 
!-
 
!-
 
! colspan="2" |-
 
!-
 
!-
 
|-
 
!<h5 style="text-align:left">Arithmetic operations</h5>
 
|
 
| colspan="2" |<code>arr + arr</code>
 
<code>arr - arr</code>
 
  
<code>arr * arr</code>
+
.output_subarea {
 +
margin-left: 40px !important;
 +
}
  
<code>arr/arr</code>
+
#toc .fa-fw {
 +
color: blue !important;
 +
}
  
<code>1/arr</code>
+
#toc .highlight_on_scroll {
 +
margin-left: -4px !important;
 +
 +
}
  
<code>arr**3</code>
+
#toc {
|Warning on division by zero, but not an error!
+
padding-left: 10px !important;
<code>0/0 -> nan</code>
+
}
  
<code>1/0 -> inf</code>
+
/*  I have also changed the color
|<syntaxhighlight lang="python3">
+
/*  #a6e22e  by  #388bfd
import numpy as np
+
*  in the entire custom.css
arr = np.arange(0,10)
+
*/
  
arr + arr
+
/* I have also chenged some of the properties of the toc directly above in the code:  
# Output:
 
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
 
  
arr**3
+
#toc-wrapper {
# Output:
+
z-index: 90;
array([  0,   1,   827, 64, 125, 216, 343, 512, 729])
+
position: fixed !important;
 +
display: flex;
 +
flex-direction: column;
 +
overflow: hidden;
 +
padding: 10px;
 +
padding-top: 40px !important;
 +
border-style: solid;
 +
border-width: thin;
 +
border-right-width: medium !important;
 +
background-color: #1e1e1e !important;
 +
}
 +
#toc-wrapper.ui-draggable.ui-resizable.sidebar-wrapper {
 +
border-color: rgba(93,92,82,.25) !important;
 +
}
 +
#toc a,
 +
#navigate_menu a,
 +
.toc {
 +
color: #f8f8f0 !important;
 +
  font-size: 16pt !important;
 +
}
 +
#toc li > span:hover {
 +
  background-color: rgba(93,92,82,.25) !important;
 +
}
 +
#toc a:hover,
 +
#navigate_menu a:hover,
 +
.toc {
 +
color: #DAA520 !important;
 +
font-size: 16pt !important;
 +
}
 +
#toc-wrapper .toc-item-num {
 +
color: #388bfd !important;
 +
font-size: 16pt !important;
 +
}
 +
*/
 
</syntaxhighlight>
 
</syntaxhighlight>
|-
 
! rowspan="5" |<h5 style="text-align:left">[https://docs.scipy.org/doc/numpy/reference/ufuncs.html Universal Array Functions]</h5>
 
| rowspan="5" |
 
| colspan="2" |<code>np.sqrt(arr)</code>
 
|Taking Square Roots
 
| rowspan="5" |<syntaxhighlight lang="python3">
 
np.sin(arr)
 
# Output:
 
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
 
      -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])
 
</syntaxhighlight>
 
|-
 
| colspan="2" |<code>np.exp(arr)</code>
 
|Calcualting exponential (e^)
 
|-
 
| colspan="2" |<code>np.max(arr)</code>
 
same as <code>arr.max()</code>
 
|Max
 
|-
 
| colspan="2" |<code>np.sin(arr)</code>
 
|Sin
 
|-
 
| colspan="2" |<code>np.log(arr)</code>
 
|Natural logarithm
 
|}
 
  
  
 
<br />
 
<br />
==Pandas==
 
You can think of pandas as an extremely powerful version of Excel, with a lot more features. In this section of the course, you should go through the notebooks in this order:
 
  
 +
====Configurations from the Juniper notebook====
  
<br />
+
<syntaxhighlight lang="python3">
===Series===
+
from IPython.core.display import display, HTML;
A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.
 
  
{| class="wikitable"
+
display(HTML("<style>.container { width:98% !important;}</style>"<))
! rowspan="2" |
 
! rowspan="2" |
 
! rowspan="2" |Method/Operator
 
! rowspan="2" |Description/Comments
 
!Example
 
|-
 
!<syntaxhighlight lang="python3">
 
import pandas as pd
 
</syntaxhighlight>
 
|-
 
! rowspan="3" |<h4 style="text-align:left">Creating Pandas Series</h4>
 
  
 +
display(HTML('<style>.prompt.input_prompt{display:none !important;}</style>'))
 +
display(HTML('<style>.prompt.input_prompt{visibility: visible !important;</style>'))
 +
display(HTML('<style>.prompt.input_prompt{margin-left8kmclustering.ipynb 50px}</style>'))
 +
display(HTML('<style>.prompt.input_prompt{visibility: visible !important; width: 0px !important; min-width: 0px !important}</style>')) 
  
<div style="text-align:left">
+
display(HTML('<style>.input_area{margin-left: -50px;}</style>'))
You can convert a <code>list</code>, <code>numpy array</code>, or <code>dictionary</code> to a Series.
+
display(HTML('<style>.input{margin-left: -20px;}</style>'))
</div>
 
|<h5 style="text-align:left">From a List</h5>
 
|<code>pd.Series(my_list)</code>
 
| colspan="2" rowspan="3" |<syntaxhighlight lang="python3">
 
# Creating some test data:
 
labels = ['a','b','c']
 
my_list = [10,20,30]
 
arr = np.array([10,20,30])
 
d = {'a':10,'b':20,'c':30}
 
  
 +
display(HTML('<style>.output_area{margin-left: 55px}</style>'))
  
pd.Series(data=my_list)
+
# display(HTML('<style>.cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
pd.Series(my_list)
+
# display(HTML('<style>.code_cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
pd.Series(arr)
 
# Output:
 
0    10
 
1    20
 
2    30
 
dtype: int64
 
  
pd.Series(data=my_list,index=labels)
+
# display(HTML('<style>.output_wrapper{margin-bottom: 0px !important; margin-top: 0px !important;}</style>'))
pd.Series(my_list,labels)
 
pd.Series(arr,labels)
 
pd.Series(d)
 
# Output:
 
a    10
 
b    20
 
c    30
 
dtype: int64
 
 
</syntaxhighlight>
 
</syntaxhighlight>
|-
 
|<h5 style="text-align:left">From a NumPy Array</h5>
 
|<code>pd.Series(arr)</code>
 
|-
 
|<h5 style="text-align:left">From a Dectionary</h5>
 
|<code>pd.Series(d)</code>
 
|-
 
!<h4 style="text-align:left">Data in a Series</h4>
 
  
|
 
|
 
| colspan="2" |A pandas Series can hold a variety of object types. Even functions (although unlikely that you will use this)<syntaxhighlight lang="python3">
 
pd.Series(data=labels)
 
# Output:
 
0    a
 
1    b
 
2    c
 
dtype: object
 
  
# Holding «functions» into a Series
+
<br />
# Output:
+
 
pd.Series([sum,print,len])
+
===Online Jupyter===
0      <built-in function sum>
+
There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/
1      <built-in function print>
 
2      <built-in function len>
 
dtype: object
 
</syntaxhighlight>
 
|-
 
!<h4 style="text-align:left">Index in Series</h4>
 
|
 
|
 
| colspan="2" |The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).<syntaxhighlight lang="python3">
 
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])
 
ser1
 
# Output:
 
USA        1
 
Germany    2
 
USSR      3
 
Japan      4
 
dtype: int64
 
  
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])
+
I have tried:
  
ser1['USA']
+
*https://cocalc.com/app
# Output:
 
1
 
  
# Operations are then also done based off of index:
+
::https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
ser1 + ser2
+
::Parece bueno, pero tiene opciones que no son gratis
# Output:
 
Germany    4.0
 
Italy      NaN
 
Japan      8.0
 
USA        2.0
 
USSR      NaN
 
dtype: float64
 
</syntaxhighlight>
 
|}
 
  
  
<br />
+
*https://www.kaggle.com/
  
===DataFrames===
+
::https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!
+
::Parece bueno pero no encontré la forma adicionar una TOC
  
  
<syntaxhighlight lang="python">
+
*https://drive.google.com
import pandas as pd
 
import numpy as np
 
  
from numpy.random import randn
+
:*https://colab.research.google.com
np.random.seed(101)
+
::Es el que estoy utilizando ahora
  
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
 
  
df
+
<br />
# Output:
+
===Some remarks===
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
C  -2.018168    0.740122    0.528813  -0.589001
 
D  0.188695  -0.758872  -0.933237    0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
</syntaxhighlight>
 
  
  
 +
<br />
 +
====Executing Terminal Commands in Jupyter Notebooks====
 +
https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks
  
'''DataFrame Columns are just Series:'''<syntaxhighlight lang="python3">
+
If we are in the Notebook, and we want to run a shell command rather than a notebook command we use the <code>'''!''' or '''%'''</code>
type(df['W'])
 
# Output:
 
pandas.core.series.Series
 
</syntaxhighlight>
 
{| class="wikitable"
 
!
 
!
 
!Method/
 
Operator
 
!Description/Comments
 
! colspan="2" |Example
 
|-
 
! rowspan="5" |<h4 style="text-align:left">Selection and Indexing</h4>
 
  
 +
Try, for example:
 +
%ls
 +
!pwd
  
<div style="text-align:left">
+
It's the same as if you opened up a terminal and typed it without the <code>'''!'''</code>
Let's learn the various
 
  
methods to grab data
 
  
from a DataFrame
+
<br />
</div>
 
  
|Standard systax
+
===[[HTML presentation with Reveal.js#Creating Presentations in Jupyter Notebook with RevealJS|Creating Presentations in Jupyter Notebook with RevealJS]]===
|<code>'''df[<nowiki>''</nowiki>]'''</code>
 
|
 
| colspan="2" rowspan="2" |<syntaxhighlight lang="python3">
 
# Pass a list of column names:
 
df[['W','Z']]
 
  
          W          Z
 
A  2.706850    0.503826
 
B  0.651118    0.605965
 
C  -2.018168  -0.589001
 
D  0.188695    0.955057
 
E  0.190794    0.683509
 
</syntaxhighlight>
 
|-
 
|SQL syntax (NOT
 
RECOMMENDED!)
 
|<code>'''df.W'''</code>
 
|
 
|-
 
|Selecting Rows
 
|'''<code>df.loc[<nowiki>''</nowiki>]</code>'''
 
|
 
| colspan="2" |<syntaxhighlight lang="python3">
 
df.loc['A']
 
# Or select based off of position instead of label :
 
df.iloc[2]
 
# Output:
 
W    2.706850
 
X    0.628133
 
Y    0.907969
 
Z    0.503826
 
Name: A, dtype: float64
 
</syntaxhighlight>
 
|-
 
|Selecting subset of
 
  
rows and columns
+
<br />
|'''<code>df.loc[<nowiki>''</nowiki>,<nowiki>''</nowiki>]</code>'''
 
|
 
| colspan="2" |<syntaxhighlight lang="python3">
 
df.loc['B','Y']
 
# Output:
 
-0.84807698340363147
 
  
df.loc[['A','B'],['W','Y']]
+
==Some of the most popular Python Data Science Libraries==
# Output:
 
          W          Y
 
A  2.706850    0.907969
 
B  0.651118  -0.848077
 
</syntaxhighlight>
 
|-
 
|Conditional Selection
 
|
 
| colspan="3" |An important feature of pandas is conditional selection using bracket notation, very similar to numpy:<syntaxhighlight lang="python3">
 
df
 
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
C  -2.018168    0.740122    0.528813  -0.589001
 
D  0.188695  -0.758872  -0.933237    0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
  
df>0
+
*NumPy
# Output:
+
*SciPy
    W      X      Y      Z
+
*Pandas
A  True    True    True    True
+
*Seaborn
B  True    False  False  True
+
*SciKit'Learn
C  False  True    True    False
+
*MatplotLib
D  True    False  False  True
+
*Plotly
E  True    True    True    True
+
*PySpartk
  
df[df>0]
 
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118    NaN        NaN        0.605965
 
C  NaN        0.740122    0.528813    NaN
 
D  0.188695    NaN        NaN        0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
  
df[df['W']>0]
+
<br />
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
D  0.188695  -0.758872  -0.933237    0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
  
df[df['W']>0]['Y']
+
==[[NumPy and Pandas]]==
# Output:
 
A    0.907969
 
B  -0.848077
 
D  -0.933237
 
E    2.605967
 
Name: Y, dtype: float64
 
  
df[df['W']>0][['Y','X']]
 
# Output:
 
          Y          X
 
A  0.907969    0.628133
 
B  -0.848077  -0.319318
 
D  -0.933237  -0.758872
 
E  2.605967    1.978757
 
  
# For two conditions you can use | and & with parenthesis:
+
<br />
df[(df['W']>0) & (df['Y'] > 1)]
+
==[[Data Visualization with Python]]==
# Output:
 
          W          X          Y          Z
 
E  0.190794    1.978757    2.605967    0.683509
 
</syntaxhighlight><br />
 
|-
 
!<h4 style="text-align:left">Creating a new column</h4>
 
|
 
|
 
|
 
| colspan="2" |<syntaxhighlight lang="python3">
 
df['new'] = df['W'] + df['Y']
 
</syntaxhighlight>
 
|-
 
!<h4 style="text-align:left">Removing Columns</h4>
 
|
 
|'''<code>df.drop()</code>'''
 
| colspan="3" |
 
<div class="mw-collapsible mw-collapsed" style="">
 
<syntaxhighlight lang="python3">
 
df.drop('new',axis=1)
 
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
C  -2.018168    0.740122    0.528813  -0.589001
 
D  0.188695  -0.758872  -0.933237    0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
  
# Not inplace unless specified!
 
df
 
# Output:
 
          W          X          Y          Z        new
 
A  2.706850    0.628133    0.907969    0.503826    3.614819
 
B  0.651118  -0.319318  -0.848077    0.605965  -0.196959
 
C  -2.018168    0.740122    0.528813  -0.589001  -1.489355
 
D  0.188695  -0.758872  -0.933237    0.955057  -0.744542
 
E  0.190794    1.978757    2.605967    0.683509    2.796762
 
  
df.drop('new',axis=1,inplace=True)
+
<br />
df
 
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
C  -2.018168    0.740122    0.528813  -0.589001
 
D  0.188695  -0.758872  -0.933237    0.955057
 
E  0.190794    1.978757    2.605967    0.683509
 
  
 +
==[[Natural Language Processing]]==
  
# Can also drop rows this way:
 
df.drop('E',axis=0)
 
# Output:
 
          W          X          Y          Z
 
A  2.706850    0.628133    0.907969    0.503826
 
B  0.651118  -0.319318  -0.848077    0.605965
 
C  -2.018168    0.740122    0.528813  -0.589001
 
D  0.188695  -0.758872  -0.933237    0.955057
 
</syntaxhighlight>
 
</div>
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|-
 
|
 
|
 
|
 
|
 
|
 
|
 
|}
 
  
 +
<br />
 +
==[[Dash - Plotly]]==
  
  
 +
<br />
 +
==[[Scrapy]]==
  
  
 
<br />
 
<br />
 +
==Using SQL in Jupyter==
 +
Connecting to a database in Jupyter
  
===Missing Data===
 
  
 +
https://pypi.org/project/ipython-sql/
  
<br />
+
https://stackoverflow.com/questions/454854/no-module-named-mysqldb
===GroupBy===
+
 
 +
https://stackoverflow.com/questions/5178292/pip-install-mysql-python-fails-with-environmenterror-mysql-config-not-found
 +
 
 +
https://docs.kyso.io/guides/sql-interface-within-jupyterlab
 +
 
 +
https://www.datacamp.com/community/tutorials/sql-interface-within-jupyterlab
 +
 
 +
https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
  
 +
https://www.sqlshack.com/learn-jupyter-notebooks-for-sql-server/
  
<br />
 
===Merging,Joining,and Concatenating===
 
  
 +
Verificar las fuentes above. Creo que lo único que tuve que hacer la última vez que lo instalé fue basado en las 3 primeras sources:
  
<br />
+
pip install ipython-sql
===Operations===
+
 +
sudo apt install default-libmysqlclient-dev
 +
 +
pip install mysqlclient
 +
 +
sudo apt-get install python3-mysqldb
  
  
<br />
+
Luego adding SQL syntax highlighting to Jupyter as describe above in the corrrespoinding source.
===Data Input and Output===
 
  
  
 
<br />
 
<br />

Latest revision as of 15:47, 11 September 2024


For a standard Python tutorial go to Python



Courses

  • Udemy - Python for Data Science and Machine Learning Bootcamp
https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/



Anaconda

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)

En otras palabras, Anaconda puede ser visto como un paquete (a distribution) que incluye no solo Python (or R) but many libraries that are used in Data Science, as well as its own virtual environment system. It's an "all-in-one" install that is extremely popular in data science and Machine Learning.Creating sample array for the following examples:



Installation

Installation from the official Anaconda Web site: https://docs.anaconda.com/anaconda/install/



Anaconda comes with a few IDE

  • Jupyter Lab
  • Jupyter Notebook
  • Spyder
  • Qtconsole
  • and others



Anaconda Navigator

Anaconda Navigator is a GUI that helps you to easily start important applications and manage the packages in your local Anaconda installation

You can open the Anaconda Navigator from the Terminal:

anaconda-navigator



Jupyter

Jupyter comes with Anaconda.

  • It is a development environment (IDE) where we can write codes; but it also allows us to display images, and write down markdown notes.
  • It is the most popular IDE in data science for exploring and analyzing data.
  • Other famoues IDE for Python are Sublime Text and PyCharm.
  • There is Jupyter Lab and Jupyter Notebook



Remote connection

https://jupyter-notebook.readthedocs.io/en/stable/public_server.html


A**1


(base) adelo@vmi346715:~/.jupyter$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
Generating a RSA private key
......................................+++++
....................................+++++
writing new private key to 'mykey.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:IE	
State or Province Name (full name) [Some-State]:Dublin
Locality Name (eg, city) []:Dublin
Organization Name (eg, company) [Internet Widgits Pty Ltd]:.
Organizational Unit Name (eg, section) []:.
Common Name (e.g. server FQDN or YOUR name) []:sinfronteras    
Email Address []:adeloaleman@gmail.com



Share Jupyter Notebook online

  • GitHub:
https://docs.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github
Example: https://github.com/adeloaleman/AmazonLaptopsDashboard/blob/master/DataAnalysis/data_analysis2.ipynb


  • 'Nbviewer
https://nbviewer.jupyter.org/
Example: https://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/main/tutorial/06%20-%20Linking%20and%20Interactions.ipynb



Customize Jupyter


Themes

https://github.com/dunovank/jupyter-themes

Ver el tema que muestran en esta página: https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab


jt   -t oceans16     -cellw 98%   -lineh 120   -fs 14   -nfs 14   -dfs 14   -ofs 14


https://www.kaggle.com/getting-started/97540

jt   -t monokai      -cellw 98%   -lineh 120   -fs 14   -nfs 14   -dfs 14   -ofs 14   -f fira   -nf ptsans   -N   -kl   -cursw 2   -cursc r   -T



Extensions

This post mention so nice extension and configuration that can be done: https://towardsdatascience.com/bringing-the-best-out-of-jupyter-notebooks-for-data-science-f0871519ca29


Unofficial Jupyter Notebook Extensions

https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html

This is very important. There are very nice extensions in this package:


Installation

https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html

I had some issues to install it. La format indicada por defecto:

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user

A través de la forma anterior no pude instalar el paquete de forma correcta. La instalación no retornó errorres, y la extensión se mostraba en Jupyter-notebook pero no podía activar "enable" las extensiones.


Al parecer es un problema con la ubicación de la instalación. Yo estaba usando conda pero conda está presentando problemas. La instalación de los paquestes demora muchísimo y luego el paquete parece no estar disponible.


En el siguiente post encontré una solución para instalar nbextension usando pip: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1127

pip install --upgrade jupyter_contrib_nbextensions
jupyter contrib nbextension install  --sys-prefix  --symlink

«--symlink» creo que lo usé pero no estoy completamente seguro. También realicé el --upgrade pero creo que la diferencia la hicieron las opciones --sys-prefix --symlink


Si no se muestra la Nbextensions tab (), try to reinstall the https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator

pip install jupyter_nbextensions_configurator

or

conda install -c conda-forge jupyter_nbextensions_configurator



CustomJS and CustonCSS files

This is a good post: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064

Keyboard Shortcut Customization: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Custom%20Keyboard%20Shortcuts.html



custom.js
/** Mis configuraciones */ 

// This is to enable syntax highlighting for SQL code: 
// https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic
require(['notebook/js/codecell'], function(codecell) {
  codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = {'reg':[/^%%sql/]} ;
  Jupyter.notebook.events.one('kernel_ready.Kernel', function(){
  Jupyter.notebook.get_cells().map(function(cell){
      if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;
  });
});


// My plain theme
// This is a good post where I took some ideas to write the following fuction: https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064
function plainTheme() {
    var input_promp_fields = document.getElementsByClassName("prompt_container");
    var text_render_fields = document.getElementsByClassName("text_cell_render");

    if (input_promp_fields[0].style.visibility == "collapse"){
        action = "visible";
        input_marginLeft = "0px";
        border_top  = "3px";
        prompt_width = "74px";
        padding_top = "0px";
        output_margin = "40px";
    }else{
        action = "collapse";
        input_marginLeft = "74px";
        border_top  = '0px';
        prompt_width = "74px";
        padding_top = "40px";
        output_margin = "40px";
    }

    // Si queremos usar !important debemos hacerlo de esta forma utilizando JQuery:
    // https://makitweb.com/how-to-add-important-to-css-property-with-jquery/
    var text_cell_fields = document.getElementsByClassName("text_cell");
    $(text_cell_fields).ready(function(){
        $('.input_prompt').css({
            'cssText': `width: 40px !important; max-width: ${prompt_width} !important; min-width: ${prompt_width} !important;`
        });
    });

    $(document).ready(function(){
        $(".prompt_container").css(
            'visibility', `${action}`
        );
        
        $(".input").css(
            'padding-left', `${input_marginLeft}`
        );
        
        $(".output_subarea").css(
            'margin-left', `${output_margin}`
        );
                    
        $('.cell').css({
            'cssText': `border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
        });
        
        $(".collapsible_headings_ellipsis").css({
            'cssText': `padding-top:${padding_top} !important; border-top-width: ${border_top} !important; border-bottom-width: ${border_top} !important;`
        });

        $(".text_cell_render").css({
            'cssText': `margin-left: -10px;`
        });
    });            
}

Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-Ctrl-Q', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        plainTheme();
    return false;
    }}
);

Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Alt-Ctrl-Q', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        plainTheme();
    return false;
    }}
);


// This could be very usefull. It allows to add text automatically into a cell
// https://forums.fast.ai/t/jupyter-notebook-enhancements-tips-and-tricks/17064/27
Jupyter.keyboard_manager.edit_shortcuts.add_shortcut('Ctrl-Shift-J', {
    help : '...',
    help_index : 'zz',
    handler : function (event) {
        document.body.style.background = 'blue'
        var target = Jupyter.notebook.get_selected_cell()
        var cursor = target.code_mirror.getCursor()
        var before = target.get_pre_cursor()
        var after = target.get_post_cursor()
        target.set_text(before + 'from IPython.core.display import display, HTML; \n\taverrrdisplay(HTML("<style>.container { width:98% !important;}</style>"))' + after)
        cursor.ch += 20 // where to put your cursor
        target.code_mirror.setCursor(cursor)
        return false;
    }}
);


// To get the real value of a css field: https://stackoverflow.com/questions/26074476/document-body-style-backgroundcolor-doesnt-work-with-external-css-style-sheet
// window.getComputedStyle(document.body).backgroundColor
// window.getComputedStyle(document.getElementsByClassName("input_area")[0]).backgroundColor



custom.css
/*  Mis configuraciones  */

.container { width:98% !important; }
/* document.getElementById("notebook-container").style.minWidth = "50%"; */
/* document.getElementById("notebook-container").style.maxWidth = "50%"; */

#notebook-container {
 width:98% !important;
}

.CodeMirror-gutters {
 background-color: transparent !important;
 background: transparent !important;
}

.CodeMirror-linenumber {
 margin-left: -20px !important;
}

.output_subarea {
 margin-left: 40px !important;
}

#toc .fa-fw {
 color: blue !important;
}

#toc .highlight_on_scroll {
 margin-left: -4px !important;
 
}

#toc {
 padding-left: 10px !important;
}

/*  I have also changed the color
/*  #a6e22e   by   #388bfd 
 *  in the entire custom.css
 */

/* I have also chenged some of the properties of the toc directly above in the code: 

#toc-wrapper {
 z-index: 90;
 position: fixed !important;
 display: flex;
 flex-direction: column;
 overflow: hidden;
 padding: 10px;
 padding-top: 40px !important;
 border-style: solid;
 border-width: thin;
 border-right-width: medium !important;
 background-color: #1e1e1e !important;
}
#toc-wrapper.ui-draggable.ui-resizable.sidebar-wrapper {
 border-color: rgba(93,92,82,.25) !important;
}
#toc a,
#navigate_menu a,
.toc {
 color: #f8f8f0 !important;
 font-size: 16pt !important;
}
#toc li > span:hover {
 background-color: rgba(93,92,82,.25) !important;
}
#toc a:hover,
#navigate_menu a:hover,
.toc {
 color: #DAA520 !important;
 font-size: 16pt !important;
}
#toc-wrapper .toc-item-num {
 color: #388bfd !important;
 font-size: 16pt !important;
}
*/



Configurations from the Juniper notebook

from IPython.core.display import display, HTML; 

display(HTML("<style>.container { width:98% !important;}</style>"<))

display(HTML('<style>.prompt.input_prompt{display:none !important;}</style>'))
display(HTML('<style>.prompt.input_prompt{visibility: visible !important;</style>'))
display(HTML('<style>.prompt.input_prompt{margin-left8kmclustering.ipynb 50px}</style>'))
display(HTML('<style>.prompt.input_prompt{visibility: visible !important; width: 0px !important; min-width: 0px !important}</style>'))  

display(HTML('<style>.input_area{margin-left: -50px;}</style>'))
display(HTML('<style>.input{margin-left: -20px;}</style>'))

display(HTML('<style>.output_area{margin-left: 55px}</style>'))

# display(HTML('<style>.cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))
# display(HTML('<style>.code_cell{margin-bottom: -5px !important; margin-top: -5px !important;}</style>'))

# display(HTML('<style>.output_wrapper{margin-bottom: 0px !important; margin-top: 0px !important;}</style>'))



Online Jupyter

There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/

I have tried:

https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
Parece bueno, pero tiene opciones que no son gratis


https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
Parece bueno pero no encontré la forma adicionar una TOC


Es el que estoy utilizando ahora



Some remarks


Executing Terminal Commands in Jupyter Notebooks

https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks

If we are in the Notebook, and we want to run a shell command rather than a notebook command we use the ! or %

Try, for example:

%ls 
!pwd

It's the same as if you opened up a terminal and typed it without the !



Creating Presentations in Jupyter Notebook with RevealJS


Some of the most popular Python Data Science Libraries

  • NumPy
  • SciPy
  • Pandas
  • Seaborn
  • SciKit'Learn
  • MatplotLib
  • Plotly
  • PySpartk



NumPy and Pandas


Data Visualization with Python


Natural Language Processing


Dash - Plotly


Scrapy


Using SQL in Jupyter

Connecting to a database in Jupyter


https://pypi.org/project/ipython-sql/

https://stackoverflow.com/questions/454854/no-module-named-mysqldb

https://stackoverflow.com/questions/5178292/pip-install-mysql-python-fails-with-environmenterror-mysql-config-not-found

https://docs.kyso.io/guides/sql-interface-within-jupyterlab

https://www.datacamp.com/community/tutorials/sql-interface-within-jupyterlab

https://stackoverflow.com/questions/43641362/adding-syntax-highlighting-to-jupyter-notebook-cell-magic

https://www.sqlshack.com/learn-jupyter-notebooks-for-sql-server/


Verificar las fuentes above. Creo que lo único que tuve que hacer la última vez que lo instalé fue basado en las 3 primeras sources:

pip install ipython-sql

sudo apt install default-libmysqlclient-dev

pip install mysqlclient

sudo apt-get install python3-mysqldb


Luego adding SQL syntax highlighting to Jupyter as describe above in the corrrespoinding source.