Difference between revisions of "Data Science"

From Sinfronteras
Jump to: navigation, search
(Installing R Packages from CRAN)
(Installing R Packages from CRAN)
Line 203: Line 203:
  
 
  > install.packages('RMySQL')
 
  > install.packages('RMySQL')
 +
 +
=====Package manager=====
 +
Display packages currently installed in your computer:
 +
> installed.packages()
 +
 +
This produces a long output with each line containing a package, its version information, the packages it depends, and so on.
 +
 +
 +
A more user-friendly, although less complete, list of the installed packages can be obtained by issuing:
 +
> library()
 +
 +
 +
The following command can be very useful as it allows you to check whether there are newer versions of your installed packages at CRAN:
 +
> old.packages()
 +
 +
 +
Moreover, you can use the following command to update all your installed packages:
 +
> update.packages()
  
 
==Data Mining with R - Luis Torgo==
 
==Data Mining with R - Luis Torgo==

Revision as of 16:37, 27 October 2018

Social Media Sentiment Analysis

https://www.dezyre.com/article/top-10-machine-learning-projects-for-beginners/397

https://elitedatascience.com/machine-learning-projects-for-beginners#social-media

https://en.wikipedia.org/wiki/Sentiment_analysis

https://en.wikipedia.org/wiki/Social_media_mining

Remote development

Eclipse - Connect to a remote file system

https://us.informatiweb.net/tutorials/it/6-web/148--eclipse-connect-to-a-remote-file-system.html

Mount a remote filesystem in your local machine

https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

https://stackoverflow.com/questions/32747819/remote-java-development-using-intellij-or-eclipse

https://serverfault.com/questions/306796/sshfs-problem-when-losing-connection

https://askubuntu.com/questions/358906/sshfs-messes-up-everything-if-i-lose-connection

https://askubuntu.com/questions/716612/sshfs-auto-reconnect

root@sinfronteras.ws: /home/adelo/1-system/3-cloud
sshfs -o reconnect,ServerAliveInterval=5,ServerAliveCountMax=3 root@sinfronteras.ws: /home/adelo/1-system/3-cloud
sshfs -o allow_other root@sinfronteras.ws: /home/adelo/1-system/3-cloud


faster way to mount a remote file system than sshfs: https://superuser.com/questions/344255/faster-way-to-mount-a-remote-file-system-than-sshfs

Git and GitHub

https://github.com/

Installing Git

https://www.digitalocean.com/community/tutorials/how-to-install-git-on-ubuntu-18-04

sudo apt install git

Configuring GitHub

https://www.howtoforge.com/tutorial/install-git-and-github-on-ubuntu/

We need to set up the configuration details of the GitHub user. To do this use the following two commands by replacing "user_name" with your GitHub username and replacing "email_id" with your email-id you used to create your GitHub account.

git config --global user.name "user_name"
git config --global user.email "email_id"
git config --global user.name "adeloaleman"
git config --global user.email "adeloaleman@gmail.com"

Creating a local repository

git init /home/adelo/1-system/1-disco_local/1-mis_archivos/1-pe/1-ciencia/1-computacion/1-programacion/GitHubLocalRepository

Creating a README file to describe the repository

Now create a README file and enter some text like "this is a git setup on Linux". The README file is generally used to describe what the repository contains or what the project is all about. Example:

vi README

This is Adelo's git repo

Adding repository files to an index

This is an important step. Here we add all the things that need to be pushed onto the website into an index. These things might be the text files or programs that you might add for the first time into the repository or it could be adding a file that already exists but with some changes (a newer version/updated version).

Here we already have the README file. So, let's create another file which contains a simple C program and call it sample.c. The contents of it will be:

vi sample.c
#include<stdio.h>
int main()
{
     printf("hello world");
     return 0;
}

So, now that we have 2 files:

README and sample.c

add it to the index by using the following 2 commands:

git add README
git add smaple.c

Note that the "git add" command can be used to add any number of files and folders to the index. Here, when I say index, what I am referring to is a buffer like space that stores the files/folders that have to be added into the Git repository.

Committing changes made to the index

Once all the files are added, we can commit it. This means that we have finalized what additions and/or changes have to be made and they are now ready to be uploaded to our repository. Use the command:

git commit -m "some_message"

"some_message" in the above command can be any simple message like "my first commit" or "edit in readme", etc.

Creating a repository on GitHub

Create a repository on GitHub. Notice that the name of the repository should be the same as the repository's on the local system. In this case, it will be "Mytest". To do this login to your account on https://github.com. Then click on the "plus(+)" symbol at the top right corner of the page and select "create new repository". Fill the details as shown in the image below and click on "create repository" button.

Once this is created, we can push the contents of the local repository onto the GitHub repository in your profile. Connect to the repository on GitHub using the command:

git remote add origin https://github.com/adeloaleman/GitHubLocalRepository

Pushing files in local repository to GitHub repository

The final step is to push the local repository contents into the remote host repository (GitHub), by using the command:

git push origin master

GUI Clients

Git comes with built-in GUI tools for committing (git-gui) and browsing (gitk), but there are several third-party tools for users looking for platform-specific experience.

https://desktop.github.com/

Parece que la aplicación oficial GitHub Desktop no está disponible para Ubuntu. Entonces hay otras aplicaciones similares disponibles para Linux: https://git-scm.com/download/gui/linux

Para Linux existe, por ejemplo: https://www.gitkraken.com/

Anaconda

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)

Installation

https://www.anaconda.com/download/#linux

https://linuxize.com/post/how-to-install-anaconda-on-ubuntu-18-04/

https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04

Jupyter Notebook

https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook

Cursos

eu.udacity.com

https://classroom.udacity.com/courses/ud120

www.coursera.org

https://www.coursera.org/learn/machine-learning/home/welcome

Otros

https://www.udemy.com/machine-learning-course-with-python/

https://stackoverflow.com/questions/19181999/how-to-create-a-keyboard-shortcut-for-sublimerepl

R programming language

The R Project for Statistical Computing: https://www.r-project.org/

R is an open-source programming language that specializes in statistical computing and graphics. Supported by the R Foundation for Statistical Computing, it is widely used for developing statistical software and performing data analysis.

Installing R on Ubuntu 18.04

https://www.digitalocean.com/community/tutorials/how-to-install-r-on-ubuntu-18-04

Because R is a fast-moving project, the latest stable version isn’t always available from Ubuntu’s repositories, so we’ll start by adding the external repository maintained by CRAN.

Let’s first add the relevant GPG key:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

Once we have the trusted key, we can add the repository: (Note that if you’re not using 18.04, you can find the relevant repository from the R Project Ubuntu list, named for each release)

sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/'
sudo apt update

At this point, we're ready to install R with the following command:

sudo apt install r-base

This confirms that we've successfully installed R and entered its interactive shell:

R
sudo -i R

Installing R Packages from CRAN

Part of R’s strength is its available abundance of add-on packages. For demonstration purposes, we'll install txtplot, a library that outputs ASCII graphs that include scatterplot, line plot, density plot, acf and bar charts. We'll start R as root so that the libraries will be available to all users automatically:

sudo -i R
> install.packages('txtplot')

When the installation is complete, we can load txtplot:

> library('txtplot')

If there are no error messages, the library has successfully loaded. Let’s put it in action now with an example which demonstrates a basic plotting function with axis labels. The example data, supplied by R's datasets package, contains the speed of cars and the distance required to stop based on data from the 1920s:

> txtplot(cars[,1], cars[,2], xlab = 'speed', ylab = 'distance')

      +----+-----------+------------+-----------+-----------+--+
  120 +                                                   *    +
      |                                                        |
d 100 +                                                   *    +
i     |                                    *                *  |
s  80 +                          *         *                   +
t     |                                       * *    *    *    |
a  60 +                          *  *      *    *      *       +
n     |                        *         * *  * *              |
c  40 +                *       * *    *  *    * *              +
e     |         *      *  * *  * *  *                          |
   20 +           *    *  * *       *                          +
      |  *      *    *                                         |
    0 +----+-----------+------------+-----------+-----------+--+
           5          10           15          20          25   
                                speed

If you are interested to learn more about txtplot, use help(txtplot) from within the R interpreter.

Package that provides functions to connect to MySQL databases

This package name is RMySQL. You just need to type the following command at R prompt:

> install.packages('RMySQL')
Package manager

Display packages currently installed in your computer:

> installed.packages()

This produces a long output with each line containing a package, its version information, the packages it depends, and so on.


A more user-friendly, although less complete, list of the installed packages can be obtained by issuing:

> library()


The following command can be very useful as it allows you to check whether there are newer versions of your installed packages at CRAN:

> old.packages()


Moreover, you can use the following command to update all your installed packages:

> update.packages()

Data Mining with R - Luis Torgo

http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/

The book is accompanied by a set of freely available R source files that can be obtained at the book's Web site. These files include all the code used in the case studies. They facilitate the "do-it-yourself" approach followed in this book. All data used in the case studies is available at the book's Web site as well. Moreover, we have created an R package called DMwR that contains several functions used in the book as well as the datasets already in R format. You should install and load this package to follow the code in the book (details on how to do this are given in the first chapter).

Installing the DMwR package

Chapter 3 - Predicting Stock Market Returns

We will address some of the difficulties of incorporating data mining tools and techniques into a concrete business problem. The spe- cific domain used to illustrate these problems is that of automatic «stock trading systems» (sistemas de comercio de acciones). We will address the task of building a stock trading system based on prediction models obtained with daily stock quotes data. Several models will be tried with the goal of predicting the future returns of the S&P 500 market index (The Standard & Poor's 500, often abbreviated as the S&P 500, or just the S&P, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ). These predictions will be used together with a trading strategy to reach a decision regarding the market orders to generate.

This chapter addresses several new data mining issues, among which are

  • How to use R to analyze data stored in a database,
  • How to handle prediction problems with a time ordering among data observations (also known as time series), and
  • An example of the difficulties of translating model predictions into decisions and actions in real-world applications.

The Available Data

In our case study we will concentrate on trading the S&P 500 market index. Daily data concerning the quotes of this security are freely available in many places, for example, the Yahoo finance site.

The data we will use is available in the book package: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/datasets3.html

S_and_P_500_market_index_data