R tutorial
Contents
- 1 Data types
- 2 Arithmetic Operators
- 3 Logical Operators
- 4 Functions
- 5 Subseting Vectors, Matrices and Data Frames using the subset function
- 6 The R Graph Gallery
- 7 Embedding R In A Website
Data types
Checking the type of a data
- We can check the type of a variable with the class function:
x <- 28
class(x)
y <- "R is Fantastic"
class(y)
z <- TRUE
class(z)
To add a value to the variable, use <- or =
Numeric
- 4 is a Integers. In R this data is called numerics.
- 4.5 is a Decimal value. They also are called numerics in R.
Character - String
- The value inside " " or ' ' are text (string). In R this data is called character
Logical - Boolean
- TRUE or FALSE is a Boolean value, which is called logical in R.
Vector - One-dimensional array
A vector is a one-dimensional array. We can create a vector with all the basic data type we learnt before. The simplest way to build a vector in R, is to use the c command.
vec_num <- c(1, 10, 49)
vec_chr <- c("a", "b", "c")
vec_bool <- c(TRUE, FALSE, TRUE)
We can do arithmetic calculations on vectors:
vect_1 <- c(1, 3, 5)
vect_2 <- c(2, 4, 6)
sum_vect <- vect_1 + vect_2
We can use the [1:5] command to extract the value 1 to 5:
slice_vector <- c(1,2,3,4,5,6,7,8,9,10)
slice_vector[1:5]
We can write c(1:10) to create a vector of value from one to ten:
c(1:10)
Matrix - N-dimensional array
Note: It is possible to create more than two dimensions arrays with R.
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow = TRUE:
matrix_a <-matrix(1:10, byrow = TRUE, nrow = 5)
# Print dimension of the matrix with dim()
dim(matrix_a)
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow = FALSE
matrix_b <-matrix(1:10, byrow = FALSE, nrow = 5)
Note: Using command matrix_b <-matrix(1:10, byrow = FALSE, ncol = 2) will have same effect as above.
You can also create a 4x3 matrix using ncol. R will create 3 columns and fill the row from top to bottom. Check an example:
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow = FALSE
matrix_b <-matrix(1:10, byrow = FALSE, nrow = 5)
Output:
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
You can add a column to a matrix with the cbind() command. cbind() means column binding. cbind()can concatenate as many matrix or columns as specified. For example, our previous example created a 5x2 matrix. We concatenate a third column and verify the dimension is 5x3
# concatenate c(1:5) to the matrix_a
matrix_a1 <- cbind(matrix_a, c(1:5))
# Check the dimension
dim(matrix_a1)
# Output:
[1] 5 3
matrix_a1
# Output
# [,1] [,2] [,3]
# [1,] 1 2 1
# [2,] 3 4 2
# [3,] 5 6 3
# [4,] 7 8 4
# [5,] 9 10 5
We can also add more than one column. Let's see the next sequence of number to the matrix_a2 matrix. The dimension of the new matrix will be 4x6 with number from 1 to 24.
matrix_a2 <-matrix(13:24, byrow = FALSE, ncol = 3)
# Output:
# [,1] [,2] [,3]
# [1,] 13 17 21
# [2,] 14 18 22
# [3,] 15 19 23
# [4,] 16 20 24
matrix_c <-matrix(1:12, byrow = FALSE, ncol = 3)
matrix_d <- cbind(matrix_a2, matrix_c)
dim(matrix_d)
# Output:
# [1] 4 6
NOTE: The number of rows of matrices should be equal for cbind work
cbind() concatenate columns, rbind() appends rows.
Let's add one row to our matrix_c matrix and verify the dimension is 6x3
matrix_c <-matrix(1:12, byrow = FALSE, ncol = 3)
# Create a vector of 3 columns
add_row <- c(1:3)
# Append to the matrix
matrix_c <- rbind(matrix_b, add_row)
# Check the dimension
dim(matrix_c)
# Output:
# [1] 6 3
Factor
Factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables.
In a dataset, we can distinguish two types of variables: categorical and continuous:
- In a categorical variable, the value is limited and usually based on a particular finite group. For example, a categorical variable can be countries, year, gender, occupation.
- A continuous variable, however, can take any values, from integer to decimal. For example, we can have the revenue, price of a share, etc..
Categorical variables
R stores categorical variables into a factor. Let's check the code below to convert a character variable into a factor variable. Text Rojo: Characters are not supported in machine learning algorithm, and the only way is to convert a string to an integer. Let's create a factor data frame.
# Create gender vector
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
class(gender_vector)
# Convert gender_vector to a factor
factor_gender_vector <-factor(gender_vector)
class(factor_gender_vector)
# Output:
# [1] "character"
# [1] "factor"
It is important to transform a string into factor when we perform Machine Learning task.
A categorical variable can be divided into nominal categorical variable and ordinal categorical variable:
Nominal categorical variable
A categorical variable has several values but the order does not matter. For instance, male or female categorical variable do not have ordering.
# Create a color vector
color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
# Convert the vector to factor
factor_color <- factor(color_vector)
factor_color
# Output:
# [1] blue red green white black yellow
# Levels: black blue green red white yellow
From the factor_color, we can't tell any order.
Ordinal categorical variable
Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.
We can use summary to count the values for each factor.
# Create Ordinal categorical vector
day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
# Print the new variable
factor_day
Output:
## [1] evening morning afternoon midday
midnight evening
## Levels: morning < midday < afternoon < evening < midnight
# Append the line to above code
# Count the number of occurence of each level
summary(factor_day)
Output:
## morning midday afternoon evening midnight
## 1 1 1 2 1
R ordered the level from 'morning' to 'midnight' as specified in the levels parenthesis.
Continuous variables
Continuous class variables are the default value in R. They are stored as numeric or integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers information on different types of car. We can import it by using mtcars and check the class of the variable mpg, mile per gallon. It returns a numeric value, indicating a continuous variable.
dataset <- mtcars
class(dataset)
#Output
# [1] "numeric"
Data Frame
A data frame is a list of vectors which are of equal length. A matrix contains only one type of data, while a data frame accepts different data types (numeric, character, factor, etc.).
When we print a data frame in R, the result is shown as a Table. If we create it out of vectors, every column will be compose of each vector and usually the name of the column will be the name of the corresponding vector.
Create a data frame
We can create a data frame with the data.frame() function:
data.frame(data, stringsAsFactors = TRUE)
- data can be a matrix to convert to a data frame or a collection of variables (vector for example) to join.
- stringsAsFactors: Convert string to factor by default
- By default, data.frame() returns string variables as a factor.
We can create our first data frame by combining four vectors of same length:
a <- c(10, 20, 30, 40) b <- c('book', 'pen', 'textbook', 'pencil_case') c <- c(TRUE, FALSE, TRUE, FALSE) d <- c(2.5, 8, 10, 7) # Join the vectors to create a data frame df <- data.frame(a,b,c,d) df ## Output: ## a b c d ## 1 1 book TRUE 2.5 ## 2 2 pen TRUE 8.0 ## 3 3 textbook TRUE 10.0 ## 4 4 pencil_case FALSE 7.0
We can see the column headers have the same name as the variables. We can change the column name with the names() function. Check the example below:# Name the data frame names(df) <- c('ID', 'items', 'store', 'price') df ## Output: ## ID items store price ## 1 10 book TRUE 2.5 ## 2 20 pen FALSE 8.0 ## 3 30 textbook TRUE 10.0 ## 4 40 pencil_case FALSE 7.0 # Print the structure str(df) ## Output: ## 'data.frame': 4 obs. of 4 variables: ## $ ID : num 10 20 30 40 ## $ items: Factor w/ 4 levels "book","pen","pencil_case",..: 1 2 4 3 ## $ store: logi TRUE FALSE TRUE FALSE ## $ price: num 2.5 8 10 7
Tibble
A tibble is a modern class of data frame within R, available in the dplyr and tibble packages, that has a convenient print method, will not convert strings to factors, and does not use row names.
tibble() function:
We can also create a data frame using the tibble() function from library(dplyr)
a <- c(10, 20, 30, 40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE, FALSE, TRUE, FALSE)
d <- c(2.5, 8, 10, 7)
library(dplyr)
tible <- tibble(a,b,c,d)
tible
## Output:
## A tibble: 4 x 4
# a b c d
# <dbl> <chr> <lgl> <dbl>
#1 10 book TRUE 2.5
#2 20 pen FALSE 8
#3 30 textbook TRUE 10
#4 40 pencil_case FALSE 7
Slice Data Frame
It is possible to SLICE values of a Data Frame. We select the rows and columns to return into bracket precede by the name of the data frame.
A data frame is composed of rows and columns, df[A, B]. A represents the rows and B the columns. We can slice either by specifying the rows and/or columns.
In the following picture, the left part represents the rows, and the right part is the columns. Note that the symbol : means to. For instance, 1:3 intends to select values from 1 to 3.
In below diagram we display how to access different selection of the data frame:
- The yellow arrow selects the row 1 in column 2
- The green arrow selects the rows 1 to 2
- The red arrow selects the column 1
- The blue arrow selects the rows 1 to 3 and columns 3 to 4
Note that, if we let the left part blank, R will select all the rows. By analogy, if we let the right part blank, R will select all the columns.
We can run the code in the console:
## Select row 1 in column 2
df[1,2]
Output:
## [1] book
## Levels: book pen pencil_case textbook
## Select Rows 1 to 2
df[1:2,]
Output:
## ID items store price
## 1 10 book TRUE 2.5
## 2 20 pen FALSE 8.0
## Select Columns 1
df[,1]
Output:
## [1] 10 20 30 40
## Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]
Output:
## store price
## 1 TRUE 2.5
## 2 FALSE 8.0
## 3 TRUE 10.0
It is also possible to select the columns with their names. For instance, the code below extracts two columns: ID and store.
# Slice with columns name
df[, c('ID', 'store')]
Output:
## ID store
## 1 10 TRUE
## 2 20 FALSE
## 3 30 TRUE
## 4 40 FALSE
Append a Column to Data Frame
You can also append a column to a Data Frame. You need to use the symbol $ to append a new variable.
# Create a new vector
quantity <- c(10, 35, 40, 5)
# Add `quantity` to the `df` data frame
df$quantity <- quantity
df
Output:
## ID items store price quantity
## 1 10 book TRUE 2.5 10
## 2 20 pen FALSE 8.0 35
## 3 30 textbook TRUE 10.0 40
## 4 40 pencil_case FALSE 7.0 5
Note: The number of elements in the vector has to be equal to the no of elements in data frame. Executing the following statement:
quantity <- c(10, 35, 40)
# Add `quantity` to the `df` data frame
df$quantity <- quantity
Gives error:
Error in `$<-.data.frame`(`*tmp*`, quantity, value = c(10, 35, 40))
replacement has 3 rows, data has 4
Select a Column of a Data frame
Sometimes, we need to store a column of a data frame for future use or perform operation on a column. We can use the $ sign to select the column from a data frame:
# Select the column ID
df$ID
Output:
## [1] 1 2 3 4
Subset a Data frame
In the previous section, we selected an entire column without condition. It is possible to subset based on whether or not a certain condition was true.
We use the subset() function:
subset(x, condition)
Arguments:
* x: data frame used to perform the subset
* condition: define the conditional statement
We want to return only the items with price above 10, we can do:
# Select price above 5
subset(df, subset = price > 5)
Output:
ID items store price
2 20 pen FALSE 8
3 30 textbook TRUE 10
4 40 pencil_case FALSE 7
Built-in a Data frame
Before to create our own data frame, we can have a look at the R data set available online. The prison dataset is a 714x5 dimension. We can get a quick look at the bottom of the data frame with tail() function. By analogy, head() displays the top of the data frame. You can specify the number of rows shown with head (df, 5). We will learn more about the function read.csv() in future tutorial.
# Print the head of the data
PATH<-'https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/wooldridge/prison.csv'
df <- read.csv(PATH)[1:5]
head(df, 5)
## Output:
## X state year govelec black
## 1 1 1 80 0 0.2560
## 2 2 1 81 0 0.2557
## 3 3 1 82 1 0.2554
## 4 4 1 83 0 0.2551
## 5 5 1 84 0 0.2548
List
A list is a great tool to store many kinds of object in the order expected. We can include matrices, vectors data frames or lists. We can imagine a list as a bag in which we want to put many different items. When we need to use an item, we open the bag and use it. A list is similar; we can store a collection of objects and use them when we need them.
- Step 1: Create a Vector:
# Vector with numeric from 1 up to 5 vect <- 1:5
- Step 2: Create a Matrices:
# A 2x 5 matrix mat <- matrix(1:9, ncol = 5) dim(mat) Output: ## [1] 2 5
- Step 3: Create Data Frame:
# select the 10th row of the built-in R data set EuStockMarkets df <- EuStockMarkets[1:10,]
- Step 4: Create a List:
Now, we can put the three object into a list.
# Construct list with these vec, mat, and df: my_list <- list(vect, mat, df) my_list Output: ## [[1]] ## [1] 1 2 3 4 5 ## [[2]] ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 3 5 7 9 ## [2,] 2 4 6 8 1 ## [[3]] ## DAX SMI CAC FTSE ## [1,] 1628.75 1678.1 1772.8 2443.6 ## [2,] 1613.63 1688.5 1750.5 2460.2 ## [3,] 1606.51 1678.6 1718.0 2448.2 ## [4,] 1621.04 1684.1 1708.1 2470.4 ## [5,] 1618.16 1686.6 1723.1 2484.7 ## [6,] 1610.61 1671.6 1714.3 2466.8 ## [7,] 1630.75 1682.9 1734.5 2487.9 ## [8,] 1640.17 1703.6 1757.4 2508.4 ## [9,] 1635.47 1697.5 1754.0 2510.5 ## [10,] 1645.89 1716.3 1754.3 2497.4
Select elements from list
After we built our list, we can access it quite easily. We need to use the index to select an element in a list. The value inside the double square bracket represents the position of the item in a list we want to extract. For instance, we pass 2 inside the parenthesis, R returns the second element listed. Let's try to select the second items of the list named my_list, we use my_list2:
# Print second element of the list
my_list[[2]]
## Output:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 1
Arithmetic Operators
- +, -, *, /
- Exponentiation: ^ or **
- Modulo: %%
(5+5)/2
Modulo:
28%%6
Logical Operators
- < , > , >= , <=
- == : Exactly equal to
- != : Not equal to
- !x : Not x
- x : y
- x & y : x AND y
- isTRUE(x) : Test if x is TRUE
The logical statements in R are wrapped inside the []. We can add many conditional statements as we like but we need to include them in a parenthesis. We can follow this structure to create a conditional statement:
variable_name[(conditional_statement)]
The logical statements in R are wrapped inside the []. We can add many conditional statements as we like but we need to include them in a parenthesis. We can follow this structure to create a conditional statement:
variable_name[(conditional_statement)]
# Create a vector from 1 to 10
logical_vector <- c(1:10)
logical_vector>5
Output
## [1]FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
In the example below, we want to extract the values that only meet the condition 'is strictly superior to five':
# Print value strictly above 5
logical_vector[(logical_vector>5)]
Output:
## [1] 6 7 8 9 10
# Print 5 and 6
logical_vector <- c(1:10)
logical_vector[(logical_vector>4) & (logical_vector<7)]
Functions
A function should be:
- written to carry out a specified a tasks
- may or may not include arguments
- contain a body
- may or may not return one or more values.
We will see three groups of function in action:
- General function
- Maths function
- Statistical function
General functions
We are already familiar with general functions like cbind(), rbind(), range(), sort(), order() functions. Each of these functions has a specific task, takes arguments to return an output. Following are important functions one must know.
diff function
If you work on time series, you need to stationary the series by taking their lag values. A stationary process allows constant mean, variance and autocorrelation over time. This mainly improves the prediction of a time series. It can be easily done with the function diff(). We can build a random time-series data with a trend and then use the function diff() to stationary the series. The diff() function accepts one argument, a vector, and return suitable lagged and iterated difference.
Note: We often need to create random data, but for learning and comparison we want the numbers to be identical across machines. To ensure we all generate the same data, we use the set.seed() function with arbitrary values of 123. The set.seed() function is generated through the process of pseudorandom number generator that make every modern computers to have the same sequence of numbers. If we don't use set.seed() function, we will all have different sequence of numbers.
set.seed(123)
## Create the data
x = rnorm(1000)
ts <- cumsum(x)
## Stationary the serie
diff_ts <- diff(ts)
par(mfrow=c(1,2))
## Plot the series
plot(ts, type='l')
plot(diff(ts), type='l')
length function
In many cases, we want to know the length of a vector for computation or to be used in a for loop. The length() function counts the number of rows in vector x. The following codes import the cars dataset and return the number of rows.
Note: length() returns the number of elements in a vector. If the function is passed into a matrix or a data frame, the number of columns is returned.
dt <- cars
## number columns
length(dt)
##Output:
## [1] 1
## number rows
length(dt[,1])
## Output:
## [1] 50
Math functions
R has an array of mathematical functions.
- abs(x): Takes the absolute value of x
- log(x,base=y): Takes the logarithm of x with base y; if base is not specified, returns the natural logarithm
- exp(x): Returns the exponential of x
- sqrt(x): Returns the square root of x
- factorial(x): Returns the factorial of x (x!)
# sequence of number from 44 to 55 both including incremented by 1
x_vector <- seq(45,55, by = 1)
#logarithm
log(x_vector)
Output:
## [1] 3.806662 3.828641 3.850148 3.871201 3.891820 3.912023 3.931826
## [8] 3.951244 3.970292 3.988984 4.007333
#exponential
exp(x_vector)
#squared root
sqrt(x_vector)
Output:
## [1] 6.708204 6.782330 6.855655 6.928203 7.000000 7.071068 7.141428
## [8] 7.211103 7.280110 7.348469 7.416198
#factorial
factorial(x_vector)
Output:
## [1] 1.196222e+56 5.502622e+57 2.586232e+59 1.241392e+61 6.082819e+62
## [6] 3.041409e+64 1.551119e+66 8.065818e+67 4.274883e+69 2.308437e+71
## [11] 1.269640e+73
Subseting Vectors, Matrices and Data Frames using the subset function
https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/subset
The R Graph Gallery
Aquí se muestran ejemplos de plots, graph, ... :
https://www.r-graph-gallery.com/
Embedding R In A Website
http://fabian-kostadinov.github.io/2015/09/21/embedding-r-in-a-website/
Shiny
http://rstudio.github.io/shiny/tutorial/#hello-shiny
http://shiny.rstudio.com/tutorial/
https://shiny.rstudio.com/articles/js-build-widget.html
Examples of Shiny web apps: https://www.rstudio.com/products/shiny/shiny-user-showcase/
Hosting and deployment
http://shiny.rstudio.com/deploy/
https://docs.rstudio.com/shinyapps.io/getting-started.html#deploying-applications
Run R Shiny App on Apache Server (Not possible): https://stackoverflow.com/questions/43527041/run-r-shiny-app-on-apache-server/43528264
How to Deploy Interactive R Apps with Shiny Server
https://www.linode.com/docs/development/r/how-to-deploy-rshiny-server-on-ubuntu-and-debian/
- Shiny sever fetch the pages from:
/srv/shiny-server/
- It uses by defaul port 3838
- Para que mi aplicación funcionara fue necesario:
chown -R shiny:shiny gofaaas/
- Luego de hacer cambios en el directorio de la applicación es necesario to restart Shiny Server:
sudo systemctl restart shiny-server
Deploy to the cloud with Shinyapps
Open in GoogleChrome: