Laboratory 1

DATA ANALYSIS AND VISUALIZATION IN R, WINTER 2024 EDITION

Work in the console and in RStudio
Data types
- Atomic
- Data structures
Indexing in R
Operations on vectors and matrices
Loops and conditional statements
Selected basic functions
Writing your own functions
Cheat sheets

Work in the console and in RStudio

Console

We start a new R session in the console/terminal by entering the R command.
To close the session, type quit() or q() in the console.
A command is executed after pressing the Enter button.
A few commands can be written in one line if separated by the semicolon. The semicolon at the end of the line is redundant and ignored.

2*pi;cos(1)

## [1] 6.283185

## [1] 0.5403023

There are three assign operators: =, <-, <<-.
- <- has the highest priority and it is standard assign operator in R,
- <<- is used inside the function to overwrite the value of a global variable,
- = is used for passing the parameter values into the function.

a <- 5
2 -> b
c = 3
a; b; c

## [1] 5

## [1] 2

## [1] 3

b <- a = 10 # gives error

## Error in b <- a = 10: could not find function "<-<-"

a; b

## [1] 5

## [1] 2

b = a <- 10
a; b

## [1] 10

## [1] 10

increment <- function(x){
  x <<- x+1
}
x <- 1; increment (x); x

## [1] 2

rnorm (n = 3, mean = 2, sd = 0.5)

## [1] 2.698830 2.821433 1.984869

Writing and executing a code in the console, line by line is not an efficient way of programming in R. The right way is to put the code in a script and run it with one command - source().

########################## test_script.R ##############################

# Function
# f <- function(x, y) {
#   x <- 2*x
#   y <<- 2*y
# }

# Main part of the script

# x <- 2
# y <- 2
# 
# print(x)
# print(y)
# x
# 
# f(2,2)
# 
# cat("x =",x,"\n")
# cat("y =",y,"\n")

########################## Script execution ###############################
source("test_script.R")

## [1] 2
## [1] 2
## x = 2 
## y = 8

RStudio

A comment in the script begins with the # character.
To run a script opened in RStudio, press the Source button.
You can run selected a chunk of code by highlighting it and pressing the Ctrl+Enter key combination or the Run button.
The print() and cat() functions are used to print variable values to the screen.

x <- 2
y <- x+1
z <- x^2

cat("y =",y,"\n")

## y = 3

print(z)

## [1] 4

Data types

Atomic

Numeric

The default numeric types in R are real numbers.

a <- 10
a; typeof(a)

## [1] 10

## [1] "double"

To create an integer variable you have to add L letter after the number.

b <- 10L
typeof(b)

## [1] "integer"

typeof(b+1)

## [1] "double"

typeof(b+1L)

## [1] "integer"

You can create a complex number using the format a+bi.

d <- 2+3i
d; typeof(d)

## [1] 2+3i

## [1] "complex"

sqrt(-1)

## Warning in sqrt(-1): NaNs produced

## [1] NaN

sqrt(-1+0i)

## [1] 0+1i

Scientific notation is allowed.

a <- 2.3e3
a

## [1] 2300

The special values are NaN (Not a Number) nad infinities Inf, -Inf

1/0; exp(-Inf); 0 * Inf

## [1] Inf

## [1] 0

## [1] NaN

Character

Textual type (string) starts and ends with a character ’ or “.

string <- "Ala ma kota"; string

## [1] "Ala ma kota"

string <- 'Pakiet R'; string; typeof (string)

## [1] "Pakiet R"

## [1] "character"

The paste() function is used to combine strings.

word1 <- "I"
word2 <- "like"
word3 <- "trains"
paste (word1, word2, word3, sep = " ")

## [1] "I like trains"

The variable of type character is not a vector of chars.

length (word3)

## [1] 1

Logical

Represents logical true (TRUE or T) and false (FALSE or F).
When used in an arithmetic expression, it is automatically converted into 0 and 1 numbers.

1 == 7

## [1] FALSE

z <- 1 == 1
z; typeof(z)

## [1] TRUE

## [1] "logical"

y <- (1 == 1) + 1
y; typeof(y)

## [1] 2

## [1] "double"

Data structures

Vector

An ordered set of objects of the same type (except for NA - Not Available).
Primary data structure in R: operations performed on vectors are the most efficient.
The c() function creates a vector from individual elements of the same type.

v <- c(-1,2,5)
v

## [1] -1  2  5

Arithmetic sequences can be easily generated by the seq() function or with a simple colon.

u <- 1:10
u

##  [1]  1  2  3  4  5  6  7  8  9 10

w <- seq(-10,10,2)
w

##  [1] -10  -8  -6  -4  -2   0   2   4   6   8  10

The rep() function is used to generate vectors with repetitions.

x <- rep(TRUE, 5)
x

## [1] TRUE TRUE TRUE TRUE TRUE

y <- rep(c(1,2,3),3)
y

## [1] 1 2 3 1 2 3 1 2 3

z <- rep(c(1,2,3), each=3)
z

## [1] 1 1 1 2 2 2 3 3 3

Factor

Useful for storing vectors of values occurring at several levels.
It is used to represent categorical and qualitative data.
Created using the factor() function.
The levels() function returns the levels.

education <- factor (c ("primary", "tertiary", "secondary", "secondary", "tertiary", "secondary"))
education

## [1] primary   tertiary  secondary secondary tertiary  secondary
## Levels: primary secondary tertiary

levels (education)

## [1] "primary"   "secondary" "tertiary"

It takes up less memory than the corresponding character type - it is stored as consecutive natural numbers, but arithmetic operations cannot be performed on them.

typeof (education)

## [1] "integer"

education+1

## Warning in Ops.factor(education, 1): '+' not meaningful for factors

## [1] NA NA NA NA NA NA

List

An ordered set of objects (e.g. vectors) that can be of various types and any length.
Created using the list() function.

L <- list (int = 1:10, x = 2.71, text = c("a", "b", "c"), logic = rep(T, 5))
L

## $int
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $x
## [1] 2.71
## 
## $text
## [1] "a" "b" "c"
## 
## $logic
## [1] TRUE TRUE TRUE TRUE TRUE

Matrix

The two-dimensional matrix is created by the matrix() function.

A <- matrix (0, 2, 3); A

##      [,1] [,2] [,3]
## [1,]    0    0    0
## [2,]    0    0    0

A <- matrix (1:8, 4, 2); A

##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8

A <- matrix (c ("a", "b", "c", "d"),2 , 2); A

##      [,1] [,2]
## [1,] "a"  "c" 
## [2,] "b"  "d"

The matrix is filled columnar by default, but this can be changed by setting the value of the byrow = TRUE parameter.

A <- matrix (1:8, 4, 2, byrow = TRUE); A

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8

In the case of multidimensional arrays (D > 2), we use the array() function.

B <- array (1:27, dim = c (3, 3, 3)); B

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   10   13   16
## [2,]   11   14   17
## [3,]   12   15   18
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   19   22   25
## [2,]   20   23   26
## [3,]   21   24   27

Data frame

A list of vectors of the same length.
Elements in each column are of the same type.
Elements in different columns can be of different types.
Very often used as a fundamental type in various R packages (e.g. ggplot2),
Created using the data.frame() function.

frame <- data.frame (numbers = 5:1, logic = T, text = letters[1:5]); frame

##   numbers logic text
## 1       5  TRUE    a
## 2       4  TRUE    b
## 3       3  TRUE    c
## 4       2  TRUE    d
## 5       1  TRUE    e

Indexing in R

Indexing in R is very important because it is very fast and can often replace loops.
All data structures (vectors, matrices, lists, etc.) are indexed starting from the index 1 (not zero as in C, C++ or Python) using square brackets [].
We can get requested elements using the c() and seq() functions or a semicolon.
Preceding the index with a minus sign means that we want to omit the elements of the vector at this index.

w <- 11:20
w[1:5]

## [1] 11 12 13 14 15

w[-1]

## [1] 12 13 14 15 16 17 18 19 20

w[c(1:4,8)]

## [1] 11 12 13 14 18

w[c(-2,-5)]

## [1] 11 13 14 16 17 18 19 20

In the case of a matrix, the above methods must be applied to all dimensions of the matrix using a comma.

M <- matrix(1:9, 3, 3)
M

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

# First row
M[1,]

## [1] 1 4 7

# First column
M[,1]

## [1] 1 2 3

# Two first rows
M[1:2,]

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8

# Omit third column
M[,-3]

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

# Omit second row and second column
M[-2,-2]

##      [,1] [,2]
## [1,]    1    7
## [2,]    3    9

Vectors and matrices can also be indexed by names (elements, columns, rows), if they have been defined.

named.vec <- c (v1 = 1, v2 = 0, v3 = 2, v4 = -1)
named.vec

## v1 v2 v3 v4 
##  1  0  2 -1

names (named.vec)

## [1] "v1" "v2" "v3" "v4"

named.vec["v1"]

## v1 
##  1

named.vec["v3"]

## v3 
##  2

colnames (M) <- c ("col1", "col2", "col3") 
rownames (M) <- c ("row1", "row2", "row3")
M

##      col1 col2 col3
## row1    1    4    7
## row2    2    5    8
## row3    3    6    9

names (M)

## NULL

M["row2",]

## col1 col2 col3 
##    2    5    8

M[,"col3"]

## row1 row2 row3 
##    7    8    9

M["row1","col2"]

## [1] 4

Lists can be indexed in two ways - both by numerical indexes given in square brackets and by the names of individual variables using the $ operator, i.e. list_name$variable_name. It is worth remembering that a single square bracket returns a sublist (i.e. the result is a list) and to get the same result as with the $ operator, you must use a double brackets [[...]].

L$int

##  [1]  1  2  3  4  5  6  7  8  9 10

L[1]

## $int
##  [1]  1  2  3  4  5  6  7  8  9 10

L[[1]]

##  [1]  1  2  3  4  5  6  7  8  9 10

Data frames can be indexed as lists or as matrices.

frame

##   numbers logic text
## 1       5  TRUE    a
## 2       4  TRUE    b
## 3       3  TRUE    c
## 4       2  TRUE    d
## 5       1  TRUE    e

# First three rows
frame[1:3,]

##   numbers logic text
## 1       5  TRUE    a
## 2       4  TRUE    b
## 3       3  TRUE    c

# Second column
frame[,2]

## [1] TRUE TRUE TRUE TRUE TRUE

# First column
frame$numbers

## [1] 5 4 3 2 1

It is also possible to index the indexing result.

w[6:10][1:2]

## [1] 16 17

L[[3]][2:3]

## [1] "b" "c"

frame$numbers[1:3]

## [1] 5 4 3

frame[1:3,][1]

##   numbers
## 1       5
## 2       4
## 3       3

frame[1:3,]["numbers"]

##   numbers
## 1       5
## 2       4
## 3       3

Operations on vectors and matrices

Let define following vectors w, u and matrices A, B:

w <- c (1,2)
v <- c (3,4)
A <- matrix (1:4, 2, 2)
B <- matrix (4:1, 2, 2)
w; v; A; B

## [1] 1 2

## [1] 3 4

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

##      [,1] [,2]
## [1,]    4    2
## [2,]    3    1

We can perform the following operations on vectors:

vectors addition,

w + v

## [1] 4 6

adding a number to a vector,

w + 5

## [1] 6 7

multiplying a vector by a number,

2 * w

## [1] 2 4

dot product of vectors,

w %*% v

##      [,1]
## [1,]   11

element-wise multiplying of vectors

w * v

## [1] 3 8

Similar operations can be performed on matrices. In addition, we can use another, very useful functions:

matrix addition,

A + B

##      [,1] [,2]
## [1,]    5    5
## [2,]    5    5

adding a number to a matrix,

A + 1

##      [,1] [,2]
## [1,]    2    4
## [2,]    3    5

multiplying a matrix by a number,

A * 2

##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8

transposed matrix,

t (A)

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

calculating the determinant of a matrix,

det (A)

## [1] -2

matrix product,

A %*% B

##      [,1] [,2]
## [1,]   13    5
## [2,]   20    8

calculating matrix eigenvalues and eigenvectors,

eigen (A)

## eigen() decomposition
## $values
## [1]  5.3722813 -0.3722813
## 
## $vectors
##            [,1]       [,2]
## [1,] -0.5657675 -0.9093767
## [2,] -0.8245648  0.4159736

Loops and conditional statements

FOR and WHILE loops

x <- 1:10
for(i in x) print(i)

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

x <- 1
while(x < 5) {
  print(x)
  x <- x + 1
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4

IF… ELSE… conditional statement

x <- 5
if(x < 5) print(x) else print(x ^ 2)

## [1] 25

The length of the condition has to be one, otherwise the conditional statement will not be executed.

x <- 1:10
if(x %% 3) {
  print("It is not divisible by 3")
} else {
  print("It is divisible by 3")
}

## Error in if (x%%3) {: the condition has length > 1

IFELSE(…,…,…) function

The conditions with length greater than one can be used in ifelse() function, which check the condition for each element of a given vector.

x <- 1:10
ifelse(x %% 3, "It is not divisible by 3", "It is divisible by 3")

##  [1] "It is not divisible by 3" "It is not divisible by 3"
##  [3] "It is divisible by 3"     "It is not divisible by 3"
##  [5] "It is not divisible by 3" "It is divisible by 3"    
##  [7] "It is not divisible by 3" "It is not divisible by 3"
##  [9] "It is divisible by 3"     "It is not divisible by 3"

Selected basic functions

Vector handling functions

The following functions are very useful when processing data in vector format:

number of vector elements,

x <- c (2,-1,0,3,-5)
length (x)

## [1] 5

average value of vector elements,

mean (x)

## [1] -0.2

standard deviation of vector elements,

sd (x)

## [1] 3.114482

reversing the order of vector elements,

rev (x)

## [1] -5  3  0 -1  2

sum of vector elements,

sum (x)

## [1] -1

cumulative sum of vector elements,

cumsum (x)

## [1]  2  1  1  4 -1

product of vector elements,

prod (x)

## [1] 0

cumulative product of vector elements,

cumprod (x)

## [1]  2 -2  0  0  0

smallest element of a vector,

min (x)

## [1] -5

index of the smallest element of a vector,

which.min (x)

## [1] 5

largest element of a vector,

max (x)

## [1] 3

index of the largest element of a vector,

which.max (x)

## [1] 4

a function that arranges elements in ascending or descending order,

sort (x)

## [1] -5 -1  0  2  3

sort (x, decreasing = TRUE)

## [1]  3  2  0 -1 -5

sort (x, index = TRUE)

## $x
## [1] -5 -1  0  2  3
## 
## $ix
## [1] 5 2 3 1 4

the above functions will not work correct if a given vector contains NA (Not Available) elements until the parameter na.rm=TRUE is used.

y <- c(1, NA, 2, 5, 7)
sum(y)

## [1] NA

mean(y)

## [1] NA

sum(y, na.rm = TRUE)

## [1] 15

mean(y, na.rm = TRUE)

## [1] 3.75

The which function returns the indexes of elements that meet a given condition.

which(y > 2)

## [1] 4 5

which(y == 2)

## [1] 3

which(y == NA)

## integer(0)

Calling the command which(y == NA) will not give the expected result. To find the indexes of NA, NaN or Inf elements, use the is.na, is.nan, is.finite and is.infinite functions inside the which function .

z <- c(0/0, NA, 1/0, -1/0, 10, 15); z

## [1]  NaN   NA  Inf -Inf   10   15

is.na(z)

## [1]  TRUE  TRUE FALSE FALSE FALSE FALSE

is.nan(z)

## [1]  TRUE FALSE FALSE FALSE FALSE FALSE

is.infinite(z)

## [1] FALSE FALSE  TRUE  TRUE FALSE FALSE

which(is.na(z))

## [1] 1 2

which(is.nan(z))

## [1] 1

which(is.infinite(z))

## [1] 3 4

Matrix handling functions

Some of the above functions can also be applied to matrices, but in the case of which.min() and which.max() it is necessary to use the arrayInd() function additionally to determine the matrix indexes - otherwise we will only get the vector index.

A <- matrix(1:16, 4, 4)
A

##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

length (A)

## [1] 16

sum (A)

## [1] 136

mean (A)

## [1] 8.5

sd (A)

## [1] 4.760952

min (A)

## [1] 1

max (A)

## [1] 16

which.min (A)

## [1] 1

which.max (A)

## [1] 16

arrayInd (which.min (A), dim (A))

##      [,1] [,2]
## [1,]    1    1

arrayInd (which.max (A), dim (A))

##      [,1] [,2]
## [1,]    4    4

Writing your own functions

As in any programming language, a very important element of the R is creating your own functions. The syntax for creating a function is as follows.

function_name <- function(x, y, ...) {
  ...
  ...
  return (...)
}

The function parameters (arguments) can be any data type or data structure, e.g. vectors, matrices or lists.

multiplication_table <- function (range1, range2) {
  return (range1 %o% range2)
}
multiplication_table (1:10, 1:10)

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    3    4    5    6    7    8    9    10
##  [2,]    2    4    6    8   10   12   14   16   18    20
##  [3,]    3    6    9   12   15   18   21   24   27    30
##  [4,]    4    8   12   16   20   24   28   32   36    40
##  [5,]    5   10   15   20   25   30   35   40   45    50
##  [6,]    6   12   18   24   30   36   42   48   54    60
##  [7,]    7   14   21   28   35   42   49   56   63    70
##  [8,]    8   16   24   32   40   48   56   64   72    80
##  [9,]    9   18   27   36   45   54   63   72   81    90
## [10,]   10   20   30   40   50   60   70   80   90   100

The return() instruction is not obligatory - the value of the function is the value specified in its last line. The dot character . in R does not have any special role. Therefore it can be used in the names of variables and functions.

add.two <- function (x, y) {
  x*y
  cos(x)
  x+y
}
add.two (2,5)

## [1] 7

All values passed to the function are visible and changed locally. If you need to globally change the value of a variable, use the assignment operator <<-.

f <- function (x, y) {
  x <- x * 2
  y <<- y * 2
}
x <- 2
y <- 2
f(2,2)
x; y

## [1] 2

## [1] 4

Cheat sheets

A popular form of assistance in data analysis in R is the so-called cheat sheets. They can be found at https://www.rstudio.com/resources/cheatsheets/. Another way is to open the Help menu in RStudio and select Cheat Sheets from the drop-down list.