ADVANCED R PROGRAMMING, SUMMER 2025 EDITION



Functions

The functional programming paradigm distinguishes functions as the most important components of a program. Each calculation is treated as a mathematical function whose result depends only on the input data. Consider the example function f, which computes the value of the constant \(\pi\) by summing the \(n+1\) first elements of the Leibniz series, \(\pi = 4 \sum_{i=0}^{\infty} (-1)^i/(2i+1)\).

f <- function(n) 4*sum (c (1,-1) / (2*(0:n)+1))
f (1000)
## [1] 3.142592

The f function is composed of predefined elements that are also functions: *, sum(), c(), /, +. Each of these functions has defined arguments and returns a clearly defined value:

The function f can therefore also be written as

f <- function(n) `*` (4, sum (`/` (c(1, -1), `+` (`*` (2, `:` (0, n)), 1))))
f (1000)
## [1] 3.142592


Defining R functions

To define a function, execute the following command

function (parameter_list) body

gdzie:

  • body is an R expression executed on the given arguments. The resulting object is the return value of the function.
  • parameter_list is a comma-separated list of elements of the form:
    • parameter_name,
    • parameter_name = default_value, or
    • ....

It’s worth noting that no name needs to be assigned to the function.

(function(x) x^2)(1:5) # square
## [1]  1  4  9 16 25

In the above example, we applied an anonymous function to the vector 1:5. Anonymous functions are especially useful when combined with functions from the *apply family.

lapply (list(1:3, 4:6), function(x) x^2)
## [[1]]
## [1] 1 4 9
## 
## [[2]]
## [1] 16 25 36

Most often, however, we will use named functions.

square <- function(x) x^2
square (1:5)
## [1]  1  4  9 16 25
is.function (square)
## [1] TRUE
is.atomic (square)
## [1] FALSE
is.vector (square)
## [1] FALSE
is.recursive (square)
## [1] TRUE
typeof (square)
## [1] "closure"
mode (square)
## [1] "function"
square
## function(x) x^2

The body of a function is a single expression. Multiple expressions can be grouped together using curly braces. Interestingly, { is also a function. Returns the value resulting from the last expression.

`{`(1, 2, 3)
## [1] 3
{
  1 
  2 
  3
}
## [1] 3

Every function must return something. In some cases, we want to write a function that writes or draws something and does not necessarily have to return anything. Then it is convenient to use the invisible (x=NULL) function.

printvec <- function(x) {
  cat (x, sep = ", ")
  cat ("\n")
  invisible ()
}

val <- printvec (1:4)
## 1, 2, 3, 4
val
## NULL


Variables inside functions

Any value assignment using the <- operator inside a function is local.

g <- function(x) {
  gy <- x+1
}
g(3)
gy
## Error in eval(expr, envir, enclos): object 'gy' not found

The gy variable only exists inside the g() function. However, it is possible to refer to global variables within functions and even change them using the <<- operator. However, such behavior is strongly discouraged because it may generate problems with maintaining and developing the code.

g <- function(x) {
  z <<- x + y
}
z <- 0
y <- 5
z <- g(1:5)
z
## [1]  6  7  8  9 10


Functions’ arguments

In R, you can pass any object to any function. However, this does not mean that the function has to accept it.

square (square)
## Error in x^2: non-numeric argument to binary operator
square (c ("a", "b"))
## Error in x^2: non-numeric argument to binary operator
square (list (1, 2, 3))
## Error in x^2: non-numeric argument to binary operator

It is the responsibility of the programmer to pass the right objects to the function. In such a situation, a very useful function is stopifnot (cond_1, cond_2, ..., cond_n), which will throw an error if any of the conditions are not met.

mean2 <- function (x) {
  stopifnot (is.numeric(x))
  stopifnot (length(x) > 0, is.finite(x))
  mean (x, na.rm=TRUE)
}

mean (numeric())
## [1] NaN
mean2 (numeric())
## Error in mean2(numeric()): length(x) > 0 is not TRUE
mean ( c(1,2,3,Inf))
## [1] Inf
mean2 ( c(1,2,3,Inf))
## Error in mean2(c(1, 2, 3, Inf)): is.finite(x) are not all TRUE

Arguments are passed to the function by value. Inside functions, they act as local variables. Changing them in the function does not change their global value.

squareroot <- function(x) {
  x <- sqrt(x)
  x
}

x <- 2
squareroot (x)
## [1] 1.414214
x
## [1] 2

When defining a function, you can specify its default arguments. If an argument is omitted when calling a function, its default value is used.

f <- function (a = 1, b = 2) a*b

f()
## [1] 2
f(3,)
## [1] 6
f(,4)
## [1] 4
f(3,4)
## [1] 12

Default arguments (if any) should be provided at the end of the argument list. For optimization reasons, function arguments are not used until they are needed.

f <- function (x) {
  cat("before ")
  cat(x)
  cat(" after\n")
}

f(1)
## before 1 after
f({cat("now "); 1})
## before now 1 after

This behavior is called lazy evaluation. Some arguments may never be used. You can easily check whether an argument was omitted when calling a function.

f <- function (x = 1) {
  cat (missing (x), x)
}
f(1)
## FALSE 1
f(2)
## FALSE 2
f()
## TRUE 1

Moreover, using the match.arg() function we can tell our function to accept only selected character values. Using the first letters is sufficient as long as the arguments can be distinguished.

f <- function (direction = c("left", "right", "forward", "backward")) {
  direction <- match.arg (direction)
  direction
}
f ("right")
## [1] "right"
f ("r")
## [1] "right"
f ("l")
## [1] "left"
f ("for")
## [1] "forward"
f ("back")
## [1] "backward"
f ("eft")
## Error in match.arg(direction): 'arg' should be one of "left", "right", "forward", "backward"
f ("up")
## Error in match.arg(direction): 'arg' should be one of "left", "right", "forward", "backward"

We can also find out what expression was passed to the function as its argument.

f <- function(x) cat (deparse (substitute (x)), "=", x)
f (5)
## 5 = 5
vals <- 1:5
f (vals)
## vals = 1 2 3 4 5
f (round (log (vals^2) + vals, 1))
## round(log(vals^2) + vals, 1) = 1 3.4 5.2 6.8 8.2

This is often used in graphics functions.

height <- rnorm (250, 1.79, 0.07)
weight <- rnorm (250, 23, 3) * height^2
plot (height * 100, weight, las = 1) # notice the axis labels

The ... (dot-dot-dot) parameter groups multiple arguments into one. It allows you to create functions that take an (a priori) unknown number of parameters or makes it easier to pass a set of parameters to another function. Each argument grouped in ... can be accessed by the two dot operator: ..1, ..2, ....

f <- function (...) {
  print (..1)
  print (..2)
}
f (1,2,3)
## [1] 1
## [1] 2
f (1)
## [1] 1
## Error in print(..2): the ... list contains fewer than 2 elements

The ... argument can be easily converted to a list. The str() function transparently displays any R structure.

f <- function (...) {
  list (...)
}
str (f (1,2,3))
## List of 3
##  $ : num 1
##  $ : num 2
##  $ : num 3

The three dot can also be passed to another function. Examples of functions that use the dot-dot-dot are c(), list(), sum(), mean(), cbind(), rbind().

f <- function (x, g, ...) {
  x + g (...)
}
f (1, c, 10, 20, 30)
## [1] 11 21 31
f (1, sum, 2, 3)
## [1] 6

Note that the g argument in the example above is a function, not variable. In R, functions can be also passed as an arguments to other functions.


Attributes

Almost all objects (except NULL) in R can be equipped with attributes. Attributes are, in a sense, metadata, i.e. they contain additional information about objects. Setting certain attributes can have a significant impact on how an object interacts with functions.


Getting and setting attributes

The attr() function can be used to set attributes. An attribute is a key-value pair. With the exception of a few special attributes, we can freely add metadata to R objects.

x <- (-5):5
attr(x, "color") <- "green"
attr(x, "which_positive") <- which(x > 0)
attr(x, "favorite_function") <- exp

The above code is equivalent to the following:

x <- structure ((-5):5,
                color = "green",
                which_positive = which(x > 0),
                favorite_function = exp
                )

The attr() function can also be used to read the attributes of an object.

attr (x, "color")
## [1] "green"
attr (x, "which_positive")
## [1]  7  8  9 10 11
attr (x, "favorite_function")
## function (x)  .Primitive("exp")

It is worth paying attention to how the object with added metadata is printed.

x
##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
## attr(,"color")
## [1] "green"
## attr(,"which_positive")
## [1]  7  8  9 10 11
## attr(,"favorite_function")
## function (x)  .Primitive("exp")
str (x)
##  int [1:11] -5 -4 -3 -2 -1 0 1 2 3 4 ...
##  - attr(*, "color")= chr "green"
##  - attr(*, "which_positive")= int [1:5] 7 8 9 10 11
##  - attr(*, "favorite_function")=function (x)

The most important thing, however, is that x is still an “regular” numeric vector and therefore you can perform exactly the same operations on it as on other numeric vectors.

mode (x)
## [1] "numeric"
typeof (x)
## [1] "integer"
x[1]
## [1] -5
mean (x)
## [1] 0
x[attr (x, "which_positive")]
## [1] 1 2 3 4 5

To remove an attribute, simply assign the value NULL to it.

attr (x, "favorite_function") <- NULL
x
##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
## attr(,"color")
## [1] "green"
## attr(,"which_positive")
## [1]  7  8  9 10 11

Another way to see all of an object’s attributes is with the attributes() function, which returns a named list.

attributes (x)
## $color
## [1] "green"
## 
## $which_positive
## [1]  7  8  9 10 11


Special attributes

There are some special attributes in R. Their values must meet certain constraints. These include:

  • comment - ignored by the print() function; type character,
  • names - the names of vector elements; type character vector,
  • class - the name of the S3 class object; type character,
  • dim - dimensionality of the matrix, i.e. the number of columns and rows; type numeric vector,
  • dimnames - the names of the matrix’ dimensions; type character vector,
  • row.names - the row names of the matrix or data frame; type character vector,
  • colnames - the column names of the matrix or data frame; type character vector.

Special attributes have their own functions with which we can set or read them.


The comment attribute

The role of the special attribute comment is to store metainformation that is invisible during normal display of the object.

x <- 1:5
comment (x) <- "Luke is Vader's son"
x
## [1] 1 2 3 4 5
str (x)
##  int [1:5] 1 2 3 4 5
##  - attr(*, "comment")= chr "Luke is Vader's son"
comment (x)
## [1] "Luke is Vader's son"
attr (x, "comment")
## [1] "Luke is Vader's son"
comment (x) <- 5
## Error in `comment<-`(`*tmp*`, value = 5): attempt to set invalid 'comment' attribute


The names attribute

The special attribute names is used to label vector elements. Its length must be exactly the same as the vector length.

names (x) <- c("1st", "2nd", "3rd", "4th", "5th")
x
## 1st 2nd 3rd 4th 5th 
##   1   2   3   4   5
structure (1:4, names = c("a", "b", "c", "d", "e"))
## Error in attributes(.Data) <- c(attributes(.Data), attrib): 'names' attribute [5] must be the same length as the vector [4]

Although indexing using labels is possible in R, they do not constitute identifiers of individual elements of the vector because they do not have to be unique.

x <- c (one = 1, two = 2, one = 3)
x["one"]
## one 
##   1
x[names(x) == "one"]
## one one 
##   1   3


The class attribute

This attribute refers to the so-called S3 class of the object. S3 class is the default object-oriented programming mechanism implemented in R. Not all objects used in R need to be S3 objects. The class() function called on a generic object will return the same as the mode() function instead of the actual class attribute.

x <- c ("a", "b", "c")
class (x)
## [1] "character"
attr (x, "class")
## NULL
h <- hist (rnorm (1000), plot = FALSE)
class (h)
## [1] "histogram"
attr (h, "class")
## [1] "histogram"


The dim attribute

Matrices in R are nothing more than vectors with an appropriately set dim attribute that informs how many rows and columns the matrix should consist of.

x <- 1:6
dim (x) <- c(2, 3)
x
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
class (x)
## [1] "matrix" "array"
is.matrix (x)
## [1] TRUE


Processing the objects with the attributes

When an object with attributes is passed to a function, some of its attributes may be passed to the object returned by the function.

x <- structure (1:5, class = "xx", names = letters[1:5], attrib = "val")
y <- structure (1, attrib2 = "val2")
z <- structure (5:1, attrib2 = "val2")
  • The indexing operator [ will skip all attributes except names, dim and dimnames.
x[2]
## b 
## 2
  • In the case of binary operators, such as “+”, most attributes are taken from the longer argument. Labels will be copied from the first one if it has the same length as the result, and from the second one if not. If the arguments have the same length, the attributes will be copied from both.
x * y
## a b c d e 
## 1 2 3 4 5 
## attr(,"class")
## [1] "xx"
## attr(,"attrib")
## [1] "val"
x + z
## a b c d e 
## 6 6 6 6 6 
## attr(,"attrib2")
## [1] "val2"
## attr(,"class")
## [1] "xx"
## attr(,"attrib")
## [1] "val"
  • Vectorized math functions should retain all attributes in most cases.
sin (x)
##          a          b          c          d          e 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
## attr(,"class")
## [1] "xx"
## attr(,"attrib")
## [1] "val"
  • Aggregation functions typically omit all attributes.
mean (x)
## [1] 3


Unit testing

Imagine a situation where the function f() calls the function g(), which calls the function h(), which in turn calls the function k() inside the function sapply(). It turns out that the f() function returns unexpected values when one of its arguments, n, is less than zero. Such an error is difficult to locate. Let’s assume that the bug is in the k() function, whose developer thought to himself No one will call this function with n<0. He made life difficult for other programmers by failing to ensure that his function behaved predictably under all conditions, even borderline ones.

Unit tests ensure that the function always handles input correctly. A unit test is a strict condition that a function must fulfill. They act as a behavioral specification for a function, e.g.

Unit tests should be performed as often as possible and therefore must not be computationally intensive. This concept is called continuous testing, which leads to better code quality. Unit tests should cover as many cases as possible. Thanks to them

It is suggested to write unit tests before implementing a function. Suppose we want to write a function that computes the \(n\)th term of the Fibbonaci sequence for \(n \geq 0\).

\(F_0 = 1\)

\(F_1 = 1\)

\(F_n = F_{n-1} + F_{n-2}, \quad n \geq 2\)

What unit tests can be written for such a function?

In R, a function can accept and return anything. That’s why it’s so important to make sure your function doesn’t return something wrong for unexpected input objects. Another important issue is that the functions we write are compatible with programming paradigms in R. We should take care of:

There are many packages in R for testing code. During the classes we will use the testthat package. It provides the test_that (test_name, expectations) function and a number of functions to check whether our function behaves as expected:

Of course, don’t forget to use the stopifnot (cond_1,...,cond_n) function, which will make unit tests much more efficient.

fibbonaci <- function(n) {
  stopifnot (is.numeric(n),
             length(n)==1,
             n>=0)
            
  n <- round(n)
  if (n<=1)
    return (1)
  else
    return (fibbonaci (n-1) + fibbonaci (n-2))
}

library(testthat)
test_that ("fibonnaci works", {
  expect_equal (fibbonaci (0), 1)
  expect_equal (fibbonaci (1), 1)
  expect_equal (fibbonaci (1.5), 2)
  expect_equal (fibbonaci (3.1), 3)
  expect_length (fibbonaci(4), 1)
  expect_error (fibonnaci (-1))
  expect_error (fibbonaci (1:5))
  expect_error (fibbonaci (mean))
  expect_error (fibbonaci (list))
  expect_error (fibbonaci ("a"))
})
## Test passed 🎉


Exception handling

The R language provides a mechanism for handling unusual events. There are three types of exceptions:


Messages

The message (character) function can be used to print diagnostic messages on the standard output of stderr errors.

message ("This is the diagnostic message")
## This is the diagnostic message
library (dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:testthat':
## 
##     matches
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Messages can be suppressed using the suppressMessages() function.

f <- function (x) {
  message ("Awsome function for computing the logarithm of the absolute value.")
  log (abs (x))
}
f (-5)
## Awsome function for computing the logarithm of the absolute value.
## [1] 1.609438
suppressMessages (f (3))
## [1] 1.098612


Warnings

The second type, warnings, can be used to draw the user’s attention to potential problems.

warning ("Attention! Potentially wrong value!")
## Warning: Attention! Potentially wrong value!
sqrt (-1)
## Warning in sqrt(-1): NaNs produced
## [1] NaN
1:2 + 3:5
## Warning in 1:2 + 3:5: longer object length is not a multiple of shorter object
## length
## [1] 4 6 6

Warnings can also be disabled. The suppressWarnings() function is used for this purpose.

suppressWarnings (sqrt (-1))
## [1] NaN

However, it is not recommended to use this function unless you are absolutely sure you know what you are doing. Generally, it is advisable to inform the user that not everything may have gone smoothly during code execution. Warnings can also be turned into errors. This behavior is useful when testing code.

options (warn = 2)
sqrt (-1)
## Error in sqrt(-1): (converted from warning) NaNs produced


Errors

Errors are thrown using the stop() function.

stop ("Very serious error!")
## Error in eval(expr, envir, enclos): Very serious error!
nonexisting_function (x)
## Error in nonexisting_function(x): could not find function "nonexisting_function"
cat (geterrmessage ())
## Error in nonexisting_function(x) : 
##   could not find function "nonexisting_function"

Sometimes we want a certain expression to execute regardless of an error in the function. The on.exit() command is used for this purpose.

erroneus_function <- function() {
  on.exit (print ("C"))
  on.exit (print ("D"), add = TRUE)
  print ("A")
  stop ("An error occured!")
  print ("B")
}
erroneus_function ()
## [1] "A"
## Error in erroneus_function(): An error occured!
## [1] "C"
## [1] "D"

The on.exit() function is useful when you want to make sure that certain resources are restored to their original state, even if the program exits unexpectedly due to an error. For example:

  • automatic closing of a file or database connection,
  • restoring graphic settings,
  • restoring global options, locales, etc.


Try-catch

The tryCatch() function allows you to ensure that any error is handled. Its syntax is as follows:

tryCatch (expression_to_try,
          error = error_handling_function_1arg,
          finally = expression_to_eval_at_the_end)

The tryCatch() function is beneficial for long simulations. What if the ten thousandth iteration causes an error? Without proper handling of this error, the program will terminate, and the results obtained in previous iterations may be lost if they are not saved to an external file on an ongoing basis.

test <- function(x) {
  tryCatch ({
    sum (as.numeric (x))},
    error = function (e) {
      NA
  })
}

test (1:5)
## [1] 15
test (mean)
## [1] NA


Debugging

A guide on how to use the debugger in RStudio can be found here:

https://support.posit.co/hc/en-us/articles/200713843-Debugging-with-RStudio