The knitr
package is used to generate documents using
the markup language Markdown. With this
package you can easily combine R codes, their execution results and text
descriptions. To create a report, you need a file with the
Rmd
extension, which means a combination of R and
Markdown.
The Rmd
file begins with a header in the YAML format (YAML Ain’t
Markup Language), which specifies the configuration of the conversion
process. Lines describing the author, title and date are optional. The
last one describes the format, in this case it is
output: html_document
, therefore the file will be converted
to an HTML file. Other popular choices are pdf_document
and
word_document
.
A chunk is a piece of code placed between triple apostrophes. The
curly brackets following the upper triple apostrophe indicate what
language should be used to run the insert (in our case r
)
and various additional options, the most frequently used of which
are:
message=FALSE
, warning=FALSE
, by
default all messages, including information about errors and warnings,
are pasted into the resulting document. However, these most often take
up space, so it is worth turning them off.
cache=TRUE
, by default each chunk is compiled every
time an Rmd document is transformed. However, if the calculation of a
specific chunk takes a long time and the content of the chunk has not
changed, then with the cache parameter enabled, the chunk will not be
re-executed and its result will be loaded from the local
repository.
eval=FALSE
, this way you can disable chunk
processing. It will not be executed, although the corresponding source
code will be included in the report. A useful option when we want to
present a fragment of code but do not need to execute it (e.g. we
illustrate how to download data from the Internet, but indeed we load
from a local copy).
echo=FALSE
causes only the output of the R code to
appear in the report, but not the source. Thanks to this, we only have
the results in the report and we can send the report to people to whom
the R code means nothing and may be disturbing.
include=FALSE
causes both the chunk itself and the
result of its execution not to be shown in the document at the place
where the chunk was placed (but the chunk is executed and its result can
be used later).
fig.width=
, fig.height=
, these
parameters can be used to determine how large the drawing pasted into
the report should be.
More options can be found here.
If a R code placed in a chunk causes a plot to be drawn (e.g. the
plot
or hist
function), the plot will appear
in the document, unless eval=FALSE
is set in the chunk
options.
It is possible to insert graphics from a file using the
include_graphics ("file path")
function. You can also use
the chunk options to change the size or position of the graphics (below
out.width="50%"
and fig.align="center
).
Data frames can be placed in a document in two ways. The first one is an ordinary listing, analogous to displaying a frame in the console.
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The second method is more elegant and uses the
kable (data, caption)
function and allows you to easily add
a caption to the table.
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
It is possible to set common options for all chunks using the
knitr::opts_chunk$set(...)
command. For this command to be
executed, it should be placed in a chunk at the top of the document,
e.g.
The knitr
package also allows you to place mathematical
formulas using the \(\LaTeX\) syntax.
The formula should be placed between the $
symbols. For
example, writing $f(x) = \frac{x^2}{x-1}$
will result in
\(f(x) = \frac{x^2}{x-1}\).
Rmd
document. This way you can easily
change the text style,
e.g. <p style="color: green; font-weight: bold;">green bold text</p>
gives
green bold text
The Markdown markup language was created to make text creation and formatting as simple as possible. It was originally created in Perl and later rewritten in many other languages. It is distributed under the BSD license and is available as a plug-in for several content management systems. It is currently widely used in the GitHub repository.
A paragraph is one or more lines of text separated from the rest by
one or more blank lines. To create a header, insert a #
in
the number from 1 to 6 at the beginning of the line, e.g.
# Header 1
## Header 2
### Header 3
Blocks of quotes are marked with >
characters
borrowed from e-mails.
> This is a block quote.
>
> This is a second paragraph in a block quote.
>
> ## This is a H2 header in a block quote.
This is a block quote.
This is a second paragraph in a block quote.
This is a H2 header in a block quote.
Markdown uses asterisks and underscores to highlight text. In a regular paragraph, you can format the code by surrounding the text with the grave accent (`).
Some of this text *is italic*.
Part of this text _is also italic_.
Use two asterisks for **bold**.
Or, if you prefer, __use two underscores__.
Part of this line is formatted LaTeX code `$a_{ij} = b_i \cdot c_j^2$`.
Some of this text is italic. Part of this text is also italic.
Use two asterisks for bold. Or, if you prefer, use two underscores.
Part of this line is formatted LaTeX code
$a_{ij} = b_i \cdot c_j^2$
.
Bulleted lists use asterisks, pluses, and minuses (*, +, and -) as list markers. These three tags are interchangeable:
A numbered list uses numbers ending with a dot as list markers:
Bullet lists can be nested using four spaces or a tab:
- The first item on the list
- The first item of the first item in the list.
- The second item of the first list item.
- The third element of the first element.
- The second item on the list.
- The third item on the list.
Markdown supports two styles of hyperlinking: “inline” and reference. Both styles use square brackets to mark the boundaries of the text to be linked. “Inline” links use parentheses immediately after the link text:
This is a link to Faculty of Physics, WUT.
Reference-style hyperlinks allow you to place links using names that are defined elsewhere in the document:
The most popular search engines for scientific publications are [Google Scholar][1], [Scopus][2]
and [Web of Science][3], only the first one is completely free.
[1]: http://scholar.google.com/ "Google"
[2]: http://scopus.com/
[3]: http://webofknowledge.com/
The most popular search engines for scientific publications are Google Scholar, Scopus and Web of Science, only the first one is completely free.
The basic command for inserting external graphics into a document is

. This command differs only in
the exclamation mark from the command that inserts a link.
It is important to leave a line break below the command,
otherwise, the image will be inserted into the text and will look unattractive.
If the image is too large, it can be made smaller by adding
{width="value"}
at the end, where the value can be given in
pixels or as a percentage of the page width. The original image is 368
pixels wide, let’s reduce it by half.
Creating a table in Markdown is also possible, although this is somewhat problematic. The first row contains column headings separated by vertical bars. In the second line, dashes are placed between the vertical lines separating the columns. The second row also specifies the alignment of the elements in the column. It may be different in each column. The remaining rows are the contents of the columns separated by dashes. A table made in Markdown will always take up the entire width of the page.
| Default alignment | Align left | Center | Align right |
|---|:--|:-:|--:|
| 1 | 2 | 3 | 4 |
| 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 |
Default alignment | Align left | Center | Align right |
---|---|---|---|
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 |
There are several reasons why you should create your own packages. First, it forces us to better organize and document our code. Secondly, it makes it easier to use the same functions in different projects without having to copy code. Thirdly, it allows you to easily share your code with others. To create your own project you need two libraries:
install.packages ("devtools")
install.packages ("roxygen2")
# Optionally to check the names of existing packages
install.packages("available")
There are several ways to establish a new package. In this manual we will use RStudio for this purpose. Additionally, we will learn how to publish a package to the GitHub repository.
At the top of the menu, select File
>
New Project...
> New Directory
>
R Package
. A dialog box will appear in which we must
provide the package name. We can use the suggest()
and
available()
functions from the available
package to choose the package name.
## ── testr ───────────────────────────────────────────────────────────────────────
## Name valid: ✔
## Available on CRAN: ✔
## Available on Bioconductor: ✔
## Available on GitHub: ✖
## Abbreviations: http://www.abbreviations.com/test
## Wikipedia: https://en.wikipedia.org/wiki/test
## Wiktionary: https://en.wiktionary.org/wiki/test
## Sentiment:???
If we intend to make the package available in an external repository,
e.g. GitHub, we should check the Create a git repository
box. Done, the package project already exists. Let’s take a look at what
it contains.
.gitignore
and .Rbuildignore
contain
lists of files to be ignored by git or R,
DESCRIPTION
and NAMESPACE
contain
metadata about the package. The former contains information about the
package name, its author, version, and dependencies on other packages.
The second one specifies which functions will be available to the user
after loading the package and possibly imports functions from other
packages. The NAMESPACE
file does not need to be edited
manually; the devtools
package will do it for us. For this
to happen, we need to remove it first. More information about
DESCRIPTION
can be found here.
Package: testr
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true
man
will contain all manuals for our functions. We
don’t have to look there either if we use
devtools
.
The R
folder will contain all the functions that
make up the package. Functions can be collected in one file or scattered
among many.
<projectname>.Rproj
refers to the project
created in RStudio, not the package itself. When creating a package
without using RStudio, this file will not appear.
This step is optional. Currently our project uses a local git
repository (which is also optional). To connect your project to GitHub,
log in to the site and create a new repository with the same name as
your local project. When creating, do not check the
Add a README file
box. In the RStudio terminal (not the R
console), enter the following commands (make sure you are in the project
directory):
git remote add origin https://github.com/user-name/project-name.git
git branch -M main
Now you can easily create commits and update the repository from the
Tool
> Version Control
menu or using the
button at the top of the window. Attention! If the Push
option is grayed out after the first commit, you should perform this
operation from the console by typing
git push -u origin main
Another solution is to simply make the first commit from the console,
but then remember to add the files first with the
git add --all
command (or instead of --all
,
specify all the files one by one).
To make work on automatic documentation and code organization run
smoothly, it is recommended that each function in the package be located
in a separate .R
file, which will have the same name as the
function. Each function will contain documentation according to the
roxygen
schema. Detailed instructions can be found by
typing vignette("rd")
in the terminal. Below is a simple
example.
#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @return A number.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
x + y
}
A roxygen
comment starts with the #'
character. The full description of functions consists of four parts, of
which only one part is absolutely mandatory.
@title
tag).@description
tag).@param
, @return
and @examples
:
@param
describes the function parameters,@return
explains what the function returns,@examples
demonstrates how to use the function.@details
).It is also worth adding the @export
tag because it
indicates that the name of the function is to be placed in the
NAMESPACE
file. Once the function file is ready, we invoke
the devtools::document()
command to update the contents of
man
and NAMESPACE
.
Our package, of course, can use other additional libraries whose installation is necessary for our package to work properly. Let’s consider the following example.
#' Read many data frames with the same columns from many text files.
#'
#' @param prefix A string with the first part of the filename.
#' @param range Numeric vector with the numbers which differ from file to file.
#' @param prefix A string with the last part of the filename. Typically consists of a file extension.
#' @param is.header A local value indicating whether the file contains the names of the variables
#' as its first line. The default value is TRUE.
#' @return A single data frame.
#' @examples
#' read.many.tables ("N1000_k8_O0.2_beta0.5_part", 1:10, ".txt")
#' read.many.tables ("precision_N", c(100,200,500,1000,2000,5000), ".txt")
read.many.tables <- function (prefix, range, suffix, is.header=TRUE) {
dataframes <- lapply (range,
function(ii) read.table (paste (prefix,ii,suffix,sep=""), header=is.header))
data.table::rbindlist(dataframes)
}
In the above example we use the rbindlist()
function
from the data.table
package. It is very important that we
do this by specifying the package name and the ::
operator,
and not using the library
function. Calling the
library
or requires
command is
absolutely not allowed for functions included in the
package. Always use the full function name preceded by the package name
and the ::
operator. Moreover, the DESCRIPTION
file should contain information that the data.table
package
is required in our package:
Imports: data.table
It is also possible to include a specific version of a package in
Imports
. Different package names should be comma
separated.
Imports: data.table (>= 1.9.4), dplyr
However, if our package uses the data.table
package, but
only to a small extent and its absence does not affect the main
functionality of our package, we can only suggest its installation.
Suggests: data.table, dplyr, tidyr
Data sets are often added to R packages, which can be used to test
the package’s functionalities conveniently. To add a set, save it in the
.Rdata
or .rda
format using the
save
function in the data/
directory in the
project folder. It is best if the name of the data file is the same as
the variable’s name in R.
earnings <- data.frame (name = c("Kowalski", "Nowak", "Wiśniewska"),
salary = c(4100, 5700, 6300))
save (earnings, "path-to-project/data/earnings.RData")
Then, we create a .R
file with the data set description
written in Roxygen2
and place it together with the
remaining .R
files in the R
folder. The file
name should be the same as the file name, e.g., earnings.R
.
Inside, we place the following elements:
@docType
specifies the document type (enter
data
).@usage
tells you how to run the file (just specify
data (set_name)
).@format
specifies the data format, most often
data.frame
or matrix
, possibly
array
or list
.\describe{}
function allows you to add a
detailed description of the set, in which we provide a description of
each variable (column) separately.@references
tag is essential if the data is not
synthetic and its source must be provided.@source
- similar to @references
.@keywords
- keywords, often simply
dataset
.@examples
- examples of using the set by other
functions, e.g. plot
.#' Earnings of employees of XYZ company
#'
#' The data set contains the monthly salary of employees of company XYZ.
#'
#' @docType data
#' @usage data (earnings)
#' @format A data frame
#' @keywords datasets earnings
#' @examples
#' data (earnings)
#' barplot (earnings$salary, names.arg = earnings$name)
Installing the package is very simple. If we have a local copy of the
source code, just use the install
function from the
devtools
package pointing to the directory containing our
package.
## [1] 7
## [1] -8
However, if our package is in a repository, e.g. GitHub, CRAN, or
BioConductor, we can use the appropriate functions
install_github
, install_cran
, or
install_biocondutor
.
If we want to install a package from a private repository, we must provide the appropriate token.
The Rcpp package allows us to write R functions directly in C++ language, which may increase the performance of our code hundreds of times. To use it, we will need the following tools:
C++ is a compiled (not: interpreted) all-purpose programming language. It is portable, object-oriented, generic, and provides low-level memory manipulation facilities. It is developed by Bjarne Stroustrup since 1979 as an extension of the C programming language. It was initially standardized in 1998 as ISO/IEC 14882:1998. C++ characterizes with high performance, efficiency, and flexibility.
R is implemented in C (and R and Fortran). As a result, everything we can do in R may be implemented in C/C++. For example, the following functions call some compiled code directly:
## function (..., na.rm = FALSE) .Primitive("sum")
## function (...) .Primitive("c")
The R/C API provides the fastest way (in terms of
performance) to communicate with R from the compiled code. However, it
is definitely not the most convenient one. All the R objects are handled
with the type SEXP
, which is a pointer to the
SEXPREC
structure. For example, this is a C/C++ code
equivalent to an R call to c(123.45, 67.89):
SEXP createVectorOfLength2 () { //R/C API
SEXP result ;
result = PROTECT (allocVector (REALSXP, 2));
REAL (result)[0] = 123.45;
REAL (result)[1] = 67.89;
UNPROTECT (1);
return result ;
}
The R/C API may be complicated for many R users to learn and use.
Thus, during this lesson, we will discuss the Rcpp
package.
It simplifies writing compiled code. Just compare the above to:
Rcpp
is a set of convenient C++ wrappers for the whole
R/C API. We may use it to remove performance bottlenecks in our R code,
implement code that is difficult to vectorize, or whenever we need
advanced algorithms, recursion, or data structures.
Let us compare two implementations of the function which computes the n-th Fibonacci number:
\(F_0 = 1\)
\(F_1 = 1\)
\(F_n = F_{n-1} + F_{n-2}, \quad n \geq 2\)
Here is an R solution:
fib1 <- function (n) {
if (n <= 1) return(1)
last12 <- c(1, 1)
for (i in 2:n)
last12 <- c(last12[1]+last12[2], last12[1])
last12[1]
}
sapply (0:7, fib1)
## [1] 1 1 2 3 5 8 13 21
And this is how we may implement it in Rcpp:
Rcpp::cppFunction ("
int fib2(int n) {
if (n <= 1) return 1;
int last1 = 1;
int last2 = 1;
for (int i=2; i<=n; ++i) {
int last3 = last2;
last2 = last1;
last1 = last2+last3;
}
return last1;
}
")
Note that Rcpp::cppFunction compiles, links, and loads a dynamic library.
## function (n)
## .Call(<pointer: 0x7f29d0e59fe0>, n)
## [1] 1 1 2 3 5 8 13 21
What is important, Rcpp automatically takes care of checking the types of functions’ arguments:
## [1] 1
## [1] 1
## Error in eval(expr, envir, enclos): Not compatible with requested type: [type=character; target=integer].
## Error in eval(expr, envir, enclos): Expecting a single value: [extent=3].
Here are some benchmarks:
## Unit: nanoseconds
## expr min lq mean median uq max neval
## fib1(40) 7797 8100 9388.350 8292.5 8684.5 41826 200
## fib2(40) 441 466 779.315 528.0 589.0 26687 200
Typically, we may get a speed gain of ca. 2–50x+. However, the code writing time increases significantly. Thus, Rcpp makes sense if writing time << execution time or in code re-used by others, e.g. whenever we would like to guarantee the best possible performance.
Apart from an inline usage or Rcpp, we may put our C++ code in
separate source files. Here are the contents of the
test.cpp
file:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int fib3 (int n) {
if (n <= 1) return 1;
int last1 = 1;
int last2 = 1;
for ( int i=2; i <= n; ++i) {
int last3 = last2;
last2 = last1;
last1 = last2 + last3;
}
return last1;
}
Call the sourceCpp()
function to compile the source
file.
The fib3()
function may now be called in R.
## function (n)
## .Call(<pointer: 0x7f29d0e48fe0>, n)
## [1] 1 1 2 3 5 8 13 21