ADVANCED R PROGRAMMING, SUMMER 2025 EDITION



ggplot2 extensions

The ggplot2 package offers many possibilities regarding data visualization but requires some practice. Experts of ggplot2 create their plotting libraries based on this package. An overview of extensions can be found here.

As a reminder, the first step to making a plot in ggplot2 is to place the data in a data frame. If the data covers many series, it is best if the frame is long, not wide; in other words, it follows the tidy data paradigm. It is then possible to make the entire chart with one call to the geom_* function instead of calling this function separately for each series.

library (ggplot2)

data ("mtcars")
g1 <- ggplot (mtcars)
g1 + geom_point (aes(x=hp, y=qsec, size=mpg)) + theme_bw()

data ("iris")
g2 <- ggplot (iris)
g2 + geom_point (aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + theme_bw()


patchwork

The patchwork package aims to simplify the process of combining different charts into one image as much as possible. It does the same thing as the gridExtra and cowplot packages but uses an API that allows you to create arbitrarily complex graphics.

Let’s start by making some sample plots:

library (patchwork)

p1 <- ggplot (mtcars) + 
  geom_point (aes(mpg, disp)) + 
  ggtitle ('Plot 1')

p2 <- ggplot (mtcars) + 
  geom_boxplot (aes(gear, disp, group = gear)) + 
  ggtitle ('Plot 2')

p3 <- ggplot (mtcars) + 
  geom_point (aes(hp, wt, colour = mpg)) + 
  ggtitle ('Plot 3')

p4 <- ggplot (mtcars) + 
  geom_bar (aes(gear)) + 
  facet_wrap (~cyl) + 
  ggtitle ('Plot 4')


+ (add) operator

The most straightforward use of the patchwork package is the + operator, which adds two images together.

p1 + p2

When we add images in this way, the last added image has the active status, and all subsequent functions refer to it.

p1 + p2 + labs (subtitle = 'This will appear in the last plot')

By default, patchwork will force a square grid layout and fill by row.

p1 + p2 + p3 + p4

This behavior can be changed using the plot_layout function.

p1 + p2 + p3 + p4 + plot_layout (nrow = 3, byrow = FALSE)

p1 + p2 + p3 + p4 + plot_layout (widths = c(2, 1), heights = unit(c(5, 1), c('cm', 'null')))


/ (on) and | (next) operators

Of course, filling a square grid is not the only possibility for the package. Another option is to stack the charts on each other using the / operator or side by side using |.

p1 / p2

p1 | p4

The difference between + and | is that the + operator will always fill a square grid with graphs, while | will always place graphs side by side.

p1 | p2 | p3 | p4

It is possible to nest layouts using parentheses.

p1 | (p2 / p3)

If we want to add empty space to our composition, we can use the plot_spacer function.

p1 + plot_spacer() + p2 + plot_spacer() + p3 + plot_spacer()


Plot annotation

Sometimes, it is necessary to add a title to your chart composition. This can be done using the plot_annotation() function.

(p1 | (p2 / p3)) + plot_annotation (title = 'The surprising story about mtcars')


Inset

Another important functionality is adding insets to our charts using the inset_element function.

p1 + inset_element (p2, left = 0.6, bottom = 0.6, right = 1, top = 1)


More

More tips on using the patchwork package can be found on the project’s GitHub page.


gganimate

The gganimate package extends the ggplot2 syntax to include animation descriptions. It does this using a set of new features/layers that can be added to the graph and which determine how it should change over time:

The following supporting packages are needed for the library to function correctly:

Let’s look at the example below: we make a simple boxplot of fuel economy as a function of the number of cylinders and allow it to cycle through the number of gears available in the cars.

library (gganimate)

ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot() + 
  # Here comes the gganimate code
  transition_states(
    gear,
    transition_length = 2,
    state_length = 1
  ) +
  enter_fade() + 
  exit_shrink() +
  ease_aes('sine-in-out')

Since this is a discrete partition (gear is best described as a factor enum variable), we use transition_states and report the relative transition times of transition_length and state state_length. Since not all values of the gear variable have all three values of the cyl variable, some states will be missing (no box on the chart). For this reason, we define the box as one that appears on the graph with a fade effect and disappears by shrinking to zero. Finally, we decide to use sine smoothing for our entire aesthetic (in this case, only y changes)

We will use data from the gapminder package for the following example.

library (gapminder)

ggplot (gapminder, aes (gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point (alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual (values = country_colors) +
  scale_size (range = c(2, 12)) +
  scale_x_log10 () +
  facet_wrap (~continent) +
  # Here comes the gganimate specific bits
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  transition_time(year) +
  ease_aes('linear')

The above example uses transition_time(), which can be used with continuous variables such as year. There is no need to provide relative transition times and states for these types of transitions because the choice of variable dictates this (the transition from 1980 to 1990 should take twice as long as the transition from 1990 to 1995). You can also see how you can associate the chart title (or any other labels) with a time variable.

Examples come from https://gganimate.com/.


ggpubr

The creators of the ggpubr package point out that the plots generated by default by the ggplot2 package are far from the standards known from scientific publications and every effort must be made to improve them. So, they created a set of functions that can be used to create elegant charts more efficiently than in ggplot2 itself. The table below shows some of these features.

Function What it draws
ggscatter Scatter plot
gghistogram Histogram
ggdensity Kernel density estimator
ggboxplot Boxplot
ggviolin Violin plot
ggbarplot Barplot
ggdotchart Lollipop plot

Moreover, the functions from the ggpubr package are characterized by a departure from the layer-adding scheme known from ggplot2, which makes them more similar to the graphics package. This makes them easier to use than the classic ggplot2 and more friendly to people without programming experience. Let’s compare both packages using an ordinary scatter plot as an example.


Scatter plot

library (ggplot2)
library (ggpubr)

# ggplot2
p1 <- ggplot (mtcars) + geom_point( aes (x=mpg, y=disp)) + ggtitle("ggplot2")

# ggpubr
p2 <- ggscatter (mtcars, x="mpg", y="disp", title = "ggpubr")

p1 + p2

The ggscatter function comes with an impressive set of arguments. What cannot be set inside it can be set in the ggpar() function, allowing you to change further graphic parameters.

# ggplot2
p1 <- p1 + scale_y_log10()

# ggpubr
p2 <- ggpar (p2, yscale = "log10")

p1 + p2

However, please note that the objects returned by functions from the ggpubr package are still of class ggplot and therefore you can add additional layers to them.

p2 + scale_x_reverse()


Histogram and kernel density estimator

The gghistogram function creates histograms based on unaggregated data placed in a data frame.

set.seed(1234)
wdata <- data.frame (sex = factor (rep (c ("F", "M"), each=200)),
                     weight = c (rnorm (200, 55), rnorm (200, 58)))

gghistogram (wdata, x = "weight", color = "sex")

As you can see above, the default histogram setting in the gghistogram function is one behind another, which is a difference from the classic geom_histogram where the default setting is one on top of the other (the position parameter defaults to stack, which can be manually changed to identity). Interesting parameters of the gghistogram function are add (you can add the mean or median) and rug (a bar graph is shown under the histogram). You can manually set the desired colors with the palette parameter.

gghistogram (wdata, x = "weight",
             add = "mean",
             rug = TRUE,
             color = "sex",
             fill = "sex",
             palette = c("#00AFBB", "#E7B800"))

An analogous function to gghistogram is ggdensity, which represents a kernel density estimator.

ggdensity(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#00AFBB", "#E7B800"))


Box and violin plots

Another class of graphical data analysis is box and violin plots. The former shows some basic statistics about the sample: median, quartiles, and outliers.

data ("ToothGrowth")
head (ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
ggboxplot (ToothGrowth, x = "dose", y = "len", color = "dose")

We can easily improve the plot by adding points with shapes depending on the series.

p <- ggboxplot (ToothGrowth, x = "dose", y = "len", color = "dose", 
                add = "jitter", shape = "dose",
                palette = c ("#00AFBB", "#E7B800", "#FC4E07"))
p

When using box plots, information about the (in)equality of means is often provided along with the p-value of the appropriate statistical test (usually ANOVA or Kruskall-Wallis). Thanks to the stat_compare_mean function we can place this information directly in the figure.

my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p + stat_compare_means (label.y = 50) + stat_compare_means (comparisons = my_comparisons)           

Violin plots usually contain more information than box plots and are especially useful when the distribution being examined is not unimodal.

ggviolin (ToothGrowth, x = "dose", y = "len", fill = "dose",
          add = "median_iqr", palette = c("#00AFBB", "#E7B800", "#FC4E07"))

It is not uncommon to place a box plot inside a violin plot.

ggviolin (ToothGrowth, x = "dose", y = "len", fill = "dose",
          add = "boxplot", add.params = list(fill = "white"),
          palette = c("#00AFBB", "#E7B800", "#FC4E07"))


Bar and lollipop plots

Bar plots are also a popular way to present data. They are helpful when we want to show the relationship between a numerical and categorical variable. An example would be the dependence of fuel consumption on the car brand. An additional grouping factor may be the number of engine cylinders.

mtcars$cyl <- as.factor (mtcars$cyl)
mtcars$name <- rownames (mtcars)

ggbarplot (mtcars, x = "name", y = "mpg",
           fill = "cyl",              
           sort.val = "desc",          
           x.text.angle = 90)          

We can easily remove the sorting of bars according to the number of cylinders and make the frames of the bars invisible on a white background.

ggbarplot (mtcars, x = "name", y = "mpg",
           fill = "cyl",               
           color = "white",            
           palette = "jco",            
           sort.val = "desc",          
           sort.by.groups = FALSE,    
           x.text.angle = 90 )         

Bar charts are often presented in a vertical orientation. It is enough to change one parameter to rotate the chart by 90 degrees, but for a better effect, it is also worth ensuring the correct orientation of the labels on the axes.

ggbarplot (mtcars, x = "name", y = "mpg",
           fill = "cyl",              
           color = "white",           
           palette = "jco",            
           sort.val = "desc",         
           sort.by.groups = FALSE,     
           rotate = TRUE,               
           ggtheme = theme_minimal()) 

A rather unique type of bar plot is a lollipop chart. They are used in situations where a categorical variable takes on a lot of values, and therefore, the bars would be very narrow.

ggdotchart (mtcars, x = "name", y = "mpg",
            color = "cyl",
            palette = "jco", 
            sorting = "ascending",
            add = "segments") 

Let’s try to make this figure even more readable.

ggdotchart (mtcars, x = "name", y = "mpg",
            color = "cyl",                                
            palette = c("#00AFBB", "#E7B800", "#FC4E07"), 
            sorting = "descending",                       
            add = "segments",                             
            rotate = TRUE,                        
            group = "cyl",                         
            dot.size = 6,                         
            label = round (mtcars$mpg),           
            font.label = list (color = "white", 
                               size = 9, 
                               vjust = 0.5))


More

More tips on using the ggpubr package can be found at https://rpkgs.datanovia.com/ggpubr/.


ggiraph

ggiraph is a package that allows you to create interactive charts. Interactivity has been added to geometry, legends, and appearance elements through the following aesthetics:

The package’s functionalities are particularly applicable to Shiny applications. You can make individual points in the chart clickable and available as reactive values. Using ggiraph in R comes down to three steps:

  1. Use geometry with the suffix interactive instead of the standard one, e.g. geom_point_interactive instead of geom_point.
  2. Use at least one of the aesthetics: tooltip, onclick, data_id.
  3. Call the girafe function, providing an interactive graph as an argument.
library (ggiraph)

gg_point <- ggplot(mtcars) + 
  geom_point_interactive (aes (x = hp,
                               y = qsec,
                               color = mpg,
                               size = mpg,
                               tooltip = name,
                               data_id = name)) +
  theme_bw()

girafe(ggobj = gg_point)


esquisse

The last extension presented is esquisse, which differs significantly from the previously mentioned packages. First of all, it is an application written in Shiny and can be run both from the console and RStudio (Menu Tools -> Addins -> Browse Addins… -> esquisse). After starting, select the data set you want to work with. Alternatively, you can invoke the application in the console with the appropriate command and immediately indicate the data frame.

library (esquisse)
esquisser (mtcars)
View of the application window after running with the mtcars data set.
View of the application window after running with the mtcars data set.


esquisse is used to create plots manually. It allows for high-speed data exploration without writing a line of code. What’s more, it generates code itself for later use. It also allows you to export the received drawing directly to a graphic file.

Scatterplot of quarter-mile time versus horsepower.
Scatterplot of quarter-mile time versus horsepower.


We determine the aesthetics by dragging tiles with variable names to the fields and marking the appropriate connections. The tile’s color indicates the variable type: blue - numeric variable, orange - enumerated, gray - string. The geometry of the chart automatically adapts to the aesthetics, but it can be changed using the button on the left above the chart. Below the plot are buttons allowing you to configure graphic options and data range. The first button on the bottom right opens a window with the code that generates a given image. At the very top of the window, on the navy blue bar, there are two essential buttons. The first is for import, and the second is for previewing the data table.