The ggplot2
package offers many possibilities regarding
data visualization but requires some practice. Experts of
ggplot2
create their plotting libraries based on this
package. An overview of extensions can be found here.
As a reminder, the first step to making a plot in
ggplot2
is to place the data in a data frame. If the data
covers many series, it is best if the frame is long, not wide; in other
words, it follows the tidy data paradigm. It is then possible
to make the entire chart with one call to the geom_*
function instead of calling this function separately for each
series.
library (ggplot2)
data ("mtcars")
g1 <- ggplot (mtcars)
g1 + geom_point (aes(x=hp, y=qsec, size=mpg)) + theme_bw()
data ("iris")
g2 <- ggplot (iris)
g2 + geom_point (aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + theme_bw()
The patchwork
package aims to simplify the process of
combining different charts into one image as much as possible. It does
the same thing as the gridExtra
and cowplot
packages but uses an API that allows you to create arbitrarily complex
graphics.
Let’s start by making some sample plots:
library (patchwork)
p1 <- ggplot (mtcars) +
geom_point (aes(mpg, disp)) +
ggtitle ('Plot 1')
p2 <- ggplot (mtcars) +
geom_boxplot (aes(gear, disp, group = gear)) +
ggtitle ('Plot 2')
p3 <- ggplot (mtcars) +
geom_point (aes(hp, wt, colour = mpg)) +
ggtitle ('Plot 3')
p4 <- ggplot (mtcars) +
geom_bar (aes(gear)) +
facet_wrap (~cyl) +
ggtitle ('Plot 4')
The most straightforward use of the patchwork
package is
the +
operator, which adds two images together.
When we add images in this way, the last added image has the active status, and all subsequent functions refer to it.
By default, patchwork
will force a square grid layout
and fill by row.
This behavior can be changed using the plot_layout
function.
Of course, filling a square grid is not the only possibility for the
package. Another option is to stack the charts on each other using the
/
operator or side by side using |
.
The difference between +
and |
is that the
+
operator will always fill a square grid with graphs,
while |
will always place graphs side by side.
It is possible to nest layouts using parentheses.
If we want to add empty space to our composition, we can use the
plot_spacer
function.
Sometimes, it is necessary to add a title to your chart composition.
This can be done using the plot_annotation()
function.
Another important functionality is adding insets to our charts using
the inset_element
function.
The gganimate
package extends the ggplot2
syntax to include animation descriptions. It does this using a set of
new features/layers that can be added to the graph and which determine
how it should change over time:
transition_*
specifies how the data should be
distributed and how it relates to each other over time,view_*
specifies how positional scales should change
during animation,shadow_*
specifies how data from other points in time
should be presented at a given point in time,enter_* / exit_*
specifies how new data should appear
and how old data should disappear during the animation,ease_aes
specifies how various aesthetics should be
smoothed during transitions.The following supporting packages are needed for the library to function correctly:
gifski
to save animations in GIF format (requires Rust
compiler - cargo)av
to create animated video files.Let’s look at the example below: we make a simple boxplot of fuel economy as a function of the number of cylinders and allow it to cycle through the number of gears available in the cars.
library (gganimate)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
# Here comes the gganimate code
transition_states(
gear,
transition_length = 2,
state_length = 1
) +
enter_fade() +
exit_shrink() +
ease_aes('sine-in-out')
Since this is a discrete partition (gear is best described as a
factor enum variable), we use transition_states
and report
the relative transition times of transition_length
and
state state_length
. Since not all values of the
gear
variable have all three values of the cyl
variable, some states will be missing (no box on the chart). For this
reason, we define the box as one that appears on the graph with a fade
effect and disappears by shrinking to zero. Finally, we decide to use
sine smoothing for our entire aesthetic (in this case, only
y
changes)
We will use data from the gapminder
package for the
following example.
library (gapminder)
ggplot (gapminder, aes (gdpPercap, lifeExp, size = pop, colour = country)) +
geom_point (alpha = 0.7, show.legend = FALSE) +
scale_colour_manual (values = country_colors) +
scale_size (range = c(2, 12)) +
scale_x_log10 () +
facet_wrap (~continent) +
# Here comes the gganimate specific bits
labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
transition_time(year) +
ease_aes('linear')
The above example uses transition_time()
, which can be
used with continuous variables such as year
. There is no
need to provide relative transition times and states for these types of
transitions because the choice of variable dictates this (the transition
from 1980 to 1990 should take twice as long as the transition from 1990
to 1995). You can also see how you can associate the chart title (or any
other labels) with a time variable.
Examples come from https://gganimate.com/.
The creators of the ggpubr
package point out that the
plots generated by default by the ggplot2
package are far
from the standards known from scientific publications and every effort
must be made to improve them. So, they created a set of functions that
can be used to create elegant charts more efficiently than in
ggplot2
itself. The table below shows some of these
features.
Function | What it draws |
---|---|
ggscatter |
Scatter plot |
gghistogram |
Histogram |
ggdensity |
Kernel density estimator |
ggboxplot |
Boxplot |
ggviolin |
Violin plot |
ggbarplot |
Barplot |
ggdotchart |
Lollipop plot |
Moreover, the functions from the ggpubr
package are
characterized by a departure from the layer-adding scheme known from
ggplot2
, which makes them more similar to the
graphics
package. This makes them easier to use than the
classic ggplot2
and more friendly to people without
programming experience. Let’s compare both packages using an ordinary
scatter plot as an example.
library (ggplot2)
library (ggpubr)
# ggplot2
p1 <- ggplot (mtcars) + geom_point( aes (x=mpg, y=disp)) + ggtitle("ggplot2")
# ggpubr
p2 <- ggscatter (mtcars, x="mpg", y="disp", title = "ggpubr")
p1 + p2
The ggscatter
function comes with an impressive set of
arguments. What cannot be set inside it can be set in the
ggpar()
function, allowing you to change further graphic
parameters.
However, please note that the objects returned by functions from the
ggpubr
package are still of class ggplot
and
therefore you can add additional layers to them.
The gghistogram
function creates histograms based on
unaggregated data placed in a data frame.
set.seed(1234)
wdata <- data.frame (sex = factor (rep (c ("F", "M"), each=200)),
weight = c (rnorm (200, 55), rnorm (200, 58)))
gghistogram (wdata, x = "weight", color = "sex")
As you can see above, the default histogram setting in the
gghistogram
function is one behind another, which is a
difference from the classic geom_histogram
where the
default setting is one on top of the other (the position
parameter defaults to stack
, which can be manually changed
to identity
). Interesting parameters of the
gghistogram
function are add
(you can add the
mean or median) and rug
(a bar graph is shown under the
histogram). You can manually set the desired colors with the
palette
parameter.
gghistogram (wdata, x = "weight",
add = "mean",
rug = TRUE,
color = "sex",
fill = "sex",
palette = c("#00AFBB", "#E7B800"))
An analogous function to gghistogram
is
ggdensity
, which represents a kernel density estimator.
ggdensity(wdata, x = "weight",
add = "mean", rug = TRUE,
color = "sex", fill = "sex",
palette = c("#00AFBB", "#E7B800"))
Another class of graphical data analysis is box and violin plots. The former shows some basic statistics about the sample: median, quartiles, and outliers.
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
We can easily improve the plot by adding points with shapes depending on the series.
p <- ggboxplot (ToothGrowth, x = "dose", y = "len", color = "dose",
add = "jitter", shape = "dose",
palette = c ("#00AFBB", "#E7B800", "#FC4E07"))
p
When using box plots, information about the (in)equality of means is
often provided along with the p-value of the appropriate statistical
test (usually ANOVA or Kruskall-Wallis). Thanks to the
stat_compare_mean
function we can place this information
directly in the figure.
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p + stat_compare_means (label.y = 50) + stat_compare_means (comparisons = my_comparisons)
Violin plots usually contain more information than box plots and are especially useful when the distribution being examined is not unimodal.
ggviolin (ToothGrowth, x = "dose", y = "len", fill = "dose",
add = "median_iqr", palette = c("#00AFBB", "#E7B800", "#FC4E07"))
It is not uncommon to place a box plot inside a violin plot.
ggviolin (ToothGrowth, x = "dose", y = "len", fill = "dose",
add = "boxplot", add.params = list(fill = "white"),
palette = c("#00AFBB", "#E7B800", "#FC4E07"))
Bar plots are also a popular way to present data. They are helpful when we want to show the relationship between a numerical and categorical variable. An example would be the dependence of fuel consumption on the car brand. An additional grouping factor may be the number of engine cylinders.
mtcars$cyl <- as.factor (mtcars$cyl)
mtcars$name <- rownames (mtcars)
ggbarplot (mtcars, x = "name", y = "mpg",
fill = "cyl",
sort.val = "desc",
x.text.angle = 90)
We can easily remove the sorting of bars according to the number of cylinders and make the frames of the bars invisible on a white background.
ggbarplot (mtcars, x = "name", y = "mpg",
fill = "cyl",
color = "white",
palette = "jco",
sort.val = "desc",
sort.by.groups = FALSE,
x.text.angle = 90 )
Bar charts are often presented in a vertical orientation. It is enough to change one parameter to rotate the chart by 90 degrees, but for a better effect, it is also worth ensuring the correct orientation of the labels on the axes.
ggbarplot (mtcars, x = "name", y = "mpg",
fill = "cyl",
color = "white",
palette = "jco",
sort.val = "desc",
sort.by.groups = FALSE,
rotate = TRUE,
ggtheme = theme_minimal())
A rather unique type of bar plot is a lollipop chart. They are used in situations where a categorical variable takes on a lot of values, and therefore, the bars would be very narrow.
ggdotchart (mtcars, x = "name", y = "mpg",
color = "cyl",
palette = "jco",
sorting = "ascending",
add = "segments")
Let’s try to make this figure even more readable.
ggdotchart (mtcars, x = "name", y = "mpg",
color = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
sorting = "descending",
add = "segments",
rotate = TRUE,
group = "cyl",
dot.size = 6,
label = round (mtcars$mpg),
font.label = list (color = "white",
size = 9,
vjust = 0.5))
ggiraph
is a package that allows you to create
interactive charts. Interactivity has been added to geometry, legends,
and appearance elements through the following aesthetics:
tooltip
- the tooltip will appear when you hover the
mouse over the element,onclick
- when you click on an element, a JavaScript
function will be launched,data_id
- a data-related identifier used in other
aesthetics.The package’s functionalities are particularly applicable to Shiny
applications. You can make individual points in the chart clickable and
available as reactive values. Using ggiraph
in R comes down
to three steps:
interactive
instead of the
standard one, e.g. geom_point_interactive
instead of
geom_point
.tooltip
,
onclick
, data_id
.girafe
function, providing an interactive
graph as an argument.The last extension presented is esquisse
, which differs
significantly from the previously mentioned packages. First of all, it
is an application written in Shiny and can be run both from the console
and RStudio (Menu Tools -> Addins -> Browse Addins… ->
esquisse). After starting, select the data set you want to work with.
Alternatively, you can invoke the application in the console with the
appropriate command and immediately indicate the data frame.
esquisse
is used to create plots manually. It allows for
high-speed data exploration without writing a line of code. What’s more,
it generates code itself for later use. It also allows you to export the
received drawing directly to a graphic file.
We determine the aesthetics by dragging tiles with variable names to the fields and marking the appropriate connections. The tile’s color indicates the variable type: blue - numeric variable, orange - enumerated, gray - string. The geometry of the chart automatically adapts to the aesthetics, but it can be changed using the button on the left above the chart. Below the plot are buttons allowing you to configure graphic options and data range. The first button on the bottom right opens a window with the code that generates a given image. At the very top of the window, on the navy blue bar, there are two essential buttons. The first is for import, and the second is for previewing the data table.