DATA ANALYSIS AND VISUALIZATION IN R, WINTER 2025 EDITION



Basics of ggplot2

The ggplot2 package allows the creation of elegant graphics and plots. It is particularly useful in the analysis of multidimensional and/or multiserial data and as an aid for data mining.

Two basic features of the package:

The basic function with which we need to start creating each graphic is ggplot(). Two most common arguments of this function are:

However, just calling the ggplot() function, even with a dataset and a mapping specified, will not produce any meaningful effect.

library (ggplot2)
df1 <- data.frame (a = 1:10, b = (1:10) + runif (10,-2,2), c = rnorm (10, mean=1:10))
df1
##     a            b        c
## 1   1 -0.537723668 2.534975
## 2   2  0.007775793 3.343147
## 3   3  4.733468513 1.905554
## 4   4  3.684972042 2.584301
## 5   5  5.754120714 3.507788
## 6   6  4.306885478 6.228185
## 7   7  5.813534214 6.057211
## 8   8  6.033869652 6.415369
## 9   9  9.634629773 9.457097
## 10 10 11.214659069 9.123157
ggplot (df1, aes (x = a, y = b))

Only after adding the appropriate layer (in this case we want to draw a scatter plot using geom_point() geometry) can we obtain a graph.

ggplot (df1, aes (x = a, y = b)) + geom_point ()

The ggplot2 package is quite flexible. The same graph can be obtained using different operations.

ggplot (df1, aes (x = a, y = b)) + geom_point ()
ggplot (df1) + geom_point (aes (x = a, y = b))
ggplot () + geom_point (data = df1, aes (x = a, y = b))
ggplot (df1, aes (x = a)) + geom_point (aes(y = b))

In addition, you can (and it is even quite recommended for the purity of the code) place the base layer in a variable and then add subsequent layers to it.

g <- ggplot (df1, aes (x = a, y = b))
g + geom_point ()

There is also an option to “replace” the data frame “on the fly” using the %+% operator.

df2 <- data.frame (a = seq (-2,2,0.1), b = dnorm (seq (-2,2,0.1)))
g %+% df2 + geom_point ()


Visual properties

Graphical parameters of layers

Each type of layer (called geometry) has variety of parameters which allows you to customize its look. The most popular are size, shape, color and fill.

g <- ggplot(df1, aes(x = a))
g + 
  geom_point (aes (y = b), size = 4, shape = 22, color = "red") + 
  geom_point (aes (y = c), size = 3, shape = 21, fill = "blue")

A layer with lines instead of points is obtained by using geom_line() geometry which is controlled by the parameters linewidth and linetype.

g + 
  geom_point (aes (y = b), size = 4, shape = 22, color = "red", fill = "blue") + 
  geom_line (aes (y = c), linewidth = 1.2, linetype = 3, color = "darkgreen")

The ggplot2 package also allows you to fit a linear model to a series by using the geom_smooth() layer.

g +
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black")

Using the alpha parameter one can adjust the transparency of some layers. This is especially useful when dealing with a large amount of data points which overlap.

df.norm <- data.frame (x = rnorm (1e4, 0, 1), y = rnorm (1e4, 0, 1))
g.norm <- ggplot (df.norm, aes (x = x, y = y))

# No transparency
g.norm + geom_point (shape = 21)

# 75% of transparency
g.norm + geom_point (shape = 21, alpha = 0.25)

df.norm2 <- data.frame (x = rnorm (1e4,2,1), y = rnorm( 1e4,-2,1))
df.norm3 <- data.frame (x = rnorm (1e4,-2,1), y = rnorm (1e4,-2,1))

# No transparency
g.norm + 
  geom_point (shape=21, fill="blue", colour="blue") + 
  geom_point (data = df.norm2, shape=21, fill = "green", colour = "green") + 
  geom_point (data = df.norm3, shape=21, fill = "red", colour = "red")

# 90% of transparency
g.norm + 
  geom_point (shape=21, fill = "blue", colour = "blue", alpha = 0.1) + 
  geom_point (data = df.norm2, shape = 21, fill = "green", colour = "green", alpha = 0.1) + 
  geom_point (data = df.norm3, shape = 21,fill = "red", colour = "red", alpha = 0.1)

Themes

In addition to the graphic parameters relating to specific layers, it is also possible to modify the properties of the entire plot using the theme() function.

g + 
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black") +
  theme (axis.title = element_text (size = 14),
         axis.text = element_text (size = 14))

There are many predefined themes, e.g. theme_bw(), theme_minimal() or theme_classic(), which can be used as a first choice for plotting neat figures.

g + 
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black") +
  theme_bw ()

g + 
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black") +
  theme_minimal ()

g + 
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black") +
  theme_classic ()

The predefined styles can by further modified by combining them with the theme() function.

my_theme <- theme_dark() + theme (axis.title = element_text (size = 14),
                                  axis.text = element_text (size = 14))

g + 
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes(y = b), shape = 21, size = 3, fill = "black") +
  my_theme

To gain access to all the data from which the plot is built, call the ggplot_build() function, passing the plot saved to the variable as an argument. The result of the function is a list containing data frames for all layers of the plot and all information about its graphical configuration.

p <- g +
  geom_smooth (aes (y = b), method = "lm", linewidth = 1.2, color = "red") + 
  geom_point (aes (y = b), shape = 21, size = 3, fill = "black")

ggplot_build(p)
## $data
## $data[[1]]
##            x           y        ymin      ymax        se flipped_aes PANEL
## 1   1.000000  0.03007699 -1.94364548  2.003799 0.8559059       FALSE     1
## 2   1.113924  0.15753375 -1.78081786  2.095885 0.8405673       FALSE     1
## 3   1.227848  0.28499052 -1.61826500  2.188246 0.8253478       FALSE     1
## 4   1.341772  0.41244728 -1.45600239  2.280897 0.8102543       FALSE     1
## 5   1.455696  0.53990405 -1.29404657  2.373855 0.7952937       FALSE     1
## 6   1.569620  0.66736081 -1.13241516  2.467137 0.7804739       FALSE     1
## 7   1.683544  0.79481757 -0.97112702  2.560762 0.7658029       FALSE     1
## 8   1.797468  0.92227434 -0.81020224  2.654751 0.7512894       FALSE     1
## 9   1.911392  1.04973110 -0.64966229  2.749124 0.7369429       FALSE     1
## 10  2.025316  1.17718786 -0.48953009  2.843906 0.7227732       FALSE     1
## 11  2.139241  1.30464463 -0.32983010  2.939119 0.7087909       FALSE     1
## 12  2.253165  1.43210139 -0.17058840  3.034791 0.6950073       FALSE     1
## 13  2.367089  1.55955816 -0.01183279  3.130949 0.6814346       FALSE     1
## 14  2.481013  1.68701492  0.14640708  3.227623 0.6680855       FALSE     1
## 15  2.594937  1.81447168  0.30409969  3.324844 0.6549737       FALSE     1
## 16  2.708861  1.94192845  0.46121152  3.422645 0.6421137       FALSE     1
## 17  2.822785  2.06938521  0.61770696  3.521063 0.6295211       FALSE     1
## 18  2.936709  2.19684197  0.77354830  3.620136 0.6172121       FALSE     1
## 19  3.050633  2.32429874  0.92869562  3.719902 0.6052041       FALSE     1
## 20  3.164557  2.45175550  1.08310679  3.820404 0.5935153       FALSE     1
## 21  3.278481  2.57921227  1.23673748  3.921687 0.5821650       FALSE     1
## 22  3.392405  2.70666903  1.38954115  4.023797 0.5711733       FALSE     1
## 23  3.506329  2.83412579  1.54146915  4.126782 0.5605613       FALSE     1
## 24  3.620253  2.96158256  1.69247084  4.230694 0.5503510       FALSE     1
## 25  3.734177  3.08903932  1.84249371  4.335585 0.5405652       FALSE     1
## 26  3.848101  3.21649608  1.99148367  4.441508 0.5312273       FALSE     1
## 27  3.962025  3.34395285  2.13938534  4.548520 0.5223614       FALSE     1
## 28  4.075949  3.47140961  2.28614238  4.656677 0.5139918       FALSE     1
## 29  4.189873  3.59886638  2.43169802  4.766035 0.5061432       FALSE     1
## 30  4.303797  3.72632314  2.57599555  4.876651 0.4988402       FALSE     1
## 31  4.417722  3.85377990  2.71897896  4.988581 0.4921071       FALSE     1
## 32  4.531646  3.98123667  2.86059361  5.101880 0.4859675       FALSE     1
## 33  4.645570  4.10869343  3.00078705  5.216600 0.4804442       FALSE     1
## 34  4.759494  4.23615019  3.13950974  5.332791 0.4755588       FALSE     1
## 35  4.873418  4.36360696  3.27671595  5.450498 0.4713309       FALSE     1
## 36  4.987342  4.49106372  3.41236456  5.569763 0.4677785       FALSE     1
## 37  5.101266  4.61852048  3.54641986  5.690621 0.4649170       FALSE     1
## 38  5.215190  4.74597725  3.67885231  5.813102 0.4627593       FALSE     1
## 39  5.329114  4.87343401  3.80963913  5.937229 0.4613153       FALSE     1
## 40  5.443038  5.00089078  3.93876483  6.063017 0.4605915       FALSE     1
## 41  5.556962  5.12834754  4.06622160  6.190473 0.4605915       FALSE     1
## 42  5.670886  5.25580430  4.19200942  6.319599 0.4613153       FALSE     1
## 43  5.784810  5.38326107  4.31613613  6.450386 0.4627593       FALSE     1
## 44  5.898734  5.51071783  4.43861721  6.582818 0.4649170       FALSE     1
## 45  6.012658  5.63817459  4.55947543  6.716874 0.4677785       FALSE     1
## 46  6.126582  5.76563136  4.67874035  6.852522 0.4713309       FALSE     1
## 47  6.240506  5.89308812  4.79644766  6.989729 0.4755588       FALSE     1
## 48  6.354430  6.02054489  4.91263850  7.128451 0.4804442       FALSE     1
## 49  6.468354  6.14800165  5.02735860  7.268645 0.4859675       FALSE     1
## 50  6.582278  6.27545841  5.14065747  7.410259 0.4921071       FALSE     1
## 51  6.696203  6.40291518  5.25258759  7.553243 0.4988402       FALSE     1
## 52  6.810127  6.53037194  5.36320359  7.697540 0.5061432       FALSE     1
## 53  6.924051  6.65782870  5.47256147  7.843096 0.5139918       FALSE     1
## 54  7.037975  6.78528547  5.58071796  7.989853 0.5223614       FALSE     1
## 55  7.151899  6.91274223  5.68772982  8.137755 0.5312273       FALSE     1
## 56  7.265823  7.04019900  5.79365338  8.286745 0.5405652       FALSE     1
## 57  7.379747  7.16765576  5.89854404  8.436767 0.5503510       FALSE     1
## 58  7.493671  7.29511252  6.00245589  8.587769 0.5605613       FALSE     1
## 59  7.607595  7.42256929  6.10544141  8.739697 0.5711733       FALSE     1
## 60  7.721519  7.55002605  6.20755126  8.892501 0.5821650       FALSE     1
## 61  7.835443  7.67748281  6.30883410  9.046132 0.5935153       FALSE     1
## 62  7.949367  7.80493958  6.40933646  9.200543 0.6052041       FALSE     1
## 63  8.063291  7.93239634  6.50910267  9.355690 0.6172121       FALSE     1
## 64  8.177215  8.05985311  6.60817486  9.511531 0.6295211       FALSE     1
## 65  8.291139  8.18730987  6.70659294  9.668027 0.6421137       FALSE     1
## 66  8.405063  8.31476663  6.80439464  9.825139 0.6549737       FALSE     1
## 67  8.518987  8.44222340  6.90161556  9.982831 0.6680855       FALSE     1
## 68  8.632911  8.56968016  6.99828921 10.141071 0.6814346       FALSE     1
## 69  8.746835  8.69713692  7.09444714 10.299827 0.6950073       FALSE     1
## 70  8.860759  8.82459369  7.19011896 10.459068 0.7087909       FALSE     1
## 71  8.974684  8.95205045  7.28533250 10.618768 0.7227732       FALSE     1
## 72  9.088608  9.07950722  7.38011383 10.778901 0.7369429       FALSE     1
## 73  9.202532  9.20696398  7.47448741 10.939441 0.7512894       FALSE     1
## 74  9.316456  9.33442074  7.56847615 11.100365 0.7658029       FALSE     1
## 75  9.430380  9.46187751  7.66210153 11.261653 0.7804739       FALSE     1
## 76  9.544304  9.58933427  7.75538366 11.423285 0.7952937       FALSE     1
## 77  9.658228  9.71679103  7.84834136 11.585241 0.8102543       FALSE     1
## 78  9.772152  9.84424780  7.94099228 11.747503 0.8253478       FALSE     1
## 79  9.886076  9.97170456  8.03335295 11.910056 0.8405673       FALSE     1
## 80 10.000000 10.09916133  8.12543885 12.072884 0.8559059       FALSE     1
##    group colour   fill linewidth linetype weight alpha
## 1     -1    red grey60       1.2        1      1   0.4
## 2     -1    red grey60       1.2        1      1   0.4
## 3     -1    red grey60       1.2        1      1   0.4
## 4     -1    red grey60       1.2        1      1   0.4
## 5     -1    red grey60       1.2        1      1   0.4
## 6     -1    red grey60       1.2        1      1   0.4
## 7     -1    red grey60       1.2        1      1   0.4
## 8     -1    red grey60       1.2        1      1   0.4
## 9     -1    red grey60       1.2        1      1   0.4
## 10    -1    red grey60       1.2        1      1   0.4
## 11    -1    red grey60       1.2        1      1   0.4
## 12    -1    red grey60       1.2        1      1   0.4
## 13    -1    red grey60       1.2        1      1   0.4
## 14    -1    red grey60       1.2        1      1   0.4
## 15    -1    red grey60       1.2        1      1   0.4
## 16    -1    red grey60       1.2        1      1   0.4
## 17    -1    red grey60       1.2        1      1   0.4
## 18    -1    red grey60       1.2        1      1   0.4
## 19    -1    red grey60       1.2        1      1   0.4
## 20    -1    red grey60       1.2        1      1   0.4
## 21    -1    red grey60       1.2        1      1   0.4
## 22    -1    red grey60       1.2        1      1   0.4
## 23    -1    red grey60       1.2        1      1   0.4
## 24    -1    red grey60       1.2        1      1   0.4
## 25    -1    red grey60       1.2        1      1   0.4
## 26    -1    red grey60       1.2        1      1   0.4
## 27    -1    red grey60       1.2        1      1   0.4
## 28    -1    red grey60       1.2        1      1   0.4
## 29    -1    red grey60       1.2        1      1   0.4
## 30    -1    red grey60       1.2        1      1   0.4
## 31    -1    red grey60       1.2        1      1   0.4
## 32    -1    red grey60       1.2        1      1   0.4
## 33    -1    red grey60       1.2        1      1   0.4
## 34    -1    red grey60       1.2        1      1   0.4
## 35    -1    red grey60       1.2        1      1   0.4
## 36    -1    red grey60       1.2        1      1   0.4
## 37    -1    red grey60       1.2        1      1   0.4
## 38    -1    red grey60       1.2        1      1   0.4
## 39    -1    red grey60       1.2        1      1   0.4
## 40    -1    red grey60       1.2        1      1   0.4
## 41    -1    red grey60       1.2        1      1   0.4
## 42    -1    red grey60       1.2        1      1   0.4
## 43    -1    red grey60       1.2        1      1   0.4
## 44    -1    red grey60       1.2        1      1   0.4
## 45    -1    red grey60       1.2        1      1   0.4
## 46    -1    red grey60       1.2        1      1   0.4
## 47    -1    red grey60       1.2        1      1   0.4
## 48    -1    red grey60       1.2        1      1   0.4
## 49    -1    red grey60       1.2        1      1   0.4
## 50    -1    red grey60       1.2        1      1   0.4
## 51    -1    red grey60       1.2        1      1   0.4
## 52    -1    red grey60       1.2        1      1   0.4
## 53    -1    red grey60       1.2        1      1   0.4
## 54    -1    red grey60       1.2        1      1   0.4
## 55    -1    red grey60       1.2        1      1   0.4
## 56    -1    red grey60       1.2        1      1   0.4
## 57    -1    red grey60       1.2        1      1   0.4
## 58    -1    red grey60       1.2        1      1   0.4
## 59    -1    red grey60       1.2        1      1   0.4
## 60    -1    red grey60       1.2        1      1   0.4
## 61    -1    red grey60       1.2        1      1   0.4
## 62    -1    red grey60       1.2        1      1   0.4
## 63    -1    red grey60       1.2        1      1   0.4
## 64    -1    red grey60       1.2        1      1   0.4
## 65    -1    red grey60       1.2        1      1   0.4
## 66    -1    red grey60       1.2        1      1   0.4
## 67    -1    red grey60       1.2        1      1   0.4
## 68    -1    red grey60       1.2        1      1   0.4
## 69    -1    red grey60       1.2        1      1   0.4
## 70    -1    red grey60       1.2        1      1   0.4
## 71    -1    red grey60       1.2        1      1   0.4
## 72    -1    red grey60       1.2        1      1   0.4
## 73    -1    red grey60       1.2        1      1   0.4
## 74    -1    red grey60       1.2        1      1   0.4
## 75    -1    red grey60       1.2        1      1   0.4
## 76    -1    red grey60       1.2        1      1   0.4
## 77    -1    red grey60       1.2        1      1   0.4
## 78    -1    red grey60       1.2        1      1   0.4
## 79    -1    red grey60       1.2        1      1   0.4
## 80    -1    red grey60       1.2        1      1   0.4
## 
## $data[[2]]
##               y  x PANEL group shape colour size  fill alpha stroke
## 1  -0.537723668  1     1    -1    21  black    3 black    NA    0.5
## 2   0.007775793  2     1    -1    21  black    3 black    NA    0.5
## 3   4.733468513  3     1    -1    21  black    3 black    NA    0.5
## 4   3.684972042  4     1    -1    21  black    3 black    NA    0.5
## 5   5.754120714  5     1    -1    21  black    3 black    NA    0.5
## 6   4.306885478  6     1    -1    21  black    3 black    NA    0.5
## 7   5.813534214  7     1    -1    21  black    3 black    NA    0.5
## 8   6.033869652  8     1    -1    21  black    3 black    NA    0.5
## 9   9.634629773  9     1    -1    21  black    3 black    NA    0.5
## 10 11.214659069 10     1    -1    21  black    3 black    NA    0.5
## 
## 
## $layout
## <ggproto object: Class Layout, gg>
##     coord: <ggproto object: Class CoordCartesian, Coord, gg>
##         aspect: function
##         backtransform_range: function
##         clip: on
##         default: TRUE
##         distance: function
##         expand: TRUE
##         is_free: function
##         is_linear: function
##         labels: function
##         limits: list
##         modify_scales: function
##         range: function
##         render_axis_h: function
##         render_axis_v: function
##         render_bg: function
##         render_fg: function
##         setup_data: function
##         setup_layout: function
##         setup_panel_guides: function
##         setup_panel_params: function
##         setup_params: function
##         train_panel_guides: function
##         transform: function
##         super:  <ggproto object: Class CoordCartesian, Coord, gg>
##     coord_params: list
##     facet: <ggproto object: Class FacetNull, Facet, gg>
##         compute_layout: function
##         draw_back: function
##         draw_front: function
##         draw_labels: function
##         draw_panels: function
##         finish_data: function
##         init_scales: function
##         map_data: function
##         params: list
##         setup_data: function
##         setup_params: function
##         shrink: TRUE
##         train_scales: function
##         vars: function
##         super:  <ggproto object: Class FacetNull, Facet, gg>
##     facet_params: list
##     finish_data: function
##     get_scales: function
##     layout: data.frame
##     map_position: function
##     panel_params: list
##     panel_scales_x: list
##     panel_scales_y: list
##     render: function
##     render_labels: function
##     reset_scales: function
##     resolve_label: function
##     setup: function
##     setup_panel_guides: function
##     setup_panel_params: function
##     train_position: function
##     super:  <ggproto object: Class Layout, gg>
## 
## $plot

## 
## attr(,"class")
## [1] "ggplot_built"


Grouping

Although the ggplot2 package can be used to plot data stored in wide tables, it works much better with long tables (tidy data). Placing the series one under the other instead of one next to the other and adding a character or factor column with the series label allows automatic series recognition and legend generation, called grouping. To do this, proper mapping should be done inside the aes() function.

library(dplyr)
library (tidyr)

# Pivot df1 from wide table to long table and add a new column
df1 %>%
  pivot_longer (cols = 2:3, names_to = "ser") %>% 
  mutate (n = sample (20)) -> df2
df2
## # A tibble: 20 × 4
##        a ser      value     n
##    <int> <chr>    <dbl> <int>
##  1     1 b     -0.538       8
##  2     1 c      2.53       10
##  3     2 b      0.00778    18
##  4     2 c      3.34       11
##  5     3 b      4.73       17
##  6     3 c      1.91        3
##  7     4 b      3.68        5
##  8     4 c      2.58       16
##  9     5 b      5.75       12
## 10     5 c      3.51        1
## 11     6 b      4.31       14
## 12     6 c      6.23        2
## 13     7 b      5.81        7
## 14     7 c      6.06        6
## 15     8 b      6.03       19
## 16     8 c      6.42       13
## 17     9 b      9.63        9
## 18     9 c      9.46       15
## 19    10 b     11.2        20
## 20    10 c      9.12        4
g2 <- ggplot (df2, aes (x = a, y = value))
g2 + geom_point (aes (fill = ser), size = 3, shape = 21)

g2 + geom_point (aes (shape = ser), size = 3, color = "blue")

Size mapping is also beneficial, for example, if we want to visualize the number of observations entering a given point.

g2 + geom_point (aes (fill = ser, size = n), shape = 21)

To perform grouping in geom_smooth geometry, assign a column name to the group parameter inside the aes() function.

# Data generation
df3 <- data.frame (x = c (0:10, 5:15, 10:20), 
                   y = c (0:10 + runif(11,-2,2), 5:15 + runif(11,-3,3), 10:20 + runif(11,-2,2)),
                   ser = rep (c ("R1","R2","R3"), each = 11))

g3 <- ggplot (df3, aes (x = x, y = y))
p <- geom_point (aes (fill = ser), size = 3, shape = 21)

# Without linear fit
g3 + p

# Linear fit without grouping
g3 + p + geom_smooth (method = "lm")

# Linear fit with grouping
g3 + p + geom_smooth (aes (group = ser), method = "lm")

Manual selection of colors

The colors for individual series are selected automatically, but there is nothing stopping you from choosing them yourself.

g3 + p + 
  geom_smooth (aes (color = ser), method = "lm") +
  scale_fill_manual (values = c ("red", "orange", "gold")) +
  scale_color_manual (values = c ("red", "orange", "gold"))

Labels

We can modify the axes and legend labels as well as the plot title using the labs() function.

g3 + p + 
  geom_smooth (aes (color = ser), method = "lm") +
  labs (x = "voltage [V]",
        y = "current [A]",
        color = "resistor",
        fill = "resistor",
        title = "Measurement of current-voltage characteristics")


Histograms

Another important type of plot is the histogram. The layer geom_histogram() displays the counts (or density) with bars, while geom_freqpoly() shows the empirical distributions with lines. Let’s start with generating some random series from normal distributions.

# Data generation
sizes <- c( 1e2, 1e3, 1e4)
sigmas <- c (0.5, 1)
mu <- 0

x1 <- rnorm (sizes[1], mu, sigmas[1])
x2 <- rnorm (sizes[2], mu, sigmas[1])
x3 <- rnorm (sizes[3], mu, sigmas[1])
x4 <- rnorm (sizes[1], mu, sigmas[2])
x5 <- rnorm (sizes[2], mu, sigmas[2])
x6 <- rnorm (sizes[3], mu, sigmas[2])

df.norm <- data.frame (x = c(x1, x2, x3, x4, x5, x6),
                       size = rep (sizes, sizes) %>% as.factor(),
                       sigma = rep (sigmas, each = sum (sizes)) %>% as.factor())

df.norm %>% filter (size == as.character (sizes[3])) %>% ggplot () -> gh
gh + geom_histogram (aes (x = x, fill = sigma, color = sigma))

gh + geom_freqpoly (aes (x = x, color = sigma))

Using the position parameter you can control the positioning of several histograms relative to each other. The default value (stack) stacks one bar on top of the other, fill normalizes the sum to one, and dodge displays the bars side by side.

gh + geom_histogram (aes (x = x, fill = sigma, color = sigma), 
                     position = "dodge")

The identity option obscures one series from another, so using transparency is useful.

gh + geom_histogram (aes (x = x, fill = sigma, color = sigma), 
                     position = "identity", alpha = 0.3)

To show the probability density instead of the number of counts, use the after_stat (density) function. If we also need to plot a certain function (e.g. probability density of the normal distribution), for the entire x-axis range, we use the stat_function() function. We provide the name of the function (fun=...) and its parameters using a list (args = list(...)).

gh + 
  geom_histogram (aes (x = x, y = after_stat (density), fill = sigma, colour = sigma),
                  position="identity", alpha = 0.3) +
  stat_function (fun = dnorm, args = list (mean = 0, sd = 0.5))


Faceting

In the case of a larger number of series that can be grouped according to one or more features, it is convenient to use the faceting mechanism using the facet_grid() or facet_wrap() functions.

ggplot (df.norm) +
  geom_histogram (aes (x = x, y = after_stat (density), fill = size, colour = size), alpha = 0.6) +
  facet_grid (size ~ sigma)


Saving plots

Plots made using the ggplot2 package are saved to the file in a different way than in the case of the basic graphics package. The ggsave (filename) function is used for this purpose, the basic argument of which is the string storing the file name, and the additional parameters include:

ggsave ("histograms.pdf", device = "pdf", width = 15, height = 10, units = "cm")