The standard R command for creating a simple scatter plot is
plot (x,y), where x and y are
numerical vectors of equal length.
The plot() function has a substantial set of different
options, with the most commonly used are:
xlab, ylab, main - axis (and
main) titles,xlim, ylim - axis ranges,type - plot type: “p” - points (default), “l” - lines,
“b” - both points and lines (see ?plot for more information),pch - point character/symbol (see ?points
for possible values),lty - linetype (see ?par for possible
values),col - color of points or lines,bg - filling/background color,cex - size of points,lwd - line thickness,font - font type: 1 - normal, 2 - bold, 3 - italic, 4 -
bold italic.In addition to the parameters of the plot() function,
there are also global options for plotting that can be controlled using
the par() function, for example:
bg - the color to be used for the background of the
device region,mai, mar - the size of the margins,mfrow, mfcol - parameters for plotting
multi0panel plots.For axis titles we can use the expression() function,
which encodes mathematic expression and greek letters
(e.g. ^ is a superscript, [..] is a
subscript).
plot (x,
x^2,
xlab = "x",
ylab = expression(f(xi)==x^2),
main = "The plot of f(x) function",
col = "red",
pch = 19,
font = 2,
font.lab = 4,
font.main = 3,
cex = 2)To add further series to the panel, one should use the
points or lines function, depending on what
type the next series is to be. The next call to the plot
function will replace the previous series instead of adding another one.
It is worth remembering that the range of axes on the plot is determined
automatically when the plot function is called and is not
corrected later when adding subsequent series. In the example below, the
red series is truncated because it falls outside the range of the blue
series.
y <- 2*x+2
z <- 3*x-1
plot (x, y, type = "l", lwd = 2, col = "blue")
lines (x, z, lwd = 2, col = "red")To save the plot to a graphical file, use one of the commands
png(), jpeg() or tiff(), and
after casting the plot, close the stream with the dev.off()
command.
png ("fig2.png")
plot (x, y, type = "l", lwd = 2, col = "blue")
lines (x, z, lwd = 2, col = "red")
dev.off()## png
## 2
A legend can be added to the plot by the legend()
function, which the most important parameters are:
x - the coordinate of the left edge of the legend or
one of the predefined keywords:
top,bottom,left,right,center,topleft,topright,bottomleft,bottomright,y - the coordinate of the left edge of the legend
(NULL by default, it is used in conjunction with the
x parameter),legend - a character vector of names of legend
elements,pch - an integer vector of symbols to display next to
the names of legend elements,lty - an integer vector of types of lines to display
next to the names of legend elements,col - a vector with the (border) colors of
symbols,pt.bg - a vector with the background colors of
symbols.x <- 1:10
plot (x,
x^2,
xlab = "x",
ylab = expression(f(x)),
col = "red",
bg = "red",
pch = 21,
cex = 2)
points (x,
x^1.5,
col = "blue",
bg = "blue",
pch = 22,
cex = 2)
legend ("topleft",
legend = c (expression(x^1.5),expression(x^2)),
pch = c (22,19),
col = c ("blue","red"),
pt.bg = c("blue","red"))There is also a particular function curve() for plotting
a continuous function over the interval. This function can be used
instead of plot() (it creates a fresh canvas) or as an
additional series after using a parameter add = TRUE.
plot (x,
x^2,
xlab = "x",
ylab = expression(f(x)),
col = "red",
bg = "red",
pch = 21,
cex = 2)
curve (x^2, from = 0, to = 11, col = "red", add = TRUE)
points (x,
x^1.5,
col = "blue",
bg = "blue",
pch = 22,
cex = 2)
curve (x^1.5, from = 0, to = 11, col = "blue", add = TRUE)The most straightforward function for counting repeated values is
table() which transforms numeric or character vectors into
factors before counting. It can be used for one or two vectors. In the
latter case, it builds a contingency table of the counts at each
combination of factor levels.
## v
## 0 1 2 3 4
## 2 2 2 2 1
## v
## 0 1.1 1.9 2 2.1 3
## 2 1 1 1 1 1
## y
## x 1 2 3 5
## 1 0 2 0 0
## 2 1 0 1 0
## 3 0 0 0 1
## 4 0 0 0 1
## 5 0 0 0 1
It is also possible to pass a data frame to the table()
function instead of two separate vectors.
## x y
## 1 1 2
## 2 1 2
## 3 2 3
## 4 2 1
## 5 3 5
## 6 4 5
## 7 5 5
## y
## x 1 2 3 5
## 1 0 2 0 0
## 2 1 0 1 0
## 3 0 0 0 1
## 4 0 0 0 1
## 5 0 0 0 1
Although the table() command is very useful, the generic
function for creating a histogram is hist(). By default, it
plots the histogram immediately after calling. The essential argument of
the function is a numeric vector. The second most important is
breaks, which can be a vector of the breakpoints or a
single number giving the number of bins or a function that returns one
of these two.
The hist() function does not only plot the histogram,
but also returns a list with all information needed for further
plotting, like breaks, counts or
mids.
## $breaks
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
##
## $counts
## [1] 3 5 2 2 1
##
## $density
## [1] 0.23076923 0.38461538 0.15384615 0.15384615 0.07692308
##
## $mids
## [1] 1 2 3 4 5
##
## $xname
## [1] "x"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
One of the base R package, namely stats,
implements a set of popular probability distributions (like normal,
binomial, exponential, etc., see ?distributions for
details). Each distribution has its own short name (like
norm, unif, binom) which is used
in combination with one of four letters denoting different type of
function:
r - random generation (e.g. runif(5)
generates five random numbers from uniform distribution),## [1] 0.9850465 0.9748526 0.1752976 0.9008859 0.2586559
## [1] 5.481890 1.797279 6.756874 8.291918 -8.843545
p - the cumulative distribution function
(e.g. pnorm (2, 0, 1.5) returns the value of the cumulative
distribution function of the normal distribution with
mean = 0 and sd = 1,5 for
x = 2),
d - the probability density function
(e.g. dexp (2, 0.1) gives the value of the probability
density function of the exponential distribution with
rate = 0.1 for x = 2),
x <- seq (-2, 2, 0.1)
sigmas <- c (0.5, 1, 0.3)
plot (x, dnorm(x, 0, sigmas[1]), ylim = c(0,1.5), pch = 19)
points (x, dnorm(x, 0, sigmas[2]), pch = 19, col = "blue")
points (x, dnorm(x, 0, sigmas[3]), pch = 19, col = "green", t = "o")
curve (pnorm(x, 0, sigmas[3]), from = -2, to = 2, col = "red", lwd = 2, add = TRUE)
legend ("topleft",
legend = c (expression (sigma==0.5),
expression (sigma==1.0),
expression (sigma==0.3),
expression (sigma==0.3)),
lty = c (0, 0, 1, 1),
pch = c (19, 19, 19, NA),
col = c ("black","blue", "green", "red")
)q - the quantile function
(e.g. qnorm (0.95, 0, 1) returns the quantile of order
q = 0.95 of the normal distribution with
mean = 0 and sd = 1).## [1] 1.644854
## [1] 3
A very useful function in the R package is sample().
Calling sample (x) returns a random permutation of a vector
x. Calling sample (x, size) gives a randomly
chosen size elements (without replacement) from a vector
x. Sampling with replacement can be done with the use of
the parameter replace = TRUE. Finally, setting the
parameter prob allows for sampling elements from a vector
x with unequal probabilities. It is worth noting that a
vector x does not have to be of numeric type.
## [1] 2 1 7 10 9 5 4 8 6 3
## [1] 9 8
## [1] 3 6 4 7
## [1] 5 1 8 5 5 9 1 7 1 1 2 9 2 3 10 7 3 6 7 9
## [1] 2 2 2 2 2 2 1 2 2 3
## [1] "a" "c" "b" "c" "a" "c" "c" "b" "a" "c"
## [1] 0.2 0.1 0.1 0.1 0.2 0.1 0.3 0.3 0.1 0.1
The ecdf() function is a quick and straightforward
method for obtaining the empirical cumulative distribution function,
which is often a first step for identifying the proper distribution. The
object returned by this function has an overload plot ()
function, which means that can be easily plotted.
It is interesting to see how the apperance of the empirical
cumulative distribution changes with the sample size. The function
par() called in the chunk of code below is used for
changing the global plotting parameters like the margins or the division
of canvas.
make.plot <- function(N) {
x <- rnorm (N, 0, 1)
plot (ecdf(x), main = N)
curve (pnorm(x, 0, 1), from = min(x), to = max(x), col = "red", lwd = 2, add=TRUE)
}
par (mfrow = c(2,2))
N <- c(10, 20, 50, 100)
sapply (N, make.plot)## [,1] [,2] [,3] [,4]
## x numeric,101 numeric,101 numeric,101 numeric,101
## y numeric,101 numeric,101 numeric,101 numeric,101