While R’s traditional graphics offers a nice set of plots, some of them require a lot of work. Viewing the same plot for different groups in your data is particularly difficult. The ggplot2 package is extremely flexible and repeating plots for groups is quite easy. The “gg” in ggplot2 stands for the Grammar of Graphics, a comprehensive theory of graphics by Leland Wilkinson which he described in his book by the same name. In his book, The Grammar of Graphics, Wilkinson showed how you could describe plots not as discrete types like bar plot or pie chart, but using a “grammar” that would work not only for plots we commonly use but for almost any conceivable graphic. From this perspective a pie chart is just a bar chart with a circular (polar) coordinate system replacing the rectangular Cartesian coordinate system. Wilkinson’s book is perhaps the most important one on graphics ever written. However, it is not a light read and it presents an abstract graphical syntax that is meant to clarify his concepts. It is not a language you can use to recreate his graphs!
The ggplot2 package is a simplified implementation of grammar of graphics written by Hadley Wickham for R. It is simplified only in that he uses R for data transformation and restructuring, rather than implementing that in his syntax. Wickham’s book, ggplot2: Elegant Graphics for Data Analysis, provides a detailed presentation of the ggplot2 package. Here I will review the basic examples presented in my books. The practice data set is shown here. The programs and the data they use are also available for download here.
To make it easy to get started, the ggplot2 package offers two main functions: quickplot() and ggplot(). The quickplot() function – also known as qplot() – mimics R’s traditional plot() function in many ways. It is particularly easy to use for simple plots. Below is an example of the default plots that qplot() makes. The command that created each plot is shown in the title of each graph. Most of them are useful except for middle one in the left column of qplot(workshop, gender). A plot like that of two factors simply shows the combinations of the factors that exist which is certainly not worth doing a graph to discover.
While qplot() is easy to use for simple graphs, it does not use the powerful grammar of graphics. The ggplot() function does that. To understand ggplot, you need to ask yourself, what are the fundamental parts of every data graph? They are:
- Aesthetics – these are the roles that the variables play in each graph. A variable may control where points appear, the color or shape of a point, the height of a bar and so on.
- Geoms – these are the geometric objects. Do you need bars, points, lines?
- Statistics – these are the functions like linear regression you might need to draw a line.
- Scales – these are legends that show things like circular symbols represent females while circles represent males.
- Facets – these are the groups in your data. Faceting by gender would cause the graph to repeat for the two genders.