**Creating More Effective Graphs
**

*by Naomi B. Robbins*

This is a book is about how to make effective graphs, *not *how to actually create them in software like R. Some books that teach effective graphing methods, like Tufte’s excellent *The Visual Display of Quantitative Information*, present graphical theories such as “less ink is better” in a graph. Others, like Cleveland’s *The Elements of Graphing Data* describe experiments into the accuracy of human perception regarding various graphical design choices. Robbins demonstrates many of the same points by simply providing examples that make them clear. The book has many excellent examples, each with brief descriptions of their strengths and weaknesses. You can read it in just a few hours so read it again in a few months to see what you’ve forgotten. If you only read one book on how to make effective data graphics, make it this one.

**Data Analysis and Graphics Using R: an Example Based Approach
**

*by John Maindonald and W. John Braun*

This book really hits the “sweet spot” for me. The authors explain a lot about R and about data analysis while managing to make it all seem easy (well, easy for data analysis anyway). It covers basic through intermediate R and statistics. The statistics coverage starts at ground zero with descriptive stats and goes through basic group comparisons (chi-squared, t-tests, etc.), regression (linear & logistic), analysis of variance, generalized linear models, survival analysis, time series, multi-level models, trees and even a bit of multivariate (PCA, discriminant). Many of those topics are complex enough to benefit from an entire book, but there is enough to be quite useful on each. I probably learned more about R from this book than any other. Highly recommended!

**Data Manipulation with R
**

*by Phil Spector*

Whenever R is doing something I don’t expect when managing data, this is the first book I turn to. Every page is filled with deep insights into how R works. It could go into some topics in more depth, but you could spend hours in R’s help files and with trial and error to discover what this book provides in clear, concise form. At 164 pages it’s a bit thin for a book, but it’s a clear case of quality over quantity.

**Data Mining with Rattle and R**

*by Graham Williams*

Rattle is a tab-oriented user interface that is similar to Microsoft Office’s ribbon interface. It makes getting started with data mining in R very easy. This book covers both Rattle, the R code that Rattle creates, and writing some R code from scratch. Therefore it will appeal to both people seeking the ease-of-use

that is very much missing from R, and people looking to learn R programming. The book is very enjoyable reading and is filled with useful information. It is aimed at both students learning data mining and data miners who are using or learning R. People are likely to read it through the first time as a text book and then later use it as a reference, especially about the details of the R language. One of the strongest aspects of this book is Dr. Williams’ ability to simplify complex topics and explain them clearly. His descriptions of bagging and boosting are the most clear that I have ever read.

**ggplot2 Elegant Graphics for Data Analysis**

*by Hadley Wickham*

While Leland Wilkinson invented the seminal grammar of graphics concept, it’s Hadley Wickham’s ggplot2 package for that really brought it to life. The “gg” in ggplot2 stands for **g**rammar of **g**raphics. Wickham’s version of the grammar is simpler than Wilkinson’s because he depends upon the data management and transformation capabilities in R.

His software, ggplot2, is free so you can recreate every plot in the book and countless variations of your own. While the book is only 212 pages, if you work your way through all the examples, you will gain a deep understanding of this powerful concept and you will probably be able to create any data graphic that you can think of. Still, knowing how to create a graph is only half the story. Knowing what type of graph communicates your point well is the other half. See *Creating More Effective Graphs* above for that topic.

**The Grammar of Graphics**

*by Leland Wilkinson*

This book outlines the *Grammar of Graphics*, a comprehensive theory of graphics by the author, Leland Wilkinson. In it, Wilkinson shows that plots are not unrelated types of displays, but rather they all share an underlying “grammar”. From this perspective a pie chart is just a bar chart are almost identical. The bar chart just uses rectangular Cartesian coordinates, while pies use the circular polar coordinate system. This is perhaps the most important book on graphics ever written. However, it is not a light read and it presents an abstract graphical syntax that is meant to clarify his concepts. It is not a language you can use to recreate his graphs. To see how to use these concepts in R, see *ggplot2 Elegant Graphics for Data Analysis* above. If you want to study graphics at its deepest level, *The Grammar of Graphics *is the book for you.

**Introductory Statistics with R
**

*by Peter Dalgaard*

This is an excellent book for people just starting out with both statistics and R. It moves at a nice easy pace, explaining statistics from the ground up. You won’t learn that much about R but Dalgaard carefully shows you the best ways to do things, avoiding much of R’s complexity that you’re likely to encounter as you start analyzing real data. For an intro statistics class at the undergraduate level, it can’t be beat.

**Modern Applied Statistics in S
**

*by W. N. Venables and B. D. Ripley*

This is widely considered the best statistics book using R for advanced users. R is a variant of the S language, so most of its examples work fine in R. The authors usually show where R differs, but there are a few places they missed so don’t be surprised if something doesn’t run. This book is written at a fairly high level so if you are not very mathematically inclined, you might wan to try some of the other books described here.

**The R Book
**

*by Michael J. Crawley*

This book is a beast, running a staggering 950 pages. I view it as an encylopedia; not much fun to read but good to have around. Some parts seem aimed at total beginners, others seem aimed at advanced users. The high number of cross-references can drive you crazy. When I need answers, it’s the last book I turn to before sending my question out to the R-Help list.

**R in Action
**

*by Robert Kabakoff*

I thoroughly enjoyed reading this book. Kabacoff’s writing is clear and concise. He covers a wide range of statistics and graphics, including topics that many books skip, such as missing values analysis and interactive graphics. One of the best things about R is the huge number of packages available for it. But which ones are worth investigating? This book covers quite a few packages I have never used before, so even experienced R users will make many useful discoveries reading it. I have no doubt this book will become as popular as his web site, Quick-R (statmethods.net).

**SAS and R
**

*by Ken Kleinman and Nicholas J. Horton*

*SAS and R* is a well-crafted dictionary of how to do things in both SAS and R. Its 343 pages are about evenly divided between R and SAS. For each topic the authors clearly and concisely show how to perform that task in SAS, then in R. They typically provide a paragraph of description for each. The brevity of explanation allows the authors to cover a wide range of topics. If you needed to know more about a topic, at least they have given you a good start and you’ll know what SAS statements or R functions to pursue. Each chapter concludes with example programs with output which demonstrate the topics covered. Output for both packages is shown. The book does include brief introductions to

both SAS and R in the appendices but, as the authors state in the preface, their book is not meant to be read cover to cover. However, unlike a standard dictionary, the entries are organized by category, so reading several entries in a row is usually helpful.

In comparison, *R for SAS and SPSS Users, 2nd Edition* is a step-by-step introductory text, meant to be read in order. Its 715 pages are devoted almost exclusively to R. I assume you already know SAS or SPSS, and the only discussion of them is used to help you learn R. Rather than a paragraph of explanation per topic, I typically provide several pages, stepping through complete example programs, and pointing out where beginners typically make mistakes. *SAS and R* covered more topics than the first edition of *R for SAS and SPSS Users*, but not more than the second edition.

Someone wanting a brief reference for both SAS and R would prefer *SAS and R*. Someone wanting to focusing on learning R would prefer *R for SAS and SPSS Users*. In either case, you’ll probably need additional books devoted to the particular methods of analysis you need.

Another good alternative to Dalgaard is “Discovering Statistics Using R” .

Discovering Statistics Using R