R is free and powerful software for data analysis and graphics. However, its flexible approach is so different from other software that it can be frustrating to learn. This workshop introduces R in a way that takes advantage of what you already know. For many topics we will begin with R’s built-in commands that offer sparse but flexible output. Then we’ll cover add-on commands that work similarly to your current software. We will also discuss aspects of R that are likely to trip you up. For example, many R functions let you specify which data set to use in a way that looks identical to SAS, but which differs in a way that is likely to lead to perplexing error messages.
Most of our time will be spent working through examples that you may run simultaneously on your computer. You will see both the instructor’s screen and yours, as we run the examples and discuss the output. However, the handouts include each step and its output, so feel free to skip the computing; it’s easy to just relax and take notes. The slides and programming steps are numbered so you can easily switch from computing to slides and back again.
Most of the examples come from the highly-regarded books by the instructor, R for SAS and SPSS Users and R for Stata Users (no knowledge of those languages is required). That makes it easy to review what we did later with full explanations, or to learn more about a particular subject by extending an example which you have already seen.
This workshop is available in three ways: site visits, webinars, and interactive video.
The 0n-site version is the most engaging by far, generating much discussion and occasionally veering off briefly to cover topics specific to a particular organization. The instructor presents a topic for around twenty minutes. Then we switch to exercises, which are already open in another tabbed window. The exercises contain hints that show the general structure of the solution; you adapt those hints to get the final solution. The complete solutions are in a third tabbed window, so if you get stuck the answers are a click away. The typical schedule for training on site are located here.
A webinar version is also available. The approach is saves travel expenses and is especially useful for organizations with branch offices. It’s offered as two half-day sessions, often with a day or two skipped in between to give participants a chance to do the exercises and catch up on other work. There is time for questions on the lecture topics (live) and the exercises (via email). However, webinar participants are typically much less engaged, and far less discussion takes place.
The interactive video version is available at DataCamp.com. That uses a similar lecture / exercise combination, but you can stop and restart at any time. This lets you learn at your own pace and it minimizes the disruption to your regular work. However, there is no way to ask the instructor questions, or generate discussion among the participants.
For further details or to arrange a webinar or site visit, contact the instructor, Bob Muenchen, at firstname.lastname@example.org.
Despite the title, this workshop requires no knowledge of other software. However, if the audience has expertise in SAS, SPSS, or Stata, the instructor will adapt his presentation to use language they’re most familiar with. Some knowledge of statistics is helpful, but not required. The instructor is well aware that knowledge of statistics fades rapidly when not used!
When finished, participants will be able to use R to import data, transform it, create publication quality graphics, perform commonly used statistical analyses and know how to generalize that knowledge to more advanced methods. They will also have an especially thorough understanding of how R compares to SAS, SPSS and Stata.
Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in analytics software and helping people learn the R language. Bob is an ASA Accredited Professional Statistician™ with 35 years of experience and is currently the manager of OIT Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has taught workshops on research computing topics for more than 500 organizations and has offered training in partnership with the American Statistical Association, DataCamp.com, New Horizons Computer Learning Centers, Revolution Analytics, RStudio and Xerox Learning Services. Bob has written or coauthored over 70 articles published in scientific journals and conference proceedings, and has provided guidance on more than 1,000 graduate theses and dissertations.
Bob has served on the advisory boards of SAS Institute, SPSS Inc., StatAce OOD, Intuitics, the Statistical Graphics Corporation and PC Week Magazine (now eWeek). His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization, text analytics, and data mining.
On-site training is best done in a computer lab with a projector and, for large rooms, a PA system. The webinar version is delivered to your computer using Zoom (or similar webinar systems if your organization has a preference.)
Course programs, data, and exercises will be sent to you a week before the workshop. The instructions include installing R, which you can download R for free here: http://www.r-project.org/. We will also use RStudio, which you can download for free here: http://RStudio.com. If you already know a different R editor, that’s fine too.
(In-depth data management topics are covered in an optional separate workshop that usually follows immediately after this one.)
Introduction and statement of goals
- Overview of R
- Installing and maintaining R
- Getting help
Programming Language Basics – including creating, subsetting and analyzing:
- Vectors (variables)
- Factors (categorical variables)
- Data frames (data sets)
- “Tibbles” (dplyr’s tbl_df data frames)
Managing your files and workspace
- Listing their names
- Examining structure of data sets, etc.
Controlling functions (procedures or commands) using
- Arguments (options or parameters)
- An object’s class
- How to change class
- Model formulas
Data Acquisition – Reading files (includes whichever formats your organzation needs)
- Comma separated value files
- Tab-delimited files
- Excel files
- Minitab data sets
- SAS data sets
- SPSS save file
- Stata data sets
Data Transformations using
- Math formulas
- Conditional (logical) formulas
Selecting variables and observations using:
- Dollar format
- The “attach” function
- The “with” function
- Subscripting (a.k.a. indexing)
- dplyr’s select and filter functions
- Model formulas and the “data=” argument
Writing functions (macros)
- Why they’re more important in R than most languages
- How to create functions
- How to apply functions to data frames
- Applying functions by group
Traditional graphics (similar to old SAS and SPSS graphics) including:
- Bar charts
- Scatter plots
- Strip plots
- Box plots
- Repeating above plots by groups
- Adding titles, etc.
- Adding regression lines
- Lattice graphics (similar to new SAS SG* and Stata graphics) – a brief overview
The Grammar of Graphics approach using the ggplot2 package (similar to SPSS GPL) including:
- qplot vs. ggplot
- Bar charts
- Scatter plots
- Strip plots
- Multi-layered plots
- Group plots
- Adding titles, etc.
- Adding regression lines
- Interactive graphics – a brief overview (similar to JMP, SAS/INSIGHT, SAS/IML Studio)
- Graphics resources
- Traditional graphics (similar to old SAS and SPSS graphics) including:
Statistics – many are done showing sparse R output and the richer output that most people prefer.
- Descriptive statistics
- Crosstabulation with chi-squared test
- Repeating an analysis by groups or departments (a.k.a. “By" or “split file")
- Correcting p-values for the effects of multiple testing
- Correlation: Pearson, Spearman
- Linear regression
- Extractor functions (a.k.a. postestimation commands)
- Wilcoxon Mann-Whitney rank sum test
- Paired t-test
- Wilcoxon signed-rank test
- Analysis of variance
- Post hoc tests
Getting publication-quality output into
- LaTeX (optional)
Ways to run R (includes only those of interest to your organization)
- Programs that include other programs
- Running R from within SAS
- Running R from within SPSS
- Running R as an adjunct to Stata
Graphical User Interfaces:
- R Commander
- Rattle data mining interface
- Excel integration
- Summary of topics learned
Here is a slide show of previous workshops.