Before you can analyze data, it must be in the right form. Getting it into that form is often where we spend most of our time. This workshop shows how to perform the most commonly used data management tasks in R. We will cover how to use R’s most popular add-on packages (dplyr, stringr, lubridate, tidyr, broom, compare, sqldf, etc.) and compare them to R’s older built-in functions.
Most of our time will be spent working through examples that you may run simultaneously on your computer. You will see both the instructor’s screen and yours, side-by-side, as we run the examples and discuss the output. However, the handouts include each step and its output, so feel free to skip the computing; it’s easy to just relax and take notes.
Most of the examples come from the highly-regarded books by the instructor, R for SAS and SPSS Users and R for Stata Users. That makes it easy to review what we did later with full explanations, or to learn more about a particular subject by extending an example which you have already seen.
The workshops are available on-site or via webinar.
The on-site workshops are the most thorough since direct face-to-face interaction is the most flexible. The instructor presents a topic for around twenty minutes. Then we switch to exercises, which are already open in another tabbed window. The exercises contain hints that show the general structure of the solution that you adapt to get the final solution. The complete solutions are in a third tabbed window, so if you get stuck the answers are a click away. There is plenty of time to handle in-depth questions on any of the topics covered, and the discussion often veers off into a broad range of interesting areas. The usual schedule for an on-site workshop is here.
The webinar version is particularly easy to work into a busy schedule. It’s offered in half-day sessions with a day or two skipped in between to give participants a chance to do the exercises and catch up on other work. There is time for questions on the lecture topics (live) and the exercises (via email). The lecture is recorded and available for review for 30 days.
For further details or to arrange a site visit, contact the instructor, Bob Muenchen, at firstname.lastname@example.org.
Attendees should know basic R programming, including how to read data files and call functions.
When finished, participants will be able to prepare most data sets for analysis.
Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in analytics software and helping people learn the R language. Bob is an ASA Accredited Professional Statistician™ with 33 years of experience and is currently the manager of OIT Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has taught workshops on research computing topics for more than 500 organizations and currently offers training in partnership with DataCamp.com, Revolution Analytics, RStudio, New Horizon’s Computer Learning Centers, and Xerox Learning Services. Bob has written or coauthored over 70 articles published in scientific journals and conference proceedings, and has provided guidance on more than 1,000 graduate theses and dissertations.
Bob has served on the advisory boards of SAS Institute, SPSS Inc., StatAce OOD, Intuitics, the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization, text analytics, and data mining.
On-site training is best done in a computer lab with a projector and, for large rooms, a PA system. The interactive video version requires only a web browser and an Internet connection fast enough to display video.
Course programs, data, and exercises will be sent a week before the workshop. The instructions include installing R, which you can download R for free here: http://www.r-project.org/. We will also use RStudio, which you can download for free here: http://RStudio.com. If you already know a different R editor, that’s fine too.
- Intro to the dplyr package
- Selecting/keeping variables and observations
- Combining steps: subset vs. nest vs. pipe
- Copying & deleting (assigning/removing)
- Renaming data sets & variables
- Transforming variables
- Conditional transformations
- Summarizing / aggregating columns
- Summarizing / aggregating rows
- By-group / split-file calculations (using by)
- By-group / split-file analysis with output management (using dplyr & purrr)
- Sorting / ordering data
- Selecting first or last observation per group
- Stacking / concatenating data frames
- Finding and removing duplicate observations
- Merging / joining data frames
- Reshaping data frames
- Comparing objects
- Character string manipulations
- Date & time manipulations
- Using SQL within R
Here is a slide show of previous workshops.