Before you can analyze data, it must be in the right form. Getting it into that form is often where we spend most of our time. This 4-hour workshop shows how to perform the most commonly used data management tasks in R. We will cover how to use R’s popular add-on packages and compare them to R’s older built-in functions.
Most of our time will be spent working through examples that you may run simultaneously on your computer. However, the handouts include each step and its output, so feel free to just relax and take notes. Most examples come from the extensive data management examples in R for SAS and SPSS Users, R for Stata Users, and http://r4stats.com. That makes it easy to review what we did later with full explanations, or to learn more about a particular subject by extending an example which you have already seen.
At the end of the workshop, you will receive a set of practice exercises for you to do on your own time, as well as solutions to the problems. The instructor is available via email to address these problems or any other topics in his workshops or books.
Attendees should know basic R programming, including how to read data files and call functions.
When finished, you will be able to prepare most data sets for analysis.
Robert A. Muenchen is the creator of the web site http://r4stats.com and is the author of the books, R for SAS and SPSS Users, and, with Joseph Hilbe, R for Stata Users. An Accredited Professional Statistician™ with 30 years of experience, Bob is the Manager of Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. Bob has served on the advisory boards of SAS Institute, SPSS Inc., the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements and/or programming code have been incorporated into SAS®, SPSS®, JMP®, STATGRAPHICS® and several R packages.
- Transformation basics (mutate, transform, within)
- Conditional transformations (ifelse, recode)
- Summarization of columns and rows (laply, aaply)
- Summarization by group (ddply, colwise, by)
- Analysis by group (ddply with stat functions)
- Sorting vectors and data frames (sort, order, arrange)
- Miscellaneous variable tools (rename, keep, drop)
- Stacking data frames (rbind, rbind.fill)
- Merging data frames (merge)
- Reshaping data frames (reshape package)
- Character string manipulations (stringr packge)
- Date / time manipulations (lubridate package)
Here is a slide show of previous workshops.