Two new R packages are quickly becoming standards in the R community:
Hadley Wickham’s dplyr and tidyr. The dplyr package almost completely replaces his popular plyr package for data manipulation. Most importantly for general R use, it makes it much easier to select variables. For example,
if your data included variables for race, gender, pretest, posttest, and four survey items q1 through q4, you could select various sets of variables using:
library("dplyr") select(mydata, race, gender) # Just those two variables. select(mydata, gender:posttest) # From gender through posttest. select(mydata, contains("test")) # Gets pretest & posttest. select(mydata, starts_with("q")) # Gets all vars staring with "q". select(mydata, ends_with("test")) # All vars ending with "test". select(mydata, num_range("q", 1:4)) # q1 thru q4 regardless of location. select(mydata, matches("^q")) # Matches any regular expression.
As I show in my books, these were all possible in R before, but they required much more programming.
The tidyr package replaces Hadley’s popular reshape and reshape2 packages with a data reshaping approach that is simpler and more focused just on the reshaping process, especially converting from “wide” to “long” form and back.
I’ve integrated dplyr in to my workshop R for SAS, SPSS and Stata Users, and both tidyr and dplyr now play extensive roles in my Managing Data with R workshop. The next Virtual Instructor-led Classroom (webinar) version of those workshops I’m doing in partnership with Revolution Analytics during the week of October 6, 2014. I’m also available to teach them at your organization’s site in partnership with RStudio.com (contact me at Muenchen.email@example.com to schedule a visit). These workshops will also soon be available 24/7 at Datacamp.com. “You’ll be able to take Bob’s popular workshops using an interactive combination of video and live exercises in the comfort of your own browser” said Jonathan Cornelissen, CEO of Datacamp.com.