Introduction to Modern R

I enjoyed teaching R workshops for many years, but I have retired from teaching them. I leave the workshop pages up just to let people know.

Learn R the easy way, by focusing on modern “tidyverse” functions. This 2-day workshop starts at ground zero and shows you how to import data, then transform, visualize, and analyze it. You’ll have hands-on experience every step of the way. The slides, examples, and output are all integrated into a single document. You can add your own notes as you go, and when finished, you “knit” it all together into a single 156-page book.

R--113

This workshop is available at your organization’s site, or via webinars.

The on-site version is the most engaging, generating much discussion and occasionally veering off briefly to cover topics specific to a particular organization. The instructor and participants work through a topic hands-on for around twenty minutes. Then we switch to exercises, which are already open in another tabbed window. The exercises contain hints that show the general structure of the solution; you adapt those hints to get the final solution. The complete solutions are in a third tabbed window, so if you get stuck the answers are a click away. The typical schedule for training on site is located here.

A webinar version is also available. The approach saves travel expenses and is especially useful for organizations with branch offices. It’s offered as two half-day sessions, often with a day or two skipped in between to give participants a chance to do the exercises and catch up on other work. There is time for questions on the lecture topics (live) and the exercises (via email). However, webinar participants are typically much less engaged, and far less discussion takes place.

For further details or to arrange a webinar or site visit, contact the instructor, Bob Muenchen, at muenchen.bob@gmail.com.

Prerequisites

This workshop assumes no prior knowledge of R. Some knowledge of statistics is helpful, but not required. The instructor is well aware that knowledge of statistics fades rapidly when not used.

Learning Outcomes

When finished, participants will be able to use R to import data, transform it, create publication quality graphics, perform commonly used statistical analyses and know how to generalize that knowledge to more advanced methods.

Presenter

Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular website devoted to analyzing trends in analytics software and helping people learn the R language. Bob is an ASA Accredited Professional Statistician™ with 35 years of experience and is currently the manager of OIT Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has taught workshops on research computing topics for more than 500 organizations and has offered training in partnership with the American Statistical AssociationDataCamp.com, New Horizons Computer Learning Centers, Revolution Analytics, RStudio, and Xerox Learning Services. Bob has written or coauthored over 70 articles published in scientific journals and conference proceedings and has provided guidance on more than 1,000 graduate theses and dissertations.

Bob has served on the advisory boards of SAS Institute, SPSS Inc., StatAce OOD, Intuitics, the Statistical Graphics Corporation and PC Week Magazine (now eWeek). His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS, and many R packages. His research interests include statistical computing, data graphics and visualization, text analytics, machine learning, and data mining.

Computer Requirements

We will use the free and open-source version of R, which you can download here: http://www.r-project.org/. We will also use RStudio, which you can download for free here: http://RStudio.com. If you already know a different R editor, that’s fine too.

On-site training is best done in a computer lab with a projector and, for large rooms, a PA system. The webinar version is delivered to your computer using Zoom (or similar webinar systems if your organization has a preference.)

Course Materials

Course notes, programs, data sets, practice exercises, and solutions will be sent to you in electronic form a week before the workshop. For ease of searching, the course notes are indexed by keywords from Excel, SQL, SAS, SPSS, and Stata. Searching on any fundamental topic from those languages is likely to take you directly to the R equivalent.

Other searchable keywords include alerts on topics that people often err on, as well as common R warning and error messages along with their meanings and solutions. The notes, code, and output are summarized in the 156-page book, Introduction to Modern R, which has an interactive table of contents, allowing you to jump quickly to any topic.

Course Outline
(In-depth data management topics are covered in an optional separate workshop that usually follows immediately after this one.)

INTRODUCTION
1.1 Topics
1.2 Preparing Your Computer
1.3 Note to System Administrators

2. OVERVIEW OF R
2.1 What is R?
2.2 R’s Advantages
2.3 R’s Disadvantages
2.4 Is R Accurate?
2.5 The Five Main Parts of SAS / SPSS / Stata
2.6 Workshops vs. Books

3. INSTALLING & MAINTAINING R
3.1 Package Installation & Loading
3.2 Choosing a “Mirror”
3.3 Finding Packages
3.4 What if Packages Change?

4. RSTUDIO BASICS & WORKSHOP FILES
4.1 Starting an R Script File
4.2 RStudio Tips
4.3 Workshop Files
4.4 Keywords, Alerts, Warnings, & Errors

5. R MARKDOWN
5.1 Starting an R Markdown File
5.2 R Markdown Language
5.3 R Markdown Knitting Options
5.4 R Markdown Chunk Options

6. R LANGUAGE BASICS
6.1 Objects & Their Names
6.2 Console Prompts
6.3 R Comments
6.4 Expressions
6.5 Assignments
6.6 Commands
6.7 Spacing Example
6.8 Impact of (Parentheses)
6.9 Impact of {Braces}
6.10 Getting Package Info
6.11 Package Conflicts
6.12 Resolving Packages Conflicts

7. HELP & DOCUMENTATION
7.1 Help Files
7.2 Help Details
7.3 More Specific Help
7.4 Help for a Whole Package
7.5 Documentation
7.6 Free Internet Support
7.7 Practice Time

8. DATA STRUCTURES
8.1 A Quick Poll
8.2 R vs. Other Software
8.3 Numeric Vectors
8.4 Printing Vectors (or Any Object)
8.5 Vector Operations
8.6 Example Operations
8.7 Vector Attributes
8.8 Character Vectors
8.9 More Numeric Vectors
8.10 Example Function Calls
8.11 Selecting Vector Elements
8.12 Factors
8.13 Creating a Factor
8.14 Value Labels
8.15 Factor Arguments
8.16 Selecting by Factor Label
8.17 Factor from Character Vector
8.18 Adding Value Labels
8.19 Our Data So Far
8.20 Data Frame Creation
8.21 Why Use Data Frames?
8.22 Data Frame Details
8.23 More Data Frame Details
8.24 Tibble Creation
8.25 How Tibbles Improve Printing
8.26 Other Tibble Advantages
8.27 Matrices
8.28 Matrix Creation via the matrix Function
8.29 Matrix Printing
8.30 Matrix Creation via Column Binding
8.31 Matrix Function Use
8.32 Arrays
8.33 List Creation
8.34 List Details
8.35 Naming Components
8.36 Getting Component Names
8.37 Table of Data Structures, Modes & Classes
8.38 Practice Time

9. MANAGING FILES & WORKSPACE
9.1 Preparing the Workspace
9.2 Introduction
9.3 Listing Objects
9.4 ls Examples
9.5 Printing Objects
9.6 Displaying Attributes
9.7 Examining Object Structure
9.8 Deleting Objects
9.9 rm Examples
9.10 Working Directory
9.11 Saving Your Work
9.12 Quitting & Restarting
9.13 R “Helps” Automate Saving
9.14 Blocking Automatic Saving/Loading
9.15 Special R Files
9.16 Practice Time

10. CONTROLLING FUNCTIONS
10.1 Preparing the Workspace
10.2 R Functions
10.3 Function Output
10.4 Argument Name vs. Position
10.5 A Common Error
10.6 The Triple-dot Argument
10.7 Controlling Functions with Class
10.8 Seeing What Methods Exist
10.9 Changing Class Changes Output
10.10 Combining Function Calls
10.11 Practice Time

11. DATA ACQUISITION
11.1 Preparing the Workspace
11.2 A Quick Poll
11.3 Comma Separated Values
11.4 CSV File Details
11.5 Resulting Tibble
11.6 Data Within a Program Using read_csv
11.7 Data Within a Program Using tribble
11.8 Tab Delimited mydata.tab
11.9 Tab File Details
11.10 Reading Tab File
11.11 Excel Files
11.12 SAS, SPSS, Stata Files
11.13 Database via ODBC
11.14 Database Directly
11.15 Other Databases
11.16 Other Data Sources
11.17 Practice Time

12. CHOOSING VARIABLES FROM DATA FRAMES
12.1 Preparing the Workspace
12.2 The Way You Choose Variables Matters!
12.3 Which Data Frame(s)?
12.4 Choosing Vars Using Dollar Notation
12.5 dplyr’s select Function
12.6 select Variable Options
12.7 Subscripting or Indexing
12.8 Column Position Can Contain…
12.9 Leaving Out the Comma
12.10 Comma Impact on Tibbles
12.11 Choose Vars Using [[ ]] Notation
12.12 Choosing Vars Using Formulas
12.13 The attach and with Functions
12.14 Recommendations
12.15 A Common Selection Error
12.16 Practice Time

13. CHOOSING OBSERVATIONS FROM DATA FRAMES
13.1 Preparing the Workspace
13.2 Using dplyr’s filter Function
13.3 Using Subscripting [ ] 13.4 Logic Rules
13.5 Impact of the “which” Function
13.6 Effect of “which”” on Logical Vectors
13.7 Using Selections in Analyses
13.8 Table of Logical Comparisons
13.9 Practice Time

14. CHOOSING BOTH VARS & OBS FROM DATA FRAMES
14.1 Preparing the Workspace
14.2 Combining select and filter
14.3 select & filter details
14.4 Using Both Subscripts
14.5 Saving Subsets
14.6 Practice Time

15. TRANSFORMATIONS
15.1 Preparing the Workspace
15.2 Using Dollar Notation
15.3 Using mutate
15.4 Resulting Data
15.5 mutate Details
15.6 Table of Transformations
15.7 Practice Time

16. MISSING VALUES
16.1 Preparing the Workspace
16.2 Reading Blanks as Missing
16.3 Reading Other Values as Missing
16.4 Missing Value Codes
16.5 How Missing Values Sort
16.6 Logic for Missing Values
16.7 Using naniar to Count Missing & Valid
16.8 Using Logic to Count Missing & Valid
16.9 Action on Missing Values
16.10 Manual Listwise Deletion
16.11 Mean/Median Substitution
16.12 Advanced Imputation Methods
16.13 The “simputation” Package
16.14 Practice Time

17. GRAPHICS: BASE
17.1 Preparing the Workspace
17.2 Importance of Graphing
17.3 Base Graphics Overview
17.4 Barplot
17.5 Barplot stacked
17.6 Boxplot
17.7 Scatterplot
17.8 Histogram
17.9 Adding Embellishments
17.10 Graphics Parameters
17.11 What Objects Can plot Handle?
17.12 Plotting Groups
17.13 Practice Time

18. GRAPHICS: ggplot2 PACKAGE
18.1 Prepare the Workspace
18.2 The ggplot2 Package
18.3 ggplot vs. qplot
18.4 The Grammar Components
18.5 Barplot
18.6 ggplot Syntax
18.7 Barplot Stacked
18.8 Barplot Dodged
18.9 Barplot with Facets
18.10 Boxplot with Overlay of Points
18.11 Boxplot with Facets
18.12 Simple Scatterplot
18.13 Scatterplot with Points & Shapes Set by a Factor
18.14 Regression Lines Set by a Factor
18.15 Scatterplot with Facets
18.16 Changing Colors & Styles
18.17 Grey Scale
18.18 Black and White Background with Grid
18.19 Color Palettes
18.20 Applying a Color Palette
18.21 Example Theme: Wall Street Journal
18.22 Color Blind Correction
18.23 Interactive Graphics
18.24 Graphics Resources
18.25 Practice Time

19. WRITING & APPLYING FUNCTIONS
19.1 Preparing the Workspace
19.2 Applying Functions to Data Frames
19.3 A map Example
19.4 The Family of Map Functions
19.5 Using map_dbl
19.6 An Example Function
19.7 Rules for Writing Functions
19.8 Applying mystats
19.9 Anonymous Functions
19.10 Including Functions from Files
19.11 Practice Time

20. STATISTICS REVIEW
20.1 Goals of Statistical Analysis
20.2 Meaning of Significance
20.3 Impact of Data Size
20.4 Impact of Multiple Testing

21. BASIC STATISTICS
21.1 Preparing the Workspace
21.2 R’s summary function
21.3 The skim Function
21.4 jmv Package’s descriptives Function
21.5 Frequency & Percent Tables
21.6 Cross-tabulation & Chi-Square
21.7 R’s Built-in table Function
21.8 Table-Related Functions
21.9 Cross-tabulations Using the jmv Package
21.10 Other Categorical Functions

22. CORRELATION & REGRESSION
22.1 Preparing the Workspace
22.2 Correlation
22.3 R’s Built-in cor Function
22.4 Built-in Test of Significance
22.5 R Commander’s rcorr.adjust Does More
22.6 Multiple Regression
22.7 Modeling Functions
22.8 Linear Models Using lm
22.9 Model Contents
22.10 Printing Entire Model Contents
22.11 Finding Relevant Functions
22.12: Table of Regression Formulas

23. COMPARING GROUPS
23.1 Preparing the Workspace
23.2 Two Independent Groups
23.3 Independent Samples t.test
23.4 Independent Samples Non-parametric wilcox.test
23.5 Paired Samples
23.6 Paired Samples t.test
23.7 Paired Samples Non-parametric wilcox.test
23.8 Comparing More Than 2 Groups
23.9 Getting Means & Variances
23.10 Test for Equality of Variances
23.11 Create AOV Model
23.12 Plot Diagnostics
23.13 Types of ANOVA Tests
23.14 Get ANOVA Table
23.15 Types of Means
23.16 Estimated Marginal Means
23.17 Comparing All Means
23.18 Comparing Means to Control
23.19 Methods of Comparison
23.20 Adjustment Types
23.21 Compact Letter Display (CLD)
23.22 Plotting All Comparisons
23.23 EM Means Interaction Plot (MIP)
23.24 Table of ANOVA / ANCOVA Formulas
23.25 Practice Time

24. HIGH QUALITY OUTPUT
24.1 Preparing the Workspace
24.2 Output Formatting Options
24.3 The kable Function
24.4 Create Models to Display
24.5 xtable: One Model
24.6 texreg: One Model
24.7 texreg: Two Models
24.8 texreg Reference
24.9 apaTables
24.10 Practice Time

25. DEBUGGING CODE
25.1 Preparing the Workspace
25.2 Debugging Steps
25.3 Objects Not Loaded
25.4 Formulas and Data
25.5 Misspelled Object Names
25.6 Misspelled Package Names
25.7 Package Not Installed
25.8 Missing Quotes for Character Variables
25.9 Forgetting to Reference a Data Frame
25.10 Simple Functions Need Vectors
25.11 Messages About Arguments
25.12 Messages About Arguments II
25.13 Too Many Functions
25.14 Too Many Results for map_dbl
25.15 Missing Pipe Operator
25.16 Missing Quotes in Subscripts I
25.17 Quotes for File Names
25.18 Quotes for Package Names
25.19 Missing Commas, Parentheses, etc.
25.20 Missing Functions
25.21 “Error in Select”
25.22 Use Two Colons in Function Calls
25.23 Use One Colon When Detaching

26. GRAPHICAL USER INTERFACES TO R
26.1 The R Commander
26.2 BlueSky Statistics
26.3 jamovi

27. CONCLUSION
27.1 Brief Review
27.2 Providing Feedback
27.3 Future Support
27.4 Question Time

Here is a slideshow of previous workshops.