R for SAS, SPSS and Stata Users

I enjoyed teaching R workshops for many years, but I have retired from teaching them. I leave the workshop pages up just to let people know.

R is free and powerful software for data analysis and graphics. However, its flexible approach is so different from other software that it can be frustrating to learn. This workshop introduces R in a way that takes advantage of what you already know. For many topics we will begin with R’s built-in commands that offer sparse but flexible output.  Then we’ll cover add-on commands that work similarly to your current software. We will also discuss aspects of R that are likely to trip you up. For example, many R functions let you specify which data set to use in a way that looks identical to SAS, but which differs in a way that is likely to lead to perplexing error messages.

R--113Most of our time will be spent working through examples that you may run simultaneously on your computer. You will see both the instructor’s screen and yours, as we run the examples and discuss the output. However, the handouts include each step and its output, so feel free to skip the computing; it’s easy to just relax and take notes.

Most of the examples come from the highly-regarded books by the instructor, R for SAS and SPSS Users and R for Stata Users (no knowledge of those languages is required). That makes it easy to review what we did later with full explanations, or to learn more about a particular subject by extending an example which you have already seen.

This workshop is available on-site or via webinar.

The 0n-site version is the most engaging by far, generating much discussion and occasionally veering off briefly to cover topics specific to a particular organization. The instructor presents a topic for around twenty minutes. Then we switch to exercises, which are already open in another tabbed window. The exercises contain hints that show the general structure of the solution; you adapt those hints to get the final solution. The complete solutions are in a third tabbed window, so if you get stuck the answers are a click away. The typical schedule for training on site is located here.

A webinar version is also available. The approach saves travel expenses and is especially useful for organizations with branch offices. It’s offered as two half-day sessions, often with a day or two skipped in between to give participants a chance to do the exercises and catch up on other work. There is time for questions on the lecture topics (live) and the exercises (via email). However, webinar participants are typically much less engaged, and far less discussion takes place.

For further details or to arrange a webinar or site visit, contact the instructor, Bob Muenchen, at muenchen.bob@gmail.com.

Prerequisites

Despite the title, this workshop requires no knowledge of other software. However, if the audience has expertise in SAS, SPSS, or Stata, the instructor will adapt his presentation to use the language they’re most familiar with. Some knowledge of statistics is helpful, but not required. The instructor is well aware that knowledge of statistics fades rapidly when not used!

Learning Outcomes

When finished, participants will be able to use R to import data, transform it, create publication quality graphics, perform commonly used statistical analyses and know how to generalize that knowledge to more advanced methods. They will also have an especially thorough understanding of how R compares to SAS, SPSS and Stata.

Presenter

Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in analytics software and helping people learn the R language. Bob is an ASA Accredited Professional Statistician™ with 35 years of experience and is currently the manager of OIT Research Computing Support (formerly the Statistical Consulting Center) at the University of Tennessee. He has taught workshops on research computing topics for more than 500 organizations and has offered training in partnership with the American Statistical AssociationDataCamp.com, New Horizons Computer Learning Centers, Revolution Analytics, RStudio and Xerox Learning Services. Bob has written or coauthored over 70 articles published in scientific journals and conference proceedings, and has provided guidance on more than 1,000 graduate theses and dissertations.

Bob has served on the advisory boards of SAS Institute, SPSS Inc., StatAce OOD, Intuitics, the Statistical Graphics Corporation and PC Week Magazine (now eWeek). His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization, text analytics, and data mining.

Computer Requirements

On-site training is best done in a computer lab with a projector and, for large rooms, a PA system. The webinar version is delivered to your computer using Zoom (or similar webinar systems if your organization has a preference.)

Course programs, data, and exercises will be sent to you a week before the workshop. The instructions include installing R, which you can download R for free here: http://www.r-project.org/. We will also use RStudio, which you can download for free here: http://RStudio.com. If you already know a different R editor, that’s fine too.

Course Outline
(In-depth data management topics are covered in an optional separate workshop that usually follows immediately after this one.)

INTRODUCTION
1.1 Topics
1.2 Preparing Your Computer
1.3 Note to System Administrators

2. OVERVIEW OF R
2.1 What is R?
2.2 R’s Advantages
2.3 R’s Disadvantages
2.4 Is R Accurate?
2.5 The Five Main Parts of SAS / SPSS / Stata
2.6 Workshops vs. Books

3. INSTALLING & MAINTAINING R
3.1 Package Installation & Loading
3.2 Choosing a “Mirror”
3.3 Finding Packages
3.4 What if Packages Change?

4. RSTUDIO BASICS & WORKSHOP FILES
4.1 Starting an R Script File
4.2 RStudio Tips
4.3 Workshop Files
4.4 Keywords, Alerts, Warnings, & Errors

5. R MARKDOWN
5.1 Starting an R Markdown File
5.2 R Markdown Language
5.3 R Markdown Knitting Options
5.4 R Markdown Chunk Options

6. R LANGUAGE BASICS
6.1 Objects & Their Names
6.2 Console Prompts
6.3 R Comments
6.4 Expressions
6.5 Assignments
6.6 Commands
6.7 Spacing Example
6.8 Impact of (Parentheses)
6.9 Impact of {Braces}
6.10 Getting Package Info
6.11 Package Conflicts
6.12 Resolving Packages Conflicts

7. HELP & DOCUMENTATION
7.1 Help Files
7.2 Help Details
7.3 More Specific Help
7.4 Help for a Whole Package
7.5 Documentation
7.6 Free Internet Support
7.7 Practice Time

8. DATA STRUCTURES
8.1 A Quick Poll
8.2 R vs. Other Software
8.3 Numeric Vectors
8.4 Printing Vectors (or Any Object)
8.5 Vector Operations
8.6 Example Operations
8.7 Vector Attributes
8.8 Character Vectors
8.9 More Numeric Vectors
8.10 Example Function Calls
8.11 Selecting Vector Elements
8.12 Factors
8.13 Creating a Factor
8.14 Value Labels
8.15 Factor Arguments
8.16 Selecting by Factor Label
8.17 Factor from Character Vector
8.18 Adding Value Labels
8.19 Our Data So Far
8.20 Data Frame Creation
8.21 Why Use Data Frames?
8.22 Data Frame Details
8.23 More Data Frame Details
8.24 Tibble Creation
8.25 How Tibbles Improve Printing
8.26 Other Tibble Advantages
8.27 Matrices
8.28 Matrix Creation via the matrix Function
8.29 Matrix Printing
8.30 Matrix Creation via Column Binding
8.31 Matrix Function Use
8.32 Arrays
8.33 List Creation
8.34 List Details
8.35 Naming Components
8.36 Getting Component Names
8.37 Table of Data Structures, Modes & Classes
8.38 Practice Time

9. MANAGING FILES & WORKSPACE
9.1 Preparing the Workspace
9.2 Introduction
9.3 Listing Objects
9.4 ls Examples
9.5 Printing Objects
9.6 Displaying Attributes
9.7 Examining Object Structure
9.8 Deleting Objects
9.9 rm Examples
9.10 Working Directory
9.11 Saving Your Work
9.12 Quitting & Restarting
9.13 R “Helps” Automate Saving
9.14 Blocking Automatic Saving/Loading
9.15 Special R Files
9.16 Practice Time

10. CONTROLLING FUNCTIONS
10.1 Preparing the Workspace
10.2 R Functions
10.3 Function Output
10.4 Argument Name vs. Position
10.5 A Common Error
10.6 The Triple-dot Argument
10.7 Controlling Functions with Class
10.8 Seeing What Methods Exist
10.9 Changing Class Changes Output
10.10 Combining Function Calls
10.11 Practice Time

11. DATA ACQUISITION
11.1 Preparing the Workspace
11.2 A Quick Poll
11.3 Comma Separated Values
11.4 CSV File Details
11.5 Resulting Tibble
11.6 Data Within a Program Using read_csv
11.7 Data Within a Program Using tribble
11.8 Tab Delimited mydata.tab
11.9 Tab File Details
11.10 Reading Tab File
11.11 Excel Files
11.12 SAS, SPSS, Stata Files
11.13 Database via ODBC
11.14 Database Directly
11.15 Other Databases
11.16 Other Data Sources
11.17 Practice Time

12. CHOOSING VARIABLES FROM DATA FRAMES
12.1 Preparing the Workspace
12.2 The Way You Choose Variables Matters!
12.3 Which Data Frame(s)?
12.4 Choosing Vars Using Dollar Notation
12.5 dplyr’s select Function
12.6 select Variable Options
12.7 Subscripting or Indexing
12.8 Column Position Can Contain…
12.9 Leaving Out the Comma
12.10 Comma Impact on Tibbles
12.11 Choose Vars Using [[ ]] Notation
12.12 Choosing Vars Using Formulas
12.13 The attach and with Functions
12.14 Recommendations
12.15 A Common Selection Error
12.16 Practice Time

13. CHOOSING OBSERVATIONS FROM DATA FRAMES
13.1 Preparing the Workspace
13.2 Using dplyr’s filter Function
13.3 Using Subscripting [ ] 13.4 Logic Rules
13.5 Impact of the “which” Function
13.6 Effect of “which”” on Logical Vectors
13.7 Using Selections in Analyses
13.8 Table of Logical Comparisons
13.9 Practice Time

14. CHOOSING BOTH VARS & OBS FROM DATA FRAMES
14.1 Preparing the Workspace
14.2 Combining select and filter
14.3 select & filter details
14.4 Using Both Subscripts
14.5 Saving Subsets
14.6 Practice Time

15. TRANSFORMATIONS
15.1 Preparing the Workspace
15.2 Using Dollar Notation
15.3 Using mutate
15.4 Resulting Data
15.5 mutate Details
15.6 Table of Transformations
15.7 Practice Time

16. MISSING VALUES
16.1 Preparing the Workspace
16.2 Reading Blanks as Missing
16.3 Reading Other Values as Missing
16.4 Missing Value Codes
16.5 How Missing Values Sort
16.6 Logic for Missing Values
16.7 Using naniar to Count Missing & Valid
16.8 Using Logic to Count Missing & Valid
16.9 Action on Missing Values
16.10 Manual Listwise Deletion
16.11 Mean/Median Substitution
16.12 Advanced Imputation Methods
16.13 The “simputation” Package
16.14 Practice Time

17. GRAPHICS: BASE
17.1 Preparing the Workspace
17.2 Importance of Graphing
17.3 Base Graphics Overview
17.4 Barplot
17.5 Barplot stacked
17.6 Boxplot
17.7 Scatterplot
17.8 Histogram
17.9 Adding Embellishments
17.10 Graphics Parameters
17.11 What Objects Can plot Handle?
17.12 Plotting Groups
17.13 Practice Time

18. GRAPHICS: ggplot2 PACKAGE
18.1 Prepare the Workspace
18.2 The ggplot2 Package
18.3 ggplot vs. qplot
18.4 The Grammar Components
18.5 Barplot
18.6 ggplot Syntax
18.7 Barplot Stacked
18.8 Barplot Dodged
18.9 Barplot with Facets
18.10 Boxplot with Overlay of Points
18.11 Boxplot with Facets
18.12 Simple Scatterplot
18.13 Scatterplot with Points & Shapes Set by a Factor
18.14 Regression Lines Set by a Factor
18.15 Scatterplot with Facets
18.16 Changing Colors & Styles
18.17 Grey Scale
18.18 Black and White Background with Grid
18.19 Color Palettes
18.20 Applying a Color Palette
18.21 Example Theme: Wall Street Journal
18.22 Color Blind Correction
18.23 Interactive Graphics
18.24 Graphics Resources
18.25 Practice Time

19. WRITING & APPLYING FUNCTIONS
19.1 Preparing the Workspace
19.2 Applying Functions to Data Frames
19.3 A map Example
19.4 The Family of Map Functions
19.5 Using map_dbl
19.6 An Example Function
19.7 Rules for Writing Functions
19.8 Applying mystats
19.9 Anonymous Functions
19.10 Including Functions from Files
19.11 Practice Time

20. STATISTICS REVIEW (optional)
20.1 Goals of Statistical Analysis
20.2 Meaning of Significance
20.3 Impact of Data Size
20.4 Impact of Multiple Testing

21. BASIC STATISTICS
21.1 Preparing the Workspace
21.2 R’s summary function
21.3 The skim Function
21.4 jmv Package’s descriptives Function
21.5 Frequency & Percent Tables
21.6 Cross-tabulation & Chi-Square
21.7 R’s Built-in table Function
21.8 Table-Related Functions
21.9 Cross-tabulations Using the jmv Package
21.10 Other Categorical Functions

22. CORRELATION & REGRESSION
22.1 Preparing the Workspace
22.2 Correlation
22.3 R’s Built-in cor Function
22.4 Built-in Test of Significance
22.5 R Commander’s rcorr.adjust Does More
22.6 Multiple Regression
22.7 Modeling Functions
22.8 Linear Models Using lm
22.9 Model Contents
22.10 Printing Entire Model Contents
22.11 Finding Relevant Functions
22.12: Table of Regression Formulas

23. COMPARING GROUPS
23.1 Preparing the Workspace
23.2 Two Independent Groups
23.3 Independent Samples t.test
23.4 Independent Samples Non-parametric wilcox.test
23.5 Paired Samples
23.6 Paired Samples t.test
23.7 Paired Samples Non-parametric wilcox.test
23.8 Comparing More Than 2 Groups
23.9 Getting Means & Variances
23.10 Test for Equality of Variances
23.11 Create AOV Model
23.12 Plot Diagnostics
23.13 Types of ANOVA Tests
23.14 Get ANOVA Table
23.15 Types of Means
23.16 Estimated Marginal Means
23.17 Comparing All Means
23.18 Comparing Means to Control
23.19 Methods of Comparison
23.20 Adjustment Types
23.21 Compact Letter Display (CLD)
23.22 Plotting All Comparisons
23.23 EM Means Interaction Plot (MIP)
23.24 Table of ANOVA / ANCOVA Formulas
23.25 Practice Time

24. HIGH QUALITY OUTPUT
24.1 Preparing the Workspace
24.2 Output Formatting Options
24.3 The kable Function
24.4 Create Models to Display
24.5 xtable: One Model
24.6 texreg: One Model
24.7 texreg: Two Models
24.8 texreg Reference
24.9 apaTables
24.10 Practice Time

25. DEBUGGING CODE
25.1 Preparing the Workspace
25.2 Debugging Steps
25.3 Objects Not Loaded
25.4 Formulas and Data
25.5 Misspelled Object Names
25.6 Misspelled Package Names
25.7 Package Not Installed
25.8 Missing Quotes for Character Variables
25.9 Forgetting to Reference a Data Frame
25.10 Simple Functions Need Vectors
25.11 Messages About Arguments
25.12 Messages About Arguments II
25.13 Too Many Functions
25.14 Too Many Results for map_dbl
25.15 Missing Pipe Operator
25.16 Missing Quotes in Subscripts I
25.17 Quotes for File Names
25.18 Quotes for Package Names
25.19 Missing Commas, Parentheses, etc.
25.20 Missing Functions
25.21 “Error in Select”
25.22 Use Two Colons in Function Calls
25.23 Use One Colon When Detaching

26. GRAPHICAL USER INTERFACES TO R
26.1 R Commander
26.2 BlueSky Statistics
26.3 jamovi

27. CONCLUSION
27.1 Brief Review
27.2 Providing Feedback
27.3 Future Support
27.4 Question Time

Here is a slideshow of previous workshops.

12 thoughts on “R for SAS, SPSS and Stata Users”

  1. Hi Bob,
    I was wondering how the training will be held next week.
    Will the training be a WebEx and will it be recorded like SAS training so I can view it later for 20 business days.

    I appreciate your reply as it would help me in preparing for the training.

    Regards,

    Amit

  2. Hi Santiago,

    It’s being done via WebEx. If you register, Revolution Analytics will send you the login info and where to download the handouts, practice programs, data sets and exercises.

    Cheers,
    Bob

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.