Selecting variables in most statistics packages is very simple. For example, SAS uses VAR Q1-Q4 to select variables q1, q2, q3, and q4. Selecting observations, on the other hand, usually uses logic like GENDER==”F” to select all the females. That logic is used in various commands like WHERE, IF, and so on.
R is radically different in that it allows you to use many of the same methods to select both variables and observations. For example, you could use logic to select all your numeric variables and row names (like variable names except for observations) to select observations. This perspective offers such great flexibility that our books include three chapters of variations. That requires too much discussion for this website, so I’ll present just some basic examples. However, these examples do cover many of the most common selection tasks.
The example programs below select the males and variables workshop and q1 through q4 and save them to a new data set called myMalesWQ. The R program actually lists the print() function even though that is the default. This is to show how the selection would look inside other function calls. The R and Stata programs demonstrate this selection in several different ways. The SAS and SPSS programs focus on the main way you would do this in those packages. The practice data set is shown here. The programs and the data they use are also available for download here.
R
setwd("c:/myRfolder") load(file = "mydata.RData") attach(mydata) print(mydata) # The subset Function # Select in a function: print( subset( mydata, subset = gender=="m", select = c(workshop, q1:q4) ) ) # Select to a new set: myMalesWQ <- subset(mydata, subset = gender=="m", select = c(workshop, q1:q4) ) print(myMalesWQ) summary(myMalesWQ) # Logic for Obs, # Names for Vars print( mydata[ which(gender == "m") , c("workshop", "q1", "q2", "q3", "q4") ] ) myMales <- which(gender == "m") myVars <- c("workshop", "q1", "q2", "q3", "q4") myVars print( mydata[myMales, myVars] ) # Row and Variable Names print( mydata[ c("5", "6", "7", "8"), c("workshop", "q1", "q2", "q3", "q4") ] ) myMales <- c("5", "6", "7", "8") myVars <- c("workshop", "q1", "q2", "q3", "q4") print( mydata[myMales, myVars] ) # Numeric Index Vectors print( mydata[ c(5, 6, 7, 8), c(1, 3, 4, 5, 6) ] ) print( mydata[ 5:8, c(1, 3:6) ] ) myMales <- c(5,6,7,8) myVars <- c(1,3:6) print( mydata[myMales, myVars] ) # Saving and # Loading Subsets myMalesWQ <- subset(mydata, subset = gender == "m", select = c(workshop, q1:q4) ) save(mydata, myMalesWQ, file = "myBoth.RData") load("myBoth.RData")SAS
LIBNAME myLib 'C:myRfolder'; OPTIONS _LAST_=myLib.mydata; PROC PRINT; VAR workshop q1 q2 q3 q4; WHERE gender="m"; RUN; * Creating a data set from selected variables; DATA myLib.myMalesWQ; SET myLib.mydata; WHERE gender="m"; KEEP workshop q1-q4; RUN; PROC PRINT DATA=myLib.myMalesWQ; RUN;SPSS
CD 'c:myRfolder'. GET FILE='mydata.sav'. SELECT IF (gender EQ "m"). LIST workshop q1 TO q4. SAVE OUTFILE='myMalesWQ.sav'. EXECUTE.Stata
use c:myRfoldermydata, clear display * ---Equivalent to the Subset Function--- list workshop q* if gender=="m" preserve keep if gender=="m" keep workshop q* save c:myRfoldermymalesWQ list summary * ---Logic for Obs, Names for Vars--- list gen id = 0 replace id=_n+4 order id workshop q* save c:myRfoldermymaleWQ, replace list restore list use c:myRfoldermymalesWQ, clear list * ---Names for Both--- list gen id = 0 replace id=_n+4 order id workshop q* list restore list workshop q* * ---Numeric Indexes for Both--- di gender[1] // display value first observation of gender di q1[1] + q1[4] // display sum of first and fourth observations of q1 di q1[2] * q2[2] // display product of second observations of q1 and q2 * ---Saving and Loading Subsets--- use c:myRfoldermymalesWQ, clear keep workshop q* save, replace list