Selecting Variables and Observations

Selecting variables in most statistics packages is very simple. For example, SAS uses VAR Q1-Q4 to select variables q1, q2, q3, and q4. Selecting observations, on the other hand, usually uses logic like GENDER==”F” to select all the females. That logic is used in various commands like WHERE, IF, and so on.

R is radically different in that it allows you to use many of the same methods to select both variables and observations. For example, you could use logic to select all your numeric variables and row names (like variable names except for observations) to select observations. This perspective offers such great flexibility that our books include three chapters of variations. That requires too much discussion for this website, so I’ll present just some basic examples. However, these examples do cover many of the most common selection tasks.

The example programs below select the males and variables workshop and q1 through q4 and save them to a new data set called myMalesWQ. The R program actually lists the print() function even though that is the default. This is to show how the selection would look inside other function calls. The R and Stata programs demonstrate this selection in several different ways. The SAS and SPSS programs focus on the main way you would do this in those packages. The practice data set is shown here. The programs and the data they use are also available for download here.

R

setwd("c:/myRfolder")
load(file = "mydata.RData")
attach(mydata)
print(mydata)

# The subset Function

# Select in a function:
print( subset(
  mydata,
  subset = gender=="m",
  select =
    c(workshop, q1:q4)
) )

# Select to a new set:
myMalesWQ <- subset(mydata,
  subset = gender=="m",
  select = c(workshop, q1:q4)
)

print(myMalesWQ)
summary(myMalesWQ)

# Logic for Obs,
# Names for Vars

print(
mydata[
which(gender == "m") ,
  c("workshop", "q1",
    "q2", "q3", "q4")
] )

myMales <-
which(gender == "m")
myVars <-
  c("workshop", "q1", "q2", "q3", "q4")
myVars

print( mydata[myMales, myVars] )

# Row and Variable Names

print( mydata[
  c("5", "6", "7", "8"),
  c("workshop", "q1",
    "q2", "q3", "q4")
] )

myMales <- 
  c("5", "6", "7", "8")
myVars <-
  c("workshop", "q1",
    "q2", "q3", "q4")

print( mydata[myMales, myVars] )

# Numeric Index Vectors

print( mydata[
  c(5, 6, 7, 8),
  c(1, 3, 4, 5, 6) ] )
print(
  mydata[ 5:8,
          c(1, 3:6) ] )

myMales <- c(5,6,7,8)
myVars  <- c(1,3:6)

print(
mydata[myMales, myVars] )

# Saving and
# Loading Subsets

myMalesWQ <- subset(mydata,
  subset = gender == "m",
  select = c(workshop, q1:q4)
)

save(mydata, myMalesWQ, file = "myBoth.RData")
load("myBoth.RData")

SAS

LIBNAME myLib 'C:myRfolder';
OPTIONS _LAST_=myLib.mydata;

PROC PRINT; VAR workshop q1 q2 q3 q4;
WHERE gender="m";
RUN;

* Creating a data set from selected variables;
DATA myLib.myMalesWQ;
SET myLib.mydata;
WHERE gender="m";
KEEP workshop q1-q4;
RUN;

PROC PRINT DATA=myLib.myMalesWQ; RUN;

SPSS

CD 'c:myRfolder'.
GET FILE='mydata.sav'.

SELECT IF (gender EQ "m").
LIST workshop q1 TO q4.

SAVE OUTFILE='myMalesWQ.sav'.
EXECUTE.

Stata

use c:myRfoldermydata, clear
display

* ---Equivalent to the Subset Function---
list workshop q* if gender=="m"
preserve
keep if gender=="m"
keep workshop q*
save c:myRfoldermymalesWQ
list
summary

* ---Logic for Obs, Names for Vars---
list
gen id = 0
replace id=_n+4
order id workshop q*
save c:myRfoldermymaleWQ, replace
list
restore
list
use c:myRfoldermymalesWQ, clear
list

* ---Names for Both---
list
gen id = 0
replace id=_n+4
order id workshop q*
list
restore
list workshop q*

* ---Numeric Indexes for Both---
di gender[1] // display value first observation of gender
di q1[1] + q1[4] // display sum of first and fourth observations of q1
di q1[2] * q2[2] // display product of second observations of q1 and q2

* ---Saving and Loading Subsets---
use c:myRfoldermymalesWQ, clear
keep workshop q*
save, replace
list

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.