How to Search for Data Science Jobs

by Robert A. Muenchen

This article describes the technical details of how to search for jobs in the field of data science. The results of the searches are displayed and discussed in The Popularity of Data Analysis Software.

IMPORTANT UPDATE: The protocols described below are significantly simplified since I first posted it on 2/20/14. This approach is easier to use and it treats each software more consistently. While this new method underestimates the actual number of jobs by a factor of roughly 2, it correlates highly with the actual number of jobs (r=.99). This ensures that relative popularity of each software is the same.

Overview

Here are the steps I use to search at Indeed.com in brief:

1. For software with non-ambiguous names, simply search on their names and add inclusion terms that ensure the jobs are for data science rather than for, say, simple report writing.

2. For software whose names are ambiguous, follow step 1, but also add exclusion terms that remove that ambiguity (i.e. use disambiguation).

Software Names and Disambiguation Terms

Below I list the actual search strings for each software plus disambiguation terms, when needed. When possible, I used the shortest version of a name. For example, there are far more jobs that ask for skills in “SPSS” than for the more official name, “IBM SPSS Statistics”. The shorter version will also capture the longer one.

A space between terms implies a logical “or” so to find a name like “Enterprise Miner”, the quotes are critically important; without them you would get thousands of extra hits that contain either term by itself. The exclamation point excludes a term. For example, the abbreviation “BMDP” is ambiguous because it stands not only for our target, Bio-Medical Data Package, but also Bone Marrow Donor Program. So adding “!marrow” eliminates such jobs.

The C language is a bit tricky since “C” is very ambiguous. However, “C programmer” and “C developer” are not. Note that all variants of C are included except for Objective C, which is used almost exclusively for Apple iPads and iPhones.

Alpine !"Alpine Investors" !"skiing"
Alteryx
Angoss or KnowledgeStudio or KnowledgeSeeker or KnowledgeReader
BMDP !SMDP !BMCP !marrow
("C progammer" or "C programming" or "C developer" or
 "C++" or "C#" !("objective c"))
"Enterprise Miner"
FICO
Hadoop
Infocentricity or Xeno
JMP
Julia (done manually w/ data science terms added)
KNIME
KXEN or InfiniteInsight
Lavastorm
Megaputer or Polyanalyst
Minitab
NCSS
Salford and (SPM or CART or MARS or TreeNet 
    or RandomForests or GPS or RuleLearner or ISLE)
R 
  !"R D" !"A R" !"H R" !"R N" 
  !toys !kids !" R Walgreen" !walmart
  !"HVAC R" !"R Bard" 
RapidMiner
SAS 
  !"system administrator" !"school age" 
  !sata !firmware !scsi !raid 
  !samsung !scandinavian !sonar !nurse
Spotfire
SPSS
"SPSS Modeler"
Stata
Statgraphics
Statistica
Systat
WEKA or Pentaho

Especially Ambiguous: SAS

The SAS of interest is, of course, the software from the SAS Institute. The software is focused on performing data science tasks, however “SAS” as a search  terms stands for many things:

SAS Shoes
Samsung Austin Semiconductor
School Age Services
Scandinavian Airlines System
SCSI Attached Storage (jobs often include keywords SATA and firmware)
Sedation Agitation Scoring
Senior Analysts (abbreviated SAs, which also spells SAS)
Sexual Assault Services (often advertised as “SAS Nurse”)
Specialist Alarm Services
Student Access to Science
Surgical Admissions Suite
Synthetic Aperture Sonar
System Administrators (abbreviated SAs, which also spells SAS)

and even the British special forces group, Special Air Service. Luckily, the latter do not seem to advertise for jobs on Indeed.com!

Although the above string eliminates over 500 jobs that otherwise would have been counted as relevant SAS jobs, it is almost impossible to prevent a slight over-count for SAS because there are several unavoidable abbreviations. For example, “Senior Analysts” are often referred to as “SAs” in job descriptions (e.g. “…the job will require working with other SAs on a team”) and many of them are analysts who do use SAS software. This is balanced by the fact that a relatively small number of “Systems Administrators” may also be doing data science with SAS, which that search string excludes. The net result is that the above search string probably over-counts relevant SAS jobs, but only by a small proportion.

Especially Ambiguous: R

The search situation for R is much worse than searching for SAS since “R” appears in a vast number of non-analytic job descriptions in categories such as:

R&D = Research and Development
H.R. = Human Resources
A/R = Accounts Receivable

R also appears in the names of companies who place large numbers of job advertisements such as:

Toys R Us (almost 8,000 jobs!)
Kids R Kids
HVAC/R
Walgreens, whose ads point out it was founded by Charles R. Wallgreen
C.R. Bard

However, when combined with the terms that define the concept of data science, the relatively short search string shown above does a good job of finding only relevant job listings.

Defining the Concept of Data Science

To determine which search terms best describe the concept of data science, I compiled a list of 33 terms commonly used in data science job descriptions, then I searched for all jobs whose descriptions included them, one at a time. I counted the number of jobs for each term and tracked how likely it was to result in an accurate hit. The latter was done using the time honored, “I know it when I see it” approach. (I’m quite familiar with advanced text analytics, but I don’t have time to extract all the data and do it.)

       Search Terms                         Number of Jobs
 1  Analytics (not well focused)               65,136
 2  Survey (not well focused)                  47,342
 3  Statistical (not well focused)             42,890
 4  Statistics (not well focused)              37,783
 5  Business intelligence (too much reporting) 19,805
 6  Analyze data (not well focused)            13,339
 7  Big Data *                                 10,378
 8  Statistical analysis *                      9,719
 9  Data mining *                               7,776
 10 Data analytics *                            6,209
 11 Machine learning *                          3,658
 12 Quantitative analysis *                     3,365
 13 Research associate (too vague)              3,022
 14 Business analytics *                        2,867
 15 Statistical software *                      2,102
 16 Predictive modeling *                       1,804
 17 Research analyst *                          1,722
 18 Statistician *                              1,711
 19 Predictive analytics *                      1,497
 20 Statistical modeling *                      1,462
 21 Quantitative research                       1,380
 22 Econometric  (specific field)               1,265
 23 Statistical tools                           1,121
 24 Data Scientist * (very well focused)          974
 25 Data Science                                  973
 26 Artificial intelligence                       794
 27 Statistical packages                          559
 28 Survey research                               559
 29 Quantitative modeling                         322
 30 Statistical research                          174
 31 Statistical analyst                           141
 32 Statistical computing                         108
 33 Research computing                             97
 34 Data miner                                     19

I then selected the subset that was most frequently used and which produced focused hits (marked with a “*” above). Some notable terms that that included far too many irrelevant hits include: “analytics”, “analyze data”, “business intelligence”, and “statistics”. I moved down the list until my queries became too long for Indeed.com to accept. Here is my resulting search for jobs in the field of data science (in order from most to least widely used). I present this list in query form to make it as easy as possible to use:

("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling"
or "research analyst"
or "statistician"
or "predictive analytics"
or "statistical modeling"
or "data scientist")

This entire search string depends upon the job counts taken on 2/20/14, but given that I’m using the most popular terms, these search criteria should change very slowly over time.

When searching for trends, it’s helpful to be able to list two packages in the same query. Here is a subset that will fit twice in the same query. See Searching for Trends below for details.

("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling")

To search for any particular piece of software, choose the string from the first table and add to it the data science terms above. For example, to search for R jobs, use this complete string:

R 
!"R D" !"A R" !"H R" !"R N" 
!toys !kids !" R Walgreen" 
!walmart !"HVAC R" !"R Bard"
("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling"
or "research analyst"
or "statistician"
or "predictive analytics"
or "statistical modeling"
or "data scientist")

Searching for Trends

Indeed.com has a Job Trends tool that lets you see how jobs are changing across the last several years. You can enter a single search from one of the examples above to see one trend line.

More interestingly, you can enter multiple searches separated by commas to see multiple trend lines. If both packages are easily searched by just their names, then this can be very easy to do. For example, to compare jobs for SPSS and Stata, simply enter the string, “spss, stata”.

However, a job trends search that compares R to SAS is fairly complex. Both have to exclude the irrelevant topics as discussed above in the sections that focus on R and SAS. Since an R query has to include its seek terms that help disambiguate its name, the SAS part of the query must also include those terms. However, the query becomes too long so you have to then start deleting the seek terms starting with the least popular ones (those at the bottom of the list) until the query is short enough to run. Here’s the resulting query of R vs. SAS:

R 
!"R D" !"A R" !"H R" !"R N" 
!toys !kids !" R Walgreen" !walmart
!"HVAC R" !"R Bard" 
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics")
,
SAS 
!"system administrator"
!"school age" 
!sata !firmware !scsi !raid
!samsung !scandinavian !sonar !nurse
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics")

With these specific seek terms, this query shows SAS as having only 72% more jobs. The queries that seek R and SAS separately show SAS with a 200% advantage. What’s missing of course are the jobs found with all those other search terms.

Although simply moving down the data science terms list will get you the most widely used terms, you may want to put more thought into the term selection process. Comparing R to SPSS is a good example. Since the name SPSS does not require exclusion terms, we have room to add inclusion terms. Here’s the query that uses the most popular search terms:

R 
!"R D" !"A R" !"H R" !"R N" 
!toys !kids !" R Walgreen" !walmart
!"HVAC R" !"R Bard" 
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling")
,
SPSS 
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling")

By fascinating coincidence, the next most popular term on the list is “research analyst”. The fact that it was left off has much more impact on the SPSS results than the R ones. That’s because SPSS is heavily used in market research. This is also referred to as “marketing research” in the job descriptions. So here’s a query that shows SPSS has a 276% lead over R in that arena (on 2/22/14):

R 
!"R D" !"A R" !"H R" !"R N" 
!toys !kids !" R Walgreen" !walmart
!"HVAC R" !"R Bard"   
and ("market research" or "marketing research")
,
SPSS and ("market research" or "marketing research")

To compare R to Python, I used a query that’s almost identical to the R vs. SPSS one. I simply substituted “Python” where “SPSS” had been.

R 
!"R D" !"A R" !"H R" !"R N" 
!toys !kids !" R Walgreen" !walmart
!"HVAC R" !"R Bard" 
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling")
,
Python 
and ("big data"
or "statistical analysis"
or "data mining"
or "data analytics"
or "machine learning"
or "quantitative analysis"
or "business analytics"
or "statistical software"
or "predictive modeling")

I’m very interested in improving this methodology so if you have ideas, please comment below or send me email at muenchen.bob@gmail.com.

12 Responses to How to Search for Data Science Jobs

  1. Pingback: Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More! | r4stats.com

  2. Pingback: Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More! | Patient 2 Earn

  3. Soumya Boral says:

    A very good study indeed. By the way do you use the Advanced Job Search Option on indeed.com or on the ordinary search option .

  4. boral1 says:

    Very good study . By the way are you using the Advanced Job Search to get the numbers or the ordinary search option ?

  5. Pingback: Analytics Software Popularity Update: Counting Blogs, Simplifying Job Searches | r4stats.com

  6. Thanks Bob – applied to several positions today using this precise methodology… Wonderful!

    • Bob Muenchen says:

      Hi Joe,

      Since my goal was to measure the popularity or market share of analytics tools, it did not even occur to me that people would use this info to actually find jobs. Doh! I’m glad it helped!

      Cheers,
      Bob

  7. Pingback: Python for Analytics | Building The Analytic Enterprise

  8. Pingback: YOU CANalytics

  9. ProQuotient says:

    This is an extremely helpful post for people who are looking for jobs in the Data Science industry. Using advanced search is quite effective to find the posts you are looking for. Thank you for sharing this!

Leave a Reply