*by Robert A. Muenchen*

This article describes the technical details of how to search for jobs in the field of data science. The results of the searches are displayed and discussed in The Popularity of Data Analysis Software.

**IMPORTANT UPDATE:** The protocols described below are significantly simplified since I first posted it on 2/20/14. This approach is easier to use and it treats each software more consistently. While this new method *underestimates* the actual number of jobs by a factor of roughly 2, it correlates highly with the actual number of jobs (r=.99). This ensures that relative popularity of each software is the same.

**Overview**

Here are the steps I use to search at Indeed.com in brief:

1. For software with non-ambiguous names, simply search on their names and add inclusion terms that ensure the jobs are for data science rather than for, say, simple report writing.

2. For software whose names are ambiguous, follow step 1, but also add exclusion terms that remove that ambiguity (i.e. use disambiguation).

**Software Names and Disambiguation Terms
**

Below I list the actual search strings for each software plus disambiguation terms, when needed. When possible, I used the shortest version of a name. For example, there are far more jobs that ask for skills in “SPSS” than for the more official name, “IBM SPSS Statistics”. The shorter version will also capture the longer one.

A space between terms implies a logical “or” so to find a name like “Enterprise Miner”, the quotes are critically important; without them you would get thousands of extra hits that contain either term by itself. The exclamation point excludes a term. For example, the abbreviation “BMDP” is ambiguous because it stands not only for our target, Bio-Medical Data Package, but also Bone Marrow Donor Program. So adding “!marrow” eliminates such jobs.

The C language is a bit tricky since “C” is very ambiguous. However, “C programmer” and “C developer” are not. Note that all variants of C are included except for Objective C, which is used almost exclusively for Apple iPads and iPhones.

Alpine !"Alpine Investors" !"skiing" Alteryx Angoss or KnowledgeStudio or KnowledgeSeeker or KnowledgeReader BMDP !SMDP !BMCP !marrow ("C progammer" or "C programming" or "C developer" or "C++" or "C#" !("objective c")) "Enterprise Miner" FICO Hadoop Infocentricity or Xeno JMP Julia (done manually w/ data science terms added) KNIME KXEN or InfiniteInsight Lavastorm Megaputer or Polyanalyst Minitab NCSS Salford and (SPM or CART or MARS or TreeNet or RandomForests or GPS or RuleLearner or ISLE) R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" RapidMiner SAS !"system administrator" !"school age" !sata !firmware !scsi !raid !samsung !scandinavian !sonar !nurse Spotfire SPSS "SPSS Modeler" Stata Statgraphics Statistica Systat WEKA or Pentaho

**Especially Ambiguous: SAS**

The SAS of interest is, of course, the software from the SAS Institute. The software is focused on performing data science tasks, however “SAS” as a search terms stands for many things:

SAS Shoes

Samsung Austin Semiconductor

School Age Services

Scandinavian Airlines System

SCSI Attached Storage (jobs often include keywords SATA and firmware)

Sedation Agitation Scoring

Senior Analysts (abbreviated SAs, which also spells SAS)

Sexual Assault Services (often advertised as “SAS Nurse”)

Specialist Alarm Services

Student Access to Science

Surgical Admissions Suite

Synthetic Aperture Sonar

System Administrators (abbreviated SAs, which also spells SAS)

and even the British special forces group, Special Air Service. Luckily, the latter do not seem to advertise for jobs on Indeed.com!

Although the above string eliminates over 500 jobs that otherwise would have been counted as relevant SAS jobs, it is almost impossible to prevent a slight over-count for SAS because there are several unavoidable abbreviations. For example, “Senior Analysts” are often referred to as “SAs” in job descriptions (e.g. “…the job will require working with other SAs on a team”) and many of them are analysts who do use SAS software. This is balanced by the fact that a relatively small number of “Systems Administrators” may also be doing data science with SAS, which that search string excludes. The net result is that the above search string probably over-counts relevant SAS jobs, but only by a small proportion.

**Especially Ambiguous:** R

The search situation for R is much worse than searching for SAS since “R” appears in a *vast *number of non-analytic job descriptions in categories such as:

R&D = Research and Development

H.R. = Human Resources

A/R = Accounts Receivable

R also appears in the names of companies who place large numbers of job advertisements such as:

Toys R Us (almost 8,000 jobs!)

Kids R Kids

HVAC/R

Walgreens, whose ads point out it was founded by Charles R. Wallgreen

C.R. Bard

However, when combined with the terms that define the concept of data science, the relatively short search string shown above does a good job of finding only relevant job listings.

**Defining the Concept of Data Science**

To determine which search terms best describe the concept of data science, I compiled a list of 33 terms commonly used in data science job descriptions, then I searched for all jobs whose descriptions included them, one at a time. I counted the number of jobs for each term and tracked how likely it was to result in an accurate hit. The latter was done using the time honored, “I know it when I see it” approach. (I’m quite familiar with advanced text analytics, but I don’t have time to extract all the data and do it.)

Search Terms Number of Jobs1 Analytics (not well focused) 65,136 2 Survey (not well focused) 47,342 3 Statistical (not well focused) 42,890 4 Statistics (not well focused) 37,783 5 Business intelligence (too much reporting) 19,805 6 Analyze data (not well focused) 13,339 7 Big Data * 10,378 8 Statistical analysis * 9,719 9 Data mining * 7,776 10 Data analytics * 6,209 11 Machine learning * 3,658 12 Quantitative analysis * 3,365 13 Research associate (too vague) 3,022 14 Business analytics * 2,867 15 Statistical software * 2,102 16 Predictive modeling * 1,804 17 Research analyst * 1,722 18 Statistician * 1,711 19 Predictive analytics * 1,497 20 Statistical modeling * 1,462 21 Quantitative research 1,380 22 Econometric (specific field) 1,265 23 Statistical tools 1,121 24 Data Scientist * (very well focused) 974 25 Data Science 973 26 Artificial intelligence 794 27 Statistical packages 559 28 Survey research 559 29 Quantitative modeling 322 30 Statistical research 174 31 Statistical analyst 141 32 Statistical computing 108 33 Research computing 97 34 Data miner 19

I then selected the subset that was most frequently used and which produced focused hits (marked with a “*” above). Some notable terms that that included far too many irrelevant hits include: “analytics”, “analyze data”, “business intelligence”, and “statistics”. I moved down the list until my queries became too long for Indeed.com to accept. Here is my resulting search for jobs in the field of data science (in order from most to least widely used). I present this list in query form to make it as easy as possible to use:

("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling" or "research analyst" or "statistician" or "predictive analytics" or "statistical modeling" or "data scientist")

This entire search string depends upon the job counts taken on 2/20/14, but given that I’m using the most popular terms, these search criteria should change very slowly over time.

When searching for trends, it’s helpful to be able to list two packages in the same query. Here is a subset that will fit twice in the same query. See *Searching for Trends* below for details.

("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling")

To search for any particular piece of software, choose the string from the first table and add to it the data science terms above. For example, to search for R jobs, use this complete string:

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling" or "research analyst" or "statistician" or "predictive analytics" or "statistical modeling" or "data scientist")

**Searching for Trends**

Indeed.com has a Job Trends tool that lets you see how jobs are changing across the last several years. You can enter a single search from one of the examples above to see one trend line.

More interestingly, you can enter multiple searches *separated by commas* to see multiple trend lines. If both packages are easily searched by just their names, then this can be very easy to do. For example, to compare jobs for SPSS and Stata, simply enter the string, “spss, stata”.

However, a job trends search that compares R to SAS is fairly complex. Both have to exclude the irrelevant topics as discussed above in the sections that focus on R and SAS. Since an R query has to include its seek terms that help disambiguate its name, the SAS part of the query must also include those terms. However, the query becomes too long so you have to then start deleting the seek terms starting with the least popular ones (those at the bottom of the list) until the query is short enough to run. Here’s the resulting query of R vs. SAS:

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics") , SAS !"system administrator" !"school age" !sata !firmware !scsi !raid !samsung !scandinavian !sonar !nurse and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics")

With these specific seek terms, this query shows SAS as having only 72% more jobs. The queries that seek R and SAS separately show SAS with a 200% advantage. What’s missing of course are the jobs found with all those other search terms.

Although simply moving down the data science terms list will get you the most widely used terms, you may want to put more thought into the term selection process. Comparing R to SPSS is a good example. Since the name SPSS does not require exclusion terms, we have room to add inclusion terms. Here’s the query that uses the most popular search terms:

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling") , SPSS and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling")

By fascinating coincidence, the next most popular term on the list is “research analyst”. The fact that it was left off has much more impact on the SPSS results than the R ones. That’s because SPSS is heavily used in market research. This is also referred to as “marketing research” in the job descriptions. So here’s a query that shows SPSS has a 276% lead over R in that arena (on 2/22/14):

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" and ("market research" or "marketing research") , SPSS and ("market research" or "marketing research")

To compare R to Python, I used a query that’s almost identical to the R vs. SPSS one. I simply substituted “Python” where “SPSS” had been.

R !"R D" !"A R" !"H R" !"R N" !toys !kids !" R Walgreen" !walmart !"HVAC R" !"R Bard" and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling") , Python and ("big data" or "statistical analysis" or "data mining" or "data analytics" or "machine learning" or "quantitative analysis" or "business analytics" or "statistical software" or "predictive modeling")

I’m very interested in improving this methodology so if you have ideas, please comment below or send me email at muenchen.bob@gmail.com.

Pingback: Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More! | r4stats.com

Pingback: Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More! | Patient 2 Earn

A very good study indeed. By the way do you use the Advanced Job Search Option on indeed.com or on the ordinary search option .

Hi Soumya,

I used the standard search. My searches were so specific that I did not find the advanced search added any capabilities that I needed.

Cheers,

Bob

Very good study . By the way are you using the Advanced Job Search to get the numbers or the ordinary search option ?

Pingback: Analytics Software Popularity Update: Counting Blogs, Simplifying Job Searches | r4stats.com

Thanks Bob – applied to several positions today using this precise methodology… Wonderful!

Hi Joe,

Since my goal was to measure the popularity or market share of analytics tools, it did not even occur to me that people would use this info to actually find jobs. Doh! I’m glad it helped!

Cheers,

Bob

Pingback: Python for Analytics | Building The Analytic Enterprise

Pingback: YOU CANalytics

This is an extremely helpful post for people who are looking for jobs in the Data Science industry. Using advanced search is quite effective to find the posts you are looking for. Thank you for sharing this!

Hi ProQuotient,

I’m glad you’re finding the site useful!

Cheers,

Bob