SAS Dominates Analytics Job Market; R up 42%


I’m continuing to gather and analyze data to update The Popularity of Data Analysis Software. In this installment I cover the latest employment figures.

Employment is important to us all, so what software skills are employers seeking? A thorough answer to this question would require a time consuming content analysis of job descriptions. However, we can get a rough idea by searching on job advertising sites. Indeed.com is the most popular job search site in the world. As their  CEO and co-founder Paul Forster stated, it includes “all the jobs from over 1,000 unique sources, comprising the major job boards – Monster, Careerbuilder, Hotjobs, Craigslist – as well as hundreds of newspapers, associations, and company websites.” I used a program that went there weekly and searched jobs descriptions for keywords such as “SPSS” or “Minitab.” This was repeated during the 2nd, 3rd and 4th weeks of March in 2012 and 2013. (The data were meant to be for the complete two years, but the automated process went awry.)

The abbreviation “SAS” is common in computer storage, so I avoided those by searching for “SAS !SATA !storage !firmware” (the exclamation point represents a logical “not”). I focused on R while avoiding related topics like “R&D” by using “R SAS” or “SAS R”, including each package in the graph. The data for 2013 are presented in Figure 11.

Figure 11. Mean number of jobs per week available on Indeed.com for each software ( March 2013).
Figure 11. Mean number of jobs per week available on Indeed.com for each software (March 2013) [last label should read “BMDP”].

SAS has a very substantial lead in job openings, with SPSS coming in second with just over a quarter of the jobs. R comes in third place with slightly more than half the jobs available for SPSS. Compared to R or Minitab, SAS has over seven times as many jobs available!

Since 2012, job descriptions that included SAS declined by 961 (7.3%) and those containing Minitab declined by 154 (8.7%). Jobs for R increased by 497 (42%) pushing it past Minitab into third place by a slim margin. In fact, all packages except for SPSS and Systat showed significant though much smaller absolute changes (via Holm-corrected paired t-tests (Table 2). Since these comparisons are based on only three data points in each year, I would not put much stock in most of them, but the 48% increase for R is notable.

Given the extreme dominance of SAS, a data analyst would do well to know it unless he or she was seeking a job in a field in which one of the other packages is dominant.

                  2012      2013   Difference  Ratio
1        SAS     13234     12272      -961      0.93
2       SPSS      3299      3289       -10      1.00
3          R      1196      1693       497      1.42
4    Minitab      1769      1615      -154      0.91
5      Stata       842       898        56      1.07
6        JMP       644       619       -25      0.96
7 Statistica        61        71        10      1.17
8     Systat        14        15         1      1.07
9       BMDP         6        10         3      1.53

Table 2. Number of jobs on Indeed.com that list each software in March of 2012 and 2013. Changes are significant for all software except SPSS and Systat.

Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

[Since this was originally published in 2013, I’ve collected new data that renders this article obsolete. You can always see the most recent data here. -Bob Muenchen]

I recently updated my plots of the data analysis tools used in academia in my ongoing article, The Popularity of Data Analysis Software. I repeat those here and update my previous forecast of data analysis software usage.

Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is likely to predict what the next generation of analysts will use for years to come. As you can see in Fig. 1, the use of most analytic software is growing rapidly in academia. The only one growing slowly, very slowly, is Statistica.

Fig_7b_ScholarlyImpactLittle6
Figure 1. The growth of data analysis packages with SAS and SPSS removed.

While they remain dominant, the use of SAS and SPSS has been declining rapidly in recent years. Figure 2 plots the same data, adding SAS and SPSS and dropping JMP and Statistica (and changing all colors and symbols!)

Fig_7a_ScholarlyImpactBig6
Figure 2. Scholarly use of data analysis software with SAS and SPSS added, JMP and Statistica removed.

Since Google changes its search algorithm, I recollect all the data every year. Last year’s plot (below, Fig. 3) ended with the data from 2011 and contained some notable differences. For SPSS, the 2003 data value is quite a bit lower than the value collected in the current year. If the data were not collected by a computer program, I would suspect a data entry error. In addition, the old 2011 data value in Fig. 3 for SPSS showed a marked slowing in the rate of usage decline. In the 2012 plot (above, Fig. 2), not only does the decline not slow in 2011, but both the 2011 and 2012 points continue the sharp decline of the previous few years.

Figure 3. Scholarly use of data analysis software, collected in 2011. Note how different the SPSS value for 2011 is compared to that in Fig. 2.

Let’s take a more detailed look at what the future may hold for R, SAS and SPSS Statistics.

Here is the data from Google Scholar:

         R   SAS SPSS   Stata
1995     7  9120 7310      24
1996     4  9130 8560      92
1997     9 10600 11400    214
1998    16 11400 17900    333
1999    25 13100 29000    512
2000    51 17300 50500    785
2001   155 20900 78300    969
2002   286 26400 66200   1260
2003   639 36300 43500   1720
2004  1220 45700 156000  2350
2005  2210 55100 171000  2980
2006  3420 60400 169000  3940
2007  5070 61900 167000  4900
2008  7000 63100 155000  6150
2009  9320 60400 136000  7530
2010 11500 52000 109000  8890
2011 13600 44800  74900 10900
2012 17000 33500  49400 14700

ARIMA Forecasting

I forecast the use of R, SAS, SPSS and Stata five years into the future using Rob Hyndman’s forecast package and the default settings of its auto.arima function. The dip in SPSS use in 2002-2003 drove the function a bit crazy as it tried to see a repetitive up-down cycle, so I modeled the SPSS data only from its 2005 peak onward.  Figure 4 shows the resulting predictions.

Forecast
Figure 4. Forecast of scholarly use of the top four data analysis software packages, 2013 through 2017.

The forecast shows R and Stata surpassing SPSS and SAS this year (2013), with Stata coming out on top. It also shows all scholarly use of SPSS and SAS stopping in 2014 and 2015, respectively. Any forecasting book will warn you of the dangers of looking too far beyond the data and above forecast does just that.

Guestimate Forecasting

So what will happen? Each reader probably has his or her own opinion, here’s mine. The growth in R’s use in scholarly work will continue for three more years at which point it will level off at around 25,000 articles in 2015. This growth will be driven by:

  • The continued rapid growth in add-on packages
  • The attraction of R’s powerful language
  • The near monopoly R has on the latest analytic methods
  • Its free price
  • The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (IBM is loosening up on this a bit)

What will slow R’s growth is its lack of a graphical user interface that:

  • Is powerful
  • Is easy to use
  • Provides direct cut/paste access to journal style output in word processor format
  • Is standard, i.e. widely accepted as The One to Use
  • Is open source

While programming has important advantages over GUI use, many people will not take the time needed to learn to program. Therefore they rarely come to fully understand those advantages. Conversely, programmers seldom take the time to fully master a GUI and so often underestimate its full range of capabilities and its speed of use. Regardless of which is best, GUI users far outnumber programmers and, until resolved, this will limit R’s long term growth. There are GUIs for R, but with so many to choose from that none becomes the clear leader (Deducer, R Commander, Rattle, at least two from commercial companies and still more here.) If from this “GUI chaos” a clear leader were to emerge, then R could continue its rapid growth and end up as the most used software.

The use of SAS for scholarly work will continue to decline until it matches R at the 25,000 level. This is caused by competition from R and other packages (notably Stata) but also by SAS Instute’s self-inflicted GUI chaos.  For years they have offered too many GUIs such as SAS/Assist, SAS/Insight, IML/Studio, the Analyst application, Enterprise Guide, Enterprise Miner and  even JMP (which runs SAS nicely in recent versions). Professors looking to meet student demand for greater ease of use are not sure which GUI to teach, so they continue teaching SAS as a programming language. Even now that Enterprise Guide has evolved into a respectable GUI, many SAS users do not know what it is. If SAS Institute were to completely replace their default Display Manager System with Enterprise Guide, they could bend the curve and end up at a higher level of perhaps 27,000.

The use of SPSS for scholarly work will decline less sharply in 2013 and will level off in in 2015 at around 27,000 articles because:

  • Many of the people who needed advanced methods and were not happy calling R functions from within SPSS have already switched to R or Stata
  • Many of the people who like to program and want a more flexible language than SPSS offers have already switched to R or Stata
  • Many of the people who needed more interactive visualization have already switched to JMP

The GUI users will stick with SPSS until a GUI as good (or close to as good) comes to R and becomes widely accepted. At The University of Tennessee where I work, that’s the great majority of SPSS users.

Although Stata is currently the fastest growing package, it’s growth will slow in 2013 and level off by 2015 at around 23,000 articles, leaving it in fourth place. The main cause of this will be inertia of users of the established leaders, SPSS and SAS, as well as the competition from all the other packages, most notably R. R and Stata share many strengths and with one being free, I doubt Stata will be able to beat R in the long run.

The other packages shown in Fig. 1 will also level off around 2015, roughly maintaining their current place in the rankings. A possible exception is JMP, whose interface is radically superior to the the others for exploratory analysis. Its use could continue to grow, perhaps even replacing Stata for fourth place.

The future of SAS Enterprise Miner and IBM SPSS Modeler are tied to the success of each company’s more mainstream products, SAS and SPSS Statistics respectively. Use of those products is generally limited to one university class in data mining, while the other software discussed here is widely used in many classes. Both companies could significantly shift their future by combining their two main GUIs. Imagine a menu & dialog-box system that draws a simple flowchart as you do things. It would be easy to learn and users would quickly get the idea that you could manipulate the flowchart directly, increasing its window size to make more room. The flowchart GUI lets you see the big picture at a glance and lets you re-use the analysis without switching from GUI to programming, as all other GUI methods require. Such a merger could give SAS and SPSS a game-changing edge in this competitive marketplace.

So there you have it: the future of analytics revealed. No doubt each reader has found a wide range of things to disagree with, so I encourage you to do your own forecasts and add links to them in the comment section below. You can use my data or follow the detailed blog at Librestats to collect your own. One thing is certain: the coming decade in the field of analytics will be interesting indeed!

SAS, SPSS, Stata Users: Learn R from Home June 17

R--67

Has learning R been driving you a bit crazy? If so, it may be that you’re “lost in translation.” On June 17 and 19, I’ll be teaching a webinar, R for SAS, SPSS and Stata Users. With each R concept, I’ll introduce it using terminology that you already know,  then translate it into R’s very different view of the world. You’ll be following along, with hands-on practice, so that by the end of the workshop R’s fundamentals should be crystal clear. The examples we’ll do come right out of my books, R for SAS and SPSS Users and R for Stata Users. That way if you need more explanation later or want to dive in more deeply, the book of your choice will be very familiar. Plus, the table of contents and the index contain topics listed by SAS/SPSS/Stata terminology and R terminology so you can use either to find what you need.

A complete outline of the workshop plus a registration link is here. I have no artistic skills, but I’ve always been amazed at what artists can do. I taught this workshop in Knoxville on April 29, and pro photographer Steve Chastain made it look way more exciting than I recall! His view of it is here; turn your speakers up and get ready to boogie!