Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

[Since this was originally published in 2013, I’ve collected new data that renders this article obsolete. You can always see the most recent data here . -Bob Muenchen]

I recently updated my plots of the data analysis tools used in academia in my ongoing article, The Popularity of Data Analysis Software. I repeat those here and update my previous forecast of data analysis software usage.

Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is likely to predict what the next generation of analysts will use for years to come. As you can see in Fig. 1, the use of most analytic software is growing rapidly in academia. The only one growing slowly, very slowly, is Statistica.


Figure 1. The growth of data analysis packages with SAS and SPSS removed.

While they remain dominant, the use of SAS and SPSS has been declining rapidly in recent years. Figure 2 plots the same data, adding SAS and SPSS and dropping JMP and Statistica (and changing all colors and symbols!)


Figure 2. Scholarly use of data analysis software with SAS and SPSS added, JMP and Statistica removed.

Since Google changes its search algorithm, I recollect all the data every year. Last year’s plot (below, Fig. 3) ended with the data from 2011 and contained some notable differences. For SPSS, the 2003 data value is quite a bit lower than the value collected in the current year. If the data were not collected by a computer program, I would suspect a data entry error. In addition, the old 2011 data value in Fig. 3 for SPSS showed a marked slowing in the rate of usage decline. In the 2012 plot (above, Fig. 2), not only does the decline not slow in 2011, but both the 2011 and 2012 points continue the sharp decline of the previous few years.

Figure 3. Scholarly use of data analysis software, collected in 2011. Note how different the SPSS value for 2011 is compared to that in Fig. 2.

Let’s take a more detailed look at what the future may hold for R, SAS and SPSS Statistics.

Here is the data from Google Scholar:

         R   SAS SPSS   Stata
1995     7  9120 7310      24
1996     4  9130 8560      92
1997     9 10600 11400    214
1998    16 11400 17900    333
1999    25 13100 29000    512
2000    51 17300 50500    785
2001   155 20900 78300    969
2002   286 26400 66200   1260
2003   639 36300 43500   1720
2004  1220 45700 156000  2350
2005  2210 55100 171000  2980
2006  3420 60400 169000  3940
2007  5070 61900 167000  4900
2008  7000 63100 155000  6150
2009  9320 60400 136000  7530
2010 11500 52000 109000  8890
2011 13600 44800  74900 10900
2012 17000 33500  49400 14700

ARIMA Forecasting

I forecast the use of R, SAS, SPSS and Stata five years into the future using Rob Hyndman’s forecast package and the default settings of its auto.arima function. The dip in SPSS use in 2002-2003 drove the function a bit crazy as it tried to see a repetitive up-down cycle, so I modeled the SPSS data only from its 2005 peak onward. Figure 4 shows the resulting predictions.


Figure 4. Forecast of scholarly use of the top four data analysis software packages, 2013 through 2017.

The forecast shows R and Stata surpassing SPSS and SAS this year (2013), with Stata coming out on top. It also shows all scholarly use of SPSS and SAS stopping in 2014 and 2015, respectively. Any forecasting book will warn you of the dangers of looking too far beyond the data and above forecast does just that.

Guestimate Forecasting

So what will happen? Each reader probably has his or her own opinion, here’s mine. The growth in R’s use in scholarly work will continue for three more years at which point it will level off at around 25,000 articles in 2015. This growth will be driven by:

  • The continued rapid growth in add-on packages
  • The attraction of R’s powerful language
  • The near monopoly R has on the latest analytic methods
  • Its free price
  • The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (IBM is loosening up on this a bit)

What will slow R’s growth is its lack of a graphical user interface that:

  • Is powerful
  • Is easy to use
  • Provides direct cut/paste access to journal style output in word processor format
  • Is standard, i.e. widely accepted as The One to Use
  • Is open source

While programming has important advantages over GUI use, many people will not take the time needed to learn to program. Therefore they rarely come to fully understand those advantages. Conversely, programmers seldom take the time to fully master a GUI and so often underestimate its full range of capabilities and its speed of use. Regardless of which is best, GUI users far outnumber programmers and, until resolved, this will limit R’s long term growth. There are GUIs for R, but with so many to choose from that none becomes the clear leader (Deducer, R Commander, Rattle, at least two from commercial companies and still more here.) If from this “GUI chaos” a clear leader were to emerge, then R could continue its rapid growth and end up as the most used software.

The use of SAS for scholarly work will continue to decline until it matches R at the 25,000 level. This is caused by competition from R and other packages (notably Stata) but also by SAS Instute’s self-inflicted GUI chaos. For years they have offered too many GUIs such as SAS/Assist, SAS/Insight, IML/Studio, the Analyst application, Enterprise Guide, Enterprise Miner and even JMP (which runs SAS nicely in recent versions). Professors looking to meet student demand for greater ease of use are not sure which GUI to teach, so they continue teaching SAS as a programming language. Even now that Enterprise Guide has evolved into a respectable GUI, many SAS users do not know what it is. If SAS Institute were to completely replace their default Display Manager System with Enterprise Guide, they could bend the curve and end up at a higher level of perhaps 27,000.

The use of SPSS for scholarly work will decline less sharply in 2013 and will level off in in 2015 at around 27,000 articles because:

  • Many of the people who needed advanced methods and were not happy calling R functions from within SPSS have already switched to R or Stata
  • Many of the people who like to program and want a more flexible language than SPSS offers have already switched to R or Stata
  • Many of the people who needed more interactive visualization have already switched to JMP

The GUI users will stick with SPSS until a GUI as good (or close to as good) comes to R and becomes widely accepted. At The University of Tennessee where I work, that’s the great majority of SPSS users.

Although Stata is currently the fastest growing package, it’s growth will slow in 2013 and level off by 2015 at around 23,000 articles, leaving it in fourth place. The main cause of this will be inertia of users of the established leaders, SPSS and SAS, as well as the competition from all the other packages, most notably R. R and Stata share many strengths and with one being free, I doubt Stata will be able to beat R in the long run.

The other packages shown in Fig. 1 will also level off around 2015, roughly maintaining their current place in the rankings. A possible exception is JMP, whose interface is radically superior to the the others for exploratory analysis. Its use could continue to grow, perhaps even replacing Stata for fourth place.

The future of SAS Enterprise Miner and IBM SPSS Modeler are tied to the success of each company’s more mainstream products, SAS and SPSS Statistics respectively. Use of those products is generally limited to one university class in data mining, while the other software discussed here is widely used in many classes. Both companies could significantly shift their future by combining their two main GUIs. Imagine a menu & dialog-box system that draws a simple flowchart as you do things. It would be easy to learn and users would quickly get the idea that you could manipulate the flowchart directly, increasing its window size to make more room. The flowchart GUI lets you see the big picture at a glance and lets you re-use the analysis without switching from GUI to programming, as all other GUI methods require. Such a merger could give SAS and SPSS a game-changing edge in this competitive marketplace.

So there you have it: the future of analytics revealed. No doubt each reader has found a wide range of things to disagree with, so I encourage you to do your own forecasts and add links to them in the comment section below. You can use my data or follow the detailed blog at Librestats to collect your own. One thing is certain: the coming decade in the field of analytics will be interesting indeed!

This entry was posted in Analytics, R, SAS, SPSS, Statistics and tagged , , , , . Bookmark the permalink.

136 Responses to Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

  1. Paul A. Thompson says:

    The problem with R is that it is not validated. I cannot imagine a pharma going from SAS to R. In addition, the mixed tools in R seem to be in a continual state of developement, and sometimes people note that the current versions have different results. I am a SAS user, and learned it in 1975, so I am not on the cutting edge. However, until R can 1) demonstrate that the system is dependable and 2) is accepted by the FDA for a NDA, you will see SAS used. In 20 years? Dunno about that.

    • Not wholly true, see the document on the R web-site “R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments” (‎).

      R is accepted by the FDA, in fact my understanding is that they do not explicitly state any software should be used, merely that whatever should be used is validated (there has been various discussion on this matter on the R-help mailing list over the years and more recently on the MedStats mailing list, search the archives of each if you want to know more).

    • R is already used for submission-related work in big pharma and we have set up validated R servers to this purpose for our customers, so if there is to be a transition it has already started..

    • ucfagls says:

      Utter FUD! R already is in use by Pharma and is acceptable to the FDA; they don’t mandate *any* software to the best of my knowledge, all have to be validated. R has a compliance statement in that regard.

      It is somewhat sad that you have this backwards. How can you know that SAS is doing what it says – it is closed source. R is completely open so you or anyone can check what it is doing and validate it. Something that many people do all the time.

    • ph says:

      I would like to point out one thing that often goes unstated in these R v. SAS testing discussions … yes R is tested and I can certainly write my own tests with it to convince myself that my specific routine works correctly … BUT SAS guarantees numerical precision across different chip sets and OS’ … this is very difficult, tedious work that I’m not sure is done in CRAN R.

      Has anyone ever compared a numerical routine in R, especially one requiring numerical derivatives, across say 32-bit Windows and 64-bit Linux? If you get different numbers, which numbers are right?

      • Bob Muenchen says:

        Hi ph,

        I handle the research software contracts for The University of Tennessee and I’ve read each new contract with SAS Institute. I’ve never seen them guarantee the accuracy of their software in writing. On the other hand, I think they’re a very trustworthy company that goes to great lengths to ensure the accuracy of their products. The main R download is also tested very carefully for accuracy and it has been found to be quite accurate by independent researchers (see http://www.r‐‐FDA.pdf, A comparative study of the reliability of nine statistical software packages, Keeling & Pavur; The reliability of statistical functions in four software packages freely used in numerical computation, Almiron et al). R calls LAPACK, the same set of subroutines used by two commercial packages that are highly regarded for their accuracy: MATLAB and Maple.

        Where accuracy becomes more of a concern is with add-on packages. As with a SAS Macro presented at the Global Forum and downloadable from the author’s web site, you have to decide if you trust the source or if you’re going to test it yourself against known solutions.


      • Might be a bit late to the game, but I’d like to also note that Stata also makes fairly extensive use of the LAPACK ( Stata seems to get left out of many of the conversations. Personally, I think there is room for them all to coexist. The biggest issue I see with SAS and SPSS are that they are cost prohibitive for smaller organizations and in some cases inaccessible to individuals. Their language/style conventions also don’t seem to say much about the respective organizations’ willingness to move toward more modern semantic paradigms. There are things that I am really starting to enjoy about R and things that I still find infinitely easier in Stata. In general, it seems that discussions like this tend to just create tension between each platform’s fanboys – so to speak. Instead, maybe more fruitful conversations could come from discussing the various strengths of each platform.

        • Bob Muenchen says:

          Hi Billy,

          Discussing the pros and cons of the various packages does tend to be about as contentious as comparing religions! That’s why for the most part I focus on measuring popularity or market share. At the moment R and Stata are the two most rapidly growing analytics packages used on scholarly articles (see ). I think that’s due to the fact that both are extremely extensible.


    • biostat says:

      Please, tell it Amgen, Merck, Pfizer, Novartis, FDA (yes!) and others 🙂 R is not validated? R is perfectly validated by more than 2 million of users having the ability to look into the code. Don’t even try to spread the panic. And please don’t write “dunno”, just check the things before you post any comment if you want to be seen as a professional.

      • Peter B. says:

        Did you know that every thematic section of the R repository has own academic supervisor? Many of them come from prestigious academic sites (Princeton, Stanford). I trust more them than closed sources and claims that “they do their best”. I have been working with R for last 8 years and always validated results with SAS and STATA. Have not discovered any dramatic discrepancies. It means that both SAS and R are well done. With the difference that I can go to, download any package I want, unpack it and look into the code (both R, FORTRAN and C). Most of authors of packages I use place references to literature and I can compare the written and implemented formulas. And I did this 3-4 times – just out of curiosity how it is done, and found no issues. Many people working in clinical research use R in validated environments, having their results validated with other packages following SOPs. And this excellent, indirect validation of the quality provided by R. Keep your head cool, don’t get mad – FDA does NOT endorse or require any specific software! The software you use must be compliant with 21 CFR Part 11 guidance. And regarding R, there is a document covering this topic. Thousands of biostatistcians use R, great, well known pharmaceutical companies use R in their submissions. SAS is not the only package you can make use of.

        • Bob Muenchen says:

          Hi Peter,

          Thanks for reporting on your experience comparing the accuracy of R to other software. When you mention “academic supervisors” are you referring to what CRAN labels the “maintainers” of the Task Views section? I wasn’t under the impression that they did any testing on the vast array of packages in their task views, but if you know of documentation that says otherwise, please send it my way.


  2. I thought we already had a winner in R GUIs called R-Studio. I took two data analysis courses in coursera and they used and recommended all students use R-Studio.

    • wrathematics says:

      When bob refers to a gui, he’s talking about something more like SPSS than Rstudio. Rstudio has its place, but it still requires R scripting. His dream is to have an easier to install and more functional version of deducer.

  3. While R-Studio looks like a beautiful IDE, it isn’t a GUI application for business analysts who won’t/can’t invest in learning R as a language.

    By GUI, they mean something like SAS Enterprise Guide or Rattle. Most business analysts I know have never learned an object-oriented language, they might pick up bits of Visual Basic and the most popular language they learn is SQL.

    R-Studio, IDE:
    SAS EG, modern GUI:
    Rattle, old-school GUI:

  4. Roland E Andersson says:

    How can we explain that the total number of hits for all the packages show such a peak in 2005? Do I understand correctly that the calculations are done on publications that report which software that were used? Maybe this practice is changing?

    • ucfagls says:

      If anything I would have thought that the opposite is true; that we are becoming more concerned with reproducibility. That said, I do know journals in some fields that instruct authors not to both stating what stats software they used – for shame!

      • Bob Muenchen says:

        Hi ucfagls,

        I agree completely. I hope the push toward more reproducible research starts getting journals to require stating the software complete with version and code used. I frequently get researchers coming in to do an analysis “just like they did in this article” but it’s impossible to tell WHAT they did. Luckily most authors are happy to explain when I write them.


    • Bob Muenchen says:

      Hi Roland,

      That’s a question that has mystified me for several years. Other than SAS and SPSS, Systat was #3 during the “hump” years but it did not show that hump. I added all the packages together to see if it smoothed things out. It did not; the plot looked much like SPSS by itself, just at a higher level.

      I’ve asked quite few people about this, including the people in charge of SPSS development, and no one is quite sure why there’s such a radical shift.

      One possibility for the drop off is that the recession cut government grant funding sharply. But you’d have to assume that for some reason it affected SAS and SPSS but not the other packages. That might be caused by the fact that more established researchers A) got more grants (and so more cuts) and B) being older they also used the older packages.

      Another possibility is the retirement of the baby boomers. If they tended to use the packages that have been around longer, then SPSS and SAS would have been disproportionately affected.


      • I work as a researcher for a government research foundation (in Brussels). We have cut our number of SPSS licences in half. Add-on modules were all reviewed. If we were not sure that it would be used in the coming year by a certain person, we skipped. The times that ‘let’s get one for him/her too’ to be on the safe side have turned around in ‘let’s skip it for him/her’ to be on the safe side. Moreover, funding for deata collection is also skimmed, so there’s less data driven research.

        • Bob Muenchen says:

          Hi Hendrik,

          I’ll bet sales add-on modules for SAS and SPSS are being closely examined in most organizations. It’s often easy to do 90% of your work in the main commercial product and then occasionally call an R package to do something specialized. That approach can save a lot of money without having to retrain the whole staff in R all at once. However, having less data driven research in general runs counter to overall trends. I never thought I’d see the day when the state of statistical analysis (excuse me, that’s “analytics” now) was routinely discussed in the general press.


  5. I am not argue with your analysis but I think we’ll have to wait a very long time before SAS loses its strength in the analytics market. Reason behind this is simple: legacy systems and legacy code – SAS has been here for more than 30yrs – it is cheaper to upgrade current systems (not necessarily in money but in time) with new SAS products than replace them with systems based in R for example. Did you see SAS last year profit?

    Academia may not best feature for predicting the end of SAS ( perhaps SPSS 😉

    • Bob Muenchen says:

      Hi Alberto,

      I agree, the momentum of SAS and SPSS will be slow to turn. In academia old code is helpful to have around for new projects, but it’s nowhere near as important as hundreds or thousands of SAS reports in industry. Even after scholarly work reaches some sort of long-term equilibrium, industry will have another 10-20 years before that pattern takes hold.


  6. Jesus FV says:

    Dear Bob

    I was surprised by the continuous fast growth of STATA. I thought that its peak moment was behind. Even in economics (where for a number of reasons -basically its powerful panel data capabilities) it is quite popular, more and more people are moving to R (in my own department, first year Ph.D. classes are now 100% R).

    Any thoughts?

    • Bob Muenchen says:

      Dear Jesus,

      I, too, was surprised by the continued strength of Stata. Below a list of similarities from my book with Joe Hilbe, R for Stata Users. I thought these similarities would mean that Stata would be hit the hardest by competition from R. However, lately in my workshops when I ask the Stata users if they program or use the GUI, around 90% say the GUI. That’s quite a switch from the early Stata users that I knew who where there for its excellent programming language. So I suspect that R’s lack of a point-and-click style GUI is keeping many Stata users from migrating. I also think that even for programmers Stata’s language is easier to learn, though it may be slightly less flexible (e.g. it’s easy in R to create entirely new data structures).


      From “R for Stata Users” (

      “Perhaps more than any other two research computing environments, R
      and Stata share many of the features that make them outstanding:

      • Both include rich programming languages designed for writing new analytic
      methods, not just a set of prewritten commands.

      • Both contain extensive sets of analytic commands written in their own
      languages. [for a clarification, see

      • The pre-written commands in R, and most in Stata, are visible and open
      for you to change as you please.

      • Both save command or function output in a form you can easily use as
      input to further analysis.

      • Both do modeling in a way that allows you to readily apply your models
      for tasks such as making predictions on new data sets. Stata calls these
      postestimation commands and R calls them extractor functions.

      • In both, when you write a new command, it is on an equal footing with
      commands written by the developers. There are no additional “Developer’s
      Kits" to purchase.

      • Both have legions of devoted users who have written numerous extensions
      and who continue to add the latest methods many years before their competitors.

      • Both can search the Internet for user-written commands and download
      them automatically to extend their capabilities quickly and easily.

      • Both hold their data in the computer’s main memory, offering speed but
      limiting the amount of data they can handle.”

      • Nick Cox says:

        Bob: Your GUI — programming distinction doesn’t make much sense for Stata. It has menus (in practice mostly only for official commands), it has a command line interface, and it has a do-file editor for developing scripts. So, what are you calling the GUI? Many users move back and forth between these. In practice, according to many people I’ve discussed this with, only novice or occasional users use the menus. Also, this balance hasn’t shifted much over the years. (I’ve been using Stata since 1991.) In short, Stata is highly command-oriented.

        • Bob Muenchen says:

          Hi Nick,

          I love Stata’s command language; it’s clear and concise. What I’ve noticed lately though is that the younger Stata users taking my workshops depend far more on the menus and dialog boxes (what I called the GUI) rather than commands. All questions I used to get were about the command language, but it has shifted over time. It could just be that more novice users are taking my workshops.


      • Jesus FV says:

        Dear Bob

        Thanks for your thoughtful response. Your analysis moved my posterior quite a bit from my prior 🙂

    • Blair says:

      My $0.02 worth, as a STATA and R user:
      Stata is quite cheap, and you can actually understand the user guides (and most of the error messages).
      STATA and R both have massive ranges of (free) add-on packages.
      Just installing SAS is like going back 20 years (repeatedly swapping between 6 CDs etc). This company cannot rely on inertia for ever.

      • Bob Muenchen says:

        Hi Blair,

        Stata is superb software and for a single user system it’s not too expensive. However, our server version cost $14,000 for one of our small servers. There’s no way we could afford it on a big cluster.

        The SAS installation is indeed in a class by itself. Not a good class, either. It has been years since I counted all the pages of instructions, but it was over 500!


  7. Robert Young says:

    — … so people tend to continue using the tool they learned in college for much of their careers.

    Were that it were so. If it were, COBOL and VSAM and RPG would have died around 1980. For those who work independently, than any tool which fits the hand will do. Academics come to mind. Anywhere else, not so much. Just as COBOL has 50 years of code hanging out (and being lipsticked with java/javascript to a faretheewell), SAS/SPSS has about 40. And mindshare.

    • Bob Muenchen says:

      Hi Robert,

      I had to laugh because at first I thought you were saying that people didn’t stick with what they learned in college because NEWER tools came along! I agree that there will be great pressure for new graduates to drop R and switch to SAS if that’s what their new employers use. Even if a company were trying to migrate from SAS to R, it is likely to take a decade or more. We just retired our mainframe, and it was a top priority to do so for almost 20 years!


  8. Mark Ezzo says:

    A very interesting article, but I seriously question the validity. I am software agnostic; the tool fits the problem, not vice versa. I work at a site that has R, SAS, SPSS, Stata, Matlab and several one-offs. This is for Health Care research and the majority of younger Researchers do enter knowing primarily R. However, they almost always gravitate to a more robust, enterprise environment and it is usually the SAS/Grid. Stata is used for Health Economics (good tool) and many do use SPSS. We have found that 70% of our Research use SAS. As you know, SAS can incorporate R, but in the era of Monstrous Data (especially Health Care), I do not see R supplanting any of the mainstream products. It is my understanding that market-share for SAS is increasing. Obviously, in such a lucrative market, there will be competition and entry into the market will curtail massive growth, but I do not see a declination. I have consulted in the Financial, Health Care, Pharma, etc. industries and several Government venues. I do not see this analysis as a real-market, applicable model, but more of being a proponent for R (which is also a good tool).

    • Bob Muenchen says:

      Hi Mark,

      I agree that each the methods I use to estimate popularity or market share are flawed in one way or another. It’s the combination of them all that I find most compelling (see I also don’t mean to imply that R is better than the alternatives. I use SAS and SPSS a lot and like them both. While I needed co-author Joe Hilbe to go in-depth on Stata, I hold it in very high regard.

      I think that SAS sales are going up because they continue to introduce useful vertically-integrated solutions (e.g. SAS Fraud Network Analysis). However I suspect that the market share of SAS/Stat is decreasing simply due to competition from all sides. There are some fairly major competitors that I’m not even covering, such as Tableau and Spotfire.

      Even if these trends were to continue in academia, I suspect that it would be a decade or two before they would make their way through industry.


      • Mark Ezzo says:

        Hi Bob, I would agree with your comments. I enjoy using all of them. Essentially, we are seeing the results of opportunity coupled with market saturation. Onward and upward!

    • Fr. says:

      I agree with a lot of that, because this view is problem-driven and takes staff turnover into account. My own experience of teaching R and Stata also converges with Bob’s “free puppy" remark below, and the comments that describe RStudio as the clear winner among R interfaces are also correct in my view, although there is indeed a difference between pushing buttons in a GUI and using an IDE.

      As far as I can tell, the current market can be summarized in three trends: fast ubiquitous growth through cutting-edge innovation (R), slower sectoral growth driven by path dependency (SAS) and specificity (Stata), and decline or stagnation (everything else, including SPSS).

    • I agree with your point that the choice of software relates to the kind of work. For my research, data manipulation and producing publication-ready tables are the core of my work. Lots of variabeles, lots of crosstabs, that’s not the target environment of R I think.

  9. Russell Dimond says:

    Seems to me that the headline on this article ought to be “Total citations of stat software in Google Scholar drop 50% over 4 years.” Since I very much doubt total usage of stat software has fallen, that suggests trends in total citations do not reflect trends in total usage. That raises the question of whether trends in the proportion of citations for each stat package actually reflect trends in the proportion of usage. Perhaps SPSS users are disproportionately more likely to have stopped citing the stat package they used to obtain their results? (I can think of reasons why that might be so, but they wouldn’t explain the same thing happening to SAS.)

    But putting that aside for the moment, Stata’s strength relative to R does not surprise me at all. Stata’s simply much easier to learn–even if you insist on writing programs rather than using the menus (which you should). Also keep in mind that many academic users don’t pay for their own Stata licenses, so R being free does not affect their decision-making.

    (To expand on Nick Cox’s point: you have to distinguish between people that use Stata’s GUI as an IDE for writing programs and people who use Stata’s GUI to avoid writing programs. I always teach people to do the former, and if you’re seeing more of the latter that’s disappointing but not terribly surprising.)

    • Bob Muenchen says:

      Hi Russell,

      How about the headline, “Number of Publications that use Stat Software has Increased 635% Since 1996, with a Weird Hump in the Middle.” The hump in the graph is quite bizarre! Competition from the packages shown in this set of graphs definitely cannot balance out that hump (I’ve plotted it to make sure) but it’s possible that competition from other packages that do statistics might. MATLAB, Mathematica, RapidMiner, Weka, SPSS Modeler, SAS Enterprise Miner, Spotfire, Tableau, KXEN, and Salford’s CART, TREENET, MARS, etc. must have been used in a lot of scholarly publications and many were not popular before 2005. However, in academia I don’t see the classic SPSS user using any of them. Stata is the only thing I see chomping away at the SPSS marketshare in academia.

      I agree that Stata is easier to learn and use than R. In fact, I suspect that if Stata were to become an open source project, it would become the most widely used software in academia in short order. Now excuse me while I put on my Kevlar vest!


      • Nick Cox says:

        I don’t see Stata going open source as far as proprietary code is concerned. But a more immediate point is simple, but often missed. Stata and R are converging in real price. The price of Stata — although more than most individuals prefer to pay — is modest compared with the commercial opposition, and once you have a current Stata licence free technical support is included and lots of free user-written software is available to you. The real price of R has to include whatever training and books and support from specialist companies that users pay for. In that sense the statement “R is free” is completely accurate but nevertheless incomplete. Naturally I am not denying that Stata is commercial and R is not: just saying “measure how much you pay”.

        • Bob Muenchen says:

          Hi Nick,

          Good point. Open source fanatics love the two “frees” — free beer i.e. free to use and freedom to change — but tend to downplay the “free puppy” one. You may totally love it, but it’s gonna cost you! I agree that Stata pricing is quite a deal for a single-user SE license for business ($845/yr) and the Small Stata license for students at $49/yr is decent. Where Stata pricing gets crazy is on servers. A 64-core server with 25 users is $75K/$40K for business/academia. The smallest cluster our group has is 5,000 cores, and the largest has 100,000. Such needs are not common, but we do use R at high scale (e.g. SAS Institute recently woke up and made their licensing for unlimited copies on unlimited servers at all our campuses for not much more than Stata charges for one small server. I’m sure they realized that too much Big Data work in academia was using R and they needed to address that.

          I don’t mean to imply that any of these packages are not worth their asking prices. As long as they are able to sell the software, it’s worth the price. But I’m glad open source projects such as R and RapidMiner are there to help drive prices down.


  10. I recently had to return to SPSS for one project after a long period of using R and MS Access. At first I was using the GUI but then found I couldn’t stand the lack of repeatability as the tasks had to be repeated over numerous datasets. Then I moved to syntax and it wasn’t so bad but I now strongly prefer R.

    I also noticed how limited the joining of datasets in SPSS is. I would have thought this feature would be more versatile by now but it looks like the fields to join on still have to be the same name, etc.

    • Bob Muenchen says:

      Hi Justin,

      As you point out, the repeatability factor is very important. I think that’s why so many of the newer packages like RapidMiner, Orange and Knime have adopted the flowchart GUI used by SPSS Modeler and SAS Enterprise Miner. It’s the only non-programming GUI that allows you to use it repeatedly without having to switch to programming. The Red-R GUI for R is like this, but unfortunately its progress seems to have stalled.


  11. Berry says:

    Very interesting, thanks for the analysis!
    I would be delighted to see matlab included the next time…

    • Bob Muenchen says:

      Guten Tag Berry,

      While MATLAB and R have much in common, MATLAB use is dominated by solving engineering problems rather than statistical analysis or data mining. If I could think of a way to split that use out, I would love to do it.


  12. Masanori Yoshida says:

    Thank you very much for your interesting article.
    I translated your article into Japanese. Please let me know if you are not comfortable with my translation.

  13. Tim Daciuk says:

    Hey Bob,
    Tim Daciuk here; I think that we did a couple of presentations and/or were on a panel together, back “in the day” (when I was part of SPSS Inc). Interesting article and interesting use of forecasting. Certainly the use of R is expanding; mostly due to the cost if R. I think however that to measure trends from a primarily from an academic/scholastic bent may be problematic. If you take. A look at ‘industry’ I think that SPSS and SAS are still the big gorillas in the market and will be for the foreseeable future. I think at this is due to a number of factors: 1) the ‘one throat to choke’ ability of having a company stand behind the product; 2) the end-to-end solutioning that SAS and SPSS offer (as predictive analytics becomes a business integrated function) which is not the with R; the development of vertical applications (mentioned earlier), and; the existing ancillary development and integration network around SPSS and SAS (though this is changing).

    P.S. I tend to rely on the Colbert statistic for a lot of my work!

    • Bob Muenchen says:

      Hi Tim,

      It’s nice to hear from you! I miss those SPSS Directions meetings. IBM priced them out of the academic market. I agree with all your points. I’ll write a new post soon based on job advertisements (mostly corporate) that reinforces your point.


  14. I really enjoyed reading your article and the comments. There is however another bit that would belong into the discussion, that has not yet been mentioned (or I over-read it). Let’s start with a provocative statement: in my books, teaching SAS/SPSS at Universities almost amounts to misappropriation of funds. Let me explain. By spending vast amounts of cash for software deals, this money is then lacking at other places like lab seats or smaller classes. In return, Unis get programs that de-facto vendor lock-in their students (and faculty). A common counter argument is, that Unis need to teach what industry requires. However, I believe that Unis job is to teach knowledge, and not to vendor-lock their students in specific software. If someone understands statistics, learning SAS or SPSS is not that much of a hassle anymore. However, teaching statistics with R, being able to demonstrate how calculations are done and results come about, gives any educator a definite advantage. So I hope that R (or any other free successor, for that matter) will eventually dominate.

    Forgive my tone, but at the moment I’m a little bit disgruntled because I just spent half a year trying to convince some Unis in Serbia to use exclusively R in their newly established statistics program, and failed.

    • Bob Muenchen says:

      Hi Christoph,

      At The University of Tennessee I’m in charge of software licensing for research tools and you’re right, we spend a LOT of money on them. It’s around $350K for research only and well over $1M if we include productivity tools, ERP software and data bases. When it comes to teaching, professors face a tough choices:

      Use what’s free or cheap to save the university money?
      Use what’s easy so students can focus on analytic concepts instead of programming and debugging?
      Use the tool that’s powerful so students will learn maximum flexibility?
      Use the tool that’s most likely to get the students a job?

      Depending on your perspective, each answer may lead to a different product! I hope that as the menus & dialog box GUIs for R improve and employers use R more, that all these could be fulfilled by one package. However, I suspect that SAS and SPSS will be #1 and #2 with employers for many years to come. I’ll have a blog post on that soon with the latest data.


      • tophcito says:

        Hey Bob,
        thank you for sharing the figures of UT. I had not realized that you work there, I had the privilege of being an exchange student to the Bartlett area around Memphis a long time ago. Since then, Rocky Top never fails to increase my heart rate. 😉

        Back to topic: I totally agree with those hard choices and certainly share your hope of R interfaces improving and thus gaining a bigger market share. They also entirely depend on the audience. For Statistics majors I would go 100% R from day one, complementing it with a scripting language as data retrieval and manipulation becomes an issue.

        When teaching other subjects the choice is less obvious for me, unfortunately. I got social science majors started on R using R Commander with quite some success. However, a key issue there is applying survey weights to data, and here R GUIs don’t help. To my surprise, most of them took quite easily to the command line. And once you get to the point where you tell them that using survey weights correctly in SPSS involves much more than issuing a WEIGHT BY command, they accept R’s solution willingly. Anyway, most social science students end up producing SPSS tables and guessing their meaning. So while SPSS aids them in producing results more quickly, it does not help them to produce correct results, let alone understand them.

        I’ll be looking forward to your post about employer preferences.


        • Bob Muenchen says:

          Hi Christoph,

          We had a large class of non-stat majors switch from programming in SAS to clicking in SPSS. There was a real concern that the SPSS approach would let the students be lazy and not learn as much about what the output meant. That happened over in the Statistics Department. Our research support group sees the students years later when it’s thesis or dissertation time. It was clear to us that the SPSS approach allowed the students to learn much MORE about what the statistics meant. With SAS programming, they spent far too much time debugging their programs. I’m sure this had nothing to do with SAS per se, but just the debugging time you’d have with any language.

          However, with stat majors, I agree that they must dive in and learn to program or they’ll never do well on the job.


          • Fr. says:

            Bob, why not Stata? It’s the right middle point between programming and point and click for non stats students.

          • Bob Muenchen says:

            Hi Fr.,

            I’m sure Stata would have done as well. SPSS was chosen by the departments that were requiring the students to take the class. My point was that when non-stat majors have only two stat classes in their entire PhD program, they’ll learn more about statistics if they don’t have to spend that time learning both programming and statistics. I think that would apply when comparing any two decent stat packages, one using programming and the other using a point-and-click GUI. Of course this was not a carefully controlled study, just an observational one. Decent research may well exist that would do a better job of settling that question!


  15. selva rajan says:

    The problem with open source software is that no one is responsible if there is a crash or a bug in calculation or flaw in crucial machine data analysis. In a paid software there is a company and can be held responsible (well partly atleast) and can be asked to fix the problem. You can’t call a specific person. Business needs service people to service them for what ever runs the company or school or research, so SPSS and SAS will stay on as long as they are paid software. Piracy has made paid software equivalent to open source !, so the adapters and learners of SPSS and SAS are so high that even in future there will hardly be any letup in those two software usage. I can’t see R surging ahead in future, though I will continue to work against the blank looks I get when I say “Why don’t you do the 3×3 matrix data Fisher test in R. SPSS can hadle only 2×2”. The question I get is “what is R?” and later ” Why would you download all those different packages and write a program for it ?”

    • Bob Muenchen says:

      Hi Selva,

      Revolution Analytics is betting that most people will agree with you and pay them for Revolution R Enterprise. Then they can pick up the phone and get immediate support.


  16. Academic Researcher says:

    I was wondering if you took into account the “renaming” of some SPSS products to PASW in and around 2009-2010. It would explain your rapid decrease of SPSS hits in that time period.

    • Bob Muenchen says:

      Hi Researcher,

      Thanks for asking this question. I thought my query included PASW, but I just checked and it did not. I’ll fix that for next year but the impact of it will be small. From it’s peak at 155K hits, SPSS fell to 49.4K. Only 7.7% of the decline since 2009 was due to the exclusion of PASW as a search term.


  17. Pingback: 2014 será o ano do fim do SAS e do SPSS? Não exatamente! | Mineração de Dados

  18. Republicou isso em psicometricae comentado:
    Add your thoughts here… (optional)

  19. Greetings from the University of Tennessee of Chattanooga. Thanks for the article. I completely agree with your prediction that R will level off without a GUI and the 80% of people will not use a code language. That was the revolution behind the development of Windows. Today’s generation is even more anti-code as everything is graphic based.

    • Bob Muenchen says:

      Hi Isaac,

      It’s nice to hear from UTC! I really like the combination of a GUI that writes a program that I can then customize. That way I get an error-free start and as much flexibility as I need. Most just want to point and click though.


  20. shrio says:

    To me this report is quite one sided if we were to look from this perspective:
    1) How much does universities and schools are willing to pay for a environment that encourages learning technologies that are widely used in demanding reality?
    Nowadays universities and schools are looking at profitability and/or cost saving rather than quality. If delivery of a course is on analytics, one may even consider analytics software or even business intelligence software, whichever does the job.

    • Bob Muenchen says:

      Hi Shrio,

      Academia is under pressure to control costs, but I’m not yet aware of any major universities that have stopped licensing SAS, SPSS, or Stata. So far only S-PLUS has been eliminated through the use of R. I suspect that it will be at least 10 years before R eliminates any others.


  21. radjaye says:

    In the full enterprise environment SAS has tools for SAS marketing automation and marketing optimisation…..I doubt r etc can provide anything that comes close to handling these tasks required in a busy marketing or crm environment…SAS is not just about analysing data,but taking things a step further and handling customer lifecycles….