Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

I recently updated my plots of the data analysis tools used in academia in my ongoing article, The Popularity of Data Analysis Software. I repeat those here and update my previous forecast of data analysis software usage.

Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is likely to predict what the next generation of analysts will use for years to come. As you can see in Fig. 1, the use of most analytic software is growing rapidly in academia. The only one growing slowly, very slowly, is Statistica.

Fig_7b_ScholarlyImpactLittle6

Figure 1. The growth of data analysis packages with SAS and SPSS removed.

While they remain dominant, the use of SAS and SPSS has been declining rapidly in recent years. Figure 2 plots the same data, adding SAS and SPSS and dropping JMP and Statistica (and changing all colors and symbols!)

Fig_7a_ScholarlyImpactBig6

Figure 2. Scholarly use of data analysis software with SAS and SPSS added, JMP and Statistica removed.

Since Google changes its search algorithm, I recollect all the data every year. Last year’s plot (below, Fig. 3) ended with the data from 2011 and contained some notable differences. For SPSS, the 2003 data value is quite a bit lower than the value collected in the current year. If the data were not collected by a computer program, I would suspect a data entry error. In addition, the old 2011 data value in Fig. 3 for SPSS showed a marked slowing in the rate of usage decline. In the 2012 plot (above, Fig. 2), not only does the decline not slow in 2011, but both the 2011 and 2012 points continue the sharp decline of the previous few years.

Figure 3. Scholarly use of data analysis software, collected in 2011. Note how different the SPSS value for 2011 is compared to that in Fig. 2.

Let’s take a more detailed look at what the future may hold for R, SAS and SPSS Statistics.

Here is the data from Google Scholar:

         R   SAS SPSS   Stata
1995     7  9120 7310      24
1996     4  9130 8560      92
1997     9 10600 11400    214
1998    16 11400 17900    333
1999    25 13100 29000    512
2000    51 17300 50500    785
2001   155 20900 78300    969
2002   286 26400 66200   1260
2003   639 36300 43500   1720
2004  1220 45700 156000  2350
2005  2210 55100 171000  2980
2006  3420 60400 169000  3940
2007  5070 61900 167000  4900
2008  7000 63100 155000  6150
2009  9320 60400 136000  7530
2010 11500 52000 109000  8890
2011 13600 44800  74900 10900
2012 17000 33500  49400 14700

ARIMA Forecasting

I forecast the use of R, SAS, SPSS and Stata five years into the future using Rob Hyndman’s forecast package and the default settings of its auto.arima function. The dip in SPSS use in 2002-2003 drove the function a bit crazy as it tried to see a repetitive up-down cycle, so I modeled the SPSS data only from its 2005 peak onward.  Figure 4 shows the resulting predictions.

Forecast

Figure 4. Forecast of scholarly use of the top four data analysis software packages, 2013 through 2017.

The forecast shows R and Stata surpassing SPSS and SAS this year (2013), with Stata coming out on top. It also shows all scholarly use of SPSS and SAS stopping in 2014 and 2015, respectively. Any forecasting book will warn you of the dangers of looking too far beyond the data and above forecast does just that.

Guestimate Forecasting

So what will happen? Each reader probably has his or her own opinion, here’s mine. The growth in R’s use in scholarly work will continue for three more years at which point it will level off at around 25,000 articles in 2015. This growth will be driven by:

  • The continued rapid growth in add-on packages
  • The attraction of R’s powerful language
  • The near monopoly R has on the latest analytic methods
  • Its free price
  • The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (IBM is loosening up on this a bit)

What will slow R’s growth is its lack of a graphical user interface that:

  • Is powerful
  • Is easy to use
  • Provides direct cut/paste access to journal style output in word processor format
  • Is standard, i.e. widely accepted as The One to Use
  • Is open source

While programming has important advantages over GUI use, many people will not take the time needed to learn to program. Therefore they rarely come to fully understand those advantages. Conversely, programmers seldom take the time to fully master a GUI and so often underestimate its full range of capabilities and its speed of use. Regardless of which is best, GUI users far outnumber programmers and, until resolved, this will limit R’s long term growth. There are GUIs for R, but with so many to choose from that none becomes the clear leader (Deducer, R Commander, Rattle, at least two from commercial companies and still more here.) If from this “GUI chaos” a clear leader were to emerge, then R could continue its rapid growth and end up as the most used software.

The use of SAS for scholarly work will continue to decline until it matches R at the 25,000 level. This is caused by competition from R and other packages (notably Stata) but also by SAS Instute’s self-inflicted GUI chaos.  For years they have offered too many GUIs such as SAS/Assist, SAS/Insight, IML/Studio, the Analyst application, Enterprise Guide, Enterprise Miner and  even JMP (which runs SAS nicely in recent versions). Professors looking to meet student demand for greater ease of use are not sure which GUI to teach, so they continue teaching SAS as a programming language. Even now that Enterprise Guide has evolved into a respectable GUI, many SAS users do not know what it is. If SAS Institute were to completely replace their default Display Manager System with Enterprise Guide, they could bend the curve and end up at a higher level of perhaps 27,000.

The use of SPSS for scholarly work will decline less sharply in 2013 and will level off in in 2015 at around 27,000 articles because:

  • Many of the people who needed advanced methods and were not happy calling R functions from within SPSS have already switched to R or Stata
  • Many of the people who like to program and want a more flexible language than SPSS offers have already switched to R or Stata
  • Many of the people who needed more interactive visualization have already switched to JMP

The GUI users will stick with SPSS until a GUI as good (or close to as good) comes to R and becomes widely accepted. At The University of Tennessee where I work, that’s the great majority of SPSS users.

Although Stata is currently the fastest growing package, it’s growth will slow in 2013 and level off by 2015 at around 23,000 articles, leaving it in fourth place. The main cause of this will be inertia of users of the established leaders, SPSS and SAS, as well as the competition from all the other packages, most notably R. R and Stata share many strengths and with one being free, I doubt Stata will be able to beat R in the long run.

The other packages shown in Fig. 1 will also level off around 2015, roughly maintaining their current place in the rankings. A possible exception is JMP, whose interface is radically superior to the the others for exploratory analysis. Its use could continue to grow, perhaps even replacing Stata for fourth place.

The future of SAS Enterprise Miner and IBM SPSS Modeler are tied to the success of each company’s more mainstream products, SAS and SPSS Statistics respectively. Use of those products is generally limited to one university class in data mining, while the other software discussed here is widely used in many classes. Both companies could significantly shift their future by combining their two main GUIs. Imagine a menu & dialog-box system that draws a simple flowchart as you do things. It would be easy to learn and users would quickly get the idea that you could manipulate the flowchart directly, increasing its window size to make more room. The flowchart GUI lets you see the big picture at a glance and lets you re-use the analysis without switching from GUI to programming, as all other GUI methods require. Such a merger could give SAS and SPSS a game-changing edge in this competitive marketplace.

So there you have it: the future of analytics revealed. No doubt each reader has found a wide range of things to disagree with, so I encourage you to do your own forecasts and add links to them in the comment section below. You can use my data or follow the detailed blog at Librestats to collect your own. One thing is certain: the coming decade in the field of analytics will be interesting indeed!

About these ads

About Bob Muenchen

I help researchers analyze their data and write books about research computing.
This entry was posted in Analytics, R, SAS, SPSS, Statistics and tagged , , , , . Bookmark the permalink.

119 Responses to Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

  1. Paul A. Thompson says:

    The problem with R is that it is not validated. I cannot imagine a pharma going from SAS to R. In addition, the mixed tools in R seem to be in a continual state of developement, and sometimes people note that the current versions have different results. I am a SAS user, and learned it in 1975, so I am not on the cutting edge. However, until R can 1) demonstrate that the system is dependable and 2) is accepted by the FDA for a NDA, you will see SAS used. In 20 years? Dunno about that.

    • Not wholly true, see the document on the R web-site “R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments” (http://www.r-project.org/doc/R-FDA.pdf‎).

      R is accepted by the FDA, in fact my understanding is that they do not explicitly state any software should be used, merely that whatever should be used is validated (there has been various discussion on this matter on the R-help mailing list over the years and more recently on the MedStats mailing list, search the archives of each if you want to know more).

    • R is already used for submission-related work in big pharma and we have set up validated R servers to this purpose for our customers, so if there is to be a transition it has already started..

    • ucfagls says:

      Utter FUD! R already is in use by Pharma and is acceptable to the FDA; they don’t mandate *any* software to the best of my knowledge, all have to be validated. R has a compliance statement in that regard.

      It is somewhat sad that you have this backwards. How can you know that SAS is doing what it says – it is closed source. R is completely open so you or anyone can check what it is doing and validate it. Something that many people do all the time.

    • ph says:

      I would like to point out one thing that often goes unstated in these R v. SAS testing discussions … yes R is tested and I can certainly write my own tests with it to convince myself that my specific routine works correctly … BUT SAS guarantees numerical precision across different chip sets and OS’ … this is very difficult, tedious work that I’m not sure is done in CRAN R.

      Has anyone ever compared a numerical routine in R, especially one requiring numerical derivatives, across say 32-bit Windows and 64-bit Linux? If you get different numbers, which numbers are right?

      • Bob Muenchen says:

        Hi ph,

        I handle the research software contracts for The University of Tennessee and I’ve read each new contract with SAS Institute. I’ve never seen them guarantee the accuracy of their software in writing. On the other hand, I think they’re a very trustworthy company that goes to great lengths to ensure the accuracy of their products. The main R download is also tested very carefully for accuracy and it has been found to be quite accurate by independent researchers (see http://www.r‐project.org/doc/R‐FDA.pdf, A comparative study of the reliability of nine statistical software packages, Keeling & Pavur; The reliability of statistical functions in four software packages freely used in numerical computation, Almiron et al). R calls LAPACK, the same set of subroutines used by two commercial packages that are highly regarded for their accuracy: MATLAB and Maple.

        Where accuracy becomes more of a concern is with add-on packages. As with a SAS Macro presented at the Global Forum and downloadable from the author’s web site, you have to decide if you trust the source or if you’re going to test it yourself against known solutions.

        Cheers,
        Bob

      • Might be a bit late to the game, but I’d like to also note that Stata also makes fairly extensive use of the LAPACK (http://www.stata.com/manuals13/m-5lapack.pdf). Stata seems to get left out of many of the conversations. Personally, I think there is room for them all to coexist. The biggest issue I see with SAS and SPSS are that they are cost prohibitive for smaller organizations and in some cases inaccessible to individuals. Their language/style conventions also don’t seem to say much about the respective organizations’ willingness to move toward more modern semantic paradigms. There are things that I am really starting to enjoy about R and things that I still find infinitely easier in Stata. In general, it seems that discussions like this tend to just create tension between each platform’s fanboys – so to speak. Instead, maybe more fruitful conversations could come from discussing the various strengths of each platform.

      • Bob Muenchen says:

        Hi Billy,

        Discussing the pros and cons of the various packages does tend to be about as contentious as comparing religions! That’s why for the most part I focus on measuring popularity or market share. At the moment R and Stata are the two most rapidly growing analytics packages used on scholarly articles (see ). I think that’s due to the fact that both are extremely extensible.

        Cheers,
        Bob

    • biostat says:

      Please, tell it Amgen, Merck, Pfizer, Novartis, FDA (yes!) and others :) R is not validated? R is perfectly validated by more than 2 million of users having the ability to look into the code. Don’t even try to spread the panic. And please don’t write “dunno”, just check the things before you post any comment if you want to be seen as a professional.

      • Peter B. says:

        Did you know that every thematic section of the R repository has own academic supervisor? Many of them come from prestigious academic sites (Princeton, Stanford). I trust more them than closed sources and claims that “they do their best”. I have been working with R for last 8 years and always validated results with SAS and STATA. Have not discovered any dramatic discrepancies. It means that both SAS and R are well done. With the difference that I can go to cran.r-project.org/src/contrib/Archive/, download any package I want, unpack it and look into the code (both R, FORTRAN and C). Most of authors of packages I use place references to literature and I can compare the written and implemented formulas. And I did this 3-4 times – just out of curiosity how it is done, and found no issues. Many people working in clinical research use R in validated environments, having their results validated with other packages following SOPs. And this excellent, indirect validation of the quality provided by R. Keep your head cool, don’t get mad – FDA does NOT endorse or require any specific software! The software you use must be compliant with 21 CFR Part 11 guidance. And regarding R, there is a document covering this topic. Thousands of biostatistcians use R, great, well known pharmaceutical companies use R in their submissions. SAS is not the only package you can make use of.

      • Bob Muenchen says:

        Hi Peter,

        Thanks for reporting on your experience comparing the accuracy of R to other software. When you mention “academic supervisors” are you referring to what CRAN labels the “maintainers” of the Task Views section? I wasn’t under the impression that they did any testing on the vast array of packages in their task views, but if you know of documentation that says otherwise, please send it my way.

        Cheers,
        Bob

  2. I thought we already had a winner in R GUIs called R-Studio. I took two data analysis courses in coursera and they used and recommended all students use R-Studio.

    • wrathematics says:

      When bob refers to a gui, he’s talking about something more like SPSS than Rstudio. Rstudio has its place, but it still requires R scripting. His dream is to have an easier to install and more functional version of deducer.

      • Bob Muenchen says:

        Exactly. I love RStudio for what it does. I’m in it all day long. But even at the graduate student level, 80% or more of people just will not program.

  3. While R-Studio looks like a beautiful IDE, it isn’t a GUI application for business analysts who won’t/can’t invest in learning R as a language.

    By GUI, they mean something like SAS Enterprise Guide or Rattle. Most business analysts I know have never learned an object-oriented language, they might pick up bits of Visual Basic and the most popular language they learn is SQL.

    R-Studio, IDE: http://www.rstudio.com/ide/screenshots/
    SAS EG, modern GUI: http://www.sas.com/technologies/bi/query_reporting/guide/#section=4
    Rattle, old-school GUI: http://techpad.co.uk/custom/images/large/5007d3c919774.jpg

  4. Roland E Andersson says:

    How can we explain that the total number of hits for all the packages show such a peak in 2005? Do I understand correctly that the calculations are done on publications that report which software that were used? Maybe this practice is changing?

    • ucfagls says:

      If anything I would have thought that the opposite is true; that we are becoming more concerned with reproducibility. That said, I do know journals in some fields that instruct authors not to both stating what stats software they used – for shame!

      • Bob Muenchen says:

        Hi ucfagls,

        I agree completely. I hope the push toward more reproducible research starts getting journals to require stating the software complete with version and code used. I frequently get researchers coming in to do an analysis “just like they did in this article” but it’s impossible to tell WHAT they did. Luckily most authors are happy to explain when I write them.

        Cheers,
        Bob

    • Bob Muenchen says:

      Hi Roland,

      That’s a question that has mystified me for several years. Other than SAS and SPSS, Systat was #3 during the “hump” years but it did not show that hump. I added all the packages together to see if it smoothed things out. It did not; the plot looked much like SPSS by itself, just at a higher level.

      I’ve asked quite few people about this, including the people in charge of SPSS development, and no one is quite sure why there’s such a radical shift.

      One possibility for the drop off is that the recession cut government grant funding sharply. But you’d have to assume that for some reason it affected SAS and SPSS but not the other packages. That might be caused by the fact that more established researchers A) got more grants (and so more cuts) and B) being older they also used the older packages.

      Another possibility is the retirement of the baby boomers. If they tended to use the packages that have been around longer, then SPSS and SAS would have been disproportionately affected.

      Cheers,
      Bob

      • I work as a researcher for a government research foundation (in Brussels). We have cut our number of SPSS licences in half. Add-on modules were all reviewed. If we were not sure that it would be used in the coming year by a certain person, we skipped. The times that ‘let’s get one for him/her too’ to be on the safe side have turned around in ‘let’s skip it for him/her’ to be on the safe side. Moreover, funding for deata collection is also skimmed, so there’s less data driven research.

      • Bob Muenchen says:

        Hi Hendrik,

        I’ll bet sales add-on modules for SAS and SPSS are being closely examined in most organizations. It’s often easy to do 90% of your work in the main commercial product and then occasionally call an R package to do something specialized. That approach can save a lot of money without having to retrain the whole staff in R all at once. However, having less data driven research in general runs counter to overall trends. I never thought I’d see the day when the state of statistical analysis (excuse me, that’s “analytics” now) was routinely discussed in the general press.

        Cheers,
        Bob

  5. I am not argue with your analysis but I think we’ll have to wait a very long time before SAS loses its strength in the analytics market. Reason behind this is simple: legacy systems and legacy code – SAS has been here for more than 30yrs – it is cheaper to upgrade current systems (not necessarily in money but in time) with new SAS products than replace them with systems based in R for example. Did you see SAS last year profit?

    Academia may not best feature for predicting the end of SAS ( perhaps SPSS ;-)

    • Bob Muenchen says:

      Hi Alberto,

      I agree, the momentum of SAS and SPSS will be slow to turn. In academia old code is helpful to have around for new projects, but it’s nowhere near as important as hundreds or thousands of SAS reports in industry. Even after scholarly work reaches some sort of long-term equilibrium, industry will have another 10-20 years before that pattern takes hold.

      Cheers,
      Bob

  6. Jesus FV says:

    Dear Bob

    I was surprised by the continuous fast growth of STATA. I thought that its peak moment was behind. Even in economics (where for a number of reasons -basically its powerful panel data capabilities) it is quite popular, more and more people are moving to R (in my own department, first year Ph.D. classes are now 100% R).

    Any thoughts?

    • Bob Muenchen says:

      Dear Jesus,

      I, too, was surprised by the continued strength of Stata. Below a list of similarities from my book with Joe Hilbe, R for Stata Users. I thought these similarities would mean that Stata would be hit the hardest by competition from R. However, lately in my workshops when I ask the Stata users if they program or use the GUI, around 90% say the GUI. That’s quite a switch from the early Stata users that I knew who where there for its excellent programming language. So I suspect that R’s lack of a point-and-click style GUI is keeping many Stata users from migrating. I also think that even for programmers Stata’s language is easier to learn, though it may be slightly less flexible (e.g. it’s easy in R to create entirely new data structures).

      Cheers,
      Bob

      From “R for Stata Users” (http://r4stats.com/books/r4stata/):

      “Perhaps more than any other two research computing environments, R
      and Stata share many of the features that make them outstanding:

      • Both include rich programming languages designed for writing new analytic
      methods, not just a set of prewritten commands.

      • Both contain extensive sets of analytic commands written in their own
      languages. [for a clarification, see http://librestats.com/2011/08/29/how-much-of-r-is-written-in-r-part-2-contributed-packages/%5D

      • The pre-written commands in R, and most in Stata, are visible and open
      for you to change as you please.

      • Both save command or function output in a form you can easily use as
      input to further analysis.

      • Both do modeling in a way that allows you to readily apply your models
      for tasks such as making predictions on new data sets. Stata calls these
      postestimation commands and R calls them extractor functions.

      • In both, when you write a new command, it is on an equal footing with
      commands written by the developers. There are no additional “Developer’s
      Kits” to purchase.

      • Both have legions of devoted users who have written numerous extensions
      and who continue to add the latest methods many years before their competitors.

      • Both can search the Internet for user-written commands and download
      them automatically to extend their capabilities quickly and easily.

      • Both hold their data in the computer’s main memory, offering speed but
      limiting the amount of data they can handle.”

      • Nick Cox says:

        Bob: Your GUI — programming distinction doesn’t make much sense for Stata. It has menus (in practice mostly only for official commands), it has a command line interface, and it has a do-file editor for developing scripts. So, what are you calling the GUI? Many users move back and forth between these. In practice, according to many people I’ve discussed this with, only novice or occasional users use the menus. Also, this balance hasn’t shifted much over the years. (I’ve been using Stata since 1991.) In short, Stata is highly command-oriented.

      • Bob Muenchen says:

        Hi Nick,

        I love Stata’s command language; it’s clear and concise. What I’ve noticed lately though is that the younger Stata users taking my workshops depend far more on the menus and dialog boxes (what I called the GUI) rather than commands. All questions I used to get were about the command language, but it has shifted over time. It could just be that more novice users are taking my workshops.

        Cheers,
        Bob

      • Jesus FV says:

        Dear Bob

        Thanks for your thoughtful response. Your analysis moved my posterior quite a bit from my prior :)

      • Bob Muenchen says:

        Jesus, Haha! Well said, well said! -Bob

    • Blair says:

      My $0.02 worth, as a STATA and R user:
      Stata is quite cheap, and you can actually understand the user guides (and most of the error messages).
      STATA and R both have massive ranges of (free) add-on packages.
      Just installing SAS is like going back 20 years (repeatedly swapping between 6 CDs etc). This company cannot rely on inertia for ever.

      • Bob Muenchen says:

        Hi Blair,

        Stata is superb software and for a single user system it’s not too expensive. However, our server version cost $14,000 for one of our small servers. There’s no way we could afford it on a big cluster.

        The SAS installation is indeed in a class by itself. Not a good class, either. It has been years since I counted all the pages of instructions, but it was over 500!

        Cheers,
        Bob

  7. Robert Young says:

    – … so people tend to continue using the tool they learned in college for much of their careers.

    Were that it were so. If it were, COBOL and VSAM and RPG would have died around 1980. For those who work independently, than any tool which fits the hand will do. Academics come to mind. Anywhere else, not so much. Just as COBOL has 50 years of code hanging out (and being lipsticked with java/javascript to a faretheewell), SAS/SPSS has about 40. And mindshare.

    • Bob Muenchen says:

      Hi Robert,

      I had to laugh because at first I thought you were saying that people didn’t stick with what they learned in college because NEWER tools came along! I agree that there will be great pressure for new graduates to drop R and switch to SAS if that’s what their new employers use. Even if a company were trying to migrate from SAS to R, it is likely to take a decade or more. We just retired our mainframe, and it was a top priority to do so for almost 20 years!

      Cheers,
      Bob

  8. Mark Ezzo says:

    A very interesting article, but I seriously question the validity. I am software agnostic; the tool fits the problem, not vice versa. I work at a site that has R, SAS, SPSS, Stata, Matlab and several one-offs. This is for Health Care research and the majority of younger Researchers do enter knowing primarily R. However, they almost always gravitate to a more robust, enterprise environment and it is usually the SAS/Grid. Stata is used for Health Economics (good tool) and many do use SPSS. We have found that 70% of our Research use SAS. As you know, SAS can incorporate R, but in the era of Monstrous Data (especially Health Care), I do not see R supplanting any of the mainstream products. It is my understanding that market-share for SAS is increasing. Obviously, in such a lucrative market, there will be competition and entry into the market will curtail massive growth, but I do not see a declination. I have consulted in the Financial, Health Care, Pharma, etc. industries and several Government venues. I do not see this analysis as a real-market, applicable model, but more of being a proponent for R (which is also a good tool).

    • Bob Muenchen says:

      Hi Mark,

      I agree that each the methods I use to estimate popularity or market share are flawed in one way or another. It’s the combination of them all that I find most compelling (see http://bit.ly/statpop). I also don’t mean to imply that R is better than the alternatives. I use SAS and SPSS a lot and like them both. While I needed co-author Joe Hilbe to go in-depth on Stata, I hold it in very high regard.

      I think that SAS sales are going up because they continue to introduce useful vertically-integrated solutions (e.g. SAS Fraud Network Analysis). However I suspect that the market share of SAS/Stat is decreasing simply due to competition from all sides. There are some fairly major competitors that I’m not even covering, such as Tableau and Spotfire.

      Even if these trends were to continue in academia, I suspect that it would be a decade or two before they would make their way through industry.

      Cheers,
      Bob

      • Mark Ezzo says:

        Hi Bob, I would agree with your comments. I enjoy using all of them. Essentially, we are seeing the results of opportunity coupled with market saturation. Onward and upward!

    • Fr. says:

      I agree with a lot of that, because this view is problem-driven and takes staff turnover into account. My own experience of teaching R and Stata also converges with Bob’s “free puppy” remark below, and the comments that describe RStudio as the clear winner among R interfaces are also correct in my view, although there is indeed a difference between pushing buttons in a GUI and using an IDE.

      As far as I can tell, the current market can be summarized in three trends: fast ubiquitous growth through cutting-edge innovation (R), slower sectoral growth driven by path dependency (SAS) and specificity (Stata), and decline or stagnation (everything else, including SPSS).

    • I agree with your point that the choice of software relates to the kind of work. For my research, data manipulation and producing publication-ready tables are the core of my work. Lots of variabeles, lots of crosstabs, that’s not the target environment of R I think.

  9. Russell Dimond says:

    Seems to me that the headline on this article ought to be “Total citations of stat software in Google Scholar drop 50% over 4 years.” Since I very much doubt total usage of stat software has fallen, that suggests trends in total citations do not reflect trends in total usage. That raises the question of whether trends in the proportion of citations for each stat package actually reflect trends in the proportion of usage. Perhaps SPSS users are disproportionately more likely to have stopped citing the stat package they used to obtain their results? (I can think of reasons why that might be so, but they wouldn’t explain the same thing happening to SAS.)

    But putting that aside for the moment, Stata’s strength relative to R does not surprise me at all. Stata’s simply much easier to learn–even if you insist on writing programs rather than using the menus (which you should). Also keep in mind that many academic users don’t pay for their own Stata licenses, so R being free does not affect their decision-making.

    (To expand on Nick Cox’s point: you have to distinguish between people that use Stata’s GUI as an IDE for writing programs and people who use Stata’s GUI to avoid writing programs. I always teach people to do the former, and if you’re seeing more of the latter that’s disappointing but not terribly surprising.)

    • Bob Muenchen says:

      Hi Russell,

      How about the headline, “Number of Publications that use Stat Software has Increased 635% Since 1996, with a Weird Hump in the Middle.” The hump in the graph is quite bizarre! Competition from the packages shown in this set of graphs definitely cannot balance out that hump (I’ve plotted it to make sure) but it’s possible that competition from other packages that do statistics might. MATLAB, Mathematica, RapidMiner, Weka, SPSS Modeler, SAS Enterprise Miner, Spotfire, Tableau, KXEN, and Salford’s CART, TREENET, MARS, etc. must have been used in a lot of scholarly publications and many were not popular before 2005. However, in academia I don’t see the classic SPSS user using any of them. Stata is the only thing I see chomping away at the SPSS marketshare in academia.

      I agree that Stata is easier to learn and use than R. In fact, I suspect that if Stata were to become an open source project, it would become the most widely used software in academia in short order. Now excuse me while I put on my Kevlar vest!

      Cheers,
      Bob

      • Nick Cox says:

        I don’t see Stata going open source as far as proprietary code is concerned. But a more immediate point is simple, but often missed. Stata and R are converging in real price. The price of Stata — although more than most individuals prefer to pay — is modest compared with the commercial opposition, and once you have a current Stata licence free technical support is included and lots of free user-written software is available to you. The real price of R has to include whatever training and books and support from specialist companies that users pay for. In that sense the statement “R is free” is completely accurate but nevertheless incomplete. Naturally I am not denying that Stata is commercial and R is not: just saying “measure how much you pay”.

      • Bob Muenchen says:

        Hi Nick,

        Good point. Open source fanatics love the two “frees” — free beer i.e. free to use and freedom to change — but tend to downplay the “free puppy” one. You may totally love it, but it’s gonna cost you! I agree that Stata pricing is quite a deal for a single-user SE license for business ($845/yr) and the Small Stata license for students at $49/yr is decent. Where Stata pricing gets crazy is on servers. A 64-core server with 25 users is $75K/$40K for business/academia. The smallest cluster our group has is 5,000 cores, and the largest has 100,000. Such needs are not common, but we do use R at high scale (e.g. http://www.r-bloggers.com/r-at-12000-cores/). SAS Institute recently woke up and made their licensing for unlimited copies on unlimited servers at all our campuses for not much more than Stata charges for one small server. I’m sure they realized that too much Big Data work in academia was using R and they needed to address that.

        I don’t mean to imply that any of these packages are not worth their asking prices. As long as they are able to sell the software, it’s worth the price. But I’m glad open source projects such as R and RapidMiner are there to help drive prices down.

        Cheers,
        Bob

  10. I recently had to return to SPSS for one project after a long period of using R and MS Access. At first I was using the GUI but then found I couldn’t stand the lack of repeatability as the tasks had to be repeated over numerous datasets. Then I moved to syntax and it wasn’t so bad but I now strongly prefer R.

    I also noticed how limited the joining of datasets in SPSS is. I would have thought this feature would be more versatile by now but it looks like the fields to join on still have to be the same name, etc.

    • Bob Muenchen says:

      Hi Justin,

      As you point out, the repeatability factor is very important. I think that’s why so many of the newer packages like RapidMiner, Orange and Knime have adopted the flowchart GUI used by SPSS Modeler and SAS Enterprise Miner. It’s the only non-programming GUI that allows you to use it repeatedly without having to switch to programming. The Red-R GUI for R is like this, but unfortunately its progress seems to have stalled.

      Cheers,
      Bob

  11. Berry says:

    Very interesting, thanks for the analysis!
    I would be delighted to see matlab included the next time…

    • Bob Muenchen says:

      Guten Tag Berry,

      While MATLAB and R have much in common, MATLAB use is dominated by solving engineering problems rather than statistical analysis or data mining. If I could think of a way to split that use out, I would love to do it.

      Tschüss,
      Bob

  12. Masanori Yoshida says:

    Thank you very much for your interesting article.
    I translated your article into Japanese. Please let me know if you are not comfortable with my translation.

  13. Tim Daciuk says:

    Hey Bob,
    Tim Daciuk here; I think that we did a couple of presentations and/or were on a panel together, back “in the day” (when I was part of SPSS Inc). Interesting article and interesting use of forecasting. Certainly the use of R is expanding; mostly due to the cost if R. I think however that to measure trends from a primarily from an academic/scholastic bent may be problematic. If you take. A look at ‘industry’ I think that SPSS and SAS are still the big gorillas in the market and will be for the foreseeable future. I think at this is due to a number of factors: 1) the ‘one throat to choke’ ability of having a company stand behind the product; 2) the end-to-end solutioning that SAS and SPSS offer (as predictive analytics becomes a business integrated function) which is not the with R; the development of vertical applications (mentioned earlier), and; the existing ancillary development and integration network around SPSS and SAS (though this is changing).

    P.S. I tend to rely on the Colbert statistic for a lot of my work!

    • Bob Muenchen says:

      Hi Tim,

      It’s nice to hear from you! I miss those SPSS Directions meetings. IBM priced them out of the academic market. I agree with all your points. I’ll write a new post soon based on job advertisements (mostly corporate) that reinforces your point.

      Cheers,
      Bob

  14. I really enjoyed reading your article and the comments. There is however another bit that would belong into the discussion, that has not yet been mentioned (or I over-read it). Let’s start with a provocative statement: in my books, teaching SAS/SPSS at Universities almost amounts to misappropriation of funds. Let me explain. By spending vast amounts of cash for software deals, this money is then lacking at other places like lab seats or smaller classes. In return, Unis get programs that de-facto vendor lock-in their students (and faculty). A common counter argument is, that Unis need to teach what industry requires. However, I believe that Unis job is to teach knowledge, and not to vendor-lock their students in specific software. If someone understands statistics, learning SAS or SPSS is not that much of a hassle anymore. However, teaching statistics with R, being able to demonstrate how calculations are done and results come about, gives any educator a definite advantage. So I hope that R (or any other free successor, for that matter) will eventually dominate.

    Forgive my tone, but at the moment I’m a little bit disgruntled because I just spent half a year trying to convince some Unis in Serbia to use exclusively R in their newly established statistics program, and failed.

    • Bob Muenchen says:

      Hi Christoph,

      At The University of Tennessee I’m in charge of software licensing for research tools and you’re right, we spend a LOT of money on them. It’s around $350K for research only and well over $1M if we include productivity tools, ERP software and data bases. When it comes to teaching, professors face a tough choices:

      Use what’s free or cheap to save the university money?
      Use what’s easy so students can focus on analytic concepts instead of programming and debugging?
      Use the tool that’s powerful so students will learn maximum flexibility?
      Use the tool that’s most likely to get the students a job?

      Depending on your perspective, each answer may lead to a different product! I hope that as the menus & dialog box GUIs for R improve and employers use R more, that all these could be fulfilled by one package. However, I suspect that SAS and SPSS will be #1 and #2 with employers for many years to come. I’ll have a blog post on that soon with the latest data.

      Cheers,
      Bob

      • tophcito says:

        Hey Bob,
        thank you for sharing the figures of UT. I had not realized that you work there, I had the privilege of being an exchange student to the Bartlett area around Memphis a long time ago. Since then, Rocky Top never fails to increase my heart rate. ;-)

        Back to topic: I totally agree with those hard choices and certainly share your hope of R interfaces improving and thus gaining a bigger market share. They also entirely depend on the audience. For Statistics majors I would go 100% R from day one, complementing it with a scripting language as data retrieval and manipulation becomes an issue.

        When teaching other subjects the choice is less obvious for me, unfortunately. I got social science majors started on R using R Commander with quite some success. However, a key issue there is applying survey weights to data, and here R GUIs don’t help. To my surprise, most of them took quite easily to the command line. And once you get to the point where you tell them that using survey weights correctly in SPSS involves much more than issuing a WEIGHT BY command, they accept R’s solution willingly. Anyway, most social science students end up producing SPSS tables and guessing their meaning. So while SPSS aids them in producing results more quickly, it does not help them to produce correct results, let alone understand them.

        I’ll be looking forward to your post about employer preferences.

        Best,
        Christoph

      • Bob Muenchen says:

        Hi Christoph,

        We had a large class of non-stat majors switch from programming in SAS to clicking in SPSS. There was a real concern that the SPSS approach would let the students be lazy and not learn as much about what the output meant. That happened over in the Statistics Department. Our research support group sees the students years later when it’s thesis or dissertation time. It was clear to us that the SPSS approach allowed the students to learn much MORE about what the statistics meant. With SAS programming, they spent far too much time debugging their programs. I’m sure this had nothing to do with SAS per se, but just the debugging time you’d have with any language.

        However, with stat majors, I agree that they must dive in and learn to program or they’ll never do well on the job.

        Cheers,
        Bob

      • Fr. says:

        Bob, why not Stata? It’s the right middle point between programming and point and click for non stats students.

      • Bob Muenchen says:

        Hi Fr.,

        I’m sure Stata would have done as well. SPSS was chosen by the departments that were requiring the students to take the class. My point was that when non-stat majors have only two stat classes in their entire PhD program, they’ll learn more about statistics if they don’t have to spend that time learning both programming and statistics. I think that would apply when comparing any two decent stat packages, one using programming and the other using a point-and-click GUI. Of course this was not a carefully controlled study, just an observational one. Decent research may well exist that would do a better job of settling that question!

        Cheers,
        Bbo

  15. selva rajan says:

    The problem with open source software is that no one is responsible if there is a crash or a bug in calculation or flaw in crucial machine data analysis. In a paid software there is a company and can be held responsible (well partly atleast) and can be asked to fix the problem. You can’t call a specific person. Business needs service people to service them for what ever runs the company or school or research, so SPSS and SAS will stay on as long as they are paid software. Piracy has made paid software equivalent to open source !, so the adapters and learners of SPSS and SAS are so high that even in future there will hardly be any letup in those two software usage. I can’t see R surging ahead in future, though I will continue to work against the blank looks I get when I say “Why don’t you do the 3×3 matrix data Fisher test in R. SPSS can hadle only 2×2″. The question I get is “what is R?” and later ” Why would you download all those different packages and write a program for it ?”

    • Bob Muenchen says:

      Hi Selva,

      Revolution Analytics is betting that most people will agree with you and pay them for Revolution R Enterprise. Then they can pick up the phone and get immediate support.

      Cheers,
      Bob

  16. Academic Researcher says:

    I was wondering if you took into account the “renaming” of some SPSS products to PASW in and around 2009-2010. It would explain your rapid decrease of SPSS hits in that time period.

    • Bob Muenchen says:

      Hi Researcher,

      Thanks for asking this question. I thought my query included PASW, but I just checked and it did not. I’ll fix that for next year but the impact of it will be small. From it’s peak at 155K hits, SPSS fell to 49.4K. Only 7.7% of the decline since 2009 was due to the exclusion of PASW as a search term.

      Cheers,
      Bob

  17. Pingback: 2014 será o ano do fim do SAS e do SPSS? Não exatamente! | Mineração de Dados

  18. Republicou isso em psicometricae comentado:
    Add your thoughts here… (optional)

  19. Greetings from the University of Tennessee of Chattanooga. Thanks for the article. I completely agree with your prediction that R will level off without a GUI and the 80% of people will not use a code language. That was the revolution behind the development of Windows. Today’s generation is even more anti-code as everything is graphic based.

    • Bob Muenchen says:

      Hi Isaac,

      It’s nice to hear from UTC! I really like the combination of a GUI that writes a program that I can then customize. That way I get an error-free start and as much flexibility as I need. Most just want to point and click though.

      Cheers,
      Bob

  20. shrio says:

    To me this report is quite one sided if we were to look from this perspective:
    1) How much does universities and schools are willing to pay for a environment that encourages learning technologies that are widely used in demanding reality?
    Nowadays universities and schools are looking at profitability and/or cost saving rather than quality. If delivery of a course is on analytics, one may even consider analytics software or even business intelligence software, whichever does the job.

    • Bob Muenchen says:

      Hi Shrio,

      Academia is under pressure to control costs, but I’m not yet aware of any major universities that have stopped licensing SAS, SPSS, or Stata. So far only S-PLUS has been eliminated through the use of R. I suspect that it will be at least 10 years before R eliminates any others.

      Cheers,
      Bob

  21. radjaye says:

    In the full enterprise environment SAS has tools for SAS marketing automation and marketing optimisation…..I doubt r etc can provide anything that comes close to handling these tasks required in a busy marketing or crm environment…SAS is not just about analysing data,but taking things a step further and handling customer lifecycles….

  22. boral says:

    Currently I am a undergraduate student of statistics. When I want to do some analysis, I personally like Minitab due to the GUI ( I have not used SPSS much ). On the other hand I do love R but as it is discussed here the lack of a proper GUI stops many of us to use R whenever we want. For making some simple analysis, Minitab, Excel and perhaps SPSS also are good. While R will really take a lot of labour for simple analysis which is not always good.

  23. Mark says:

    Radjaye is right on with his comment. I liken SAS to public transportation and R to a scooter. It can go some places more quickly than SAS, but cannot carry the load needed to make it truly effective for now. What I have been doing lately is combining the SAS and R worlds. Allowing folks to use the power of SAS, divert to a R module and then port the results back to SAS to continue process. Very interesting and very well-accepted. What I see in the future, is SAS assimilating the entire R concept into their suite. I also think, this will eventually mitigate the costs of SAS downward. All good outcomes.

  24. Mark says:

    My mission will be optimizing R within SAS to maximum benefit. If you would like to collaborate, that would be fine.

  25. I think “R” should release a its software for Tablets as well…it would be a revolutionary step and go beyond all competitors at one go…may be R-android app….i love “R”

    • Bob Muenchen says:

      Hi Ravi,

      One of the odd things about tablets is that they gain simplification by hiding their file system. When I do a single analysis project, I may have files in many formats: Excel, R, SAS, SPSS, Word and LaTeX. I don’t want to go to each app to find the files for that project. I want them to be all in one folder as they would be on a PC, Mac or Linux computer. I don’t know if the iPad/Android tablets will go that direction. Windows tablets have the file manager still, but they get grief from reviewers as not being as easy to use. It will be very interesting to watch what happens in the tablet space!

      Cheers,
      Bob

  26. Why didn’t you include PSPP in the analysis? For a lot of users it will do all they did with SPSS.
    I also wonder if PSPP would lead people to use SPSS afterwards.

    • Bob Muenchen says:

      Hi Hendrik,

      That’s an excellent question. Anyone can download the free SPSS clone amusingly named PSPP from http://www.gnu.org/software/pspp/. I’ve been following the software off and on for many years. Although it offers quite a lot of what SPSS does, its support for analysis of variance (ANOVA) is very weak. It does only oneway ANOVA and it lacks mixed linear models that are widely used here at The University of Tennessee. It also has no multivariate methods outside of factor analysis. If you don’t need those methods, it might be good for you. The price is certainly right!

      Cheers,
      Bob

  27. anilde says:

    For Enterprise people they can use Revolution R, which is the business version of R. The cost is much less than SAS. Also Revolution R can handle very big data, can work from Hadoop cluster and much more. Revolution R gives full support for business. Also if any company wants to switch to R from SAS they convert the code for free. Revolution R is such a software which tweaks R and does computation much faster than the regular R version. Also it can handle really big amounts of data.
    Revolution R is free of cost tor academics and researches.
    Just have a look at it :- http://www.revolutionanalytics.com/

  28. boral1 says:

    Revolution R is really a good option for business people to use R. It is of much less cost than SAS and it can handle very big amount of data. Not only that Revolution R ca work in Hadoop cluster and it has a lot of tweaks than the free R version. The support is extraordinary good and it is free for academics and researchers . If any company wants to switch from SAS to R, then Revolution R converts the codes automatically free of cost.
    Have a look at it here :- http://revolutionanalytics.com/

    • Mark says:

      I use both SAS and R. In fact, I recently worked R into our SAS environment. I agree that SAS is expensive, but if you compare SAS and any form of R together, I don’t see any area where R is superior. Point of fact, from members of our research community, many of the R code they find does not validate correctly. SAS is justified at many sites as the software of choice, for it’s superior capabilities. As R competes more, it will become more expensive. SAS market share is increasing, not decreasing. I am completely software agnostic, this is how I and others perceive it.

      • anilde says:

        Yes SAS is the software of choice in many places and I think this is because of the fact that SAS is older than R. Many people are used to SAS and so they don’t want to trust a newcomer in the analytics field. As for superiority of R over SAS, take these as examples :-
        The graphics produced by R ( with package ggplot2 ) is far superior than that produced by SAS. Also integration of R ( actually Revolution R ) with Hadoop is much more easier to handle with big data sets conveniently at lesser costs. There is no doubt that SAS still dominates the job market, but the growth of R is much more than the growth of SAS. And when big companies like Google use R for their analysis there is something in R – what do you think ?

      • Mark says:

        I find your reasoning very flawed in many respects. If you have seen the release of 9.4 SAS recently, you could never have made those comments. I have seen the R and SAS graphics and I cannot agree with your comment of R superiority. Please view the SAS/Graph package, ODS delivery and the graphics inherent in their statistical procedures. SAS has released a new language that allow for much better data handling in the any relational DB (FEDSQL), and an interface with Hadoop that allows for seamless interaction. They have also added a new language in the Data Step to augment their Hash program that will process data several times faster than R can even imagine. Their tools are reliable, leading-edge, proven, innovative and continuously enhanced. The growth of R is slowing as it is aging in the market. Don’t you think that if R could replace SAS companies would flock to it? I see the opposite happening. SAS has release a very inexpensive version to academia and I have seen a vast increase of SAS individuals graduating. Finally, for every Google I hear in regards to R usage, there are 50 times as many firms using SAS (Financials, Pharmaceuticals, Health Care Research and Providers, Governments, etc.).

  29. Cris says:

    Bob, wonderful article and forum to discuss this topic. You mentioned RapidMiner in one of your earlier posts. I am curious as to what your thoughts are on KDNuggets’ yearly poll on what analytics software is currently being used (http://www.kdnuggets.com/polls/2013/analytics-big-data-mining-data-science-software.html). RapidMiner seems to be rising in popularity and is becoming more of a commercial solution, has some good open source backing, and seems to be providing a good GUI interface layer to a fairly robust backend. It supports many of the newer datamining techniques and seems to be becoming more polished. I currently work for a U.S. telecom and the predictive analytics department I run currently uses SPSS Modeler, but I have been keeping my eye on RapidMiner and R (although I have used neither) as they seem to be rising in popularity in the business community. Thoughts?

    • Bob Muenchen says:

      Hi Cris,

      I think RapidMiner is the most interesting open source analytics software next to R. I like their AGPL approach of making the older versions free while charging for the most recent “commercial” one. I’ve only gone through a couple of tutorials on it, but the interface looks well designed. It’s certainly easier to get started in RapidMiner than R.

      Cheers,
      Bob

  30. Jani Erola says:

    My 2 cents; I teach quant methods in sociology do research using super complicated register data, meaning that easily 95 % of the time invested on a paper goes to data manipulation. Stata is currently way superior in data working to any competition I know. So even if we want to use a method that is only available in R (which is the case quite a few times in fact) the data work is almost always done on Stata. This applies also to quite a few people who would describe themselves as primary R users. This may explain why R and Stata growth goes hand in hand, at least for now. This is the reason also why R can easily be used for teaching statistics relying more on the methods themselves but is not that well suited for teaching how methods are applied in research in practise (I have used both).

    A second advantage of Stata over R is that using PCs Stata’s multicore application does not require any additional steps from a user — a convenience that is not available in R as far as I know.

    I really think the GUI advantage of SPSS is overrated.

    • Bob Muenchen says:

      Hi Jani,

      When I started using R in 2005, I often saw people on Internet forums saying they needed SAS for data management and R for analysis and graphics. But some R gurus said that R could do all those tasks, so long as the data fit into RAM. To see who was right, data management was the first area I studied in R. The result was my workshop Managing Data with R, which summarizes the 160 pages in R for SAS and SPSS Users, 2nd edition. My coverage of the subject in R for Stata Users was somewhat less, at 108 pages. R is not only capable of handling all the data management situations I’ve seen, but it does so with great elegance thanks to Hadley Wickham’s packages plyr and reshape2.

      R has had multi-core support since version 2.14.

      I think Stata is a wonderful package, with a more consistent and extensible language than most other packages. It’s also much easier for a beginner to do a lot with a small amount of Stata know-how. A small amount of R knowledge just leads to frustration.

      Regarding GUIs, I think Stata’s is about as good as SPSS’.

      Cheers,
      Bob

  31. Trang says:

    Check out the free version of SAS Enterprise Guide called SAS® OnDemand for Academics http://www.sas.com/govedu/edu/programs/od_academics.html. SAS provides one-year license at no cost for instructors and students who wish to use SAS. It only works for Windows operating system. There is also free SAS Web Editor – a service that allows you to access SAS over the internet. No need to download. The only drawback I saw so far with the free SAS® OnDemand for Academics is that it is pretty slow due to the server issue.

    • Bob Muenchen says:

      Hi Trang,

      I have no doubt that those free versions are the direct result of competition from R and RapidMiner. I’ve heard they’ve sped it up with bigger servers, but students can only analyze data that the professor has put online. That makes experimenting with your own data frustrating.

      Cheers,
      Bob

      • Mark says:

        We use SAS, Stata, SPSS and R. The new version of SAS 9.4 with additional data languages (DS2, Fedsql, etc.), the upgrades to the interfaces and statistical procedures, puts both Open-Source and Commercial R back some years. We also have a SAS Grid, which increases our Statistical ROI minimally 3X. 75% of the non-SAS entrants migrate to SAS as their primary language. The other Gentlemen is correct; since SAS has given a very low-cost option for Academia, I am seeing a monstrous increase in students with a SAS background. Remember it is a moving target and I think SAS has just raised the pot! If you are comparing contempary R with SAS of even a few years ago, you will be astounded as to the improvements.

  32. me says:

    Hi Mark,

    When I started to learn statistics, I heard about SAS first and R was very distant from me. Then I do all my work with Excel and Minitab , because they were enough for then. But I personally had a strong erge to learn SAS, as my professors told me that it is really a good language. I spent almost half a year to learn SAS in a convenient way. But I don’t find one. The main obstacle was that SAS is not free even for students. The low-cost option that you are saying is not too low for a student who just want to learn SAS out of interest . Also the SAS on demand is not helpful always. Being really frustrated I shifted to R and now is learning it. R is always available at my fingertips while SAS is not. Also Revolution R , as mentioned by boral1 and anilde above is also a very good software and it really has enhanced R a lot. Many things which one cannot do in R can be done in Revolution R. Revolution R is free for academics and is really a good software and I personally will support R ( and also Revolution R ) for this openness for students, which SAS don’t have.

    • Mark says:

      Interesting comment and I understand your position completely. However, I believe that my point was that R and Revolution R are fine products for academia or very small projects. However, in the corporate environment, government applications or any “Real-World” application, it really competes in a very small niche. I use both and others as they apply to my projects. Like most shops that contain many SAS, R, Stata or SPSS, the vast majority (75%) is done in SAS and split between the rest. SAS gave many academic institutions a very-low cost cloud option and the number of grads with SAS experience has increased expotentially in the last few years. I also support R, but as your own professors said, SAS is a very good language to learn via its dominance. Good luck in your studies!

  33. drannmaria says:

    I use SAS to teach and I ONLY use real-world examples – the California Health Interview Survey, Trends in International Math & Science Study and more. It is true that I have to upload the data to the SAS Web Editor, as the professor, but all of our data analysis is done with real data and there is nothing forbidden by SAS. When students did dissertations, they could use SAS as well

    • Bob Muenchen says:

      Hi Drannmaria,

      I like real-world examples, especially when they allow students to solve new problems. That’s usually where SAS Institute draws the line. You’re welcome to use public third-party data as you mentioned, but if you want your students solve real-world problems for companies, SAS won’t allow it (unless they’ve changed their licensing very recently.) I handle the academic contracts for about 30 software vendors and they almost all have that clause. I believe Revolution Analytics is an exception, but I haven’t read their contracts in around five years.

      Cheers,
      Bob

  34. Robin Rappaport says:

    Interesting. I learned SAS in college (in 1979) and with the exception of a brief period where my office was using SPSS have always used SAS for work. We presently support both SAS and R.

    I thought SAS had a program in place that provides SAS free to educational institutions.

    • Bob Muenchen says:

      Hi Robin,

      Yes, SAS Institute does offer the use of SAS on their cloud systems for free. The professor submits data sets and students can analyze those. However, to install the software on their own PC, the school will have to pay for the license. It’s inexpensive per copy, but still runs into tens of thousands of dollars.

      Cheers,
      Bob

  35. Hi a better GUI for R is Tableau Software. you can trial it for free on our website. It needs no programming.

    • Bob Muenchen says:

      Hi David,

      I don’t quite understand your comment. In your web site’s document titled, “Using R and Tableau” it says:

      “Who is this feature intended for?
      This feature is primarily targeted for users who are already proficient at R. It is
      NOT meant for beginners with R. Anyone who wishes to use the new functions
      must first learn how to use R in order to leverage its capabilities in Tableau.”

      Which is correct, your comment or your company’s documentation?

      Cheers,
      Bob

  36. William Caughey says:

    Hello all. First and foremost, what a delightful analysis Bob! I enjoyed the read.

    Now, on to business. Currently, I am a PhD student in Biomedical Informatics and I use both R and SAS. I have seen the work with Bioconductor, which I really enjoy and look forward to the further developments found in the Bioconductor project. It is my understanding that some hospital systems, most notably the Mayo Clinic, have started to use R in their data analysis and management. My question is do you believe the medical sector will ever trend towards using R more frequently than SAS and if so, do you have an estimated time this might occur?

    Thanks!

  37. Bob Muenchen says:

    Hi William,

    In the area of bioinformatics, I would guess that R is already more widely used than SAS due to the wide range of functions that have been added to the Bioconductor project. However, I would expect that in the larger biomedical market that SAS still dominates. Those are both just guesses though. It’s hard enough to get solid data on research as a whole let alone breaking it down by application segment.

    Cheers,
    Bob

  38. Roland Andersson says:

    Bob
    I have used Stata regularly since many years and have also introduced some of my PhD students in this program. I now attend a course in Statistical learning that uses R. I have done my first introduction lesson and am almost lost. There are so many assumptions. Just an example – to define the working directory you can not use backslash (which is the nomal for Windows) but the “/” instead. Toock me 10 minutes to find out by trial and error. And the help file use such difficult neologism or technical terminology that it is non-interpretabel for the uninitiated. Compare
    ?summary in R and help summarize in Stata for instance.
    I am a surgeon and use Stata maybe 3-5 hours a week. I very often need to check out the help file to get things right. Maybe R can work for someone that stays at the keypad 20-30 h/week and is also a professional statistician.
    I have heard so much enthusiastic comments about R so I decided to try it out but am sofar doubtful. I probably can do all I want in Stata.
    Roland E Andersson

    • Bob Muenchen says:

      Hi Roland,

      I heartily agree with your comments. In fact, in my book, “R for Stata Users” I use the help files for displaying your data as an example. For Stata it’s crystal clear: “LIST displays case values for variables in the active data set.” But in the R help file for the similar print function has this cryptic description: “print prints its argument and returns it invisibly (via invisible(X)). It is a generic function, which means new printing methods can be easily added for new classes.” So it prints “invisibly”?? Well no, but you get that impression. You also need to understand classes and methods to know what it’s talking about.

      I provide several other reasons why R is not for occasional users in my post, “Why R is Hard to Learn“.

      Cheers,
      Bob

  39. ROMIO SAHA says:

    I NEED A INFORMATION ON SAS/SPSS WHICH ONE POPULAR & WIDE ANALYSIS BENEFIT IN MARKET RESEARCH (WIDE TOOL,EASY ERROR RECTIFICATION, EASY TO ANALYSE DATA)

    • Bob Muenchen says:

      Hi Romio,

      If you’re looking for software for market research that’s easy to use, I recommend SPSS. That package is very dominant in the field of market research so knowing it is a good thing to have on your resume. It’s also quite easy to learn and use.

      Cheers,
      Bob

  40. Dale Lehman says:

    I am interested in your perceptions on why JMP attracts so little attention. I have been using it for years and find it far easier to teach students to use, easier to use myself, and quite powerful – extending a a number of machine learning methods. My own personal belief is that JMP has been handicapped by being owned by SAS – fear of product cannibalization. But academics have not embraced JMP to the extent I think it deserves. An additional factor in its favor is its extremely attractive academic licensing compared to all of the competition (except R of course).

    • Bob Muenchen says:

      Hi Dale,

      JMP is really nice software. It’s so enjoyable to have all the graphs linked & interactive. I too have been surprised that it has not become one of the top stat packages. All I can guess is there’s so much competition.

      Cheers,
      Bob

  41. Mark Ezzo says:

    This is a highly contrived and restricted study of dubious conclusions at best. The usage of SAS has risen quite dramatically in the academic world since SAS offered a very inexpensive if not free version to schools. I am seeing a huge influx in applicants out of school with SAS and or other analytical package skills (including the ones you mention). I work with > 4000 researchers in the Health Care field and we offer R, SAS, SPSS, Stata, Matlab, and many one-offs. SAS is the OVERWHELMING choice as 78% of the population (average age would be ~38) consume SAS, followed by Stata, SPSS and R. Many of the individuals who used R in school switch to the SAS within a 6 month period. My company is constantly receiving inquiries for SAS talent and we cannot keep up with the demand. I respectfully refute this findings as a bias study to push the usage of R for commercial remuneration.

  42. Pingback: SPSSやSASは終わりの始まりなのか、他 | Scientific-Global.net

  43. What do you think of considering a combination instead of contrasting the different packages? JMP seems to be very user-friendly and could be used for everyday statistical work. In addition it has an interface to R for more complex and/or innovative statistical methods.

    • Bob Muenchen says:

      Hi Winfried,

      I think that’s a great way to use R. JMP is a wonderful package and it has a good interface. Many packages – SAS, SPSS, Statistica – let you do all your work in the main package and then just call R for the thing they don’t yet do. I show the basics of R for such use, plus ways to call R from several other packages here.

      Cheers,
      Bob

      • Winfried Koch says:

        Dear Bob,
        Thanks for your positive feedback and the hint to your valuable document. I intent further exploring the way forward with JMP and R. I have 12 years of experience with JMP but I am just starting with R.
        Best regards
        Winfried

  44. David says:

    Interesting blog post. Thanks for sharing!
    Here is one more way of looking at these data (total hits, and proportion of total hits):

    https://docs.google.com/spreadsheets/d/1hJ2qg8F9G_1xLBiM6BdQnYwlYq2tGOa7PY65Dtx_uTU/pubhtml

    • Bob Muenchen says:

      Hi David,

      Those are nice, thanks!

      Bob

    • Mark Ezzo says:

      Very unscientific with anecdotal results. Conclusions are spurious at best. Any Researchers disagree with that?

      • David says:

        Mark Ezzo,
        Since I didn’t draw any conclusions, I will assume your comment is in response to the blog post rather than my comment even though you quoted me. I would not disagree that there are many significant limitations to his analysis and conclusions, not the least of which is the use of counts over rates, a point my graphs subtlely make for those astute enough to pick up on it. Maybe you could improve your feedback by offering your take on how his analysis could be improved. Better yet, offer actual analysis. If nothing else, his work is a conversation starter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s