Rexer Analytics Survey Results

Rexer Analytics has released preliminary results showing the usage of various data science tools. I’ve added the results to my continuously-updated article, The Popularity of Data Analysis Software. For your convenience, the new section is repeated below.

Surveys of Use

One way to estimate the relative popularity of data analysis software is though a survey. Rexer Analytics conducts such a survey every other year, asking a wide range of questions regarding data science (previously referred to as data mining by the survey itself.) Figure 6a shows the tools that the 1,220 respondents reported using in 2015.

Figure 6a. Analytics tools used.
Figure 6a. Analytics tools used by respondents to the Rexer Analytics Survey. In this view, each respondent was free to check multiple tools.

We see that R has a more than 2-to-1 lead over the next most popular packages, SPSS Statistics and SAS. Microsoft’s Excel Data Mining software is slightly less popular, but note that it is rarely used as the primary tool. Tableau comes next, also rarely used as the primary tool. That’s to be expected as Tableau is principally a visualization tool with minimal capabilities for advanced analytics.

The next batch of software appears at first to be all in the 15% to 20% range, but KNIME and RapidMiner are listed both in their free versions and, much further down, in their commercial versions. These data come from a “check all that apply” type of question, so if we add the two amounts, we may be over counting. However, the survey also asked,  “What one (my emphasis) data mining / analytic software package did you use most frequently in the past year?”  Using these data, I combined the free and commercial versions and plotted the top 10 packages again in figure 6b. Since other software combinations are likely, e.g. SAS and Enterprise Miner; SPSS Statistics and SPSS Modeler; etc. I combined a few others as well.

Figure 6b. The percent of survey respondents who checked each package as their primary tool.
Figure 6b. The percent of survey respondents who checked each package as their primary tool. Note that free and commercial versions of KNIME and RapidMiner are combined. Multiple tools from the same company are also combined. Only the top 10 are shown.

In this view we see R even more dominant, with over a 3-to-1 advantage compared to the software from IBM SPSS and SAS Institute. However, the overall ranking of the top three didn’t change. KNIME however rises from 9th place to 4th. RapidMiner rises as well, from 10th place to 6th. KNIME has roughly a 2-to-1 lead over RapidMiner, even though these two packages have similar capabilities and both use a workflow user interface. This may be due to RapidMiner’s move to a more commercially oriented licensing approach. For free, you can still get an older version of RapidMiner or a version of the latest release that is quite limited in the types of data files it can read. Even the academic license for RapidMiner is constrained by the fact that the company views “funded activity” (e.g. research done on government grants) the same as commercial work. The KNIME license is much more generous as the company makes its money from add-ons that increase productivity, collaboration and performance, rather than limiting analytic features or access to popular data formats.

If you found this interesting, you can read about the results of other surveys and several other ways to measure software popularity here.

Is your organization still learning R?  I’d be happy to stop by and help. I also have a workshop, R for SAS, SPSS and Stata Users, on DataCamp.com. If you found this post useful, I invite you to follow me on Twitter.

13 thoughts on “Rexer Analytics Survey Results”

  1. Bob.

    Nice work, as always. Looking forward to release of the complete survey findings.

    Looking at results from Rexer’s last round:

    — Primary and total usage for R increased as % of respondents
    — SPSS/Statistics total usage declined slightly, but remains #2
    — RapidMiner collapsed from #3 to #9, total usage down from ~30% to ~20%
    — SAS remained constant at ~ 30%, moved up to #3
    — Excel Data Mining and Tableau did not appear in previous surveys. I suspect that Rexer has broadened the survey audience.
    — Weka and MATLAB hold steady in reported usage, but Tableau and Excel displace them in the ranking

    In general, the rankings are remarkably stable from the previous survey.

    Your plot of the primary tools makes sense.

    Regards,

    Thomas

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.