Rexer Analytics has released preliminary results showing the usage of various data science tools. I’ve added the results to my continuously-updated article, The Popularity of Data Analysis Software. For your convenience, the new section is repeated below.
Surveys of Use
One way to estimate the relative popularity of data analysis software is though a survey. Rexer Analytics conducts such a survey every other year, asking a wide range of questions regarding data science (previously referred to as data mining by the survey itself.) Figure 6a shows the tools that the 1,220 respondents reported using in 2015.
We see that R has a more than 2-to-1 lead over the next most popular packages, SPSS Statistics and SAS. Microsoft’s Excel Data Mining software is slightly less popular, but note that it is rarely used as the primary tool. Tableau comes next, also rarely used as the primary tool. That’s to be expected as Tableau is principally a visualization tool with minimal capabilities for advanced analytics.
The next batch of software appears at first to be all in the 15% to 20% range, but KNIME and RapidMiner are listed both in their free versions and, much further down, in their commercial versions. These data come from a “check all that apply” type of question, so if we add the two amounts, we may be over counting. However, the survey also asked, “What one (my emphasis) data mining / analytic software package did you use most frequently in the past year?” Using these data, I combined the free and commercial versions and plotted the top 10 packages again in figure 6b. Since other software combinations are likely, e.g. SAS and Enterprise Miner; SPSS Statistics and SPSS Modeler; etc. I combined a few others as well.
In this view we see R even more dominant, with over a 3-to-1 advantage compared to the software from IBM SPSS and SAS Institute. However, the overall ranking of the top three didn’t change. KNIME however rises from 9th place to 4th. RapidMiner rises as well, from 10th place to 6th. KNIME has roughly a 2-to-1 lead over RapidMiner, even though these two packages have similar capabilities and both use a workflow user interface. This may be due to RapidMiner’s move to a more commercially oriented licensing approach. For free, you can still get an older version of RapidMiner or a version of the latest release that is quite limited in the types of data files it can read. Even the academic license for RapidMiner is constrained by the fact that the company views “funded activity” (e.g. research done on government grants) the same as commercial work. The KNIME license is much more generous as the company makes its money from add-ons that increase productivity, collaboration and performance, rather than limiting analytic features or access to popular data formats.
If you found this interesting, you can read about the results of other surveys and several other ways to measure software popularity here.
Is your organization still learning R? I’d be happy to stop by and help. I also have a workshop, R for SAS, SPSS and Stata Users, on DataCamp.com. If you found this post useful, I invite you to follow me on Twitter.
Bob.
Nice work, as always. Looking forward to release of the complete survey findings.
Looking at results from Rexer’s last round:
— Primary and total usage for R increased as % of respondents
— SPSS/Statistics total usage declined slightly, but remains #2
— RapidMiner collapsed from #3 to #9, total usage down from ~30% to ~20%
— SAS remained constant at ~ 30%, moved up to #3
— Excel Data Mining and Tableau did not appear in previous surveys. I suspect that Rexer has broadened the survey audience.
— Weka and MATLAB hold steady in reported usage, but Tableau and Excel displace them in the ranking
In general, the rankings are remarkably stable from the previous survey.
Your plot of the primary tools makes sense.
Regards,
Thomas
Hi Thomas,
Thanks for the comparison. I started to do that, but the combination of primary tool or not and free or commercial blew my mind.
Cheers,
Bob
How many respondents were there?
Hi Michael,
Excellent question! There were 1,220 respondents from 72 countries. I’ll add that to the text.
Cheers,
Bob
Was Python an option in the survey?
Hi Scott,
Excellent question! Python was handled separately. I hope to add that when the full report is released.
Cheers,
Bob