Rexer Analytics has released preliminary results showing the usage of various data science tools. I’ve added the results to my continuously-updated article, The Popularity of Data Analysis Software. For your convenience, the new section is repeated below.
Surveys of Use
One way to estimate the relative popularity of data analysis software is though a survey. Rexer Analytics conducts such a survey every other year, asking a wide range of questions regarding data science (previously referred to as data mining by the survey itself.) Figure 6a shows the tools that the 1,220 respondents reported using in 2015.
We see that R has a more than 2-to-1 lead over the next most popular packages, SPSS Statistics and SAS. Microsoft’s Excel Data Mining software is slightly less popular, but note that it is rarely used as the primary tool. Tableau comes next, also rarely used as the primary tool. That’s to be expected as Tableau is principally a visualization tool with minimal capabilities for advanced analytics.
The next batch of software appears at first to be all in the 15% to 20% range, but KNIME and RapidMiner are listed both in their free versions and, much further down, in their commercial versions. These data come from a “check all that apply” type of question, so if we add the two amounts, we may be over counting. However, the survey also asked, “What one (my emphasis) data mining / analytic software package did you use most frequently in the past year?” Using these data, I combined the free and commercial versions and plotted the top 10 packages again in figure 6b. Since other software combinations are likely, e.g. SAS and Enterprise Miner; SPSS Statistics and SPSS Modeler; etc. I combined a few others as well.
In this view we see R even more dominant, with over a 3-to-1 advantage compared to the software from IBM SPSS and SAS Institute. However, the overall ranking of the top three didn’t change. KNIME however rises from 9th place to 4th. RapidMiner rises as well, from 10th place to 6th. KNIME has roughly a 2-to-1 lead over RapidMiner, even though these two packages have similar capabilities and both use a workflow user interface. This may be due to RapidMiner’s move to a more commercially oriented licensing approach. For free, you can still get an older version of RapidMiner or a version of the latest release that is quite limited in the types of data files it can read. Even the academic license for RapidMiner is constrained by the fact that the company views “funded activity” (e.g. research done on government grants) the same as commercial work. The KNIME license is much more generous as the company makes its money from add-ons that increase productivity, collaboration and performance, rather than limiting analytic features or access to popular data formats.
If you found this interesting, you can read about the results of other surveys and several other ways to measure software popularity here.
Is your organization still learning R? I’d be happy to stop by and help. I also have a workshop, R for SAS, SPSS and Stata Users, on DataCamp.com. If you found this post useful, I invite you to follow me on Twitter.