How Valuable is a #1 Ranking for Analytics Software? Not as Much as You Might Think!

In my never-ending quest to study the Popularity of Data Analysis Software, I recently read the 2013 Edition of the Wisdom of Crowds Business Intelligence Market Study by Dresner Advisory Services, LLC. In it, I found the table below which displays the “Wisdom of Crowds”, or what most people would call survey results.

Dresner-2013-5-20-Tableau
Wisdom of Crowds report showing Tableau with highest score.

As you can see, among high growth business intelligence vendors, Tableau comes out on top with a mean score of 4.40. Not long afterwards, I saw this press release that claimed, “TIBCO Spotfire Named the Leader in ‘High Growth Business Intelligence’ Market Segment. Spotfire Achieves ‘Best in Class’ in Wisdom of Crowds SME Business Intelligence Study.” That can’t be right, can it? I downloaded the report, expecting it to be for 2014. However, it was from the same year as the previous one I had read, 2013, and it contained the following familiar-looking table.

Table 1.
Wisdom of Crowds report showing Spotfire in first place.

To my surprise, Spotfire was now in first place, with Tableau in 3rd. Then I belatedly noticed the “SME” in Tibco’s headline. It turns out that the second report was a based on a subset of the first, selecting responses only from Small and Mid-sized Enterprises (SMEs). While the reports looked identical to me at first, the subset report does include “Small and Mid-sized Enterprises” right in its title. So both company claims are correct once you figure out who is being surveyed.

But what does this mean for the value of a #1 ranking from Dresner? The reports break the 23 vendors down into 5 groups: Titans, Large Established Pure-Play, High Growth, Specialized, and Emerging (they mention Early Stage ones too, but don’t rank them). This approach results in four or five vendors per table. By doing the report twice, once for all respondents and again for SMEs, there are 10 opportunities for companies to be #1 in any given year. In the two Dresner reports from 2013, 7 of the 23 companies are #1. Companies are more likely to purchase distribution rights to reports when they come out looking good. Since Dresner makes a living selling reports, that gives a whole new meaning to the term Business Intelligence!

JMBayes R package (webinar)

A free webinar will provide an introduction to the “JMBayes” R package which provides methods for Joint Modeling of Longitudinal and Time-to-Event Data under a Bayesian Approach.

Webinar Format:

– Introduction to Joint Models and the JMBayes R package
– Live demonstration
– Question and Answer period

Speaker:

– Dimitris Rizopoulos, JMBayes Package Maintainer

For more information on the JMBayes package, please visit this site:

http://cran.r-project.org/package=JMbayes

Please note that in addition to attending from your laptop or desktop computer, you can also attend from a Wi-Fi connected iPhone, iPad, Android phone or Android tablet by installing the GoToMeeting App.

Registration:

https://www3.gotomeeting.com/register/187219462

This event is brought to you by The Orange County R User Group.

R Continues Its Rapid Growth

I’ve just updated the section below from The Popularity of Data Analysis Software. Note that the overall article is still under construction and all the figure numbers have changed from previous versions.

Growth in Capability

The capability of analytics software has grown significantly over the years. It would be helpful to be able to plot the growth of each software package’s capabilities, but such data is hard to obtain. John Fox (2009) acquired it for R’s main distribution site http://cran.r-project.org/. I collected the data for later versions following his method.

Figure 8 shows that the growth in R packages is following a rapid parabolic arc (quadratic fit with R-squared=.998). The right-most point is for version 3.0.2, the last version released in 2013.

Fig_8_CRAN
Figure 8. Number of R packages plotted for each major release of R.

As rapid as this growth has been, these data represent only the main CRAN repository. R does have eight other software repositories, such as the one at http://www.bioconductor.org/ that are not included in this graph. A program run on 4/7/2014 counted 7,364 R packages at all major repositories, 5,323 of which were at CRAN. So the growth curve for the software at all repositories would be roughly 38% higher on the y-axis than the one shown in Figure 8. As with any analysis software, individuals also maintain their own separate collections typically available on their web sites.

To put this astonishing growth in perspective, let us compare it to the most dominant commercial package, SAS. In version, 9.3, SAS contains around 1,200  commands that are roughly equivalent to R functions (procs, functions etc. in Base, Stat, ETS, HP Forecasting, Graph, IML, Macro, OR, QC). R packages contain a median of 5 functions (Rasmus Bååth, 12/2012 personal communication). Therefore R has approximately 36,820 functions compared to SAS’s 1,200. In fact, during 2013 alone, R added more functions/procs than SAS Institute has written in its entire history! That’s 835 packages, counting only CRAN, or around 4,175 functions. Of course these are not perfectly equivalent. Some SAS procedures have many more options to control their output than R functions do. However, R functions can nest inside one another, creating nearly infinite combinations. Also, SAS is now out with version 9.4 and I have not repeated the arduous task of recounting its commands. If SAS Institute would provide the figure, I would be happy to list it here. While the comparison is not perfect, it does provide an interesting perspective on the size and growth rate of R.