My latest update to The Popularity of Data Analysis Software is an attempt to use blog counts to estimate the popularity of analytics software. While I was able to greatly broaden the coverage of packages when studying job data, I made very little progress on the blog measure, adding new coverage for only Python and updating the previous counts for Stata. I post the results here mostly as a request for input from people who may know of more sources of blog lists than I have found so far.
I’ve also updated the jobs data slightly both in the main article and in the background one, How to Search for Data Science Jobs. While the changes to the search algorithm are greatly simplified, but worth reading only by people who are doing their own searches. Rather than jump through hoops to estimate total jobs for each software, I only count those for the main set of search terms. The relative results from the new search algorithm are nearly identical to the previous, more complex one (r = .99).
Here’s the update on blogs:
On Internet blogs, people write about software that interests them, showing how to solve problems and interpreting events in the field. Blog posts contain a great deal of information about their topic, and although it’s not as time consuming as a book to write, maintaining a blog certainly requires effort. Therefore, the number of bloggers writing about analytics software has potential as a measure of popularity or market share. Unfortunately, counting the number of relevant blogs is often a difficult task. General purpose software such as Java, Python, the C language variants and MATLAB have many more bloggers writing about general programming topics than just analytics. But separating them out isn’t easy. The name of a blog and the title of its latest post may not give you a clue that it routinely includes articles on analytics.
Another problem arises from the fact that what some companies would write up as a newsletter, others would do as a set of blogs, where several people in the company each contribute their own blog, but they’re also combined into a single company blog. Statsoft and Minitab offer examples of this. What’s really interesting is not company employees who are assigned to write blogs, but rather volunteers who freely provide their time.
In a few lucky cases, lists of such blogs are maintained, usually by blog consolidators, who combine many blogs into large “metablogs.” All I have to do is find such lists and count the blogs. I don’t attempt to extract the few vendor employees that I know are blended into such lists. However, I skip those lists that are exclusively employee-based (or very close to it). The results are shown in Table 1.
Number Software of Blogs Source R 452 R-Bloggers.com Python 60 SciPy.org SAS 40 PROC-X.com, sasCommunity.org Planet Stata 11 Stata-Bloggers.com
Table 1. Number of blogs devoted to each software package on March 5, 2014, and the source of the data.
R’s 452 blogs is quite an impressive number. For Python, I could only find that list of 60 that were devoted to the SciPy subroutine library. Some of those are likely cover topics besides analytics, but to determine which never cover the topic would be quite time consuming. The 40 blogs about SAS is still an impressive figure given that Stata was the only other software that even garnered a list anywhere. That list is at the vendor itself, StataCorp, but it consists of non-employees except for one.
While searching for lists of blogs on other software, I did find individual blogs that at least occasionally covered a particular topic. However, keeping this list up to date is far too time consuming given the relative ease with which other popularity measures are collected.
If you know of other lists of relevant blogs, please let me know and I’ll add them. If you’re a software vendor CEO reading this, and your company does not build a metablog or at least maintain a list of your bloggers, I recommend taking advantage of this important source of free publicity.