I’m continuing to gather and analyze data to update The Popularity of Data Analysis Software. In this installment I cover the latest employment figures.
Employment is important to us all, so what software skills are employers seeking? A thorough answer to this question would require a time consuming content analysis of job descriptions. However, we can get a rough idea by searching on job advertising sites. Indeed.com is the most popular job search site in the world. As their CEO and co-founder Paul Forster stated, it includes “all the jobs from over 1,000 unique sources, comprising the major job boards – Monster, Careerbuilder, Hotjobs, Craigslist – as well as hundreds of newspapers, associations, and company websites.” I used a program that went there weekly and searched jobs descriptions for keywords such as “SPSS” or “Minitab.” This was repeated during the 2nd, 3rd and 4th weeks of March in 2012 and 2013. (The data were meant to be for the complete two years, but the automated process went awry.)
The abbreviation “SAS” is common in computer storage, so I avoided those by searching for “SAS !SATA !storage !firmware” (the exclamation point represents a logical “not”). I focused on R while avoiding related topics like “R&D” by using “R SAS” or “SAS R”, including each package in the graph. The data for 2013 are presented in Figure 11.
SAS has a very substantial lead in job openings, with SPSS coming in second with just over a quarter of the jobs. R comes in third place with slightly more than half the jobs available for SPSS. Compared to R or Minitab, SAS has over seven times as many jobs available!
Since 2012, job descriptions that included SAS declined by 961 (7.3%) and those containing Minitab declined by 154 (8.7%). Jobs for R increased by 497 (42%) pushing it past Minitab into third place by a slim margin. In fact, all packages except for SPSS and Systat showed significant though much smaller absolute changes (via Holm-corrected paired t-tests (Table 2). Since these comparisons are based on only three data points in each year, I would not put much stock in most of them, but the 48% increase for R is notable.
Given the extreme dominance of SAS, a data analyst would do well to know it unless he or she was seeking a job in a field in which one of the other packages is dominant.
2012 2013 Difference Ratio 1 SAS 13234 12272 -961 0.93 2 SPSS 3299 3289 -10 1.00 3 R 1196 1693 497 1.42 4 Minitab 1769 1615 -154 0.91 5 Stata 842 898 56 1.07 6 JMP 644 619 -25 0.96 7 Statistica 61 71 10 1.17 8 Systat 14 15 1 1.07 9 BMDP 6 10 3 1.53
Table 2. Number of jobs on Indeed.com that list each software in March of 2012 and 2013. Changes are significant for all software except SPSS and Systat.
It’s BMDP. I didn’t even know it was still in use. I spent more time than I want to say at an 059, building input for it.
Though, most of those job sites tend toward corporate, and thus biased toward SAS/SPSS/MiniTab, the latter if you’re into QA stats.
R is still in guerilla mode, by and large. Storm the ramparts!
Hi Robert,
Even though you told me the spelling, I still could not see that I had a typo! I checked and the search string was correct, it’s just the label that’s wrong. I fixed it in the table and added a note to the graph.
Thanks!
Bob
Hi Robert,
Yeah, I’m amazed there are any jobs that list BMDP at this point. They had such a wonderful set of multivariate procedures and could run almost any analysis in 160K (that’s kilobytes, remember those?).
Did you notice that it had the largest percent increase – 53% – although that was only 3 or 4 jobs (there’s some rounding error in the table).
Cheers,
Bob
It sounds like you did the best you could with a terrible (for searching) name like R, but it shocks me that Minitab is as popular as R. My first guess would be that Minitab is more of a floor for stats packages and R must therefore be under-represented. But then I see Stata is also lower than Minitab and it’s a fairly unique term. Guess I just circulate in the wrong circles!
To clarify the “R SAS” remark in the article, do you *only* find “R” when it is adjacent to “SAS”? I’d wonder if ” R, “, ” R.”, and ” R or” might not also be reasonable search terms (assuming “.” is not a wildcard in the search language).
Hi Wayne,
I was also surprised by the popularity of Minitab, but I’ve since talked to many six-sigma people who say that’s the #1 package in that area. Stata is growing rapidly in academia, but in my workshops I rarely find anyone in the corporate world using it.
The name “R” is a curse! I’ve spend insane amounts of time learning how to track it down on different sites. For this job data, it turns out that advertisements almost never ask for only R expertise. They’ll say, “R or SAS” or “R or SPSS” or they might flip the order on them. So the search I do is really extensive, looking for R in combination with all other packages in either order. I look carefully at the results to make sure that I’m getting nearly 100% good hits. R’s Twitter tag is #RStats. How I wish the R-project would rename it to that!
Cheers,
Bob
I “mentored” a MBA, who was anointed the 6-sigma guru (he knew nothing about stats, of course) at a previous company (just the sort of approach that ticked Deming off, also of course). So, yes MiniTab is lingua franca in 6-sigma. My recollection is that the training/cert vendors have widely standardized on MiniTab. Windows, point&click, and such.
Stata has some traction in “development” jobs (World Bank, NGOs). Your stats confirm this is a small niche.
Not sure you need to do a t test. You don’t have a random sample of anything here; it is more like a universe (all the data for specific time periods). So any differences are “real.” I could be wrong, but I think all the information is encapsulated in the absolute size of the difference; nothing is added by determining whether that sized difference would be “significant” if it were from a random sample.
Hi Dolllar,
That’s a good point. I was viewing the ads Indeed.com could get access to as a random sample, which I knew was quite doubtful. It’s a sample, but certainly not random. As their founders state, “We are including all the jobs from over 1,000 unique sources, comprising the major job boards – Monster, Careerbuilder, Hotjobs, Craigslist – as well as hundreds of newspapers, associations, and company websites.” That’s actually more extensive than I had realized when I settled on t-tests, so I probably could have just called it the population of jobs and just reported the raw values.
Cheers,
Bob
Revolutions Analytics just published a post on companies using R, which seems relevant to the topic.
Hi Fr.,
I hadn’t seen that. Thanks!
Cheers,
Bob
Bob, be sure to learn Chinese if you have not done so, then read this post by a Chinese hackeR: http://cos.name/cn/topic/110647 (well, you do not really have to understand Chinese, since the R code almost tells the whole story)
Hi Yihui,
Thanks very much for pointing out this R program to automate the job search as described in my article! A friend of mine did this in UNIX script, a variation of which is described here:
http://librestats.com/2012/04/12/statistical-software-popularity-on-google-scholar/.
I much prefer the R approach of course!
Thanks,
Bob