<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>r4stats.com &#187; SAS</title>
	<atom:link href="http://r4stats.com/category/sas/feed/" rel="self" type="application/rss+xml" />
	<link>http://r4stats.com</link>
	<description>Analyzing the World of Analytics</description>
	<lastBuildDate>Thu, 23 May 2013 18:35:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='r4stats.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>r4stats.com &#187; SAS</title>
		<link>http://r4stats.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://r4stats.com/osd.xml" title="r4stats.com" />
	<atom:link rel='hub' href='http://r4stats.com/?pushpress=hub'/>
		<item>
		<title>Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?</title>
		<link>http://r4stats.com/2013/05/14/beginning-of-the-end-v2/</link>
		<comments>http://r4stats.com/2013/05/14/beginning-of-the-end-v2/#comments</comments>
		<pubDate>Tue, 14 May 2013 15:21:31 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[R-Project]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=985</guid>
		<description><![CDATA[I recently updated my plots of the data analysis tools used in academia in my ongoing article, The Popularity of Data Analysis Software. I repeat those here and update my previous forecast of data analysis software usage. Learning to use &#8230; <a href="http://r4stats.com/2013/05/14/beginning-of-the-end-v2/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=985&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I recently updated my plots of the data analysis tools used in academia in my ongoing article, <a title="The Popularity of Data Analysis Software" href="http://r4stats.com/articles/popularity/" target="_blank">The Popularity of Data Analysis Software</a>. I repeat those here and update my previous <a title="Will 2015 be the Beginning of the End for SAS and SPSS?" href="http://r4stats.com/2012/05/09/beginning-of-the-end/">forecast</a> of data analysis software usage.</p>
<p>Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is likely to predict what the next generation of analysts will use for years to come. As you can see in Fig. 1, the use of most analytic software is growing rapidly in academia. The only one growing slowly, very slowly, is Statistica.</p>
<div id="attachment_965" class="wp-caption aligncenter" style="width: 650px"><a href="http://r4stats.files.wordpress.com/2012/04/fig_7b_scholarlyimpactlittle61.png"><img class="size-full wp-image-965" alt="Fig_7b_ScholarlyImpactLittle6" src="http://r4stats.files.wordpress.com/2012/04/fig_7b_scholarlyimpactlittle61.png?w=640&#038;h=443" width="640" height="443" /></a><p class="wp-caption-text">Figure 1. The growth of data analysis packages with SAS and SPSS removed.</p></div>
<p>While they remain dominant, the use of SAS and SPSS has been declining rapidly in recent years. Figure 2 plots the same data, adding SAS and SPSS and dropping JMP and Statistica (and changing all colors and symbols!)</p>
<div id="attachment_964" class="wp-caption aligncenter" style="width: 650px"><a href="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig61.png"><img class="size-full wp-image-964" alt="Fig_7a_ScholarlyImpactBig6" src="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig61.png?w=640&#038;h=443" width="640" height="443" /></a><p class="wp-caption-text">Figure 2. Scholarly use of data analysis software with SAS and SPSS added, JMP and Statistica removed.</p></div>
<p>Since Google changes its search algorithm, I recollect all the data every year. Last year&#8217;s plot (below, Fig. 3) ended with the data from 2011 and contained some notable differences. For SPSS, the 2003 data value is quite a bit lower than the value collected in the current year. If the data were not collected by a computer program, I would suspect a data entry error. In addition, the old 2011 data value in Fig. 3 for SPSS showed a marked slowing in the rate of usage decline. In the 2012 plot (above, Fig. 2), not only does the decline <em>not</em> slow in 2011, but both the 2011 and 2012 points continue the sharp decline of the previous few years.</p>
<div id="attachment_643" class="wp-caption aligncenter" style="width: 650px"><a href="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png"><img class="size-full wp-image-643" title="Fig_7a_ScholarlyImpactBig6" alt="" src="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png?w=640&#038;h=640" width="640" height="640" /></a><p class="wp-caption-text">Figure 3. Scholarly use of data analysis software, collected in 2011. Note how different the SPSS value for 2011 is compared to that in Fig. 2.</p></div>
<p style="text-align:center;">Let&#8217;s take a more detailed look at what the future may hold for R, SAS and SPSS Statistics.</p>
<p style="text-align:left;">Here is the data from Google Scholar:</p>
<pre>         R   SAS SPSS   Stata
1995     7  9120 7310      24
1996     4  9130 8560      92
1997     9 10600 11400    214
1998    16 11400 17900    333
1999    25 13100 29000    512
2000    51 17300 50500    785
2001   155 20900 78300    969
2002   286 26400 66200   1260
2003   639 36300 43500   1720
2004  1220 45700 156000  2350
2005  2210 55100 171000  2980
2006  3420 60400 169000  3940
2007  5070 61900 167000  4900
2008  7000 63100 155000  6150
2009  9320 60400 136000  7530
2010 11500 52000 109000  8890
2011 13600 44800  74900 10900
2012 17000 33500  49400 14700</pre>
<p><strong>ARIMA Forecasting</strong></p>
<p>I forecast the use of R, SAS, SPSS and Stata five years into the future using <a href="http://robjhyndman.com/software/forecast/">Rob Hyndman&#8217;s</a> forecast package and the default settings of its auto.arima function. The dip in SPSS use in 2002-2003 drove the function a bit crazy as it tried to see a repetitive up-down cycle, so I modeled the SPSS data only from its 2005 peak onward.  Figure 4 shows the resulting predictions.</p>
<div id="attachment_1001" class="wp-caption aligncenter" style="width: 650px"><a href="http://r4stats.files.wordpress.com/2013/05/forecast.png"><img class="size-full wp-image-1001" alt="Forecast" src="http://r4stats.files.wordpress.com/2013/05/forecast.png?w=640&#038;h=443" width="640" height="443" /></a><p class="wp-caption-text">Figure 4. Forecast of scholarly use of the top four data analysis software packages, 2013 through 2017.</p></div>
<p>The forecast shows R and Stata surpassing SPSS and SAS this year (2013), with Stata coming out on top. It also shows all scholarly use of SPSS and SAS stopping in 2014 and 2015, respectively. Any forecasting book will warn you of the dangers of looking too far beyond the data and above forecast does just that.</p>
<p><strong style="line-height:1.5;">Guestimate Forecasting</strong></p>
<p>So what will happen? Each reader probably has his or her own opinion, here&#8217;s mine. The growth in R&#8217;s use in scholarly work will continue for three more years at which point it will level off at around 25,000 articles in 2015. This growth will be driven by:</p>
<ul>
<li>The continued rapid growth in add-on packages</li>
<li>The attraction of R&#8217;s powerful language</li>
<li>The near monopoly R has on the latest analytic methods</li>
<li>Its free price</li>
<li>The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (IBM is loosening up on this a bit)</li>
</ul>
<p>What will slow R&#8217;s growth is its lack of a graphical user interface that:</p>
<ul>
<li>Is powerful</li>
<li>Is easy to use</li>
<li>Provides direct cut/paste access to journal style output in word processor format</li>
<li>Is standard, i.e. widely accepted as <em>The One to Use</em></li>
<li>Is open source</li>
</ul>
<p>While programming has important advantages over GUI use, many people will not take the time needed to learn to program. Therefore they rarely come to fully understand those advantages. Conversely, programmers seldom take the time to fully master a GUI and so often underestimate its full range of capabilities and its speed of use. Regardless of which is best, GUI users far outnumber programmers and, until resolved, this will limit R&#8217;s long term growth. There are GUIs for R, but with so many to choose from that none becomes the clear leader (Deducer, R Commander, Rattle, at least two from commercial companies and still more <a href="http://en.wikipedia.org/wiki/R_(programming_language)#Graphical_user_interfaces">here</a>.) If from this &#8220;GUI chaos&#8221; a clear leader were to emerge, then R could continue its rapid growth and end up as the most used software.</p>
<p>The use of SAS for scholarly work will continue to decline until it matches R at the 25,000 level. This is caused by competition from R and other packages (notably Stata) but also by SAS Instute&#8217;s self-inflicted GUI chaos.  For years they have offered too many GUIs such as SAS/Assist, SAS/Insight, IML/Studio, the Analyst application, Enterprise Guide, Enterprise Miner and  even JMP (which runs SAS nicely in recent versions). Professors looking to meet student demand for greater ease of use are not sure which GUI to teach, so they continue teaching SAS as a programming language. Even now that Enterprise Guide has evolved into a respectable GUI, many SAS users do not know what it is. If SAS Institute were to completely replace their default Display Manager System with Enterprise Guide, they could bend the curve and end up at a higher level of perhaps 27,000.</p>
<p>The use of SPSS for scholarly work will decline less sharply in 2013 and will level off in in 2015 at around 27,000 articles because:</p>
<ul>
<li>Many of the people who needed advanced methods and were not happy <a title="Calling R from Other Software" href="http://r4stats.com/articles/calling-r/">calling R functions from within SPSS</a> have already switched to R or Stata</li>
<li>Many of the people who like to program and want a more flexible language than SPSS offers have already switched to R or Stata</li>
<li>Many of the people who needed more interactive visualization have already switched to JMP</li>
</ul>
<p>The GUI users will stick with SPSS until a GUI as good (or close to as good) comes to R and becomes widely accepted. At The University of Tennessee where I work, that&#8217;s the great majority of SPSS users.</p>
<p>Although Stata is currently the fastest growing package, it&#8217;s growth will slow in 2013 and level off by 2015 at around 23,000 articles, leaving it in fourth place. The main cause of this will be inertia of users of the established leaders, SPSS and SAS, as well as the competition from all the other packages, most notably R. R and Stata share many strengths and with one being free, I doubt Stata will be able to beat R in the long run.</p>
<p>The other packages shown in Fig. 1 will also level off around 2015, roughly maintaining their current place in the rankings. A possible exception is JMP, whose interface is radically superior to the the others for exploratory analysis. Its use could continue to grow, perhaps even replacing Stata for fourth place.</p>
<p>The future of SAS Enterprise Miner and IBM SPSS Modeler are tied to the success of each company&#8217;s more mainstream products, SAS and SPSS Statistics respectively. Use of those products is generally limited to one university class in data mining, while the other software discussed here is widely used in many classes. Both companies could significantly shift their future by combining their two main GUIs. Imagine a menu &amp; dialog-box system that draws a simple flowchart as you do things. It would be easy to learn and users would quickly get the idea that you could manipulate the flowchart directly, increasing its window size to make more room. The flowchart GUI lets you see the big picture at a glance and lets you re-use the analysis without switching from GUI to programming, as all other GUI methods require. Such a merger could give SAS and SPSS a game-changing edge in this competitive marketplace.</p>
<p>So there you have it: the future of analytics revealed. No doubt each reader has found a wide range of things to disagree with, so I encourage you to do your own forecasts and add links to them in the comment section below. You can use my data or follow the detailed blog at <a href="http://librestats.com/2012/04/12/statistical-software-popularity-on-google-scholar/">Librestats</a> to collect your own. One thing is certain: the coming decade in the field of analytics will be interesting indeed!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/985/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=985&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2013/05/14/beginning-of-the-end-v2/feed/</wfw:commentRss>
		<slash:comments>51</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_7b_scholarlyimpactlittle61.png" medium="image">
			<media:title type="html">Fig_7b_ScholarlyImpactLittle6</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig61.png" medium="image">
			<media:title type="html">Fig_7a_ScholarlyImpactBig6</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png" medium="image">
			<media:title type="html">Fig_7a_ScholarlyImpactBig6</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2013/05/forecast.png" medium="image">
			<media:title type="html">Forecast</media:title>
		</media:content>
	</item>
		<item>
		<title>SAS, SPSS, Stata Users: Learn R from Home June 17</title>
		<link>http://r4stats.com/2013/05/08/r-sas-spss-stata-june/</link>
		<comments>http://r4stats.com/2013/05/08/r-sas-spss-stata-june/#comments</comments>
		<pubDate>Wed, 08 May 2013 14:33:22 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=957</guid>
		<description><![CDATA[Has learning R been driving you a bit crazy? If so, it may be that you&#8217;re &#8220;lost in translation.&#8221; On June 17 and 19, I&#8217;ll be teaching a webinar, R for SAS, SPSS and Stata Users. With each R concept, &#8230; <a href="http://r4stats.com/2013/05/08/r-sas-spss-stata-june/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=957&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://r4stats.files.wordpress.com/2012/05/r-67.jpg"><img class="aligncenter size-full wp-image-941" alt="R--67" src="http://r4stats.files.wordpress.com/2012/05/r-67.jpg?w=640&#038;h=426" width="640" height="426" /></a></p>
<p>Has learning R been driving you a bit crazy? If so, it may be that you&#8217;re &#8220;lost in translation.&#8221; On June 17 and 19, I&#8217;ll be teaching a webinar, R for SAS, SPSS and Stata Users. With each R concept, I&#8217;ll introduce it using terminology that you already know,  then translate it into R&#8217;s very different view of the world. You&#8217;ll be following along, with hands-on practice, so that by the end of the workshop R&#8217;s fundamentals should be crystal clear. The examples we&#8217;ll do come right out of my books, <a title="R for SAS and SPSS Users" href="http://r4stats.com/books/r4sas-spss/" target="_blank">R for SAS and SPSS Users</a> and <a title="R for Stata Users" href="http://r4stats.com/books/r4stata/" target="_blank">R for Stata Users</a>. That way if you need more explanation later or want to dive in more deeply, the book of your choice will be very familiar. Plus, the table of contents and the index contain topics listed by SAS/SPSS/Stata terminology and R terminology so you can use either to find what you need.</p>
<p>A complete outline of the workshop plus a registration link is <a title="R for SAS, SPSS and Stata Users" href="http://r4stats.com/workshops/r4sas-spss-stata/" target="_blank">here</a>. I have no artistic skills, but I&#8217;ve always been amazed at what artists can do. I taught this workshop in Knoxville on April 29, and pro photographer <a href="http://www.stevechastainphotography.com/" target="_blank">Steve Chastain</a> made it look <em>way</em> more exciting than I recall! His view of it is <a href="http://www.youtube.com/watch?v=8HhEkx_T8qs&amp;feature=em-share_video_user" target="_blank">here</a>; turn your speakers up and get ready to boogie!</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/957/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/957/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=957&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2013/05/08/r-sas-spss-stata-june/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/05/r-67.jpg" medium="image">
			<media:title type="html">R--67</media:title>
		</media:content>
	</item>
		<item>
		<title>R&#8217;s 2012 Growth in Capability Exceeds SAS&#8217; All Time Total</title>
		<link>http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/</link>
		<comments>http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/#comments</comments>
		<pubDate>Tue, 19 Mar 2013 20:00:11 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=881</guid>
		<description><![CDATA[by Robert A. Muenchen I&#8217;m slowly gathering all the data needed to update my ongoing article, The Popularity of Data Analysis Software. The section below is the latest installment. Growth in Capability The capability of all the software in this &#8230; <a href="http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=881&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>by Robert A. Muenchen</em></p>
<p>I&#8217;m slowly gathering all the data needed to update my ongoing article, <a title="The Popularity of Data Analysis Software" href="http://r4stats.com/articles/popularity/">The Popularity of Data Analysis Software</a>. The section below is the latest installment.</p>
<p><strong>Growth in Capability</strong></p>
<p>The capability of all the software in this article has grown significantly over the years. It would be helpful to be able to plot the growth of each software package’s capabilities, but such data is hard to obtain. John Fox (2009) acquired it for R’s main distribution site <a href="http://cran.r-project.org/" rel="nofollow">http://cran.r-project.org/</a>. I collected the data for later versions following his method.</p>
<p>Figure 10 shows that the growth in R packages is following a rapid parabolic arc (quadratic fit with R-squared=.995). Early version numbers of R increase by 0.10 while more recent ones increased by 0.01. To make the x-axis consistent, the graph displays simply the numerical order in which the versions were released. The right-most point is for version 2.15.2, the last version released in 2012.</p>
<div>
<dl id="attachment_879">
<dt><a href="http://r4stats.files.wordpress.com/2012/04/fig_10_cran1.png"><img alt="Fig_10_CRAN" src="http://r4stats.files.wordpress.com/2012/04/fig_10_cran1.png?w=640&#038;h=640" width="640" height="640" /></a></dt>
<dd>Figure 10. Number of R packages plotted for each major release of R. The last value on the x-axis represents version 2.15.2, the final release in 2012.</dd>
</dl>
</div>
<p>As rapid as this growth has been, the data in Figure 10 represents only the main CRAN repository. R does have eight other software repositories, such as the one at <a href="http://www.bioconductor.org/" rel="nofollow">http://www.bioconductor.org/</a> that are not included in this graph. A program run on 3/19/2013 counted 6,275 R packages at all major repositories, 4,315 of which were at CRAN. So the growth curve for the software at all repositories would be roughly 30% higher on the y-axis than the one shown in Figure 10. As with any analysis software, individuals also maintain their own separate collections typically available on their web sites.</p>
<p>To put this astonishing growth in perspective, let us compare it to the most dominant commercial package, SAS. In its most recent version, 9.3, SAS offers 100 programming statements, 258 procedures (Base, STAT, ETS, Graph, HP Forecasting, Macro, OR, QC) and 520 SAS functions and call routines, and 314 IML statements, functions and subroutines for a total of 1,192 items that are roughly equivalent to R functions. R packages contain a median of 5 functions (Rasmus Bååth, 12/2012 personal communication). Therefore R has approximately 31,375 functions compared to SAS&#8217; 1,192. <em>In fact, during 2012 alone, R added more functions/procs than SAS Institute has provided in its entire history!</em> That&#8217;s 701 packages, counting only CRAN, or around 3,505  new functions in 2012.</p>
<p>Of course these R functions and SAS procedures / functions are not perfectly equivalent. Some SAS procedures have many more options to control their output than R functions do, giving them potentially more output per command. However, R functions can nest inside one another, creating nearly infinite combinations of output. While the comparison is not perfect, it is certainly an eye opener.</p>
<p>Stay tuned for future updates which will include what employers are now advertising for and recent trends in academic use of analytic software.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/881/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/881/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=881&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_10_cran1.png" medium="image">
			<media:title type="html">Fig_10_CRAN</media:title>
		</media:content>
	</item>
		<item>
		<title>R Tackles Big Garbage</title>
		<link>http://r4stats.com/2013/03/01/r-tackles-big-garbage/</link>
		<comments>http://r4stats.com/2013/03/01/r-tackles-big-garbage/#comments</comments>
		<pubDate>Fri, 01 Mar 2013 13:08:09 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=890</guid>
		<description><![CDATA[April 1, 2013 – Although the capabilities of the R system for data analytics have been expanding with impressive speed, it has heretofore been missing important fundamental methods. A new function works with the popular plyr package to provide these missing &#8230; <a href="http://r4stats.com/2013/03/01/r-tackles-big-garbage/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=890&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>April 1, 2013 – Although the capabilities of the R system for data analytics have been expanding with <a title="R's Rapid 2012 Growth" href="http://bit.ly/r-2012-growth" target="_blank">impressive speed</a>, it has heretofore been missing important fundamental methods. A new function works with the popular <a title="Plyr Article in JSS" href="http://www.jstatsoft.org/v40/i01/paper">plyr package</a> to provide these missing algorithms. Function names in plyr begin with two letters which indicate their input and output. For example, with the ddply function, the first “d” in its name indicates that a data frame will be read in, and the second “d” indicates that a data frame of results will be written out. Those two letters could also be “a” for array and “l” for list, in any combination.</p>
<p>While the vast array of functions in R cover most data analysis situations, they have been completely unable to handle data that bears no actual relationship to the research questions at hand. Robert A. Muenchen, author of <a href="http://www.amazon.com/gp/product/1461406846/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1461406846&amp;linkCode=as2&amp;tag=r4statscom-20" target="_blank"><i>R for SAS and SPSS Users</i></a>, has written a new ggply function, which can adroitly handle the all too popular “garbage in, garbage out” research situation. The function has only one argument, the garbage to analyze. It automatically performs the analysis strongly preferred by “gg” researchers by splitting numeric variables at the median and performing all possible cross tabulations and chi-square tests, repeated for the levels of all factors. The integration of functions from the new <a href="http://thirteen-01.stat.iastate.edu/snoweye/pbdr/" target="_blank">pbdR package</a> allows ggply to handle even Big Garbage using <a title="R at 12,000 Cores" href="http://bit.ly/R12000" target="_blank">12,000 cores</a>.</p>
<p>While the median split approach offers the benefit of <a title="The Cost of Dichotomization" href="http://www.unc.edu/~rcm/psy282/cohen.1983.pdf">decreasing power by 33%</a>, further precautions are taken by applying Muenchen&#8217;s new <i>Triple <a title="Bonferroni Correction" href="http://en.wikipedia.org/wiki/Bonferroni_correction">Bonferroni </a>with Backpropagation</i> correction. This algorithm controls the <a title="Family-wise Error Rate" href="http://en.wikipedia.org/wiki/Familywise_error_rate">garbage-wise error rate</a> by multiplying the p-values by 3k, where k is the number of tests performed. While most experiment-wise adjustment calculations set the worst case p-value to the theoretical upper limit of 1.0, simulations run by Muenchen indicate that this is far too liberal for this type analysis. “By removing this artificial constraint, I have already found cases where the final p-value was as high as 3,287 indicating a very, very, very non-significant result” reported Muenchen. The “backpropogation” part of the method re-scales any p-values that might have survived the initial correction by setting them automatically to 0.06. As Muenchen states, “this level was chosen to protect the researcher from believing an actual useful result was found, while offering hope that achieving tenure might still be possible.”</p>
<p>Reaction from the R community was swift and enthusiastic. Bill Venables, co-author the popular book<a href="0px !important;&quot; /&gt;" target="_blank"><i> Modern Applied Statistics in S</i></a> said, “Muenchen’s new approach for calculating <a title="Exegeses on Linear Models" href="http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf" target="_blank">Type III Sums of Squares</a> from chi-squared tests finally puts my mind at ease about using R for statistical analysis.” R programmer extraordinaire Patrick Burns said, “The ggply function is good, but what really excites me is the VBA plugin Bob wrote for <a title="Spreadsheet Addiction" href="http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/" target="_blank">Excel</a>. Now I can fully integrate ggply into my workflow.” Graphics guru Hadley Wickham, author of <a href="http://www.amazon.com/gp/product/0387981403/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0387981403&amp;linkCode=as2&amp;tag=r4statscom-20" target="_blank"><em>ggplot2: Elegant Graphics for Data Analysis</em></a> grumbled, “After writing ggplot and ddply, I’m stunned that I didn’t think of ggply myself. That Muenchen fellow is constantly bugging me to add irritating new features to my packages. I have to admit though that this is breakthrough of epic proportions. As they say in Muenchen’s neck of the woods, even a <a title="Blind Squirrel" href="http://www.amazon.com/gp/product/B007LUN414/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B007LUN414&amp;linkCode=as2&amp;tag=r4statscom-20" target="_blank">blind squirrel</a> finds a nut now and then.”</p>
<p>The SAS Institute, already concerned with competition from R, reacted swiftly. SAS CEO Jim Goodnight said, “SAS is the leader in Big Data, and we’ll soon catch up to R and become the leader in Big Garbage as well. PROC GGPLY, is already in development. It will be included in SAS/GG, which is, of course, an additional cost product.”</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/890/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/890/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=890&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2013/03/01/r-tackles-big-garbage/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>
	</item>
		<item>
		<title>What Analytic Software are People Discussing?</title>
		<link>http://r4stats.com/2013/02/12/what-analytic-software-are-people-discussing/</link>
		<comments>http://r4stats.com/2013/02/12/what-analytic-software-are-people-discussing/#comments</comments>
		<pubDate>Tue, 12 Feb 2013 16:24:32 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=861</guid>
		<description><![CDATA[by Robert A. Muenchen How can we measure the popularity or market share of analytic software? One way is to see what people are discussing. I&#8217;m in the process of updating my annual article, The Popularity of Data Analysis Software. Below &#8230; <a href="http://r4stats.com/2013/02/12/what-analytic-software-are-people-discussing/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=861&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>by Robert A. Muenchen</em></p>
<p>How can we measure the popularity or market share of analytic software? One way is to see what people are discussing. I&#8217;m in the process of updating my annual article, <a title="The Popularity of Data Analysis Software" href="http://r4stats.com/articles/popularity/"><em>The Popularity of Data Analysis Software</em></a>. Below is the newly updated Internet Discussion section. Don&#8217;t bother to read the rest of the main article unless you&#8217;re in a hurry. I&#8217;ve been collecting data for several of the other more interesting plots and will have more to report in following posts. As always, I&#8217;m very interested in getting feedback. If you know of other discussion forums that I can collect data on without too much effort, please let me know. <strong>Internet Discussion </strong> There are some stable and objective measures regarding analytic software. Schwartz (2009) suggested estimating relative popularity by plotting the amount of email discussion devoted to each. The most widely used packages all have discussion lists, or &#8220;listservs&#8221; devoted to them. The less popular ones either do not have such discussions or, like the lists for Minitab or S-PLUS, may have only a dozen or so emails per year. Some software packages have multiple discussion lists. For example, there are 21 devoted to using R for various focused areas such as  graphics, mapping, ecology, epidemiology, etc. (<a href="http://www.r-project.org/mail.html" rel="nofollow">http://www.r-project.org/mail.html</a>). A broader list, including a version of R-Help in Spanish, lists 49 discussions (<a href="https://stat.ethz.ch/mailman/listinfo" rel="nofollow">https://stat.ethz.ch/mailman/listinfo</a>). Figure 1a shows the level of activity on only each main discussion listserv in a typical month (i.e. forums, news groups and Google groups are excluded). Each point represents the sum of the 12 monthly counts that occurred in that year. This plot contains data through the end of 2012. If you read this article in previous years, this plot used to display the mean number of emails per month rather than the sum. Therefore the scale of the <em>y</em>-axis is different but the relative locations of the points are virtually identical. I made this change to enable better a better comparison to discussion forums (e.g. Fig. 1b).</p>
<div>
<dl id="attachment_855">
<dt><a href="http://r4stats.files.wordpress.com/2012/04/fig_1a_listserv11.png"><img alt="Fig_1a_Listserv" src="http://r4stats.files.wordpress.com/2012/04/fig_1a_listserv11.png?w=600&#038;h=400" width="600" height="400" /></a></dt>
<dd>Figure 1a. Sum of monthly email traffic on each software&#8217;s main listserv discussion list.</dd>
</dl>
</div>
<div>
<p>We can see that discussion of R has grown the most rapidly and, for the past few years, R is the most discussed software by an almost two-to-one margin. In recent years, it is followed by Stata, SAS and SPSS, respectively. Stata showed steady discussion growth until it passed SAS in 2010. SAS saw rapid growth in its discussion until 2006 when it leveled off and then declined. That decline coincided with the strong growth of both R and Stata, offering competition to SAS. SPSS held steady at a low rate across the time frame, which may be attributable to its great ease of use relative to the other packages. With both the interface and the documentation aimed at people who prefer GUIs over programming, there&#8217;s less need to ask how to do variations on an analysis. In fact, there&#8217;s less <em>ability</em> to do such variations. As a result, I doubt SPSS&#8217; low showing in this graph is indicative of its popularity or market share. It would be interesting to see what topics were most discussed on each list. The only such analysis of which I am aware was done by Arthur Tabachnek (2010) for the SAS list. The most popular topic in 2009 turned out to be&#8230;R! You can read his full analysis <a href="http://www.sascommunity.org/wiki/SAS-L_BOF" target="_blank" rel="nofollow">here</a> under <em>slides from the 2010 session.</em> In the last year or two, R and Stata joined SAS in the decline in listserv discussion. Given the sharp increase in the popularity of business analytics, Big Data, and so on, it is unlikely that people are using or talking about these tools less. Instead, alternative forums of discussion have appeared. The site Stack Overflow (<a href="http://stackoverflow.com" rel="nofollow">http://stackoverflow.com</a>) covers a wide range of programming and statistical topics, while its sister site, Cross Validated (<a href="http://stats.stackexchange.com/" rel="nofollow">http://stats.stackexchange.com/</a>), focuses only on statistical analysis. A third site, Talk Stats (<a href="http://www.talkstats.com" rel="nofollow">http://www.talkstats.com</a>), also focuses on statistical analysis. At all three sites, users tag their topics making it particularly easy to focus searches. Figure 1b shows the software people are discussing there.</p>
<div>
<dl id="attachment_841">
<dt><a href="http://r4stats.files.wordpress.com/2012/04/fig_1b_forums.png"><img alt="Figure 1b. Number of posts on each forum on 2/10/2013." src="http://r4stats.files.wordpress.com/2012/04/fig_1b_forums.png?w=640&#038;h=640" width="640" height="640" /></a></dt>
<dd>Figure 1b. Number of posts per software on each forum on 2/10/2013.</dd>
</dl>
</div>
<p>We can see that the discussion of R is dramatically higher than the other packages, which don&#8217;t differ very much. Much of this difference is due to the influence of Stack Overflow, reflecting the vastly greater popularity of R as a programming language. However, even removing that effect, it is easy to see that R still dominates the discussions on the more statistically-oriented forums.  This data is cumulative, but it would be very interesting to see how it grew by year. Without access to such data, at least we have the data in Fig. 1a to give us a feel for history.</p>
</div>
<p>Other popular discussion forum sites are LinkedIn.com and Quora.com. Neither of these sites make it easy to count number of posts, but they do display the number of people who have joined discussion groups (Figure 1c).</p>
<div>
<dl id="attachment_858">
<dt><a href="http://r4stats.files.wordpress.com/2012/04/fig_1c_forum_groups.png"><img alt="Fig_1c_Forum_Groups" src="http://r4stats.files.wordpress.com/2012/04/fig_1c_forum_groups.png?w=640&#038;h=640" width="640" height="640" /></a></dt>
<dd>Figure 1c. Number of people registered in the main discussion group for each software.</dd>
</dl>
</div>
<p>In Figure 1c we get a better view of corporate software use. I do not know the ratio of corporate to academic use of LinkedIn, but among the academics I do know (quiet a few) they use it very little. In this world, SAS is the leader with R close behind. It&#8217;s interesting to see SPSS with a 50% lead over Stata; it was also slightly higher in Fig. 1b. Remember these are people who have joined a group, not necessary people who are talking as the previous two figures were. Still, group membership should be a reasonable proxy for popularity or market share. In the coming weeks, I&#8217;ll be updating the data on which software scholars are using, the growth of R packages and what skills employers are seeking in their new hires.</p>
<p><em>Copyright 2013, Robert A. Muenchen</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/861/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/861/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=861&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2013/02/12/what-analytic-software-are-people-discussing/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_1a_listserv11.png" medium="image">
			<media:title type="html">Fig_1a_Listserv</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_1b_forums.png" medium="image">
			<media:title type="html">Figure 1b. Number of posts on each forum on 2/10/2013.</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_1c_forum_groups.png" medium="image">
			<media:title type="html">Fig_1c_Forum_Groups</media:title>
		</media:content>
	</item>
		<item>
		<title>R for SAS, SPSS, Stata Users Workshop Redesigned</title>
		<link>http://r4stats.com/2012/10/02/workshop-redesigned/</link>
		<comments>http://r4stats.com/2012/10/02/workshop-redesigned/#comments</comments>
		<pubDate>Tue, 02 Oct 2012 13:02:56 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=779</guid>
		<description><![CDATA[My workshop R for SAS, SPSS and Stata Users has been popular over the years, but it&#8217;s time for an overhaul. A common request has been to simplify it, so I have moved data management to a separate 4-hour workshop, &#8230; <a href="http://r4stats.com/2012/10/02/workshop-redesigned/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=779&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>My workshop <a title="R for SAS,SPSS and Stata Users Workshop" href="http://bit.ly/r4sasWorkshop" target="_blank"><em>R for SAS, SPSS and Stata Users</em></a> has been popular over the years, but it&#8217;s time for an overhaul. A common request has been to simplify it, so I have moved data management to a separate 4-hour workshop, <a title="Managing Data with R Workshop" href="http://bit.ly/ManagingDataR" target="_blank"><em>Managing Data with R</em></a>. This makes it much easier to absorb the basics in the remaining two 4-hour sessions. When you&#8217;re ready for more, you can take the other workshop which I&#8217;ll be offering several time per year. Detailed course outlines are available at the workshop links above and at the <a title="Revolution Analytics Training" href="http://bit.ly/RevoTraining" target="_blank">Revolution Analytics</a> web site.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/779/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/779/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=779&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2012/10/02/workshop-redesigned/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>
	</item>
		<item>
		<title>SAS Beats R on July 2012 TIOBE Rankings</title>
		<link>http://r4stats.com/2012/07/10/sas-beats-r/</link>
		<comments>http://r4stats.com/2012/07/10/sas-beats-r/#comments</comments>
		<pubDate>Tue, 10 Jul 2012 13:43:40 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=743</guid>
		<description><![CDATA[The TIOBE Community Programming Index ranks the popularity of programming languages, but from a programming language perspective rather than as analytical software (http://www.tiobe.com). It extracts measurements from blogs, entries in Wikipedia, books on Amazon, search engine results, etc. and combines them into a single index. &#8230; <a href="http://r4stats.com/2012/07/10/sas-beats-r/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=743&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html" target="_blank">TIOBE Community Programming Index</a> ranks the popularity of programming languages, but from a programming language perspective rather than as analytical software (<a href="http://www.tiobe.com" rel="nofollow" target="_blank">http://www.tiobe.com</a>). It extracts measurements from blogs, entries in Wikipedia, books on Amazon, search engine results, etc. and combines them into a single <a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/tpci_definition.htm">index</a>. The July 2012 rankings place SAS in 24th place and R in 28th. This is a reversal from the January rankings, which had R in 24th place and SAS at 31st.</p>
<p>The <a href="http://lang-index.sourceforge.net/#categ" target="_blank">Transparent Language Popularity Index</a> is very similar to the TIOBE Index except that, as you might guess, its ranking software, algorithm and data are published for all to see. I didn’t find this index until July of 2012 at which time it ranks R in 12th place and SAS in 25th.</p>
<p>I have updated this information in my ongoing article, <a href="r4stats.com/popularity">The Popularity of Data Analysis Software</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/743/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/743/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=743&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2012/07/10/sas-beats-r/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>
	</item>
		<item>
		<title>Why R is Hard to Learn</title>
		<link>http://r4stats.com/2012/06/13/why-r-is-hard-to-learn/</link>
		<comments>http://r4stats.com/2012/06/13/why-r-is-hard-to-learn/#comments</comments>
		<pubDate>Wed, 13 Jun 2012 12:40:29 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=734</guid>
		<description><![CDATA[The open source R software for analytics has a reputation for being hard to learn. It certainly can be, especially for people who are already familiar with similar packages such as SAS, SPSS or Stata. Training and documentation that leverages &#8230; <a href="http://r4stats.com/2012/06/13/why-r-is-hard-to-learn/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=734&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The open source R software for analytics has a reputation for being hard to learn. It certainly can be, especially for people who are already familiar with similar packages such as SAS, SPSS or Stata. Training and documentation that leverages their existing knowledge and points out where their previous knowledge is likely to mislead them can save much of frustration. This is the approach used in my books, <a title="R for SAS and SPSS Users" href="http://r4stats.com/books/r4sas-spss/">R for SAS and SPSS Users </a>and <a title="R for Stata Users" href="http://r4stats.com/books/r4stata/">R for Stata Users </a>as well as the <a title="Workshops" href="http://r4stats.com/workshops/">workshops</a> that are based on them. My next Internet-based workshop starts <a title="R for SAS, SPSS and Stata Users" href="http://r4stats.com/workshops/r4sas-spss-stata/">June 26</a>.</p>
<p>Here is a list of complaints about R that I commonly hear from people learning it. In the comments section below, I’d like to hear about things that drive you crazy about R.</p>
<p><strong>Misleading Function or Parameter Names (data=, sort, if)</strong></p>
<p>The most difficult time people have learning R is when functions don’t do the “obvious” thing. For example when sorting data, SAS, SPSS and Stata users all use commands appropriately named “sort.” Turning to R they look for such a command and, sure enough, there’s one named exactly that. However, it does not sort data sets! Instead it sorts individual variables, which is often a very dangerous thing to do. In R, the “order” function sorts data sets and it does so in a somewhat convoluted way. However there are add-on packages that have sorting functions that work just as SAS/SPSS/Stata users would expect.</p>
<p>Perhaps the biggest shock comes when the new R user discovers that sorting is often not even needed by R. When other packages require sorting before they can do three common tasks:</p>
<ol>
<li>Summarizing / aggregating data</li>
<li>Repeating an analysis for each group (“by” or “split file” processing)</li>
<li>Merging files by key variables</li>
</ol>
<p>R does not need to sort files before any of these tasks! So while sorting is a very helpful thing to be able to do for other reasons, R does not require it for these common situations.  </p>
<p><strong>Nonstandard Output</strong></p>
<p>R’s output is often quite sparse. For example, when doing crosstabulation, other packages routinely provide counts, cell percents, row/column percents and even marginal counts and percents. R’s built-in table function (e.g. table(a,b)) provides only counts. The reason for this is that such sparse output can be readily used as input to further analysis. Getting a bar plot of a crosstabulation is as simple as barplot( table(a,b) ). This piecemeal approach is what allows R to dispense with separate output management systems such as SAS’ ODS or SPSS’ OMS. However there are add-on packages that provide more comprehensive output that is essentially identical to that provided by other packages.</p>
<p><strong>Too Many Commands</strong></p>
<p>Other statistics packages have relatively few analysis <em>commands</em> but each of them have many <em>options</em> to control their output. R’s approach is quite the opposite which takes some getting used to. For example, when doing a linear regression in SAS or SPSS you usually specify everything in advance and then see all the output at once: equation coefficients, ANOVA table, and so on. However, when you create a model in R, one command (summary) will provide the parameter estimates while another (anova) provides the ANOVA table. There is even a command “coefficients” that gets only that part of the model. So there are more commands to learn but fewer options are needed for each.</p>
<p>R&#8217;s commands are also consistent, working across all the modeling types that they might apply to. For example the “predict” function works the same way for all types of models that might make predictions.</p>
<p><strong>Sloppy Control of Variables</strong></p>
<p>When I learned R, it came as quite a shock that in a single analysis you can include variables from multiple data sets. That usually requires that the observations be in identical order in each data set. Over the years I have had countless clients come in to merge data sets that they thought had observations in the same order, but were not! It’s always safer to merge by key variables (like ID) if possible. So by enabling such analyses R seems to be asking for disaster. I still recommend merging files when possible by key variables before doing an analysis.</p>
<p>So why does R allow this “sloppiness”? It does so because it provides very useful flexibility. For example, might plot regression lines of variable X against variable Y for each of three groups on the same plot. Then you can add group labels directly onto the graph. This lets you avoid a legend that makes your readers look back and forth between the legend and lines. The label data would contain only three variables: the group labels and the coordinates at which you wish them to appear. That’s a data set of only 3 observations so merging that with the main data set makes little sense.</p>
<p><strong>Loop-a-phobia</strong></p>
<p>R has loops to control program flow, but people (especially beginners) are told to avoid them. Since loops are so critical to applying the same function to multiple variables, this seems strange. R instead uses the “apply” family of functions. You tell R to apply the function to either rows or columns. It’s a mental adjustment to make, but the result is the same.</p>
<p><strong>Functions That Act Like Procedures</strong></p>
<p>Many other packages, including SAS, SPSS and Stata have <em>procedures</em> or <em>commands</em> that do typical data analyses which go “down” through all the observations. They also have <em>functions</em> that usually do a single calculation across rows, such as taking the mean of some scores for each observation in the data set. But R has only functions and those functions can do both. How does it get away with that? Functions may have a preference to go down rows or across columns but for many functions you can use the “apply” family of functions to force then to go in either direction. So it’s true that in R, functions act like procedures <em>and</em> functions. Coming from other software, that’s a wild new idea.</p>
<p><strong>Naming and Renaming Variables is Way Too Complicated</strong></p>
<p>Often when people learn how R names and renames its variables they, well, freak out. There are <em>many</em> ways to name and rename variables because R stores the names as a character variable. Think of all the ways you know how to fiddle with character variables and you’ll realize that if you could use them all to name or rename variables, you have <em>way</em> more flexibility than the other data analysis packages. However, how long did it take you to learn all those tricks? Probably quite a while! So until someone needs that much flexibility, I recommend simply using R to read variable names from the same source as you read the data. When you need to rename them, use an add-on package that will let you do so in a style that is similar to SAS, SPSS or Stata. An example is <a title="Data Management" href="http://r4stats.com/examples/data-management/">here</a>. You can convert to R&#8217;s built-in approach when you need more flexibility. </p>
<p><strong>Inability to Analyze Multiple Variables</strong></p>
<p>One of the first functions beginners typically learn is mean(X). As you might guess, it gets the mean of the X variable’s values. That&#8217;s simple enough. It also seems likely that to get the mean of two variables, you would just enter mean(X, Y). However that’s wrong because functions in R typically accept only single objects. The solution is to put those two variables into a single object such as a data frame: mean( data.frame(x,y) ). So the generalization you need to make isn’t from one variable to multiple variables, but rather from one object (a variable) to another (a data set). Since other software packages are not object oriented, this is a mental adjustment people have to make when coming to R from other packages. (Note to R gurus: I could have used colMeans but it does not make this example as clear.)</p>
<p><strong>Poor Ability to Select Variable Sets</strong></p>
<p>Most data analysis packages allow you to select variables that are next to one another in the data set (e.g. A&#8211;Z or A TO Z). R generally lacks this useful ability. It does have a “subset” function that allows the form A:Z, but that form works only in that function. There are many various work-arounds for this problem but most do seem rather convoluted compared to other software. Nothing’s perfect!</p>
<p><strong>Too Much Complexity</strong></p>
<p>People complain that R has too much complexity overall compared to other software. This comes from the fact that you can start learning software like SAS and SPSS with relatively few commands: the basic ones to read and analyze data. However when you start to become more productive you then have to learn whole new languages! To help reduce repitition in your programs you&#8217;ll need to learn the macro language. To use the output from one procedure in another, you&#8217;ll need to learn an output management system like SAS ODS or SPSS OMS. To add new capabilities you need to learn a matrix language like SAS IML, SPSS Matrix or Stata Mata. Each of these languages has its own commands and rules. There are also steps for tranferring data or parameters from one language to another. R has no need for that added complexity because it integrates all these capabilities into R itself. So it&#8217;s true that beginners have to see more complexity in R. Howevever, as they learn more about R, they begin to realize that there is actually less complexity and more power in R!</p>
<p><strong>Lack of Graphical User Interface (GUI)</strong></p>
<p>Like most other packages R’s full power is only accessible through programming. However unlike the others, it does not offer a standard GUI to help non-programmers do analyses. The two which are most like SAS, SPSS and Stata are <a href="http://socserv.mcmaster.ca/jfox/Misc/Rcmdr" target="_blank">R Commander</a> and <a href="http://www.deducer.org" target="_blank">Deducer</a>. While they offer enough analytic methods to make it through an undergraduate degree in statistics, they lack control when compared to a powerful GUI such as those used by SPSS or JMP. Worse, beginners must initially see a programming environment and then figure out how to find, install, and activate either GUI. Given that GUIs are aimed at people with fewer computer skills, this is a problem.</p>
<p><strong>Conclusion</strong></p>
<p>Most of the issues described above are misunderstandings caused by expecting R to work like other software that the person already knows. What examples like this have you come across?</p>
<p><strong>Acknowledgements</strong></p>
<p>Thanks to Patrick Burns and Tal Galili for their suggestions that improved this post.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/734/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/734/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=734&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2012/06/13/why-r-is-hard-to-learn/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>
	</item>
		<item>
		<title>Poll Shows Open Source Almost Even with Commercial Analytics Software</title>
		<link>http://r4stats.com/2012/05/31/open-source-almost-even/</link>
		<comments>http://r4stats.com/2012/05/31/open-source-almost-even/#comments</comments>
		<pubDate>Thu, 31 May 2012 15:10:08 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=717</guid>
		<description><![CDATA[The 2012 results of the annual KDnuggets poll are in. It shows R in first place with 30.7% of users reporting having used it for a real project. Excel is almost as popular. It seems out of place among so &#8230; <a href="http://r4stats.com/2012/05/31/open-source-almost-even/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=717&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The 2012 results of the annual <a href="http://www.kdnuggets.com/2012/05/top-analytics-data-mining-big-data-software.html">KDnuggets poll</a> are in. It shows R in first place with 30.7% of users reporting having used it for a real project. Excel is almost as popular. It seems out of place among so many more capable packages, but Excel is a tool that almost everyone has and knows how to use.</p>
<p>It’s interesting to note that four of the top five packages used were open source. While open source packages are clearly playing a major role in analytics, people still reported using more commercial software (1086) than open source (927).</p>
<p>For many other ways to measure analytic software popularity, see <a href="http://r4stats.com/articles/popularity/">The Popularity of Data Analysis Software</a>. I&#8217;ve just added this graph to that article.</p>
<p><a href="http://r4stats.files.wordpress.com/2012/04/fig_4_kdnuggets1.png"><img title="Fig_4_KDnuggets" src="http://r4stats.files.wordpress.com/2012/04/fig_4_kdnuggets1.png?w=640&#038;h=1280&#038;h=1280" alt="" width="640" height="1280" /></a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/717/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=717&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2012/05/31/open-source-almost-even/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_4_kdnuggets1.png?w=640&#38;h=1280" medium="image">
			<media:title type="html">Fig_4_KDnuggets</media:title>
		</media:content>
	</item>
		<item>
		<title>Will 2015 be the Beginning of the End for SAS and SPSS?</title>
		<link>http://r4stats.com/2012/05/09/beginning-of-the-end/</link>
		<comments>http://r4stats.com/2012/05/09/beginning-of-the-end/#comments</comments>
		<pubDate>Wed, 09 May 2012 22:37:19 +0000</pubDate>
		<dc:creator>Bob Muenchen</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[R-Project]]></category>

		<guid isPermaLink="false">http://r4stats.com/?p=666</guid>
		<description><![CDATA[Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is &#8230; <a href="http://r4stats.com/2012/05/09/beginning-of-the-end/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=666&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Learning to use a data analysis tool well takes significant effort, so people tend to continue using the tool they learned in college for much of their careers. As a result, the software used by professors and their students is likely to predict what the next generation of analysts will use for years to come. I track this trend, and many others, in my article <a title="The Popularity of Data Analysis Software" href="http://r4stats.com/articles/popularity/"><em>The Popularity of Data Analysis Software</em></a>. In the latest update (4/13/2012) I forecast that, if current trends continued, the use of the R software would exceed that of SAS for scholarly applications in 2015. That was based on the data shown in Figure 7a, which I repeat here:</p>
<p><a href="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png"><img class="aligncenter size-full wp-image-643" title="Fig_7a_ScholarlyImpactBig6" src="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png?w=640&#038;h=640" alt="" width="640" height="640" /></a>Let&#8217;s take a more detailed look at what the future may hold for R, SAS and SPSS Statistics.</p>
<p>Here is the data from Google Scholar:</p>
<pre>         R   SAS   SPSS
1995     8  8620   6450
1996     2  8670   7600
1997     6 10100   9930
1998    13 10900  14300
1999    26 12500  24300
2000    51 16800  42300
2001   133 22700  68400
2002   286 28100  88400
2003   627 40300  78600
2004  1180 51400 137000
2005  2180 58500 147000
2006  3430 64400 142000
2007  5060 62700 131000
2008  6960 59800 116000
2009  9220 52800  61400
2010 11300 43000  44500
2011 14600 32100  32000</pre>
<p><strong>ARIMA Forecasting</strong></p>
<p>We can forecast the use of R using <a href="http://robjhyndman.com/software/forecast/">Rob Hyndman&#8217;s</a> handy auto.arima function to forecast five years into the future:</p>
<pre class="brush: r; title: ; notranslate">
&gt; library(&quot;forecast&quot;)

&gt; R_fit &lt;- auto.arima(R)

&gt; R_forecast &lt;- forecast(R_fit, h=5)

&gt; R_forecast

   Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
18          18258 17840 18676 17618 18898
19          22259 21245 23273 20709 23809
20          26589 24768 28409 23805 29373
21          31233 28393 34074 26889 35578
22          36180 32102 40258 29943 42417
</pre>
<p>We see that even if the use of SAS and SPSS were to remain at their current levels, R use would surpass their use in 2016 (<em>Point Forecast</em> column where 18-22 represent years 2012 -2016).</p>
<p>If we follow the same steps for SAS we get:</p>
<pre class="brush: r; title: ; notranslate">
&gt; SAS_fit &lt;- auto.arima(SAS)

&gt; SAS_forecast &lt;- forecast(SAS_fit, h=5)

&gt; SAS_forecast

   Point Forecast     Lo 80   Hi 80    Lo 95 Hi 95
18          21200  16975.53 25424.5  14739.2 27661
19          10300    853.79 19746.2  -4146.7 24747
20           -600 -16406.54 15206.5 -24774.0 23574
21         -11500 -34638.40 11638.4 -46887.1 23887
22         -22400 -53729.54  8929.5 -70314.4 25514
</pre>
<p>It appears that if the use of SAS continues to decline at its precipitous rate, all scholarly use of it will stop in 2014 (the number of articles published can&#8217;t be less than zero, so view the negatives as zero). I would bet Mitt Romney <a href="http://www.washingtonpost.com/blogs/election-2012/post/mitt-romney-challenges-rick-perry-to-10000-bet-in-gop-debate/2011/12/11/gIQAudrBnO_blog.html">$10,000</a> that that is not going to happen!</p>
<p>I find the SPSS prediction the most interesting:</p>
<pre class="brush: r; title: ; notranslate">
&gt; SPSS_fit &lt;- auto.arima(SPSS)

&gt; SPSS_forecast &lt;- forecast(SPSS_fit, h=5)

&gt; SPSS_forecast

   Point Forecast   Lo 80 Hi 80   Lo 95  Hi 95
18        13653.2  -16301 43607  -32157  59463
19        -4693.6  -57399 48011  -85299  75912
20       -23040.4 -100510 54429 -141520  95439
21       -41387.2 -145925 63151 -201264 118490
22       -59734.0 -193590 74122 -264449 144981
</pre>
<p>The forecast has taken a logical approach of focusing on the steeper decline from 2005 through 2010 and predicting that this year (2012) is the last time SPSS will see use in scholarly publications. However the part of the graph that I find most interesting is the shift from 2010 to 2011, which shows SPSS use still declining but at a much slower rate.</p>
<p>Any forecasting book will warn you of the dangers of looking too far beyond the data and I think these forecasts do just that. The 2015 figure in the <em>Popularity</em> paper and in the title of this blog post came from an exponential smoothing approach that did not match the rate of acceleration as well as the ARIMA approach does.</p>
<p><strong>Colbert Forecasting</strong></p>
<p>While ARIMA forecasting has an impressive mathematical foundation it&#8217;s always fun to follow <a href="http://www.colbertnation.com/">Stephen Colbert&#8217;s</a> approach: go from the gut. So now I&#8217;ll present the future of analytics software that must be true, because it feels so right to me personally. This analysis has Colbert&#8217;s most important attribute: <a href="http://en.wikipedia.org/wiki/Truthiness">truthiness</a>.</p>
<p>The growth in R&#8217;s use in scholarly work will continue for two more years at which point it will level off at around 25,000 articles in 2014.This growth will be driven by:</p>
<ul>
<li>The continued rapid growth in add-on packages (<a href="http://r4stats.com/articles/popularity/">Figure 10</a>)</li>
<li>The attraction of R&#8217;s powerful language</li>
<li>The near monopoly R has on the latest analytic methods</li>
<li>Its free price</li>
<li>The freedom to teach with real-world examples from outside organizations, which is forbidden to academics by SAS and SPSS licenses (it benefits those organizations, so the vendors say they should have their own software license).</li>
</ul>
<p>What will slow R&#8217;s growth is its lack of a graphical user interface that:</p>
<ul>
<li>Is powerful</li>
<li>Is easy to use</li>
<li>Provides journal style output in word processor format</li>
<li>Is standard, i.e. widely accepted as <em>The One to Use</em></li>
<li>Is open source</li>
</ul>
<p>While programming has important advantages over GUI use, many people will not take the time needed to learn to program. Therefore they rarely come to fully understand those advantages. Conversely, programmers seldom take the time to fully master a GUI and so often underestimate its capabilities. Regardless of which is best, GUI users far outnumber programmers and, until resolved, this will limit R&#8217;s long term growth. There are GUIs for R, but so many to choose from that none becomes the clear leader (Deducer, R Commander, Rattle, Red-R, at least two from commercial companies and still more <a href="http://en.wikipedia.org/wiki/R_(programming_language)#Graphical_user_interfaces">here</a>.) If from this &#8220;GUI chaos&#8221; a clear leader were to emerge, then R could continue its rapid growth and end up as the most used package.</p>
<p>The use of SAS for scholarly work will continue to decline until it matches R at the 25,000 level. This is caused by competition from R and other packages (notably Stata) but also by SAS Instute&#8217;s self-inflicted GUI chaos.  For years they have offered too many GUIs such as SAS/Assist, SAS/Insight, IML/Studio, the Analyst application, Enterprise Guide, Enterprise Miner and  even JMP (which runs SAS nicely in recent versions). Professors looking to meet student demand for greater ease of use could not decide what to teach so they continued teaching SAS as a programming language. Even now that Enterprise Guide has evolved into a good GUI, many SAS users do not know what it is. If SAS Institute were to completely replace their default Display Manager System with Enterprise Guide, they could bend the curve and end up at a higher level of perhaps 27,000.</p>
<p>The use of SPSS for scholarly work will decline only slightly this year and will level off in 2013 because:</p>
<ul>
<li>The people who needed advanced methods and were not happy <a title="Calling R from Other Software" href="http://r4stats.com/articles/calling-r/">calling R functions from within SPSS</a> have already switched to R or Stata</li>
<li>The people who like to program and want a more flexible language than SPSS offers have already switched to R or Stata</li>
<li>The people who needed a more advanced GUI have already switched to JMP</li>
</ul>
<p>The GUI users will stick with SPSS until a GUI as good (or close to as good) comes to R and becomes widely accepted. At The University of Tennessee where I work, that&#8217;s the great majority of SPSS users.</p>
<p>Stata&#8217;s growth will level off in 2013 at level that will leave it in fourth place. The other packages shown in <a title="The Popularity of Data Analysis Software" href="http://r4stats.com/articles/popularity/">Figure 7b</a> will also level off around the same time, roughly maintaining their current place in the rankings. A possible exception is JMP, whose interface is radically superior to the the others for exploratory analysis. Its use could continue to grow, perhaps even replacing Stata for fourth place.</p>
<p>The future of Enterprise Miner and SPSS Modeler are tied to the success of each company&#8217;s more mainstream products, SAS and SPSS Statistics respectively. Use of those products is generally limited to one university class in data mining, while the other software discussed here is widely used in many classes.</p>
<p>So there you have it: the future of analytics revealed. No doubt each reader has found a wide range of things to disagree with, so I encourage you to follow the detailed blog at <a href="http://librestats.com/2012/04/12/statistical-software-popularity-on-google-scholar/">Librestats</a> to collect your own data from Google Scholar and do your own set of forecasts. Or simply go from the gut!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/r4stats.wordpress.com/666/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/r4stats.wordpress.com/666/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=r4stats.com&#038;blog=35357879&#038;post=666&#038;subd=r4stats&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://r4stats.com/2012/05/09/beginning-of-the-end/feed/</wfw:commentRss>
		<slash:comments>53</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/1bf5f1f5f75ff7d2bd346940cae93b3f?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">r4stats</media:title>
		</media:content>

		<media:content url="http://r4stats.files.wordpress.com/2012/04/fig_7a_scholarlyimpactbig6.png" medium="image">
			<media:title type="html">Fig_7a_ScholarlyImpactBig6</media:title>
		</media:content>
	</item>
	</channel>
</rss>
