See BlueSky Statistics GUI for R at JSM 2023

Are attending this year’s Joint Statistical Meetings in Toronto? If so, stop by booth 404 to see the latest features of BlueSky Statistics. A menu-based graphical user interface for the R language, BlueSky lets people access the power of R without having to learn to program. Programmers can easily add code to BlueSky’s menus, sharing their expertise with non-programmers. My detailed review of BlueSky is here, a brief comparison to other R GUIs is here, and the BlueSky User Guide is here. I hope to see you in Toronto!

Update to Data Science Software Popularity

I’ve updated The Popularity of Data Science Software‘s market share estimates based on scholarly articles. I posted it below, so you don’t have to sift through the main article to read the new section.

Scholarly Articles

Scholarly articles provide a rich source of information about data science tools. Because publishing requires significant effort, analyzing the type of data science tools used in scholarly articles provides a better picture of their popularity than a simple survey of tool usage. The more popular a software package is, the more likely it will appear in scholarly publications as an analysis tool or even as an object of study.

Since scholarly articles tend to use cutting-edge methods, the software used in them can be a leading indicator of where the overall market of data science software is headed. Google Scholar offers a way to measure such activity. However, no search of this magnitude is perfect; each will include some irrelevant articles and reject some relevant ones. The details of the search terms I used are complex enough to move to a companion article, How to Search For Data Science Articles.  

Figure 2a shows the number of articles found for the more popular software packages and languages (those with at least 4,500 articles) in the most recent complete year, 2022.

Figure 2a. The number of scholarly articles found on Google Scholar for data science software. Only those with more than 4,500 citations are shown.

SPSS is the most popular package, as it has been for over 20 years. This may be due to its balance between power and its graphical user interface’s (GUI) ease of use. R is in second place with around two-thirds as many articles. It offers extreme power, but as with all languages, it requires memorizing and typing code. GraphPad Prism, another GUI-driven package, is in third place. The packages from MATLAB through TensorFlow are roughly at the same level. Next comes Python and Scikit Learn. The latter is a library for Python, so there is likely much overlap between those two. Note that the general-purpose languages: C, C++, C#, FORTRAN, Java, MATLAB, and Python are included only when found in combination with data science terms, so view those counts as more of an approximation than the rest. Old stalwart FORTRAN appears last in this plot. While its count seems close to zero, that’s due to the wide range of this scale, and its count is just over the 4,500-article cutoff for this plot.

Continuing on this scale would make the remaining packages appear too close to the y-axis to read, so Figure 2b shows the remaining software on a much smaller scale, with the y-axis going to only 4,500 rather than the 110,000 used in Figure 2a. I chose that cutoff value because it allows us to see two related sets of tools on the same plot: workflow tools and GUIs for the R language that make it work much like SPSS.

Figure 2b. Number of scholarly articles using each data science software found using Google Scholar. Only those with fewer than 4,500 citations are shown.

JASP and jamovi are both front-ends to the R language and are way out front in this category. The next R GUI is R Commander, with half as many citations. Still, that’s far more than the rest of the R GUIs: BlueSky Statistics, Rattle, RKWard, R-Instat, and R AnalyticFlow. While many of these have low counts, we’ll soon see that the use of nearly all is rapidly growing.

Workflow tools are controlled by drawing 2-dimensional flowcharts that direct the flow of data and models through the analysis process. That approach is slightly more complex to learn than SPSS’ simple menus and dialog boxes, but it gets closer to the complete flexibility of code. In order of citation count, these include RapidMiner, KNIME, Orange Data Mining, IBM SPSS Modeler, SAS Enterprise Miner, Alteryx, and R AnalyticFlow. From RapidMiner to KNIME, to SPSS Modeler, the citation rate approximately cuts in half each time. Orange Data Mining comes next, at around 30% less. KNIME, Orange, and R Analytic Flow are all free and open-source.

While Figures 2a and 2b help study market share now, they don’t show how things are changing. It would be ideal to have long-term growth trend graphs for each software, but collecting that much data is too time-consuming. Instead, I’ve collected data only for the years 2019 and 2022. This provides the data needed to study growth over that period.

Figure 2c shows the percent change across those years, with the growing “hot” packages shown in red (right side) and the declining or “cooling” ones shown in blue (left side).

Figure 2c. Change in Google Scholar citation rate from 2019 to the most recent complete year, 2022. BlueSky (2,960%) and jamovi (452%) growth figures were shrunk to make the plot more legible.

Seven of the 14 fastest-growing packages are GUI front-ends that make R easy to use. BlueSky’s actual percent growth was 2,960%, which I recoded as 220% as the original value made the rest of the plot unreadable. In 2022 the company released a Mac version, and the Mayo Clinic announced its migration from JMP to BlueSky; both likely had an impact. Similarly, jamovi’s actual growth was 452%, which I recoded to 200. One of the reasons the R GUIs were able to obtain such high percentages of change is that they were all starting from low numbers compared to most of the other software. So be sure to look at the raw counts in Figure 2b to see the raw counts for all the R GUIs.

The most impressive point on this plot is the one for PyTorch. Back on 2a we see that PyTorch was the fifth most popular tool for data science. Here we see it’s also the third fastest growing. Being big and growing fast is quite an achievement!

Of the workflow-based tools, Orange Data Mining is growing the fastest. There is a good chance that the next time I collect this data Orange will surpass SPSS Modeler.

The big losers in Figure 2c are the expensive proprietary tools: SPSS, GraphPad Prism, SAS, BMDP, Stata, Statistica, and Systat. However, open-source R is also declining, perhaps a victim of Python’s rising popularity.

I’m particularly interested in the long-term trends of the classic statistics packages. So in Figure 2d, I have plotted the same scholarly-use data for 1995 through 2016.

Figure 2d. The number of Google Scholar citations for each classic statistics package per year from 1995 through 2016.

SPSS has a clear lead overall, but now you can see that its dominance peaked in 2009, and its use is in sharp decline. SAS never came close to SPSS’s level of dominance, and its usage peaked around 2010. GraphPad Prism followed a similar pattern, though it peaked a bit later, around 2013.

In Figure 2d, the extreme dominance of SPSS makes it hard to see long-term trends in the other software. To address this problem, I have removed SPSS and all the data from SAS except for 2014 and 2015. The result is shown in Figure 2e.

Figure 2e. The number of Google Scholar citations for each classic statistics package from 1995 through 2016, with SPSS removed and SAS included only in 2014 and 2015. The removal of SPSS and SAS expanded scale makes it easier to see the rapid growth of the less popular packages.

Figure 2e shows that most of the remaining packages grew steadily across the time period shown. R and Stata grew especially fast, as did Prism until 2012. The decline in the number of articles that used SPSS, SAS, or Prism is not balanced by the increase in the other software shown in this graph.

These results apply to scholarly articles in general. The results in specific fields or journals are likely to differ.

You can read the entire Popularity of Data Science Software here; the above discussion is just one section.

Mayo Clinic Announces Move from SAS’ JMP to BlueSky Statistics

At the useR! 2022 Conference, the world-renowned Mayo Clinic announced that after 20 years of using SAS Institute’s JMP software, they have migrated to the BlueSky Statistics user interface for R. Ross Dierkhising, a principal biostatistician with the Clinic, described the process. They reviewed 16 commercial statistical software packages and none met their needs as well as JMP. Then they investigated three graphical user interface for the powerful R language: BlueSky Statistics, jamovi, and JASP.

They found BlueSky meet their needs as well as JMP, for significantly less cost. Then Mayo’s staff added over 40 new dialogs to BlueSky, including things that JMP did not offer. Dierkhising said, “I have nothing but the highest respect [for] the BlueSky development team and how they worked with us.” Among others, the Mayo’s additions to BlueSky include:

  • Kaplan-Meier, one group and compare groups
  • Competing risks, one group, and compare groups
  • Cox models, single model, and advanced single model
  • Stratified cox model
  • Fine-Gray Cox model
  • Cox model, with binary time-dependent covariate
  • Large-scale data/model summaries via the arsenal package
  • Frequency table in list format
  • Compare datasets like SAS’ compare procedure
  • Single tables of multiple model fits
  • Bland-Altman plots
  • Cohen’s and Fleiss’ kappa
  • Concordance correlation coefficients
  • Intraclass correlation coefficients
  • Diagnostic testing with a gold standard

Although Dierkhising said BlueSky included a “ton” of data wrangling methods, the Mayo team added a dozen more. The result was “gigantic” cost savings, and a tool that, in the end, did things that JMP could not do.

Anyone can download a free and open source copy of BlueSky statistics from the company website. You can read my detailed review of BlueSky here, and see how it compares to other graphical user interfaces to R here. The BlueSky User Guide is online here.

You can watch Ross Dierkhising’s entire 17 minute presentation here:

Updated Comparison of R Graphical User Interfaces

I have just updated my detailed reviews of Graphical User Interfaces (GUIs) for R, so let’s compare them again. It’s not too difficult to rank them based on the number of features they offer, so let’s start there. I’m basing the counts on the number of dialog boxes in each category of four categories:

  • Ease of Use
  • General Usability
  • Graphics
  • Analytics

This is trickier data to collect than you might think. Some software has fewer menu choices, depending instead on more detailed dialog boxes. Studying every menu and dialog box is very time-consuming, but that is what I’ve tried to do. I’m putting the details of each measure in the appendix so you can adjust the figures and create your own categories. If you decide to make your own graphs, I’d love to hear from you in the comments below.

Figure 1 shows how the various GUIs compare on the average rank of the four categories. R Commander is abbreviated Rcmdr, and R AnalyticFlow is abbreviated RAF. We see that BlueSky is in the lead with R-Instat close behind. As my detailed reviews of those two point out, they are extremely different pieces of software! Rather than spend more time on this summary plot, let’s examine the four categories separately.

Figure 1. Mean of each R GUI’s ranking of the four categories. To make this plot consistent with the others below, the larger the rank, the better.

For the category of ease-of-use, I’ve defined it mostly by how well each GUI does what GUI users are looking for: avoiding code. They get one point each for being able to install, start, and use the GUI to its maximum effect, including publication-quality output, without knowing anything about the R language itself. Figure two shows the result. JASP comes out on top here, with jamovi and BlueSky right behind.

Figure 2. The number of ease-of-use features that each GUI has.

Figure 3 shows the general usability features each GUI offers. This category is dominated by data-wrangling capabilities, where data scientists and statisticians spend most of their time. This category also includes various types of data input and output. BlueSky and R-Instat come out on top not just due to their excellent selection of data wrangling features but also due to their use of the rio package for importing and exporting files. The rio package combines the import/export capabilities of many other packages, and it is easy to use. I expect the other GUIs will eventually adopt it, raising their scores by around 40 points. JASP shows up at the bottom of this plot due to its philosophy of encouraging users to prepare the data elsewhere before importing it into JASP.

Figure 3. Number of general usability features for each GUI.

Figure 4 shows the number of graphics features offered by each GUI. R-Instat has a solid lead in this category. In fact, this underestimates R-Instat’s ability if you…


R Graphical User Interface Comparison

I have recently updated my detailed reviews of Graphical User Interfaces (GUIs) for R, so it’s time for another comparison post. It’s not too difficult to rank them based on the number of features they offer, so let’s start there. I’m basing the counts on the number of dialog boxes in each category of four categories:

  • Ease of Use
  • General Usability
  • Graphics
  • Analytics

This is trickier data to collect than you might think. Some software has fewer menu choices, depending instead on more detailed dialog boxes. Studying every menu and dialog box is very time-consuming, but that is what I’ve tried to do. I’m putting the details of each measure in the appendix so you can adjust the figures and create your own categories. If you decide to make your own graphs, I’d love to hear from you in the comments below.

Figure 1 shows how the various GUIs compare on the average rank of the four categories. R Commander is abbreviated Rcmdr, and R AnalyticFlow is abbreviated RAF. We see that BlueSky (User Guide online here) and R-Instat are nearly tied for the lead. As my detailed reviews of those two point out, they are extremely different pieces of software! Rather than spend more time on this summary plot, let’s examine the four categories separately.

Figure 1. Mean of each R GUI’s ranking of the four categories. To make this plot consistent with the others below, the larger the rank, the better.

For the category of ease-of-use, I’ve defined it mostly by how well each GUI does what GUI users are looking for: avoiding code. They get one point each for being able to install, start, and use the GUI to its maximum effect, including publication-quality output without having to know anything about the R language itself. Figure two shows the result. JASP comes out on top here, with jamovi and BlueSky right behind.

Figure 2. The number of ease-of-use features that each GUI has.

Figure 3 shows the general usability features each GUI offers. This category is dominated by data-wrangling capabilities, where data scientists and statisticians spend the majority of their time. This category also includes various types of data input and output. R-Instat comes out on top not just due to its excellent selection of data wrangling features, but also due to its use of the rio package for importing and exporting files. The rio package combines the import/export capabilities of many other packages and it is easy to use. I expect the other GUIs will eventually adopt it, raising their scores by around 40 points. JASP shows up at the bottom on this plot due to its philosophy of encouraging users to prepare the data elsewhere before importing it into JASP.

Figure 3. Number of general usability features for each GUI.

Figure 4 shows the number of graphics features offered by each GUI. R-Instat has a solid lead in this category. In fact, this is actually an underestimate of R-Instat’s ability if you include its options to layer any “geom” on top of any graph. However, that requires knowing what the geoms are and how to use them. That’s knowledge of R code, of course.

When studying these graphs, it’s important to consider the difference between the relative and absolute performance. For example, relatively speaking, JASP and R Commander are not doing well here, but they do offer over 25 types of plots! That absolute figure might be fine for your needs.

Figure 4. Number of graphics features offered by each GUI.

Finally, we get to what is, for many people, the main reason for using this type of software: analytics. Figure 5 shows how the GUIs compare on the number of statistics, machine learning, and artificial intelligence methods. Here R Commander shows, well, a “commanding” lead! This GUI has been around the longest, and so has had more time for people to contribute to its capabilities. If you read an earlier version of this article, R Commander was not as dominant. That was due to the fact that I had not yet taken the time necessary to load and study every one of its 42 add-ons. That required a substantial amount of time, and these updated figures reflect a more complete view of its capabilities.

Again, it’s worth considering the absolute values on the x-axis. JASP and jamovi are in the middle of the pack, but they both have nearly 200 methods. If that is sufficient for your needs, you can then focus on the other categories.

Many important details are buried in these simple counts. For example, I enjoy using jamovi for statistical analyses, but it currently lacks machine learning and artificial intelligence. I like BlueSky too, but it doesn’t yet do any Bayesian statistics (jamovi and JASP do). Rattle comes out near the bottom due to its focus on machine learning, but it does an excellent job of introducing students to that area.

Figure 5. Number of analytics features offered by each GUI.

Overview of Each R GUI

The above plots help show us overall feature sets, but each package offers methods that the others lack. Let’s look at a brief overview of each. Remember that each of these has a detailed review that follows my standard template. I present them in alphabetical order.

BlueSky Statistics – This software was created by former SPSS employees and it shares many of SPSS’ features. BlueSky is only a few years old, and it converted from commercial to open source mid-way through 2018. Its developers have been adding features at a rapid rate. When using BlueSky, it’s not initially apparent that R is involved at all. Unless you click the code button “</>” included in every dialog box, you’ll never see the R code. If you’re wanting to learn R code, seeing what BlueSky uses for each step can help. BlueSky saves the dialog settings for every step, providing GUI-based reproducibility. For R code, it uses the popular, but controversial, tidyverse style while most of the other GUIs use base R functions. BlueSky’s output is in publication-quality tables which follow the popular style of the American Psychological Association. It’s stronger than most of the others at AI/ML and psychometrics. It is now available for Windows and Mac (previous versions were Windows-only).

Deducer – This has a very nice-looking interface, and it’s probably the first R GUI to offer output in true APA-style word processing tables. Being able to just cut and paste a table into your word processor saves a lot of time and it’s a feature that has been copied by several others. Deducer was released in 2008, and when I first saw it, I thought it would quickly gain developers. It got a few, but development seems to have halted. Deducer’s installation is quite complex, and it depends on the troublesome Java software. It also uses JGR, which never became as popular as the similar RStudio. The main developer, Ian Fellows, has moved on to another interesting GUI project called Vivid. I ran this most recently in February, 2022, and the output had many odd characters in it, perhaps due to a lack of support for Unicode.

jamovi – The developers who form the core of the jamovi project used to be part of the JASP team. Despite the fact that they started a couple of years later, they’re ahead of JASP in several ways at the moment. Its developers decided that the R code it used should be visible and any R code should be executable, features that differentiated it from JASP. jamovi has an extremely interactive interface that shows you the result of every selection in each dialog box (JASP does too). It also saves the settings in every dialog box, and lets you re-use every step on a new dataset by saving a “template.” That’s extremely useful since GUI users often prefer to avoid learning R code. jamovi’s biggest weakness is its dearth of data management featues, though there are plans to address that. The most recent version of jamovi borrowed the Bayesian analysis methods from JASP, making those two tied as the leaders in that approach. jamovi can help you learn R code by showing what it does at each step, though it uses its own functions from the jmv package. While those functions are not standard R, they do combine the capability of many R functions in each one.

JASP – The biggest advantage JASP offers is its emphasis on Bayesian analysis. If that’s your preference, this might be the one for you. Another strength is JASP’s Machine Learning module. At the moment JASP is very different from all the other GUIs reviewed here because it can’t show you the R code it’s writing. The development team plans to address that issue, but it has been planned for a couple of years now, so it must not be an easy thing to add.

R AnalyticFlow – This is unique among R GUIs as it is the only one that lets you organize your analyses using flowchart-like workflow diagrams. That approach makes it easy to visualize what a complex analysis is doing and to rerun it. It writes very clean base R code and provides easy access to the powerful lattice graphics package. It also supports the ggplot2 graphics package, but only through its more limited quickplot function. R AnalyticFlow also lets you extend its capability making it easier for R power users to interact with non-programmers. However, it has some serious limitations. Its set of analytic and graphical methods is quite sparse. It also lacks the important advantage that most workflow-based tools have: the ability to re-use the workflow on a new dataset by changing only the data input nodes. Since each node requires the name of the dataset used, you must change it in each location.

Rattle – If your work involves ML/AI (a.k.a. data mining) instead of standard statistical methods, Rattle may be the GUI for you. It’s focused on ML/AI, and its tabbed-based interface makes quick work of it. However, it’s the weakest of them all when it comes to statistical analysis. It also lacks many standard data management features.

R Commander – This is the oldest GUI, having been around since at least 2005. There are an impressive 42 add-ons developed for it. It is currently one of only three R GUIs that saves R Markdown files (the others being BlueSky and RKWard), but it does not create word processing tables by default, as some of the others do. The R code it writes is classic, rarely using the newer tidyverse functions. It works as a partner to R; you install R separately, then use it to install and start R Commander. R Commander makes it easy to blend menu-based analysis with coding. If your goal is to learn to code using base R, this is an excellent choice. The software’s main developer, John Fox, told me in January 2022 that he has no future development plans for R Commander. However, others can still extend its feature set by writing add-ons.

R-Instat – This offers one of the most extensive collections of data wrangling, graphics, and statistical analysis methods of any R GUI. At a basic level, its graphics dialogs are easy to use, and it offers powerful multi-layer support for people who are familiar with the ggplot2 package’s geom functions. To use its full modeling capabilities, you need to know what R’s packages (e.g. MASS) are and what each one’s functions (e.g. rlm) do. For an R programmer, recognizing a known package::function combination is much easier than recalling it without assistance. Such a user would find R-Instat’s GUI extremely helpful.

RKWard – This GUI blends a nice point-and-click interface with an integrated development environment (IDE) that is the most advanced of all the other GUIs reviewed here. It’s easy to install and start, and it saves all your dialog box settings, allowing you to rerun them. However, that’s done step-by-step, not all at once as jamovi’s templates allow. The code RKWard creates is classic R, with no tidyverse at all. RKWard is one of only three R GUIs that supports R Markdown.


I hope this brief comparison will help you choose the R GUI that is right for you. Each offers unique features that can make life easier for non-programmers. Instructors of introductory classes in statistics or ML/AI should find these enable their students to focus on the material rather than on learning the R language. If one catches your eye, don’t forget to read the full review of it here.


Writing this set of reviews has been a monumental undertaking. It would not have been possible without the assistance of Bruno Boutin, Anil Dabral, Ian Fellows, John Fox, Thomas Friedrichsmeier, Rachel Ladd, Jonathan Love, Ruben Ortiz, Danny Parsons, Christina Peterson, Josh Price, David Stern, Roger Stern, and Eric-Jan Wagenmakers, and Graham Williams.

Appendix: Guide to Scoring

The four categories are defined by the following. The yes/no items get scored 1 for yes, and 0 for no. The “how many” items consist of simple unweighted counts of the number of features, e.g., the number of file types a package can import without relying on R code. I used to plot the total number of features, but that is now dominated by the large values for analytics features, making that total fairly redundant.

Ease_of_UseInstalls without the use of R1.
Ease_of_UseStarts without the use of R1.
Ease_of_UseRemembers recent files0.
Ease_of_UseHides R code by default
Ease_of_UseUse its full capability without using R1.
Ease_of_UseData Editor1.
Ease_of_UseReuse the entire workflow without using R1.
Ease_of_UsePub-quality tables w/out R code steps1.
Ease_of_UseHides field-specific menus initially0.
Ease_of_UseTable of Contents to ease navigation0.
Ease_of_UseEasy to move blocks of output1.
Ease_of_UseEasy to repeat any step by groups1.
General_FeaturesOperating Systems (how many)
General_FeaturesImport Data File Types (how many)7.0015.
General_FeaturesImport Database (how many)
General_FeaturesExport Data File Types (how many)
General_FeaturesMultiple Data Files Open at Once1.
General_FeaturesMultiple Output Windows1.
General_FeaturesMultiple Code Windows0.
General_FeaturesVariable Metadata View1.
General_FeaturesVariable Search in Dialogs0.
General_FeaturesVariable Filtering (limit vars shown in data and dialogs)
General_FeaturesModel Builder adds N-way interactions1.
General_FeaturesMagnify GUI for teaching1.
General_FeaturesR Code Editor1.
General_FeaturesReuse work via Code1.
General_FeaturesPackage Management1.
General_FeaturesOutput: Word Processing Features1.
General_FeaturesOutput: R Markdown1.
General_FeaturesOutput: LaTeX1.
General_FeaturesData_Wrangling (how many)
General_FeaturesTransform Across Variables at Once1.
General_FeaturesTransform Down Many Variables at Once1.
General_FeaturesLabel Across Many Variables at Once0.
GraphicsTypes of Graphs (how many)29.0016.0020.0014.0011.0024.0019.0035.0019.00
GraphicsSmall Multiples1.
GraphicsLarge Multiples1.
GraphicsExport Graphics Formats (how many)
AnalyticsModel Objects1.
AnalyticsStatistics – Frequentist159.0035.0071.00168.0017.008.00591.00209.0055.00
AnalyticsStatistics – Bayesian0.000.0043.0012.
AnalyticsStatistics – Distributions18.000.0040.
AnalyticsMachine Learning / AI35.000.0016.000.004.0027.
AnalyticsModel Validation Methods (how many)

A Comparative Review of the R-Instat GUI for R

by Robert A. Muenchen


R-Instat is a free and open source graphical user interface for the R software that focuses on people who want to point-and-click their way through data science analyses. Written in Visual Basic, it is currently only available for Microsoft Windows. However, a Linux version is in development using the cross-platform Mono implementation of the .NET framework.This post is one of a series of reviews that aim to help non-programmers choose the Graphical User Interface (GUI) that is best for them. Although I wrote the BlueSky User’s Guide, I hope to remain objective in these reviews. There is no one perfect user interface for everyone; each GUI for R has features that appeal to a different set of people.


There are various definitions of user interface types, so here’s how I’ll be using these terms:GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.


The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as jamovi or RKWard, install in a single step. Others, such as Deducer, install in multiple steps (up to seven steps, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

R-Instat is easy to install, requiring only a single step. It provides its own embedded copy of R. This simplifies the installation and ensures complete compatibility between R-Instat and the version of R it’s using. However, it also means if you already have R installed, you’ll end up with a second copy. You can have R-Instat control any version of R you choose, but if the version differs too much, you may run into occasional problems.

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” that add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Rattle, Deducer) through medium (JASP 15) to high (jamovi 43, R Commander 43).

While the R-Instat project welcomes contributions from anyone, there are not any modules to add at this time. All of its capabilities are included in its initial installation.


Some user interfaces for R, such as jamovi or JASP, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and then finally call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.

You start R-Instat directly by double-clicking its icon from your desktop or choosing it from your Start Menu (i.e., not from within R).

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.

R-Instat starts up by showing its screen (Fig. 1). Under Start, I chose “New Data Frame” and it showed me the rather perplexing dialog shown in Fig. 2.

Figure 1. The R-Instat startup screen.

As an R user, I know what expressions are, but what did the R-Instat designers mean by the term?

Figure 2. The New Dataframe dialog.

Clicking the “Construct Examples” button brought up the suggestions shown in Fig. 3. These are standard R expressions, which came as quite a surprise! It seems that the R-Instat designers are wanting to get people to start using R programming code immediately.

Figure 3. Examples R-Instat provides for expression you can use to create a dataset.

Clicking the Help button brings up the advice, “the simplest option is Empty” (the developers say this will become the default in a future version). Clicking that button brings up a simple prompt for the number of rows and columns you would like to create. After that, you’re looking at a basic spreadsheet (Fig. 4) that easily lets you enter data. As you enter data, it determines if it is numeric or character. Scientific notation is accepted, but dates are saved as character variables. Logical values (TRUE, FALSE) are recognized as such and are stored appropriately.

Right-clicking on any column allows you to convert variables to be a factor, ordered factor, numeric, logical, or character. These changes are recorded as function calls to a custom “convert_column_to_type” function for reproducibility. Such interactive changes are not usually recorded by other R GUIs. Date/time conversion is not available on that menu, as that process is trickier. Those conversions are on the “Prepare> Column Date” menu item. Other things you can do from the right-click menu are: rename, duplicate, reorder, set levels/labels, sort, and filter/remove filter.

The class of each variable is indicated by a character code that follows each variable name in parenthesis: (C) for character, (F) for factor, (O.F) for ordered factor, (D) for date, (L) for logical. When no code follows a variable name, it is numeric.

Figure 4. The R-Instat Data View (left) and Output Window (right).

The name of the dataset appears on a tab at the bottom of the Data View window. This lets you easily manage multiple datasets, an ability that is popular among professionals, but which is rarely offered in R GUIs (BlueSky and R Commander are the only others that offer it).

Once the dataset is saved, to add rows or columns you choose, “Prepare > Data Frame > Insert rows/columns” to add new rows or columns at any position in the data frame. New columns can be added with a specified default value, which can be a big time-saver when entering blocks of related data.

There is a quicker method that works for inserting new rows. You right-click the row numbers and a pop-up menu will allow you to insert rows above or below, and the number of rows selected is the number of rows added – like in Excel.

When editing data, R-Instat lets you type new values on top of the old. As soon as you press the Enter key, it generates R code to execute the change. For example, in a language variable, when changing the value “English” to “Spanish,” it wrote,

Replace Value in Data
data_book$replace_value_in_data(data_name="wakefield", col_name="Language", rows="78", new_value="Spanish")

This is important for reproducibility, but R-Instat is the only GUI reviewed here that tracks such important manual changes. In fact, even among expensive proprietary software, Stata is the only one that I’m aware of that keeps track of such changes using code.

If you have another data set to enter, you can restart the process by choosing “File> New Data…” again. You can change data sets simply by clicking on its tab, and its window will pop to the front for you to see. When doing analyses, or saving data, the data set that is displayed in the editor does not influence what appears in dialog boxes. That means that you can be looking at one dataset while analyzing another! Since each dialog allows you to choose the dataset to use, that is technically not a problem, but if you have several datasets that contain the same variable names, remember that what you see may not be what you get! That’s the opposite of BlueSky Statistics, which automatically analyzes the dataset you see. R-Instat’s ability to work with multiple datasets in a single instance of the software is not a feature found in all R GUIs. For example, jamovi and JASP can only work with a single dataset at a time.

Saving the data is done with a fairly standard “File> Save As> Save Dataset As” menu. By default it will save all open datasets, filters, graphs, and models to a single file called a “data book.” That makes working with complex projects much easier to open and close.

Data Import

R-Instat supports the following file formats, most of which are automatically opened using “File> Import from File”. The ODK and NetCDF file formats have their own Import menus. R-Instat’s ability to open many formats related to climate science hints at what the software excels at. For details, see the Analysis Methods section below.

  1. Comma Separated Values (.csv)
  2. Plain text files (.txt)
  3. Excel (old and new xls file types)
  4. xBASE database files (dBase, etc.)
  5. SPSS (.sav)
  6. SAS binary files (sas7bdat and *.xpt)
  7. Standard R workspace files (RData, but it just opens one dataframe of its choosing)
  8. Open Data Kit (ODK)
  9. OpenRefine
  10. Network Common Data Form (NetCDF)
  11. SST Sea Surface Temperature formatted files
  12. IRI Data Library (API download)
  13. Climate Data Store (CDS) (API download)
  14. Shapefile
  15. Climsoft (Climatic database)
  16. .dly (ASCII files)
  17. .dat (ASCII files)
  18. Tab Separated Values (.tsv)
  19. Stata (.dta)
  20. JSON (.json)
  21. epiinfo (.rec)
  22. Minitab (.mtb)
  23. Systat (.syd). 
  24. CSV with a YAML metadata header (.csvy)
  25. Feather R/Python interchange format (.feather)
  26. Pipe separated files (.psv)
  27. YAML (.yml)
  28. Weka Attribute-Relation File Format (.arff)
  29. Data Interchange Format (.dif)
  30. OpenDocument Spreadsheet (*.ods)
  31. Shallow XML documents (*.xml)
  32. Single-table HTML documents (*.html)


BlueSky Statistics Intro and User Guides Now Available

BlueSky Statistics is an easy-to-use menu system that uses the R language to do all its work. My detailed review of BlueSky is available here, and a brief comparison of the various menu systems for R is here. I’ve just released the BlueSky Statistics 7.1 User Guide in printed form on the world’s largest independent bookstore, A description and detailed table of contents are available here.

Cover design by Kiran Rafiq.

I’ve also released the BlueSky Statistics 7.1 Intro Guide. It is a complete subset of the User Guide, and you can download it for free here (if you have trouble downloading it, your company may have security blocking Microsoft OneDrive; try it at home). Its description and table of contents are here, and soon you will also be able to purchase a printed copy of it from

Cover design by Kiran Rafiq.

I’m enthusiastic about getting feedback on these books. If you have comments or suggestions, please send them to me at muenchen.bob at gmail dot com.

Other books that feature BlueSky Statistics include:
Introduction to Biomedical Data Science
Applying the Rasch Model in Social Sciences Using R
Data Preparation and Exploration, Applied to Healthcare Data

Publishing with has been a very pleasant experience. They put the author in complete control, making one responsible for every detail of the contents, obtaining reviewers, creating a cover file that includes the front, back, and spine of the book to match the dimensions of the book (e.g. more pages means wider spine, etc.) Advertising is left up to the writer as well, hence this blog post! If you are thinking about writing a book, I highly recommend both and getting a cover design from The latter let me run a contest in which a dozen artists submitted several ideas each. Their built-in survey system let me ask many colleagues for their opinions to help me decide. Altogether, it was a very interesting experience.

To follow the progress of these and other R related books, subscribe to my blog, or follow me on Twitter.

R GUI Update: BlueSky User’s Guide, New Features

The BlueSky Statistics graphical user interface (GUI) for the R language has added quite a few new features (described below). I’m also working on a BlueSky User Guide, a draft of which you can read about and download here. [Update: don’t download that, get the full Intro Guide download instead.] Although I’m spending a lot of time on BlueSky, I still plan to be as obsessive as ever about reviewing all (or nearly all) of the R GUIs, which is summarized here.

The new data management features in BlueSky are:

  • Date Order Check — this lets you quickly check across the dates stored in many variables, and it reports if it finds any rows whose dates are not always increasing from left to right.
  • Find Duplicates – generates a report of duplicates and saves a copy of the data set from which the duplicates are removed. Duplicates can be based on all variables, or a set of just ID variables.
  • Select First/Last Observation per Group – finding the first or last observation in a group can create new datasets from the “best” or “worst” case in each group, find the most current record, and so on.

Model Fitting / Tuning

One of the more interesting features in BlueSky is its offering of what they call Model Fitting and Model Tuning. Model Fitting gives you direct control over the R function that does the work. That provides precise control over every setting, and it can teach you the code that the menus create, but it also means that model tuning is up to you to do. However, it does standardize scoring so that you do not have to keep up with the wide range of parameters that each of those functions need for scoring. Model Tuning controls models through the caret package, which lets you do things like K-fold cross-validation and model tuning. However, it does not allow control over every model setting.

New Model Fitting menu items are:

  • Cox Proportional Hazards Model: Cox Single Model
  • Cox Multiple Models
  • Cox with Formula
  • Cox Stratified Model
  • Extreme Gradient Boosting
  • KNN
  • Mixed Models
  • Neural Nets: Multi-layer Perceptron
  • NeuralNets (i.e. the package of that name)
  • Quantile Regression

There are so many Model Tuning entries that it’s easier to just paste in the list I updated on the main BlueSkly review that I updated earlier this morning:

  • Model Tuning: Adaboost Classification Trees
  • Model Tuning: Bagged Logic Regression
  • Model Tuning: Bayesian Ridge Regression
  • Model Tuning: Boosted trees: gbm
  • Model Tuning: Boosted trees: xgbtree
  • Model Tuning: Boosted trees: C5.0
  • Model Tuning: Bootstrap Resample
  • Model Tuning: Decision trees: C5.0tree
  • Model Tuning: Decision trees: ctree
  • Model Tuning: Decision trees: rpart (CART)
  • Model Tuning: K-fold Cross-Validation
  • Model Tuning: K Nearest Neighbors
  • Model Tuning: Leave One Out Cross-Validation
  • Model Tuning: Linear Regression: lm
  • Model Tuning: Linear Regression: lmStepAIC
  • Model Tuning: Logistic Regression: glm
  • Model Tuning: Logistic Regression: glmnet
  • Model Tuning: Multi-variate Adaptive Regression Splines (MARS via earth package)
  • Model Tuning: Naive Bayes
  • Model Tuning: Neural Network: nnet
  • Model Tuning: Neural Network: neuralnet
  • Model Tuning: Neural Network: dnn (Deep Neural Net)
  • Model Tuning: Neural Network: rbf
  • Model Tuning: Neural Network: mlp
  • Model Tuning: Random Forest: rf
  • Model Tuning: Random Forest: cforest (uses ctree algorithm)
  • Model Tuning: Random Forest: ranger
  • Model Tuning: Repeated K-fold Cross-Validation
  • Model Tuning: Robust Linear Regression: rlm
  • Model Tuning: Support Vector Machines: svmLinear
  • Model Tuning: Support Vector Machines: svmRadial
  • Model Tuning: Support Vector Machines: svmPoly

You can download the free open-source version from

Updates to R GUIs: BlueSky, jamovi, JASP, & RKWard

Graphical User Interfaces (GUIs) for the R language help beginners get started learning R, help non-programmers get their work done, and help teams of programmers and non-programmers work together by turning code into menus and dialog boxes. There has been quite a lot of progress on R GUIs since my last post on this topic. Below I describe some of the features added to several R GUIs.

BlueSky Statistics

BlueSky Statistics has added mixed-effects linear models. Its dialog shows an improved model builder that will be rolled out to the other modeling dialogs in future releases. Other new statistical methods include quantile regression, survival analysis using both Kaplan-Meier and Cox Proportional Hazards models, Bland-Altman plots, Cohen’s Kappa, Intraclass Correlation, odds ratios and relative risk for M by 2 tables, and sixteen diagnostic measures such as sensitivity, specificity, PPV, NPV, Youden’s Index, and the like. The ability to create complex tables of statistics was added via the powerful arsenal package. Some examples of the types of tables you can create with it are shown here.

Several new dialogs have been added to the Data menu. The Compute Dummy Variables dialog creates dummy (aka indicator) variables from factors for use in modeling. That approach offers greater control over how the dummies are created than you would have when including factors directly in models.

A new Factor Levels menu item leads to many of the functions from the forcats package. They allow you to reorder factor levels by count, by occurrence in the dataset, by functions of another variable, allow you to lump low-frequency levels into a single “Other” category, and so on. These are all helpful in setting the order and nature of, for example, bars in a plot or entries in a table.

The BlueSky Data Grid now has icons that show the type of variable i.e. factor, ordered factor, string, numeric, date or logical. The Output Viewer adds icons to let you add notes to the output (not full R Markdown yet), and a trash can icon lets you delete blocks of output.

A comprehensive list of the changes to this release is located here and my updated review of it is here.


New modules expand jamovi’s capabilities to include time-based survival analysis, Bland-Altman analysis & plots, behavioral change analysis, advanced mediation analysis, differential item analysis, and quantiles & probabilities from various continuous distributions.

jamovi’s new Flexplot module greatly expands the types of graphs it can create, letting you take a single graph type and repeat it in rows and/or columns making it easy to visualize how the data is changing across groups (called facet, panel, or lattice plots).

You can read more about Flexplot here, and my recently-updated review of jamovi is here.


The JASP package has added two major modules, machine learning, and network analysis. The machine learning module includes boosting, K-nearest neighbors, and random forests for both regression and classification problems. For regression, it also adds regularized linear regression. For clustering, it covers hierarchical, K-means, random forest, density-based, and fuzzy C-means methods. It can generate models and add predictions to your dataset, but it still cannot save models for future use. The main method it is missing is a single decision tree model. While less accurate predictors, a simple tree model can often provide insight that is lacking from other methods.

Another major addition to JASP is Network Analysis. It helps you to study the strengths of interactions among people, cell phones, etc. With so many people working from home during the Coronavirus pandemic, it would be interesting to see what this would reveal about how our patterns of working together have changed.

A really useful feature in JASP is its Data Library. It greatly speeds your ability to try out a new feature by offering a completely worked-out example including data. When trying out the network analysis feature, all I had to do was open the prepared example to see what type of data it would use. With most other data science software, you’re left to dig about in a collection of datasets looking for a good one to test a particular analysis. Nicely done!

I’ve updated my full review of JASP, which you can read here.


The main improvement to the RKWard GUI for R is adding support for R Markdown. That makes it the second GUI to support R Markdown after R Commander. Both the jamovi and BlueSky teams are headed that way. RKWard’s new live preview feature lets you see text, graphics, and markdown as you work. A comprehensive list of new features is available here, and my full review of it is here.


R GUIs are gaining features at a rapid pace, quickly closing in on the capabilities of commercial data science packages such as SAS, SPSS, and Stata. I encourage R GUI users to contribute their own additions to the menus and dialog boxes of their favorite(s). The development teams are always happy to help with such contributions. To follow the progress of these and other R GUIs, subscribe to my blog, or follow me on twitter.

Biomedical Data Science Textbook Available

By Bob Hoyt & Bob Muenchen

Data science is being used in many ways to improve healthcare and reduce costs. We have written a textbook, Introduction to Biomedical Data Science, to help healthcare professionals understand the topic and to work more effectively with data scientists. The textbook content and data exercises do not require programming skills or higher math. We introduce open source tools such as R and Python, as well as easy-to-use interfaces to them such as BlueSky Statistics, jamovi, R Commander, and Orange. Chapter exercises are based on healthcare data, and supplemental YouTube videos are available in most chapters.

For instructors, we provide PowerPoint slides for each chapter, exercises, quiz questions, and solutions. Instructors can download an electronic copy of the book, the Instructor Manual, and PowerPoints after first registering on the instructor page.

The book is available in print and various electronic formats. Because it is self-published, we plan to update it more rapidly than would be possible through traditional publishers.

Below you will find a detailed table of contents and a list of the textbook authors.

Table of Contents​


  1. Introduction
  2. Background and history
  3. Conflicting perspectives
    1. the statistician’s perspective
    2. the machine learner’s perspective
    3. the database administrator’s perspective
    4. the data visualizer’s perspective
  4. Data analytical processes
    1. raw data
    2. data pre-processing
    3. exploratory data analysis (EDA)
    4. predictive modeling approaches
    5. types of models
    6. types of software
  5. Major types of analytics
    1. descriptive analytics
    2. diagnostic analytics
    3. predictive analytics (modeling)
    4. prescriptive analytics
    5. putting it all together
  6. Biomedical data science tools
  7. Biomedical data science education
  8. Biomedical data science careers
  9. Importance of soft skills in data science
  10. Biomedical data science resources
  11. Biomedical data science challenges
  12. Future trends
  13. Conclusion
  14. References


  1. Introduction
    1. basic spreadsheet functions
    1. download the sample spreadsheet
  2. Navigating the worksheet
  3. Clinical application of spreadsheets
    1. formulas and functions
    2. filter
    3. sorting data
    4. freezing panes
    5. conditional formatting
    6. pivot tables
    7. visualization
    8. data analysis
  4. Tips and tricks
    1. Microsoft Excel shortcuts – windows users
    2. Google sheets tips and tricks
  5. Conclusions
  6. Exercises
  7. References


  1. Introduction
  2. Measures of central tendency & dispersion
    1. the normal and log-normal distributions
  3. Descriptive and inferential statistics
  4. Categorical data analysis
  5. Diagnostic tests
  6. Bayes’ theorem
  7. Types of research studies
    1. observational studies
    2. interventional studies
    3. meta-analysis
    4. orrelation
  8. Linear regression
  9. Comparing two groups
    1. the independent-samples t-test
    2. the wilcoxon-mann-whitney test
  10. Comparing more than two groups
  11. Other types of tests
    1. generalized tests
    2. exact or permutation tests
    3. bootstrap or resampling tests
  12. Stats packages and online calculators
    1. commercial packages
    2. non-commercial or open source packages
    3. online calculators
  13. Challenges
  14. Future trends
  15. Conclusion
  16. Exercises
  17. References


  1. Introduction
    1. historical data visualizations
    2. visualization frameworks
  2. Visualization basics
  3. Data visualization software
    1. Microsoft Excel
    2. Google sheets
    3. Tableau
    4. R programming language
    5. other visualization programs
  4. Visualization options
    1. visualizing categorical data
    2. visualizing continuous data
  5. Dashboards
  6. Geographic maps
  7. Challenges
  8. Conclusion
  9. Exercises
  10. References


  1. Introduction
  2. Definitions
  3. A brief history of database models
    1. hierarchical model
    2. network model
    3. relational model
  4. Relational database structure
  5. Clinical data warehouses (CDWs)
  6. Structured query language (SQL)
  7. Learning SQL
  8. Conclusion
  9. Exercises
  10. References


  1. Introduction
  2. The seven v’s of big data related to health care data
  3. Technical background
  4. Application
  5. Challenges
    1. technical
    2. organizational
    3. legal
    4. translational
  6. Future trends
  7. Conclusion
  8. References


  1. Introduction
  2. History
  3. Definitions
  4. Biological data analysis – from data to discovery
  5. Biological data types
    1. genomics
    2. transcriptomics
    3. proteomics
    4. bioinformatics data in public repositories
    5. biomedical cancer data portals
  6. Tools for analyzing bioinformatics data
    1. command line tools
    2. web-based tools
  7. Genomic data analysis
  8. Genomic data analysis workflow
    1. variant calling pipeline for whole exome sequencing data
    2. quality check
    3. alignment
    4. variant calling
    5. variant filtering and annotation
    6. downstream analysis
    7. reporting and visualization
  9. Precision medicine – from big data to patient care
  10. Examples of precision medicine
  11. Challenges
  12. Future trends
  13. Useful resources
  14. Conclusion
  15. Exercises
  16. References


  1. Introduction
  2. History
  3. R language
    1. installing R & rstudio
    2. an example R program
    3. getting help in R
    4. user interfaces for R
    5. R’s default user interface: rgui
    6. Rstudio
    7. menu & dialog guis
    8. some popular R guis
    9. R graphical user interface comparison
    10. R resources
  4. Python language
    1. installing Python
    2. an example Python program
    3. getting help in Python
    4. user interfaces for Python
  5. reproducibility
  6. R vs. Python
  7. Future trends
  8. Conclusion
  9. Exercises
  10. References


  1. Brief history
  2. Introduction
    1. data refresher
    2. training vs test data
    3. bias and variance
    4. supervised and unsupervised learning
  3. Common machine learning algorithms
  4. Supervised learning
  5. Unsupervised learning
    1. dimensionality reduction
    2. reinforcement learning
    3. semi-supervised learning
  6. Evaluation of predictive analytical performance
    1. classification model evaluation
    2. regression model evaluation
  7. Machine learning software
    1. Weka
    2. Orange
    3. Rapidminer studio
    4. KNIME
    5. Google TensorFlow
    6. honorable mention
    7. summary
  8. Programming languages and machine learning
  9. Machine learning challenges
  10. Machine learning examples
    1. example 1 classification
    2. example 2 regression
    3. example 3 clustering
    4. example 4 association rules
  11. Conclusion
  12. Exercises
  13. References


  1. Introduction
    1. definitions
  2. History
  3. Ai architectures
  4. Deep learning
  5. Image analysis (computer vision)
    1. Radiology
    2. Ophthalmology
    3. Dermatology
    4. Pathology
    5. Cardiology
    6. Neurology
    7. Wearable devices
    8. Image libraries and packages
  6. Natural language processing
    1. NLP libraries and packages
    2. Text mining and medicine
    3. Speech recognition
  7. Electronic health record data and AI
  8. Genomic analysis
  9. AI platforms
    1. deep learning platforms and programs
  10. Artificial intelligence challenges
    1. General
    2. Data issues
    3. Technical
    4. Socio economic and legal
    5. Regulatory
    6. Adverse unintended consequences
    7. Need for more ML and AI education
  11. Future trends
  12. Conclusion
  13. Exercises
  14. References


Brenda Griffith
Technical Writer
Austin, TX

Associate Clinical Professor
Department of Internal Medicine
Virginia Commonwealth University
Richmond, VA

David Hurwitz MD, FACP, ABPM-CI
Associate CMIO
Allscripts Healthcare Solutions
Chicago, IL

Madhurima Kaushal MS
Washington University at St. Louis, School of Medicine
St. Louis, MO

Assistant Professor
New York Medical College
Department of Emergency Medicine
Valhalla, NY

Karen A. Monsen PhD, RN, FAMIA, FAAN
School of Nursing
University of Minnesota
Minneapolis, MN

Robert Muenchen MS, PSTAT
Manager, Research Computing Support
University of Tennessee
Knoxville, TN

Dallas Snider PhD
Chair, Department of Information Technology
University of West Florida
Pensacola, FL

​A special thanks to Ann Yoshihashi MD for her help with the publication of this textbook.