A Comparative Review of the BlueSky Statistics GUI for R

by Robert A. Muenchen, updated 8/31/2021


BlueSky Statistics’ desktop version is a free and open source graphical user interface for the R software that focuses on beginners looking to point-and-click their way through analyses. While originally available only on Windows, the Mac version is now in beta test. It includes many features not covered in this review. I’ll add those features by mid September of 2021 (I’m writing this 8/31/2021).  A commercial version is also available which includes technical support and a version for Windows Terminal Servers such as Remote Desktop, or Citrix. Mac, Linux, or tablet users could run it via a terminal server.

This post is one of a series of reviews which aim to help non-programmers choose the Graphical User Interface (GUI) that is best for them. Additionally, these reviews include a cursory description of the programming support that each GUI offers.


There are various definitions of user interface types, so here’s how I’ll be using these terms:

GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.


The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as jamovi or RKWard, install in a single step. Others, such as Deducer, install in multiple steps (up to seven steps, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

The main BlueSky installation is easily performed in a single step. The installer provides its own embedded copy of R, simplifying the installation and ensuring complete compatibility between BlueSky and the version of R it’s using. However, it also means if you already have R installed, you’ll end up with a second copy. You can have BlueSky control any version of R you choose, but if the version differs too much, you may run into occasional problems.

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Deducer) through moderate (jamovi) to very active (R Commander).

BlueSky is a fairly new open source project, and at the moment all the add-on modules are provided by the company. However, BlueSky’s capabilities approaches the comprehensiveness of R Commander, which currently has the most add-ons available. The BlueSky developers are working to create an Internet repository for module distribution.


Some user interfaces for R, such as jamovi, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.

You start BlueSky directly by double-clicking its icon from your desktop, or choosing it from your Start Menu (i.e. not from within R itself). It interacts with R in the background; you never need to be aware that R is running.

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.

BlueSky starts up by showing you its main Application screen (Figure 1) and prompts you to enter data with an empty spreadsheet-style data editor. You can start entering data immediately, though at first, the variables are simply named var1, var2…. You might think you can rename them by clicking on their names, but such changes are done in a different manner, one that will be very familiar to SPSS users. There are two tabs at the bottom left of the data editor screen, which are labeled “Data” and “Variables.” The “Data” tab is shown by default, but clicking on the “Variables” tab takes you to a screen (Figure 2) which displays the metadata: variable names, labels, types, classes, values, and measurement scale.

Figure 1. The main BlueSky Application screen.

The big advantage that SPSS offers is that you can change the settings of many variables at once. So if you had, say, 20 variables for which you needed to set the same factor labels (e.g. 1=Strongly Disagree…5=Strongly Agree) you could do it once and then paste them into the other 19 with just a click or two. Unfortunately, that’s not yet fully implemented in BlueSky. Some of the metadata fields can be edited directly. For the rest, you must instead follow the directions at the top of that screen and right-click on each variable, one at a time, to make the changes. Complete copy and paste of metadata is planned for a future version.

Figure 2. The Variables screen in the data editor. The “Variables” tab in the lower-left is selected, letting us see the metadata for the same variables as shown in Figure 1.

You can enter numeric or character data in the editor right after starting BlueSky. The first time you enter character data, it will offer to convert the variable from numeric to character and wait for you to approve the change. This is very helpful as it’s all too easy to type the letter “O” when meaning to type a zero “0”, or the letter “I” instead of number one “1”.

To add rows, the Data tab is clearly labeled, “Click here to add a new row”. It would be much faster if the Enter key did that automatically.

To add variables you have to go to the Variables tab and right-click on the row of any variable (variable names are in rows on that screen), then choose “Insert new variable at end.”

To enter factor data, it’s best to leave it numeric such as 1 or 2, for male and female, then set the labels (which are called values using SPSS terminology) afterward. The reason for this is that once labels are set, you must enter them from drop-down menus. While that ensures no invalid values are entered, it slows down data entry. The developer’s future plans include the automatic display of labels upon entry of numeric values.

If you instead decide to make the variable a factor before entering numeric data, it’s best to enter the numbers as labels as well. It’s an oddity of R that factors are numeric inside while displaying labels that may or may not be the same as the numbers they represent.

To enter dates, enter them as character data and use the “Data> Compute” menu to convert the character data to the date format. When I reported this problem to the developers, they said they would add this to the “Variables” metadata tab so you could set it to be a date variable before entering the data.

If you have another data set to enter, you can start the process again by clicking “File> New”, and a new editor window will appear in a new tab. You can change data sets simply by clicking on its tab and its window will pop to the front for you to see. When doing analyses, or saving data, the data set that’s displayed in the editor is the one that will be used. That approach feels very natural; what you see is what you get.

Saving the data is done with the standard “File > Save As” menu. You must save each one to its own file. While R allows multiple data sets (and other objects such as models) to be saved to a single file, BlueSky does not. Its developers chose to simplify what their users have to learn by limiting each file to a single data set. That is a useful simplification for GUI users. If a more advanced R user sends a compound file containing many objects, BlueSky will detect it and offer to open one data set (data frame) at a time.

Figure 3. Output window showing standard journal-style tables. Syntax editor has been opened and is shown on right side.

Data Import

The open source version of BlueSky supports the following file formats, all located under “File> Open”:

  • Comma Separated Values (.csv)
  • Plain text files (.txt)
  • Excel (old and new xls file types)
  • dBase’s DBF
  • SPSS (.sav)
  • SAS binary files (sas7bdat)
  • Standard R workspace files (RData) with individual data frame selection

The SQL database formats are found under the “File> Import Data” menu. The supported formats include:

  • Microsoft Access
  • Microsoft SQL Server
  • MySQL
  • PostgreSQL
  • SQLite

Data Export

The ability to export data to a wide range of file types helps when you, or other members of your research team, have to use multiple tools to complete a task. Unfortunately, this is a very weak area for R GUIs. Deducer offers no data export at all, and R Commander, and rattle can export only delimited text files (an earlier version of this listed jamovi as having very limited data export; that has now been expanded).

BlueSky offers a relatively comprehensive set of export options. The main one missing is SAS’ sas7bdat format, and that’s due to be added in the next release. Here’s the complete list:

Comma Separated Values – *.csv
Dbase – *.dbf
Excel – *.xlsx
IBM SPSS – *.sav
R Objects – *.RData


Data Management

It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g. from wide to long and back). A critically important aspect of data management is the ability to transform many variables at once. For example, social scientists need to recode many survey items, biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time can be tedious. Some GUIs, such as jamovi and RKWard handle only a few of these functions. Others, such as the R Commander, can handle many, but not all, of them.

BlueSky offers one of the most comprehensive sets of data management tools of any R GUI. The “Data” menu offers the following set of tools. Not shown is an extensive set of character and date/time functions that appear under “Compute.”

  1. Bin Numeric Variable(s)
  2. Compute New Variable(s): Compute
  3. Compute New Variable(s): Compute, apply a function across all rows
  4. Compute New Variable(s): Compute Dummy Variables
  5. Compute New Variable(s): Conditional Compute (If-Then)
  6. Compute New Variable(s): Conditional Compute (If-Then-Else)
  7. Concatenate Multiple Variables (handling missing values)
  8. Convert Variable(s) to factors
  9. Dates: Convert dates to string
  10. Dates: Convert string to dates
  11. Dates: Date order check
  12. Delete Variable(s)
  13. Factor Levels: Add New Levels
  14. Factor Levels: Factor Variable(s), display levels
  15. Factor Levels: Factor Variable(s), drop unused levels
  16. Factor Levels: Factor Variable(s), label NA as missing
  17. Factor Levels: Factor(s), add new levels
  18. Factor Levels: Reorder Factor Levels by occurrence in dataset
  19. Factor Levels: Reorder Factor Levels, lumping into other
  20. Factor Levels: Reorder Factor Levels, by count
  21. Factor Levels: Reorder Factor Levels, specify levels to keep or replace by other
  22. Factor Levels: Reorder Factor(s), by one other variable
  23. Find Duplicates
  24. Missing Values: Remove NAs
  25. Missing Values: basic
  26. Missing Values: formula
  27. Missing Values: Replace all Missing Values, factor and string variables
  28. Missing Values, model imputation: Classification And Regression Tree (cart)
  29. Missing Values, model imputation: EM Algorithm (em)
  30. Missing Values, model imputation: K Nearest Neighbor (knn)
  31. Missing Values, model imputation: Linear Model (lm)
  32. Missing Values, model imputation: Lasso / Ridge / Elastic-Net (en)
  33. Missing Values, model imputation: Multivariate Random Forest (mf)
  34. Missing Values, model imputation: Predictive Mean Matching (pmm)
  35. Missing Values, model imputation: Robust Linear Model (rlm)
  36. Missing Values, model imputation: Random Forest (rf)
  37. Missing Values, model imputation: Random Hot Deck (rhd)
  38. Missing Values, model imputation: Sequential Hot Deck (shd)
  39. Rank Variable(s)
  40. Recode Variable(s)
  41. Standardize Variables(s)
  42. Transform Variable(s)
  43. Weight Variables(s)
  44. Aggregate to Dataset
  45. Aggregate to Output
  46. Merge Datasets
  47. Merge Datasets (Tidy)
  48. Refresh Data Grid
  49. Reload Dataset from File
  50. Re-order Variables in Dataset Alphabetically
  51. Reshape: Wide to Long
  52. Reshape: Long to Wide
  53. Sample Dataset
  54. Select First/Last Observation per Group
  55. Sort Dataset
  56. Sort to Output
  57. Split Dataset: For Group-by Analysis: Split
  58. Split Dataset: For Group-by Analysis: Remove Split
  59. Split Dataset: For Partitioning: Random Split
  60. Split Dataset: For Partitioning: Stratified Sample
  61. Stack Datasets
  62. Subset Data
  63. Subset Data to Output
  64. Transpose Dataset: Transpose, entire dataset
  65. Transpose Dataset: Transpose, select variables
  66. Legacy: Aggregate
  67. Legacy: Sort
  68. Legacy: Subset

Menus & Dialog Boxes

The goal of pointing & clicking your way through an analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.

BlueSky uses standard menu choices for running steps listed on the Graphics, Analysis, Model Fitting, or Model Tuning menus. Dialog boxes appear and you select variables to place into their various roles. This is accomplished by either dragging the variable names or by selecting them and clicking an arrow located next to the particular role box. You then can click on either “OK” to run the step, or “Syntax” to write the code for that step to the R program editor. To run a variation on the same analysis, the dialog boxes make quick work of it by remembering their previous settings (within a session).

The output is saved not by using the standard “File > Save As” menu, but instead with “Output > Save Output” selection from the main window. Oddly enough, while most menus are duplicated in both the main screen and the Output/Syntax screen, the ability to open or save output only appears on the main screen. If you exit without saving, BlueSky will prompt you to save both output and syntax (if you’ve used any of the latter).

During GUI-driven analysis, the only indication you have that R is doing the work is the code that appears in the output window before each result. However, if you click the “Syntax” button instead of “OK”, the program editor will pop out the right side of the output window. The code will be added to the bottom of the program editor, and it will be highlighted so that a click on the “Run” icon will execute it.

Documentation & Training

At the moment, this review is probably one of the most thorough written descriptions of how to use BlueSky.

The BlueSkyStatistics.com site offers training videos on how to use it. YouTube.com also offers training videos that show how to use BlueSky.


R GUIs provide simple task-by-task dialog boxes that generate much more complex code. So for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exception is the R Commander, which lacks help on the dialog boxes themselves.

The level of help that BlueSky provides varies depending on how much help the developers think you need. Each dialog box has a help button in the upper right corner which pops a help window off to the right of the dialog box. For many dialog boxes, it provides a summary description, how to use the dialog box, all the GUI settings, and how the accompanying function works should you choose to write your own code. In the bottom right corner of each dialog box is a “Get R Help” button that takes you to the R help page for the standard R function that actually does the calculations (sometimes these are called directly, other times they’re used inside BlueSky’s functions.)

For some dialog boxes that simply call an R function (e.g. independent samples t-test), BlueSky will display R’s built-in help file. While this variable help approach has been done well, I would prefer a more consistent approach. There are often things in the R help files that are not implemented in BlueSky, so it would be less confusing to eliminate those situations. For example, in the case of the t-test, the help file describes how “formula” works, but that concept is not addressable using BlueSky’s dialog box (nor is it needed).


The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as jamovi, use their own functions and integrate them into analysis steps. GUIs also differ quite a lot in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. That’s how jamovi works, but then jamovi is limited to its custom graph functions, as nice as they may be.

Bluesky does most of its plots using the popular ggplot2 package, so that’s the code it will create if you want to learn it. BlueSky’s dialogs for creating graphs are extremely easy to use. By comparison, learning ggplot2 code can be confusing at first. BlueSky also offers several of R’s traditional graphics functions, which it places under a “Legacy” menu. While these graphs are usually not as nice as the ones created by the rest of its menus (i.e. those created by ggplot), having both gives you the opportunity to compare both their appearance and the code used to create them.

Here is the selection of plots BlueSky can create.

  1. Bar Chart
  2. Bar Chart (means, confidence intervals)
  3. Boxplot
  4. Bullseye
  5. Contour
  6. Density (continuous)
  7. Density (counts)
  8. Frequency charts (factors)
  9. Frequency charts (numeric)
  10. Heatmap
  11. Histogram
  12. Line Chart
  13. Line Chart, stair-step plot
  14. Line Chart, variable order
  15. Maps: U.S. County Map
  16. Maps: U.S. State Map
  17. Maps: World Map
  18. Pie Chart
  19. Plot of Means
  20. P-P Plots
  21. Q-Q Plots
  22. ROC Curve
  23. Scatterplot
  24. Scatterplot 3D
  25. Scatterplot (Binned hex)
  26. Scatterplot (Binned Square)
  27. Stem and Leaf Plot
  28. Strip Chart
  29. Violin Plot
  30. Legacy (repeats some of the above using R’s built-in graphics)

Let’s take a look at how BlueSky does scatterplots, using R’s ggplot2 package behind the scenes. Using the dialog box I chose only the X variable, Y variable, X facet factor, Y facet factor, and the type of smoothing fit. Note that the initial “for” loop allows BlueSky to repeat this entire plot by levels of a third factor (not used here). That ability to do “large multiples” of plots is currently a feature that is unique to BlueSky.

for (vars in varNames)
print(ggplot(Dataset2,aes(x = pretest,
y =eval(parse(text=paste(vars))))) + 
geom_point() + labs(x = "pretest",y = vars) +
facet_grid(workshop~gender) +geom_smooth(method ="lm"))
Figure 4. A faceted scatterplot created by BlueSky and the ggplot2 package.


The way statistical models (which R stores in “model objects”) are created and used, is an area on which R GUIs differ the most. The simplest, and least flexible approach, is taken by jamovi and RKWard. They try to do everything you might need in a single dialog box. They either don’t save models, or they do nothing with them. To an R programmer, that sounds extreme, since R does a lot with model objects. However, neither SAS nor SPSS were able to save models for their first 35 years of their existence, so each approach has its merits.

BlueSky’s modeling approach balances flexibility and ease of use. All its “Model Fitting” dialogs save the resulting model as a model object. They contain a “Model Name” field which is filled in with a useful default name such as, “LinearRegModel1”. The analyses listed under “Model Statistics” automatically use the model you set in the upper right corner of the main control screen. You use the “Pick a Model” drop-down menu to choose your model. From then on, all the Model Statistics menu choices will use that model to calculate model measures such as AIC, or perform additional analyses, such as stepwise variable selection. A nice future improvement would be to have the software automatically choose the most recently created model.

The steps BlueSky currently offers to further manipulate models include Stepwise, AIC, and BIC, Confidence Intervals, Variance Inflation Factors, and the Bonferroni Outlier Test.

Analysis Methods

All of the R GUIs offer a decent set of statistical analysis methods. Some also offer machine learning methods too. As you can see in the table below, BlueSky offers an extensive set of analysis methods. It also offers interesting variations on machine learning. Under its “Model Fitting” dialog, it provides direct access to the most popular machine learning algorithms. If you are a beginner at machine learning, that’s where you would start. The menus call the various R functions directly, and if you display the commands, you’ll notice that each uses a slightly different syntax.

If you’re an advanced user of machine learning, you might skip directly to the “Model Tuning” menu. There you’ll find many of the same algorithms, this time controlled in a powerful and standard way using R’s caret package. There you begin by choosing one of four tuning methods and one of the nine machine learning algorithms. BlueSky then passes the work off to the caret package to find your optimal model.

Here is a comprehensive list of BlueSky’s methods of analysis:

  1. Agreement: Bland-Altman Plot
  2. Agreement: Cohen’s Kappa
  3. Agreement: Concordance Correlation Coefficient
  4. Agreement: Diagnostic Testing
  5. Agreement: Fleiss’ Kappa
  6. Agreement: Intraclass Correlation Coefficients
  7. Cluster Analysis: Hierarchical
  8. Cluster Analysis: KMeans
  9. Contingency Tables: Multiway
  10. Contingency Tables: Two-way
  11. Contingency Tables: (M by 2 Table)
  12. Contingency Tables: Relative Risks (M by 2 Table)
  13. Correlation: Correlation Matrix
  14. Correlation: Correlation Test, one pair
  15. Correlation: Correlation Test, multi-variable
  16. Distributions: Continuous: Beta Probabilities
  17. Distributions: Continuous: Beta Quantiles
  18. Distributions: Continuous: Plot Beta Distribution
  19. Distributions: Continuous: Sample from Beta Distribution
  20. Distributions: Continuous: Cauchy Probabilities
  21. Distributions: Continuous: Plot Cauchy Distribution
  22. Distributions: Continuous: Cauchy Quantiles
  23. Distributions: Continuous: Sample from Cauchy Distribution
  24. Distributions: Continuous: Sample from Cauchy Distribution
  25. Distributions: Continuous: Chi-squared Probabilities
  26. Distributions: Continuous: Chi-squared Quantiles
  27. Distributions: Continuous: Plot Chi-squared Distribution
  28. Distributions: Continuous: Sample from Chi-squared Distribution
  29. Distributions: Continuous: Exponential Probabilities
  30. Distributions: Continuous: Exponential Quantiles
  31. Distributions: Continuous: Plot Exponential Distribution
  32. Distributions: Continuous: Sample from Exponential Distribution
  33. Distributions: Continuous: F Probabilities
  34. Distributions: Continuous: F Quantiles
  35. Distributions: Continuous: Plot F Distribution
  36. Distributions: Continuous: Sample from F Distribution
  37. Distributions: Continuous: Gamma Probabilities
  38. Distributions: Continuous: Gamma Quantiles
  39. Distributions: Continuous: Plot Gamma Distribution
  40. Distributions: Continuous: Sample from Gamma Distribution
  41. Distributions: Continuous: Gumbel Probabilities
  42. Distributions: Continuous: Gumbel Quantiles
  43. Distributions: Continuous: Plot Gumbel Distribution
  44. Distributions: Continuous: Sample from Gumbel Distribution
  45. Distributions: Continuous: Logistic Probabilities
  46. Distributions: Continuous: Logistic Quantiles
  47. Distributions: Continuous: Plot Logistic Distribution
  48. Distributions: Continuous: Sample from Logistic Distribution
  49. Distributions: Continuous: Lognormal Probabilities
  50. Distributions: Continuous: Lognormal Quantiles
  51. Distributions: Continuous: Plot Lognormal Distribution
  52. Distributions: Continuous: Sample from Lognormal Distribution
  53. Distributions: Continuous: Normal Probabilities
  54. Distributions: Continuous: Normal Quantiles
  55. Distributions: Continuous: Plot Normal Distribution
  56. Distributions: Continuous: Sample from Normal Distribution
  57. Distributions: Continuous: t Probabilities
  58. Distributions: Continuous: t Quantiles
  59. Distributions: Continuous: Plot t Distribution
  60. Distributions: Continuous: Sample from t Distribution
  61. Distributions: Continuous: Uniform Probabilities
  62. Distributions: Continuous: Uniform Quantiles
  63. Distributions: Continuous: Plot Uniform Distribution
  64. Distributions: Continuous: Sample from Uniform Distribution
  65. Distributions: Continuous: Weibull Probabilities
  66. Distributions: Continuous: Weibull Quantiles
  67. Distributions: Continuous: Plot Weibull Distribution
  68. Distributions: Continuous: Sample from Weibull Distribution
  69. Distributions: Discrete: Binomial Probabilities
  70. Distributions: Discrete: Binomial Quantiles
  71. Distributions: Discrete: Binomial Tail Probabilities
  72. Distributions: Discrete: Plot Binomial Distribution
  73. Distributions: Discrete: Sample from Binomial Distribution
  74. Distributions: Discrete: Geometric Probabilities
  75. Distributions: Discrete: Geometric Quantiles
  76. Distributions: Discrete: Geometric Tail Probabilities
  77. Distributions: Discrete: Plot Geometric Distribution
  78. Distributions: Discrete: Sample from Geometric Distribution
  79. Distributions: Discrete: Hypergeometric Probabilities
  80. Distributions: Discrete: Hypergeometric Quantiles
  81. Distributions: Discrete: Hypergeometric Tail Probabilities
  82. Distributions: Discrete: Plot Hypergeometric Distribution
  83. Distributions: Discrete: Sample from Hypergeometric Distribution
  84. Distributions: Discrete: Negative Binomial Probabilities
  85. Distributions: Discrete: Negative Binomial Quantiles
  86. Distributions: Discrete: Negative Binomial Tail Probabilities
  87. Distributions: Discrete: Plot Negative Binomial Distribution
  88. Distributions: Discrete: Sample from Negative Binomial Distribution
  89. Distributions: Discrete: Poisson Probabilities
  90. Distributions: Discrete: Poisson Quantiles
  91. Distributions: Discrete: Poisson Tail Probabilities
  92. Distributions: Discrete: Plot Poisson Distribution
  93. Distributions: Discrete: Sample from Poisson Distribution
  94. Factor Analysis: Factor Analysis
  95. Factor Analysis: Principal Components
  96. Market Basket: Basket Data Format: Generate Rules (for Basket Data Format)
  97. Market Basket: Basket Data Format: Item Frequency Plot (for Basket Data Format)
  98. Market Basket: Basket Data Format: Targeting Items (for Basket Data Format)
  99. Market Basket: Display Rules
  100. Market Basket: Multi-line transaction Format: Generate Rules
  101. Market Basket: Multi-line transaction Format: Item Frequency Plot
  102. Market Basket: Multiple Variable Format: Generate Rules
  103. Market Basket: Multiple Variable Format: Targeting Items
  104. Market Basket: Plot Rules
  105. Means: T-Test, Independent Samples
  106. Means: T-Test, One Sample
  107. Means: T-Test, Paired Samples
  108. Means: Legacy: Oneway ANOVA
  109. Means: ANCOVA
  110. Means: Multi-way ANOVA
  111. Means: One-way ANOVA
  112. Means: One-way ANOVA with Blocks
  113. Means: One-way ANOVA with Random Blocks
  114. Missing Values: Output Arranged in Columns
  115. Missing Values: Output Arranged in Rows
  116. Non-parametric Tests: Chisq Test
  117. Non-parametric Tests: Friedman Test
  118. Non-parametric Tests: Kruskal-Wallis Test
  119. Non-parametric Tests: Wilcoxon, Independent Samples
  120. Non-parametric Tests: Wilcoxon, Paired Samples
  121. Proportions: Binomial, Single Sample
  122. Proportions: Proportion Test, Independent Samples
  123. Proportions: Proportion Test, Single Sample
  124. Reliability Analysis: Cronbach’s Alpha
  125. Reliability Analysis: McDonald’s Omega
  126. Summary Analysis: Dataset Comparison
  127. Summary Analysis: Dataset Description
  128. Summary Analysis: Analysis of Missing Values
  129. Summary Analysis: Frequency Table
  130. Summary Analysis: Table Top N
  131. Summary Analysis: Multi-Way Frequency List
  132. Summary Analysis: Numerical Statistical Analysis
  133. Summary Analysis: Numerical Statistical Analysis using describe
  134. Summary Analysis: Shapiro-Wilk Normality Test
  135. Summary Analysis: Summary Statistics by Group
  136. Summary Analysis: Summary Statistics for All Variables
  137. Summary Analysis: Summary Statistics for Selected Variables
  138. Summary Analysis: Advanced: Summary Statistics, all variables, control levels
  139. Summary Analysis: Advanced: Summary Statistics, selected variables, control levels
  140. Summary Analysis: Variable Summaries Table: Optional Tests
  141. Summary Analysis: Variable Summaries Table: Optional Tests, Advanced
  142. Survival Analysis: Kaplan-Meier Estimation, compare groups
  143. Survival Analysis: Kaplan-Meier Estimation, one group
  144. Time Series: Automated ARIMA
  145. Time Series: Exponential Smoothing
  146. Time Series: Holt-Winters Seasonal
  147. Time Series:  Holt-Winters Non-seasonal
  148. Time Series: Plot Time Series, separate or combined
  149. Time Series: Plot Time Series, with correlations
  150. Variance: Bartlett’s Test
  151. Variance: Levene’s Test
  152. Variance: Variance Test, Two Samples
  153. Model Fitting: Contrast Display
  154. Model Fitting: Contrast Set
  155. Model Fitting: Cox Proportional Hazards Model: Cox Single Model
  156. Model Fitting: Cox Multiple Models
  157. Model Fitting: Cox with Formula
  158. Model Fitting: Cox Stratified Model
  159. Model Fitting: Decision Trees
  160. Model Fitting: Display Contrasts
  161. Model Fitting: Extreme Gradient Boosting
  162. Model Fitting: GLZM
  163. Model Fitting: IRT: Simple Rasch Model
  164. Model Fitting: IRT: Simple Rasch Model, multi-faceted
  165. Model Fitting: IRT: Partial Credit Model
  166. Model Fitting: IRT: Partial Credit Model, multi-faceted
  167. Model Fitting: IRT: Rating Scale Model
  168. Model Fitting: IRT: Rating Scale Model, multi-faceted
  169. Model Fitting: KNN
  170. Model Fitting: Linear Modeling
  171. Model Fitting: Linear Regression: Linear Regression
  172. Model Fitting: Linear Regression: Linear Regression with Formula
  173. Model Fitting: Logistic Regression: Logistic Regression
  174. Model Fitting: Logistic Regression: Logistic Regression with Formula
  175. Model Fitting: Mixed Models, basic
  176. Model Fitting: Multinomial Logit
  177. Model Fitting: Naive Bayes
  178. Model Fitting: Neural Nets: Multi-layer Perceptron
  179. Model Fitting: NeuralNets (i.e. the package of that name)
  180. Model Fitting: Ordinal Regression
  181. Model Fitting: Quantile Regression
  182. Model Fitting: Random Forest: Random Forest
  183. Model Fitting: Random Forest: Tune Random Forest
  184. Model Fitting: Random Forest: Random Forest: Optimal Number of Trees
  185. Model Fitting: Summarizing Models for Each Group
  186. Model Tuning: Adaboost Classification Trees
  187. Model Tuning: Bagged Logic Regression
  188. Model Tuning: Bayesian Ridge Regression
  189. Model Tuning: Boosted trees: gbm
  190. Model Tuning: Boosted trees: xgbtree
  191. Model Tuning: Boosted trees: C5.0
  192. Model Tuning: Bootstrap Resample
  193. Model Tuning: Decision trees: C5.0tree
  194. Model Tuning: Decision trees: ctree
  195. Model Tuning: Decision trees: rpart (CART)
  196. Model Tuning: K-fold Cross-Validation
  197. Model Tuning: K Nearest Neighbors
  198. Model Tuning: Leave One Out Cross-Validation
  199. Model Tuning: Linear Regression: lm
  200. Model Tuning: Linear Regression: lmStepAIC
  201. Model Tuning: Logistic Regression: glm
  202. Model Tuning: Logistic Regression: glmnet
  203. Model Tuning: Multi-variate Adaptive Regression Splines (MARS via earth package)
  204. Model Tuning: Naive Bayes
  205. Model Tuning: Neural Network: nnet
  206. Model Tuning: Neural Network: neuralnet
  207. Model Tuning: Neural Network: dnn (Deep Neural Net)
  208. Model Tuning: Neural Network: rbf
  209. Model Tuning: Neural Network: mlp
  210. Model Tuning: Random Forest: rf
  211. Model Tuning: Random Forest: cforest (uses ctree algorithm)
  212. Model Tuning: Random Forest: ranger
  213. Model Tuning: Repeated K-fold Cross-Validation
  214. Model Tuning: Robust Linear Regression: rlm
  215. Model Tuning: Support Vector Machines: svmLinear
  216. Model Tuning: Support Vector Machines: svmRadial
  217. Model Tuning: Support Vector Machines: svmPoly
  218. Model Statistics: AIC
  219. Model Statistics: BIC
  220. Model Statistics: Bonferroni Outlier Test
  221. Model Statistics: Confidence Interval
  222. Model Statistics: Hosmer-Lemeshow Test
  223. Model Statistics: IRT: ICC Plots
  224. Model Statistics: IRT: Item Fit
  225. Model Statistics: IRT: Plot PI Map
  226. Model Statistics: IRT: Item and Test Information
  227. Model Statistics: IRT: Likelihood Ratio and Beta plots
  228. Model Statistics: IRT: Personfit
  229. Model Statistics: Pseudo R-Squared
  230. Model Statistics: Stepwise
  231. Model Statistics: Variance Inflation Factors

Generated R Code

One of the aspects that most differentiates the various GUIs for R is the code they generate. If you decide you want to save code, what type of code is best for you? The base R code as provided by the R Commander which can teach you “classic” R? The concise functions that mimic the simplicity of one-step dialogs such as jamovi provides? The completely transparent (and complex) code provided by RKWard, which might be the best for budding R power users?

BlueSky writes what you might call modern R code. For data management, it uses tidyverse packages; for graphics, it uses ggplot2, and for model tuning it uses the caret package.

Here’s an example of code BlueSky wrote to do a group-by aggregation:

mySummarized <-mydata100 %>%
  dplyr::group_by(workshop,gender) %>%
  dplyr::summarize(mean_pretest=mean(pretest,na.rm =TRUE),
    mean_posttest=mean(posttest,na.rm =TRUE))

Here is an example of code BlueSky wrote to convert my repeated-measures style “long” data set to a “wide” one. The long one had three main variables: an ID variable, a factor Time, and a measure Y. The resulting wide data set had ID and four variables named Time1, Time2, Time3, and Time4. The values of Y were spread across the four time variables. Here’s the code:


Bobs_Wide <- spread(Bobs_Long,Time,Y)


Below is an example of BlueSky’s code for a simple linear regression. BlueSky even provided the comments explaining each step, a nice touch! Note that it uses its own set of functions, such as BSkyRegression() instead of R’s built-in lm() function. It’s this function that does both the modeling step and the text formatting step. This is very similar to the approach used by jamovi, except that BlueSky does plotting using R’s standard plot function (one of the few times it uses it) instead of being integrated into a single regression function call.


#Builds a linear regression model. Returns an object called 
#BSkyLinearRegression which is an object of class lm. 
# Displays a summary of the model, coefficient table, 
# Anova table and sum of squares table.
LinearRegModel1= BSkyRegression(depVars ='posttest',
  indepVars =c('pretest'),dataset="Dataset2")

#Plots residuals vs. fitted, normal Q-Q, theoretical quantiles, 
#residuals vs. leverage

Support for Programmers

Some of the GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.

While BlueSky’s main mission is to make their point-and-click GUI comprehensive, it does include a basic program editor that supports the writing and debugging of code. The code editor is hidden at start-up, but an arrow at the upper right corner of the output window will pop open the code editor at any time (and pop it closed, if already open). A click on the Syntax button in any dialog box will also pop the code editor open.

The code editor supports syntax highlighting, and it can collapse and expand blocks of code. It also offers some hints on function name completion. For example, typing “m” will cause it to offer “min” and “max” functions, but oddly enough, it will not offer “mean” or “median.” It doesn’t provide hints on argument names or values, nor does it offer to complete object names. RStudio and RKWard both offer much more support for coders.

However, the lack of features for coders offers a benefit to GUI users: nearly all the menus and their entries are focused on GUI use. In this regard, BlueSky is the mirror image of RKWard, which has several menus full of features that only coders use.

Reproducibility & Sharing

One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written report so that others can check their work.

As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since the GUI user didn’t write the code, they also can’t read it or change it! Others such as jamovi, RKWard, and the newest version of SPSS save the dialog box entries and allow GUI users to have reproducibility in the form they prefer.

BlueSky offers only code-based reproducibility. There’s no way to get back to a filled-in dialog box when starting from the saved code.

If you wish to share your work with a colleague, you would send them the code and your data set. They could then install the appropriate version of BlueSky to run it. They could also install the “BlueSky Statistics R Package”, enabling them to run the code in any R environment. At the moment, that package is only available for download from the company web site. However, the developers plan on moving it to CRAN eventually.

Package Management

A topic related to reproducibility is package management. One of the major advantages to the R language is that it’s very easy to extend its capabilities through add-on packages. However, updates in these packages may break a previously functioning analysis. Years from now you may need to run a variation of an analysis, which would require you to find the version of R you used, plus the packages you used at the time. As a GUI user, you’d also need to find the version of the GUI that was compatible with that version of R.

Some GUIs, such as the R Commander and Deducer, depend on you to find and install R. For them, the problem of long-term stability yours to solve. Others, such as jamovi, distribute their own version of R, and all R packages, but not their add-on modules. This requires a bigger installation file, but it makes dealing with long-term stability simpler. Of course, this depends on all major versions being around for long-term, but for open-source software, there are usually multiple archives available to store software even if the original project is defunct.

BlueSky’s approach to package management is the most comprehensive of the R GUIs reviewed here. It provides everything you need in a single download. This includes the BlueSky interface, R itself, all R packages, and all BlueSky plug-ins. If you have a problem reproducing a BlueSky analysis in the future, all you need to do is download the version used when you created it.

Output & Report Writing

Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into the realm of word processing through Sweave/knitr and Rmarkdown documents. At the moment, none of the GUIs covered in this series of reviews meets all of these requirements. See the separate reviews to see how each of the other packages is doing on this topic.

The labels for each of BlueSky’s analyses are provided by its menu title, e.g. Linear Regression. However, double-clicking on the title in the output switches it into edit mode where you can change it to anything you like. Unfortunately, there is no way to add comments or notes in the output, but of course, you can do so in the code that it generates in the program editor.

The organization of the output is in time-order only, and you cannot delete any of the steps you take. This often results in a messy output file filled with unneeded results. A table of contents will pop out of the left side of the output window when you choose “Layout> Show Navigation Tree.” While such tables of contents are commonly used in GUIs to let you re-order, rename, or delete bits of output, those tasks are not possible here. There you can un-check any output to hide it, but it’s not deleted. You are better off keeping a word processing file open to paste in the results you want to keep.

BlueSky’s output quality is very high, with nice fonts of your choosing and true rich text tables (see Figure 5). To have them display using the popular style of the American Psychological Association (see Table 1) save the setting: “Options> Configuration Settings> Others> Show output tables in APA style.” From that point on, all your output tables will use APA format. That means you can right-click on any table and choose “Export to Word (or Excel)” and the formatting is retained. That really helps speed your work as R output defaults to mono-spaced fonts that require additional steps to get into publication form (e.g. using functions from packages such as xtable or texreg). You can also choose “Copy to Clipboard”, but pasting from there into Word will lose the full formatting, while still remaining a true table. All the output is stored in a single file, which can be exported to PDF and from there edited in Microsoft Word.

A nice feature of BlueSky’s output tables is that they are all interactive. So if you have a complex model you’re studying, you can easily sort the output by p-value, or parameter size, or any column you choose. That’s a nice and fairly unique feature.

Figure 5. Publication-quality output created by BlueSky.

Group-By Analyses

Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide an ability to select a subset one group to analyze, then another subset to compare it to. All the R GUIs reviewed in this series can do this task. BlueSky does single-group selections in “Data> Subset”. It generates a subset that you can analyze in the same way as the entire dataset.

Software also needs the ability to automate such selections so that you might generate dozens of analyses, one group at a time. While this has been available in commercial GUIs for decades (e.g. SPSS split-file), BlueSky is the only R GUI that includes this feature. BlueSky automates group-by analyses under “Split> For Analysis> Split”. All analyses that follow will be done repeatedly for each level of the factors(s) chosen. This feature is turned off via “Split> For Analysis> Remove Split.”

Output Management

Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g. predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g. SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, for the simplification that GUIs provide, that’s a challenge.

Output data can be observation-level, such as predicted values for each observation or case.  When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group, rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).

Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling” above.)

For example, in our organization, we have 250 departments and want to see if any of them have a gender bias on salary. We write all 250 regression models to a data set and then search to find those whose gender parameter is significant (hoping to find none, of course!)

BlueSky is the only R GUI reviewed here that does all three levels of output management. To use this function, choose “Model Fitting> Summarizing models for each group”, then specify the model and the grouping factor. It automatically creates three data sets, one at each level of analysis. This ability works only regression, ANOVA, and multinomial logistic models. More are planned for future versions.

While BlueSky is ahead of the GUI pack in output management, the approach listed above still makes judgment calls about what output is useful for further analysis. What would you do to analyze an output table not covered by the above methods? Recall that all BlueSky output tables are true tables that can be exported to Word or Excel. Using that approach, you could save any table you like, export it and then open it as a data set to analyze. It’s not the most elegant approach, but it is quite comprehensive. 

Developer Issues

There are 2 ways developers can contribute to the open source project

  1. Developers who want to add/modify the application e.g. provide new right-click controls, integration into big data libraries like Hadoop and Spark, can download the source code from https://github.com/BlueSkyStatistics/BlueSkyRepository.
  2. Programmers who want to add new statistical analysis to BlueSky Statistics should watch training videos on the dialog editor program.


BlueSky Statistics offers an extensive set of tools that are easy for a point-and-click user to use. If you’re looking for a GUI that lets you do the most using just menus and dialog boxes, BlueSky should be on your list of software to try. BlueSky and R Commander are both way out in front of the R GUI competition when it comes to breadth of coverage in data management, graph types, and methods of analysis. I encourage you to read both reviews carefully when choosing between these two. Also keep in mind that while jamovi is newer and currently has fewer features, its developers are adding new ones at a rapid pace.

For a summary of all my R GUI software reviews, see the article, R Graphical User Interface Comparison.


Thanks to the BlueSky team who have done a lot of hard work and made all but the terminal server version of it free and open source. Thanks also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.