A Comparative Review of the BlueSky Statistics GUI for R

by Robert A. Muenchen

Introduction

BlueSky Statistics’ desktop version is a free and open source graphical user interface for the R software that focuses on beginners looking to point-and-click their way through analyses. A commercial version is also available which includes technical support and a version for Windows Terminal Servers such as Remote Desktop, or Citrix. Mac, Linux, or tablet users could run it via a terminal server.

This post is one of a series of reviews which aim to help non-programmers choose the Graphical User Interface (GUI) that is best for them. Additionally, these reviews include a cursory description of the programming support that each GUI offers.

 

Terminology

There are various definitions of user interface types, so here’s how I’ll be using these terms:

GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.

 

Installation

The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as jamovi or RKWard, install in a single step. Others, such as Deducer, install in multiple steps (up to seven steps, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

The main BlueSky installation is easily performed in a single step. The installer provides its own embedded copy of R, simplifying the installation and ensuring complete compatibility between BlueSky and the version of R it’s using. However, it also means if you already have R installed, you’ll end up with a second copy. You can have BlueSky control any version of R you choose, but if the version differs too much, you may run into occasional problems.

 

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins" which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Deducer) through moderate (jamovi) to very active (R Commander).

BlueSky is a fairly new open source project, and at the moment all the add-on modules are provided by the company. However, BlueSky’s capabilities approaches the comprehensiveness of R Commander, which currently has the most add-ons available. The BlueSky developers are working to create an Internet repository for module distribution.

 

Startup

Some user interfaces for R, such as jamovi, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.

You start BlueSky directly by double-clicking its icon from your desktop, or choosing it from your Start Menu (i.e. not from within R itself). It interacts with R in the background; you never need to be aware that R is running.

 

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.

BlueSky starts up by showing you its main Application screen (Figure 1) and prompts you to enter data with an empty spreadsheet-style data editor. You can start entering data immediately, though at first, the variables are simply named var1, var2…. You might think you can rename them by clicking on their names, but such changes are done in a different manner, one that will be very familiar to SPSS users. There are two tabs at the bottom left of the data editor screen, which are labeled “Data" and “Variables." The “Data" tab is shown by default, but clicking on the “Variables" tab takes you to a screen (Figure 2) which displays the metadata: variable names, labels, types, classes, values, and measurement scale.

Figure 1. The main BlueSky Application screen.

The big advantage that SPSS offers is that you can change the settings of many variables at once. So if you had, say, 20 variables for which you needed to set the same factor labels (e.g. 1=strongly disagree…5=Strongly Agree) you could do it once and then paste them into the other 19 with just a click or two. Unfortunately, that’s not yet fully implemented in BlueSky. Some of the metadata fields can be edited directly. For the rest, you must instead follow the directions at the top of that screen and right click on each variable, one at a time, to make the changes. Complete copy and paste of metadata is planned for a future version.

Figure 2. The Variables screen in the data editor. The “Variables" tab in the lower left is selected, letting us see the metadata for the same variables as shown in Figure 1.

You can enter numeric or character data in the editor right after starting BlueSky. The first time you enter character data, it will offer to convert the variable from numeric to character and wait for you to approve the change. This is very helpful as it’s all too easy to type the letter “O" when meaning to type a zero “0", or the letter “I" instead of number one “1".

To add rows, the Data tab is clearly labeled, “Click here to add a new row". It would be much faster if the Enter key did that automatically.

To add variables you have to go to the Variables tab and right-click on the row of any variable (variable names are in rows on that screen), then choose “Insert new variable at end."

To enter factor data, it’s best to leave it numeric such as 1 or 2, for male and female, then set the labels (which are called values using SPSS terminology) afterwards. The reason for this is that once labels are set, you must enter them from drop-down menus. While that ensures no invalid values are entered, it slows down data entry. The developer’s future plans includes automatic display of labels upon entry of numeric values.

If you instead decide to make the variable a factor before entering numeric data, it’s best to enter the numbers as labels as well. It’s an oddity of R that factors are numeric inside, while displaying labels that may or may not be the same as the numbers they represent.

To enter dates, enter them as character data and use the “Data> Compute” menu to convert the character data to a date. When I reported this problem to the developers, they said they would add this to the “Variables” metadata tab so you could set it to be a date variable before entering the data.

If you have another data set to enter, you can start the process again by clicking “File> New”, and a new editor window will appear in a new tab. You can change data sets simply by clicking on its tab and its window will pop to the front for you to see. When doing analyses, or saving data, the data set that’s displayed in the editor is the one that will be used. That approach feels very natural; what you see is what you get.

Saving the data is done with the standard “File > Save As" menu. You must save each one to its own file. While R allows multiple data sets (and other objects such as models) to be saved to a single file, BlueSky does not. Its developers chose to simplify what their users have to learn by limiting each file to a single data set. That is a useful simplification for GUI users. If a more advanced R user sends a compound file containing many objects, BlueSky will detect it and offer to open one data set (data frame) at a time.

Figure 3. Output window showing standard journal-style tables. Syntax editor has been opened and is shown on right side.

 

Data Import

The open source version of BlueSky supports the following file formats, all located under “File> Open":

  • Comma Separated Values (.csv)
  • Plain text files (.txt)
  • Excel (old and new xls file types)
  • Dbase’s DBF
  • SPSS (.sav)
  • SAS binary files (sas7bdat)
  • Standard R workspace files (RData) with individual data frame selection

The SQL database formats are found under the “File> Import Data" menu. The supported formats include:

  • Microsoft Access
  • Microsoft SQL Server
  • MySQL
  • PostgreSQL
  • SQLite

Data Management

It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g. from wide to long and back). A critically important aspect of data management is the ability to transform many variables at once. For example, social scientists need to recode many survey items, biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time can be tedious. Some GUIs, such as jamovi and RKWard handle only a few of these functions. Others, such as the R Commander, can handle many, but not all, of them.

BlueSky offers one of the most comprehensive sets of data management tools of any R GUI. The “Data" menu offers the following set of tools. Not shown is an extensive set of character and date/time functions which appear under “Compute.”

  1. Missing Values
  2. Compute
  3. Bin Numeric Variables
  4. Recode (able to recode many at once)
  5. Make Factor Variable (able to covert many at once)
  6. Transpose
  7. Transform (able to transform many at once)
  8. Sample Dataset
  9. Delete Variables
  10. Standardize Variables (able to standardize many at once)
  11. Aggregate (outputs results to a new dataset)
  12. Aggregate (outputs results to a printed table)
  13. Subset (outputs to a new data et)
  14. Subset (outputs results to a printed table)
  15. Merge Datasets
  16. Sort (outputs results to a new dataset)
  17. Sort (outputs results to a printed table)
  18. Reload Dataset from File
  19. Refresh Grid
  20. Concatenate Multiple Variables (handling missing values)
  21. Legacy (does same things but using base R code)
  22. Reshape (long to wide)
  23. Reshape (wide to long)

 

Menus & Dialog Boxes

The goal of pointing & clicking your way through an analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.

BlueSky uses standard menu choices for running steps listed on the Graphics, Analysis, Model Fitting, or Model Tuning menus. Dialog boxes appear and you select variables to place into their various roles. This is accomplished by either dragging the variable names, or by selecting them and clicking an arrow located next to the particular role box. You then can click on either “OK" to run the step, or “Syntax" to write the code for that step to the R program editor. To run a variation on the same analysis, the dialog boxes make quick work of it by remembering their previous settings (within a session).

Output is saved not by using the standard “File > Save As" menu, but instead with “Output > Save Output" selection from the main window. Oddly enough, while most menus are duplicated in both the main screen and the Output/Syntax screen, the ability to open or save output only appears on the main screen. If you exit without saving, BlueSky will prompt you to save both output and syntax (if you’ve used any of the latter).

During GUI-driven analysis, the only indication you have that R is doing the work is the code that appears in the output window before each result. However, if you click the “Syntax" button instead of “OK", the program editor will pop out the right side of the output window. The code will be added to the bottom of the program editor, and it will be highlighted so that a click on the “Run" icon will execute it.

 

Documentation & Training

At the moment, this review is probably one of the most thorough written descriptions of how to use BlueSky.

The BlueSkyStatistics.com site offers training videos on how to use it. YouTube.com also offers training videos that show how to use BlueSky.

 

Help

R GUIs provide simple task-by-task dialog boxes which generate much more complex code. So for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exception that is the R Commander, which lacks help on the dialog boxes themselves (see that review for details, coming soon).

The level of help that BlueSky provides varies depending on how much help the developers think you need. Each dialog box has a help button in the upper right corner which pops a help window off to the right of the dialog box. For many dialog boxes, it provides a summary description, how to use the dialog box, all the GUI settings, and how the accompanying function works should you choose to write your own code. In the bottom right corner of each dialog box is a “Get R Help" button that takes you to the R help page for the standard R function that actually does the calculations inside BlueSky’s functions.

For some dialog boxes that simply call an R function (e.g. independent samples t-test), BlueSky will display R’s built-in help file. While this variable help approach has been done well, I would prefer a more consistent approach. There are often things in the R help files that are not implemented in BlueSky, so it would be less confusing to eliminate those situations. For example, in the case of the t-test, the help file describes how “formula" works, but that concept is not addressable using BlueSky’s dialog box (nor is it needed).

 

Graphics

The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as jamovi, use their own functions and integrate them into analysis steps. GUIs also differ quite a lot in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. That’s how jamovi works, but then jamovi is limited to its custom graph functions, as nice as they may be.

Bluesky does most of its plots using the popular ggplot2 package, so that’s the code it will create if you want to learn it. BlueSky’s dialogs for creating graphs are extremely easy to use. By comparison, learning ggplot2 code can be confusing at first. BlueSky also offers several of R’s traditional graphics functions, which it places under a “Legacy" menu. While these graphs are usually not as nice as the ones created by the rest of its menus (i.e. those created by ggplot), having both gives you the opportunity to compare both their appearance and the code used to create them.

Here is the selection of plots BlueSky can create.

  1. Maps (US County, US State, World)
  2. Histogram
  3. BoxPlot
  4. Stem and Leaf Plot
  5. Plot of Means (commercial version only)
  6. 3D Scatterplot
  7. Heatmap
  8. Bar Graph
  9. Density Plot
  10. Density Plot (counts)
  11. Strip Chart
  12. Scatterplot
  13. Others – Boxplot lists outliers, Strip Plot
  14. Line Chart
  15. Frequency Charts (numeric)
  16. Frequency Charts (factor)
  17. Legacy – Histogram, Plot of Means, Scatterplot, Strip Plot, Generic Plot

Let’s take a look at how BlueSky does scatterplots, using R’s ggplot2 package behind the scenes. Using the dialog box I chose only the X variable, Y variable, X facet factor, Y facet factor, and the type of smoothing fit. Note that the initial “for" loop allows BlueSky to repeat this plot by levels of a third factor (not used here).

local(
{
varNames=c('posttest')
for (vars in varNames)
{
print(ggplot(Dataset2,aes(x = pretest,
y =eval(parse(text=paste(vars))))) + 
geom_point() + labs(x = "pretest",y = vars) +
facet_grid(workshop~gender) +geom_smooth(method ="lm"))
}
}
)

Figure 4. A faceted scatterplot created by BlueSky and the ggplot2 package.

 

Modeling

The way statistical models (which R stores in “model objects") are created and used, is an area on which R GUIs differ the most. The simplest, and least flexible approach, is taken by jamovi and RKWard. They try to do everything you might need in a single dialog box. They either don’t save models, or they do nothing with them. To an R programmer, that sounds extreme, since R does a lot with model objects. However, neither SAS nor SPSS were able to save models for their first 35 years of existence, so each approach has its merits.

BlueSky’s modeling approach balances flexibility and ease of use. All its “Model Fitting" dialogs save the resulting model as a model object. They contain a “Model Name" field which is filled in with a useful default name such as, “LinearRegModel1". The analyses listed under “Model Statistics" automatically use the model you set in the upper right corner of the main control screen. You use the “Pick a Model" drop-down menu to choose your model. From then on, all the Model Statistics menu choices will use that model to calculate model measures such as AIC, or perform additional analyses, such as stepwise variable selection. A nice future improvement would be to have the software automatically choose your last model.

The steps BlueSky currently offers to further manipulate models include: Stepwise, AIC, and BIC, Confidence Intervals, Variance Inflation Factors, and the Bonferroni Outlier Test.

 

Analysis Methods

All of the R GUIs offer a decent set of statistical analysis methods. Some also offer machine learning methods too. As you can see in the table below, BlueSky offers an extensive set of analysis methods. It also offers interesting variations on machine learning. Under its “Model Fitting" dialog, it provides direct access to the most popular machine learning algorithms. If you are a beginner at machine learning, that’s where you would start. The menus call the various R functions directly, and if you display the commands, you’ll notice that each uses a slightly different syntax.

If you’re an advanced user of machine learning, you might skip directly to the “Model Tuning" menu. There you’ll find many of the same algorithms, this time controlled in a powerful and standard way using R’s caret package. There you begin by choosing one of four tuning methods and one of the nine machine learning algorithms. BlueSky then passes the work off to the caret package to find your optimal model.

 

Descriptive Statistics Numerical summary analysis
Factor variable analysis
Frequencies
Summary by variable
Summary (group by multiple variables)
Numerical statistical analysis
Statistical analysis Correlation test
Shapiro-Wilk normality test
Q-Q plot
Compare means One sample T-Test
Multi variable one sample T-Test
Independent sample T-Test
Paired T-Test
One way ANOVA
Multi-way ANOVA
One way ANOVA with blocks
One way ANOVA with random blocks
Proportion test Single sample proportion test
Single sample exact binomial proportion test
Two sample proportion test
Variance Two variance F-Test
Bartlett’s test
Levene’s test
Non-parametric test Chisq test
Two sample Wilcoxon test
Paired Wilcoxon test
Friedman test
Kruskal Wallis test
Contingency tables Multiway crosstab
Two-way crosstab
Factor analysis Principal component analysis
Factor analysis
Split datasets for analysis Split
Remove split
Split data sets for modeling Simple split
Stratified sampling
Linear modeling Linear regression
Generalized linear models
Linear modeling
Multinomial Logit
Ordinal regression
Tree algorithms Decision trees
Random forest
Probabilistic classifiers Naive Bayes
Clustering KNN
KMeans cluster
Hierarchical cluster
Forecasting Plot time series
Automated Arima
Exponential smoothing
Holt Winters seasonal
Non seasonal Holt Winters
Reliability Analysis
Model tuning Bootstrap resampling
K-fold cross validation
Repeated k-fold cross validation
Leave one out cross validation
Model statistics Stepwise
AIC
BIC
Confidence interval
Variance inflation factors
Bonferroni outlier test
Model scoring Pick a model
Score the selected model
Save model
Load model
Market basket Generate rules
Item frequency
Targeting items
Display rules
Plot rules

 

Generated R Code

One of the aspects that most differentiates the various GUIs for R is the code they generate. If you decide you want to save code, what type of code is best for you? The base R code as provided by the R Commander which can teach you “classic" R? The concise functions that mimic the simplicity of one-step dialogs such as jamovi provides? The completely transparent (and complex) code provided by RKWard, which might be the best for budding R power users?

BlueSky writes what you might call modern R code. For data management, it uses tidyverse packages; for graphics, it uses ggplot2, and for model tuning it uses the caret package.

Here’s an example of code BlueSky wrote to do a group-by aggregation:

mySummarized <-mydata100 %>%
  dplyr::group_by(workshop,gender) %>%
  dplyr::summarize(mean_pretest=mean(pretest,na.rm =TRUE),
    mean_posttest=mean(posttest,na.rm =TRUE))

Here is an example of code BlueSky wrote to convert my repeated-measures style “long" data set to a “wide" one. The long one had three main variables: an ID variable, a factor Time, and a measure Y. The resulting wide data set had ID and four variables named Time1, Time2, Time3, and Time4. The values of Y were spread across the four time variables. Here’s the code:

require(tidyr);

Bobs_Wide <- spread(Bobs_Long,Time,Y)

BSkyLoadRefreshDataframe(Bobs_Wide,load.dataframe=TRUE)

Below is an example of BlueSky’s code for a simple linear regression. BlueSky even provided the comments explaining each step, a nice touch! Note that it uses its own set of functions, such as BSkyRegression() instead of R’s built-in lm() function. It’s this function that does both the modeling step and the text formatting step. This is very similar to the approach used by jamovi, except that BlueSky does plotting using R’s standard plot function (one of the few times it uses it) instead of being integrated into a single regression function call.

BSkyLoadRefreshDataframe(BobsAgg)

#Builds a linear regression model. Returns an object called 
#BSkyLinearRegression which is an object of class lm. 
# Displays a summary of the model, coefficient table, 
# Anova table and sum of squares table.
LinearRegModel1= BSkyRegression(depVars ='posttest',
  indepVars =c('pretest'),dataset="Dataset2")

#Plots residuals vs. fitted, normal Q-Q, theoretical quantiles, 
#residuals vs. leverage
if(TRUE)
{
plot(LinearRegModel1)
}

 

Support for Programmers

Some of the GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.

While BlueSky’s main mission is to make their point-and-click GUI comprehensive, it does include a basic program editor which supports the writing and debugging of code. The code editor is hidden at start-up, but an arrow at the upper right corner of the output window will pop open the code editor at any time (and pop it closed, if already open). A click on the Syntax button in any dialog box will also pop the code editor open.

The code editor supports syntax highlighting, and it can collapse and expand blocks of code. It also offers some hints on function name completion. For example, typing “m" will cause it to offer “min" and “max" functions, but oddly enough, it will not offer “mean" or “median." It doesn’t provide hints on argument names or values, nor does it offer to complete object names. RStudio and RKWard both offer much more support for coders.

However, the lack of features for coders offers a benefit to GUI users: nearly all the menus and their entries are focused on GUI use. In this regard, BlueSky is the mirror image of RKWard, which has several menus full of features that only coders use.

 

Reproducibility & Sharing

One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written report so that others can check their work.

As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since the GUI user didn’t write the code, they also can’t read it or change it! Others such as jamovi, RKWard, and the newest version of SPSS save the dialog box entries and allow GUI users to have reproducibility in the form they prefer.

BlueSky offers only code-based reproducibility. There’s no way to get back to a filled-in dialog box when starting from the saved code.

If you wish to share your work with a colleague, you would send them the code and your data set. They could then install the appropriate version of BlueSky to run it. They could also install the “BlueSky Statistics R Package", enabling them to run the code in any R environment. At the moment, that package is only available for download from the company web site. However, the developers plan on moving it to CRAN eventually.

 

Output & Report Writing

Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into the realm of word processing through Sweave/knitr and Rmarkdown documents. At the moment, none of the GUIs covered in this series of reviews meets all of these requirements. See the separate reviews to see how each of the other packages is doing on this topic.

The labels for each of BlueSky’s analyses are provided by its menu title, e.g. Linear Regression. However, double-clicking on the title in the output switches it into edit mode where you can change it to anything you like. Unfortunately, there is no way to add comments or notes in the output, but of course you can do so in the code that it generates in the program editor.

The organization of the output is in time-order only, and you cannot delete any of the steps you take. This often results in a messy output file filled with unneeded results. A table of contents will pop out of the left side of the output window when you choose “Layout> Show Navigation Tree.” While such tables of contents are commonly used in GUIs to let you re-order, rename, or delete bits of output, those tasks are not possible here. There you can un-check any output to hide it, but it’s not deleted. You are better off keeping a word processing file open to paste in the results you want to keep.

BlueSky’s output quality is very high, with nice fonts of your choosing and true rich text tables (see Figure 5). To have them display using the popular style of the American Psychological Associate (see Table 1) save the setting: “Options> Configuration Settings> Others> Show output tables in APA style." From that point on, all your output tables will use APA format. That means you can right-click on any table and choose “Export to Word (or Excel)" and the formatting is retained. That really helps speed your work as R output defaults to mono-spaced fonts that require additional steps to get into publication form (e.g. using functions from packages such as xtable or texreg). You can also choose “Copy to Clipboard", but pasting from there into Word will lose the full formatting, while still remaining a true table. All the output is stored in a single file, which can be exported to PDF and from there edited in Microsoft Word.

A nice feature of BlueSky’s output tables is that they are all interactive. So if you have a complex model you’re studying, you can easily sort the output by p-value, or parameter size, or any column you choose. That’s a nice and fairly unique feature.

Figure 5. Publication-quality output created by BlueSky.

 

Group-By Analyses

Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide an ability to select a subset one group to analyze, then another subset to compare it to. All the R GUIs reviewed in this series can do this task. BlueSky does single-group selections in “Data> Subset". It generates a subset that you can analyze in the same way as the entire dataset.

Software also needs the ability to automate such selections so that you might generate dozens of analyses, one group at a time. While this has been available in commercial GUIs for decades (e.g. SPSS split-file), BlueSky is the only R GUI that includes this feature. BlueSky automates group-by analyses under “Split> For Analysis> Split". All analyses that follow will be done repeatedly for each level of the factors(s) chosen. This feature is turned off via “Split> For Analysis> Remove Split."

 

Output Management

Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g. predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g. SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, for the simplification that GUIs provide, that’s a challenge.

Output data can be observation-level, such as predicted values for each observation or case. When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group, rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).

Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling" above.)

For example, in our organization, we have 250 departments and want to see if any of them have a gender bias on salary. We write all 250 regression models to a data set, and then search to find those whose gender parameter is significant (hoping to find none, of course!)

BlueSky is the only R GUI reviewed here that does all three levels of output management. To use this function, choose “Model Fitting> Summarizing models for each group", then specify the model and the grouping factor. It automatically creates three data sets, one at each level of analysis. This ability works only regression, ANOVA, and multinomial logistic models. More are planned for future versions.

While BlueSky is ahead of the GUI pack in output management, the approach listed above still makes judgment calls about what output is useful for further analysis. What would you do to analyze an output table not covered by the above methods? Recall that all BlueSky output tables are true tables that can be exported to Word or Excel. Using that approach, you could save any table you like, export it and then open it as a data set to analyze. It’s not the most elegant approach, but it is quite comprehensive.

Developer Issues

There are 2 ways developers can contribute to the open source project

  1. Developers who want to add/modify the application e.g. provide new right click controls, integration into big data libraries like Hadoop and Spark, can download the source code from https://github.com/BlueSkyStatistics/BlueSkyRepository.
  2. Programmers who want to add new statistical analysis to BlueSky Statistics should watch training videos on the dialog editor program.

 

Conclusion

BlueSky Statistics offers an extensive set of tools that are easy for a point-and-click user to use. If you’re looking for a GUI that lets you do the most using just menus and dialog boxes, BlueSky should be on your list of software to try. BlueSky and R Commander are both way out in front of the R GUI competition when it comes to breadth of coverage in data management, graph types, and methods of analysis. I encourage you to read both reviews carefully when choosing between these two. Also keep in mind that while jamovi is newer and currently has fewer features, its developers are adding new ones at a rapid pace.

 

Acknowledgements

Thanks to the BlueSky team who have done a lot of hard work and made all but the terminal server version of it free and open source. Thanks also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.