BlueSky Statistics is a graphical user interface for the powerful R language. On July 10, 2024, the BlueskyStatistics.com website said:
“…As the BlueSky Statistics version 10 product evolves, we will continue to work on orchestrating the necessary logistics to make the BlueSky Statistics version 10.x application available as an open-source project. This will be done in phases, as we did for the BlueSky Statistics 7.x version. We are currently rearchitecting its key components to allow the broader community to make effective contributions. When this work is complete, we will open-source the components for broader community participation…”
In the current statement (September 5, 2024), the sentence regarding version 10.x becoming open source is gone. This line was added:
“…Revenue from the commercial (Pro) version plays a vital role in funding the R&D needed to continue to develop and support the open-source (BlueSky Statistics 7.x) version and the free version (BlueSky Statistics 10.x Base Edition)…”
I have verified with the founders that they no longer plan to release version 10 with an open-source license. I’m disappointed by this change as I have advocated for and written about open source for many years.
There are many advantages of open-source licensing over proprietary. If the company decides to stop making version 10 free, current users will still have the right to run the currently installed version, but they will only be able to get the next version if they pay. If it were open source, its users could move the code to another repository and base new versions on that. That scenario has certainly happened before, most notably with OpenOffice. BlueSky LLC has announced no plans to charge for future versions of BlueSky Base Edition, but they could.
I have already updated the references on my website to reflect that BlueSky v10 is not open source. I wish I had been notified of this change before telling many people at the JSM 2024 conference that I was demonstrating open-source software. I apologize to them.
BlueSky Statistics is a free and open-source graphical user interface for the powerful R language. There is also a commercial “Pro” version that offers tech support, priority feature requests, and many powerful additional features. The Pro version has been beefed up considerably with the new features below. These features apply to quality control, general statistics, team collaboration, project management, and scripting. Many are focused on quality control and Six Sigma as a result of requests from organizations migrating from Minitab and JMP. However, both versions of BlueSky Statistics offer a wide range of statistical, graphical, and machine-learning methods.
The free version saves every step of the analysis for full reproducibility. However, repeating the analysis is a step-by-step process. The Pro version can now rerun the entire set at once, substituting other datasets when needed.
Copy data from Excel and paste it into the BlueSky Statistics data grid. This is in addition to the existing mechanism of bringing data through file import for various file formats into BlueSky Statistics to perform data analysis.
Undo/Redo data grid edits
Single-item and muti-items data element edits can be discarded by undo and restored by redo operations.
Project save/open to save/open all work (all open datasets and output analysis)
Analysis performed can be saved into one or more projects. Each project contains all the datasets along with all the analyses and any R code from the editor. The projects can be exported and shared (sent as .bsp, which is a zip file; “bsp” is an abbreviation of BlueSky Statistics Project) with other BlueSky Statistics users. The users can import projects, see all the datasets and analyses stored in the projects, and subsequently add/modify/rerun all the analyses.
Enhanced cleaning/adjustment of copied/imported Excel/CSV data on the Datagrid
Dataset > Excel Cleanup
There are a few enhancements made to offer additional data cleanup/adjustment options to the existing Excel Cleanup dialog to clean/adjust (i.e., rows. Columns, data type, etc.) data on the BlueSky Statistics data grid, irrespective of how the data was loaded into the data grid with the file open option or by copying and pasting from Excel/CSV file.
Renaming output tabs
Double-clicking on the output tab will open a dialog box asking for the new name. The user can type in a name to rename the output tab.
Enhanced Pie Chart and Bar Chart
Graphics > Pie Charts > Pie Chart Graphics > Bar Chart
The pie chart and bar chart have been enhanced to show % and counts on the plot.
Scatterplot Matrix
Graphics > Scatterplot Matrix
The Scatter Plot Matrix dialog has been added.
Scatterplot with mean and confidence interval bar
Graphics > Scatterplot > Scatter Plot with Intervals
A Scatter Plot dialog with mean and confidence interval bar has been made available with an unlimited number of grouping variables for the X-axis to group a numeric variable for the Y-axis.
Enhanced Scatterplot with both horizontal and vertical reference lines
Graphics > Scatterplot > Scatter Plot Ref Lines
The Scatterplot dialog has been enhanced so that users can add an unlimited number of reference lines (horizontal and vertical axis) to the plot.
Enhancements to BlueSky Statistics R Editor and Output Syntax/Code Editor
For R-programmers many enhancements have been made to the BlueSky R Editor and the output syntax/code editor to improve ease of use and productivity with tooltips, find and replace, undo/redo, comment/uncomment blocks, etc.
Enhanced Normal Distribution Plot
Distribution > Normal > Normal Distribution Plot with Labels
The normal distribution plot will show the computed probability and x values on the plot for the shaded area for x value and quantiles, respectively
Plot one tail (left and right), two tails, and other ranges
Automatic randomization of generating normal sample distribution
Distribution > Normal > Sample from Normal Distribution
In addition to setting a seed value for reproducibility, the default option has been set to randomize automatically the sample data generation every time.
Automatic randomization of design creations of all DoE designs
DOE > Create Design > ….
In addition to setting a seed value for reproducibility, the default option has been set to randomize the creation of any DoE design every time automatically.
Enhanced Distribution Fit analysis
Analysis > Distribution Analysis > Distribution Fit P-value
The distribution fit analysis has been enhanced to compute AD, KS, and CVM tests and show test statistics, as well as corresponding p-values. These assist users in determining the best fit in addition to the existing AIC and BIC values.
Moreover, an option has been introduced for users to see only the comparison of distributions and skip displaying the analysis of the individual distribution fit analysis.
Tolerance Intervals
Six Sigma > Tolerance Intervals
A new Tolerance Intervals analysis has been introduced. The tolerance interval describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. These tolerance limits, taken from the estimated interval, are limits within which a stated proportion of the population is expected to occur.
Equivalence (and Minimal Effect) test
Analysis > Means > Equivalence test
This new feature tests for mean equivalence and minimal effects.
Nonlinear Least Square – all-purpose Non-Linear Regression modeling
Model Fitting > Nonlinear Least Square
Performs non-linear regression with flexibility and many user options to model, test, and plot.
Polynomial Models with different degrees
Model Fitting > Polynomial
Computes and fits an orthogonal polynomial model with a specified degree. Also, optionally compares multiple Polynomial models of different degrees side by side.
Enhanced Pareto Chart
Six Sigma > Pareto Chart > Pareto Chart
A new option has been added for data that does not have a count column but only has the raw data. Automatically computes cumulative frequency from Raw Data for plotting.
Frequency analysis with an option to draw a Pareto chart
Analysis > Summary > Frequency Plot
A new dialog has been introduced to plot (optionally) the Pareto Chart from the frequency table and, if desired, display the frequency table on the Datagrid.
MSA (Measurement System Analysis) Enhancements
Gage Study Design Table
Six Sigma > MSA > Design MSA Study
Users can generate a randomized design experiment table for any combination of the number of operators, parts, and replications to set up a Gage study table to perform experiments and collect the results to analyze the accuracy of the Gage under study with analysis like Gage R&R, Gage Bias, etc.
Enhanced Gage R&R
Six Sigma > MSA > Gage R&R
Many enhancements and options have been introduced to the Gage of R&R dialog and the underlying analysis
Report header table
Enlarged graphs
Nested gage data analysis, in addition to crossed
Usage of historical process std dev to estimate Gage Evaluation values (%StudyVar table)
Show %Process
Enhanced Gage Attribute Analysis
Six Sigma > MSA > Attribute Analysis
Many enhancements and options have been introduced to the Attribute Analysis dialog and the underlying analysis
Report header table
Accuracy and classification rate calculations, in addition to agreement and disagreement
Optional Cohen’s Kappa stats (between each pair of raters) in addition to Fleiss Kappa (multi-raters)
Enhanced Gage Bias Analysis
Six Sigma > MSA > Gage Bias Analysis
Many enhancements and options have been introduced to the Gage Bias Analysis dialog and the underlying analysis
Efficient single dialog with options for linearity and type-1 tests for one or more References
A new option – “Method to use for estimating repeatability std dev”
Cg and Cgk – calculated for different Reference values in one go
Run charts for every reference value and an overall run chart for all reference values
Usage of historical std dev to calculate RF (Reference Figure)
%RE, %EV are introduced, and all tables show how the computed values compared to the required/cut-off values specified by users on the dialog
PCA (Process Capability Analysis) Enhancements
Enhanced Process Capability Analysis (for normal data)
Six Sigma > Process Capability > Process Capability
pp_l = pp_k and ppU = ppk is shown when a one-sided tolerance is used
Removed underscores to only show Ppl, Ppk, Ppu, Cp, Cpk, .. etc
A new option – “Do not use unbiasing constant to estimate std dev for overall process capability indices” to compute overall Ppk (Ppl)
Underlying charts (xbar.one) renamed to MR or I Chart based on SD or MR
Handling of missing values
Customizable number of decimals to show on the plot
Standard Deviation label on the plot marked as “Overall StdDev” and “Within StdDev”
Process Capability Analysis for non-normal data
Six Sigma > Process Capability > Process Capability (Non-Normal)
A new dialog has been introduced to perform process capability analysis for non-normal data.
Multi-Vari graph
Six Sigma > Multi-Vari Chart
A new option has been added to adjust horizontal and vertical position offset to place/move the values for the data points on the plot.
Enhanced Shewhart Charts
Six Sigma > Shewhart Charts > …….
A new option has been added to all Shewhart Charts dialogs: the ability to add any number of spec/reference lines to the chart specified by the user.
I have just updated my detailed reviews of Graphical User Interfaces (GUIs) for R, so let’s compare them again. It’s not too difficult to rank them based on the number of features they offer, so let’s start there. I’m basing the counts on the number of dialog boxes in each category of four categories:
Ease of Use
General Usability
Graphics
Analytics
This is trickier data to collect than you might think. Some software has fewer menu choices, depending instead on more detailed dialog boxes. Studying every menu and dialog box is very time-consuming, but that is what I’ve tried to do. I’m putting the details of each measure in the appendix so you can adjust the figures and create your own categories. If you decide to make your own graphs, I’d love to hear from you in the comments below.
Figure 1 shows how the various GUIs compare on the average rank of the four categories. R Commander is abbreviated Rcmdr, and R AnalyticFlow is abbreviated RAF. We see that BlueSky is in the lead with R-Instat close behind. As my detailed reviews of those two point out, they are extremely different pieces of software! Rather than spend more time on this summary plot, let’s examine the four categories separately.
For the category of ease-of-use, I’ve defined it mostly by how well each GUI does what GUI users are looking for: avoiding code. They get one point each for being able to install, start, and use the GUI to its maximum effect, including publication-quality output, without knowing anything about the R language itself. Figure two shows the result. JASP comes out on top here, with jamovi and BlueSky right behind.
Figure 3 shows the general usability features each GUI offers. This category is dominated by data-wrangling capabilities, where data scientists and statisticians spend most of their time. This category also includes various types of data input and output. BlueSky and R-Instat come out on top not just due to their excellent selection of data wrangling features but also due to their use of the rio package for importing and exporting files. The rio package combines the import/export capabilities of many other packages, and it is easy to use. I expect the other GUIs will eventually adopt it, raising their scores by around 40 points. JASP shows up at the bottom of this plot due to its philosophy of encouraging users to prepare the data elsewhere before importing it into JASP.
Figure 4 shows the number of graphics features offered by each GUI. R-Instat has a solid lead in this category. In fact, this underestimates R-Instat’s ability if you…
If you want to learn R, or improve your current R skills, join me for two workshops that I’m offering through Revolution Analytics in January and April.
If you already know another analytics package, the workshop, Intro to R for SAS, SPSS and Stata Users may be for you. With each R concept, I’ll introduce it using terminology that you already know, then translate it into R’s very different view of the world. You’ll be following along, with hands-on practice, so that by the end of the workshop R’s fundamentals should be crystal clear. The examples we’ll do come right out of my books, R for SAS and SPSS Users and R for Stata Users. That way if you need more explanation later, or want to dive in more deeply, the book of your choice will be very familiar. Plus, the table of contents and the index contain topics listed by SAS/SPSS/Stata terminology and R terminology so you can use either to find what you need. You can see a complete out line and register for the workshop starting January 13 (click here) or April 21 (click here).
If you already know R, but want to learn more about how you can use R to get your data ready to analyze, the workshop Managing Data with R will demonstrate how to use the 15 most widely used data management tasks. The course outline and registration is available here for January and here for April.
If you have questions about any of these courses, drop me a line a muenchen.bob@gmail.com. I’m always available to answer questions regarding any of my books or workshops.
If you want to learn R, or improve your current R skills, join me for two workshops that I’m offering through Revolution Analytics in October.
If you already know another analytics package, the workshop, Intro to R for SAS, SPSS and Stata Users may be for you. With each R concept, I’ll introduce it using terminology that you already know, then translate it into R’s very different view of the world. You’ll be following along, with hands-on practice, so that by the end of the workshop R’s fundamentals should be crystal clear. The examples we’ll do come right out of my books, R for SAS and SPSS Users and R for Stata Users. That way if you need more explanation later, or want to dive in more deeply, the book of your choice will be very familiar. Plus, the table of contents and the index contain topics listed by SAS/SPSS/Stata terminology and R terminology so you can use either to find what you need. You can see a complete out line and register for the workshop here.
If you already know R, but want to learn more about data management, the workshop Managing Data with R will demonstrate how to use the 15 most widely used data management tasks. That course outline and registration is here.
If you have questions about any of these courses, drop me a line a muenchen.bob@gmail.com. I’m always available to answer questions regarding any of my books or workshops.