Data Science Software Reviews: Forrester vs. Gartner

Update: an earlier version of this post included figures that I’ve removed at the request of Forrester, Inc.

In my previous post, I discussed Gartner’s reviews of data science software companies. In this post, I describe Forrester’s coverage and discuss how radically different it is. As usual, this post is already integrated into my regularly-updated article, The Popularity of Data Science Software.

Forrester Research, Inc. is a leading global research and advisory firm that reviews data science software vendors. Studying their reports and comparing them to Gartner’s can provide a deeper understanding of the software these vendors provide.

Historically, Forrester has conducted their analyses similarly to Gartner’s. That approach compares software that uses point-and-click style software like KNIME, to software that emphasizes coding, such as Anaconda. To make apples-to-apples comparisons, Forrester decided to spit the two types of software into separate reports.

The Forrester Wave: Multimodal Predictive Analytics and Machine Learning Solutions, Q3, 2018 covers software that is controllable by various means such as menus, workflows, wizards, or code (as of 23/22/2019 available free here). Forrester plans to cover tools for automated modeling in a separate report, due out in 2019. Given that automation is now a widely adopted feature of the several companies covered in this report, that seems like an odd approach.

Forrester divides the vendors into four categories: Leaders, Strong Performers, Contenders, and Challengers.

In the Leaders category, they include IBM, while Gartner viewed them as a middle-of-the-pack Visionary. Forrester and Gartner both view SAS and RapidMiner as leaders.

The Strong Performers category includes KNIME, which Gartner considered a Leader. Datawatch and Tibco are tied in this segment while Gartner had them far apart, with Datawatch put in very last place by Gartner. Forrester has KNIME and SAP next to each other in this category, while Gartner had them far apart, with KNIME a Leader and SAP a Niche Player. Dataiku is here too, with a similar rating to Gartner.

The Contenders segment contains Microsoft and Mathworks, in positions similar to Gartner’s. Fico is here too; Gartner did not evaluate them.

Forrester’s Challengers segment includes World Programming, which sells SAS-compatible software, and Minitab, which purchased Salford Systems. Neither were considered by Gartner.

The Forrester Wave: Notebook-Based Solutions, Q3, 2018 reviews software controlled by notebooks, which blend programming code and output in the same window (as of 3/22/2019 available here).

Forrester rates some of the notebook-based vendors very differently than Gartner. Here Domino Data Labs is a Leader while Gartner had them at the extreme other end of their plot, in the Niche Players quadrant. Oracle is also shown as a Leader, though its strength is this market is minimal.

In the Strong Performers category are Databricks and H2O.ai, in very similar positions compared to Gartner. Civis Analytics and OpenText are also in this category; neither were reviewed by Gartner. Cloudera is here as well; it too was left out by Gartner.

Forrester’s Condenders category contains Google, in a similar position compared to Gartner’s analysis. Anaconda is here too, in a position quite a bit higher than in Gartner’s plot.

The only two companies rated by Gartner but ignored by Forrester are Alteryx and DataRobot. The latter will no doubt be covered in Forrester’s report on automated modelers, due out this summer.

As with my coverage of Gartner’s report, my summary here barely scratches the surface of the two Forrester reports. Both provide insightful analyses of the vendors and the software they create. I recommend reading both (and learning more about open source software) before making any purchasing decisions.

To see many other ways to estimate the market share of this type of software, see my ongoing article, The Popularity of Data Science Software. My next post will update the scholarly use of data science software, a leading indicator. You may also be interested in my in-depth reviews of point-and-click user interfaces to R. I invite you to subscribe to my blog or follow me on twitter where I announce new posts. Happy computing!

Gartner’s 2019 Take on Data Science Software

I’ve just updated The Popularity of Data Science Software to reflect my take on Gartner’s 2019 report, Magic Quadrant for Data Science and Machine Learning Platforms. To save you the trouble of digging through all 40+ pages of my report, here’s just the updated section:

IT Research Firms

IT research firms study software products and corporate strategies. They survey customers regarding their satisfaction with the products and services and provide their analysis in reports that they sell to their clients. Each research firm has its own criteria for rating companies, so they don’t always agree. However, I find the detailed analysis that these reports contain extremely interesting reading. The reports exclude open source software that has no specific company backing, such as R, Python, or jamovi. Even open source projects that do have company backing, such as BlueSky Statistics, are excluded if they have yet to achieve sufficient market adoption. However, they do cover how company products integrate open source software into their proprietary ones.

While these reports are expensive, the companies that receive good ratings usually purchase copies to give away to potential customers. An Internet search of the report title will often reveal companies that are distributing them. On the date of this post, Datarobot is offering free copies.

Gartner, Inc. is one of the research firms that write such reports. Out of the roughly 100 companies selling data science software, Gartner selected 17 which offered “cohesive software.” That software performs a wide range of tasks including data importation, preparation, exploration, visualization, modeling, and deployment.

Gartner analysts rated the companies on their “completeness of vision” and their “ability to execute” that vision. Figure 3a shows the resulting “Magic Quadrant” plot for 2019, and 3b shows the plot for the previous year. Here I provide some commentary on their choices, briefly summarize their take, and compare this year’s report to last year’s. The main reports from both years contain far more detail than I cover here.

Figure 3a. Gartner Magic Quadrant for Data Science and Machine Learning Platforms from their 2019 report (plot done in November 2018, report released in 2019).

The Leaders quadrant is the place for companies whose vision is aligned with their customer’s needs and who have the resources to execute that vision. The further toward the upper-right corner of the plot, the better the combined score.

RapidMiner and KNIME reside in the best part of the Leaders quadrant this year and last. This year RapidMiner has the edge in ability to execute, while KNIME offers more vision. Both offer free and open source versions, but the companies differ quite a lot on how committed they are to the open source concept. KNIME’s desktop version is free and open source and the company says it will always be so. On the other hand, RapidMiner is limited by a cap on the amount of data that it can analyze (10,000 cases) and as they add new features, they usually come only via a commercial license with “difficult-to-navigate pricing conditions.” These two offer very similar workflow-style user interfaces and have the ability to integrate many open sources tools into their workflows, including R, Python, Spark, and H2O.
Tibco moved from the Challengers quadrant last year to the Leaders this year. This is due to a number of factors, including the successful integration of all the tools they’ve purchased over the years, including Jaspersoft, Spotfire, Alpine Data, Streambase Systems, and Statistica.
SAS declined from being solidly in the Leaders quadrant last year to barely being in it this year. This is due to a substantial decline in its ability to execute. Given SAS Institute’s billions in revenue, that certainly can’t be a financial limitation. It may be due to SAS’ more limited ability to integrate as wide a range of tools as other vendors have. The SAS language itself continues to be an important research tool among those doing complex mixed-effects linear models. Those models are among the very few that R often fails to solve.

The companies in the Visionaries Quadrant are those that have good future plans but which may not have the resources to execute that vision.

Mathworks moved forward substantially in this quadrant due to MATLAB’s ability to handle unconventional data sources such as images, video, and the Internet of Things (IoT). It has also opened up more to open source deep learning projects.
H2O.ai is also in the Visionaries quadrant. This is the company behind the open source H2O software, which is callable from many other packages or languages including R, Python, KNIME, and RapidMiner. While its own menu-based interface is primitive, its integration into KNIME and RapidMiner makes it easy to use for non-coders. H2O’s strength is in modeling but it is lacking in data access and preparation, as well as model management.
IBM dropped from the top of the Visionaries quadrant last year to the middle. The company has yet to fully integrate SPSS Statistics and SPSS Modeler into its Watson Studio. IBM has also had trouble getting Watson to deliver on its promises.
Databricks improved both its vision and its ability to execute, but not enough to move out of the Visionaries quadrant. It has done well with its integration of open-source tools into its Apache Spark-based system. However, it scored poorly in the predictability of costs.
Datarobot is new to the Gartner report this year. As its name indicates, its strength is in the automation of machine learning, which broadens its potential user base. The company’s policy of assigning a data scientist to each new client gets them up and running quickly.
Google’s position could be clarified by adding more dimensions to the plot. Its complex collection of a dozen products that work together is clearly aimed at software developers rather than data scientists or casual users. Simply figuring out what they all do and how they work together is a non-trivial task. In addition, the complete set runs only on Google’s cloud platform. Performance on big data is its forte, especially problems involving image or speech analysis/translation.
Microsoft offers several products, but only its cloud-only Azure Machine Learning (AML) was comprehensive enough to meet Gartner’s inclusion criteria. Gartner gives it high marks for ease-of-use, scalability, and strong partnerships. However, it is weak in automated modeling and AML’s relation to various other Microsoft components is overwhelming (same problem as Google’s toolset).

Figure 3b. Last year’s Gartner Magic Quadrant for Data Science and Machine Learning Platforms (January, 2018)

Those in the Challenger’s Quadrant have ample resources but less customer confidence in their future plans, or vision.

Alteryx dropped slightly in vision from last year, just enough to drop it out of the Leaders quadrant. Its workflow-based user interface is very similar to that of KNIME and RapidMiner, and it too gets top marks in ease-of-use. It also offers very strong data management capabilities, especially those that involve geographic data, spatial modeling, and mapping. It comes with geo-coded datasets, saving its customers from having to buy it elsewhere and figuring out how to import it. However, it has fallen behind in cutting edge modeling methods such as deep learning, auto-modeling, and the Internet of Things.
Dataiku strengthed its ability to execute significantly from last year. It added better scalability to its ease-of-use and teamwork collaboration. However, it is also perceived as expensive with a “cumbersome pricing structure.”

Members of the Niche Players quadrant offer tools that are not as broadly applicable. These include Anaconda, Datawatch (includes the former Angoss), Domino, and SAP.

Anaconda provides a useful distribution of Python and various data science libraries. They provide support and model management tools. The vast army of Python developers is its strength, but lack of stability in such a rapidly improving world can be frustrating to production-oriented organizations. This is a tool exclusively for experts in both programming and data science.
Datawatch offers the tools it acquired recently by purchasing Angoss, and its set of “Knowledge” tools continues to get high marks on ease-of-use and customer support. However, it’s weak in advanced methods and has yet to integrate the data management tools that Datawatch had before buying Angoss.
Domino Data Labs offers tools aimed only at expert programmers and data scientists. It gets high marks for openness and ability to integrate open source and proprietary tools, but low marks for data access and prep, integrating models into day-to-day operations, and customer support.
SAP’s machine learning tools integrate into its main SAP Enterprise Resource Planning system, but its fragmented toolset is weak, and its customer satisfaction ratings are low.

To see many other ways to rate this type of software, see my ongoing article, The Popularity of Data Science Software. You may also be interested in my in-depth reviews of point-and-click user interfaces to R. I invite you to subscribe to my blog or follow me on twitter where I announce new posts. Happy computing!

Updated Review: jamovi User Interface to R

Last February I reviewed the jamovi menu-based front end to R. I’ve reviewed five more user interfaces since then, and have developed a more comprehensive template to make it easier to compare them all. Now I’m cycling back to jamovi, using that template to write a far more comprehensive review. I’ve added this review to the previous set, and I’m releasing it as a blog post so that it will be syndicated on R-Bloggers, StatsBlogs, et al.

Introduction

jamovi (spelled with a lower-case “j”) is a free and open source graphical user interface for the R software that targets beginners looking to point-and-click their way through analyses. It is available for Windows, Mac, Linux, and even ChromeOS. Versions are also planned for servers and tablets.

This post is one of a series of reviews which aim to help non-programmers choose the Graphical User Interface (GUI) for R that is best for them. Additionally, these reviews include cursory descriptions of the programming support that each GUI offers.

Terminology

There are various definitions of user interface types, so here’s how I’ll be using these terms:

GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.

Installation

The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as BlueSky Statistics or RKWard, install in a single step. Others install in multiple steps, such as R Commander (two steps), and Deducer (up to seven steps). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

jamovi’s single-step installation is extremely easy and includes its own copy of R. So if you already have a copy of R installed, you’ll have two after installing jamovi. That’s a good idea though, as it guarantees compatibility with the version of R that it uses, plus a standard R installation by itself is harder than jamovi’s. Python is also installed with jamovi, but it is used only for internal purposes. You can directly control only R through jamovi.

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Deducer) to very high (R Commander).

For jamovi, plug-ins are called “modules” and they are found in the “jamovi library” rather than on the Comprehensive R Archive Network (CRAN) where R and most of its packages are found. This makes locating and installing jamovi modules especially easy.

Although jamovi is one of the most recent GUIs to appear on the R scene, it has already attracted a respectable number of developers. The list of modules at publication time is listed below. You can check on the latest ones on this web page.

Base R – converts jamovi analyses into standard R functions
blandr – Bland-Altman method comparison analysis, and is also available as an R package from CRAN
Death Watch – survival analysis
Distraction – quantiles and probabilities of continuous and discrete distributions
GAMLj – general linear model, linear mixed model, generalized linear models, etc.
jpower – power analysis for common research designs
Learning Statistics with jamovi – example data sets to accompany the book learning statistics with jamovi
MAJOR – meta-analysis based on R’s metafor package
medmod – basic mediation and moderation analysis
jAMM – advanced mediation analysis (similar to the popular Process Macro for SAS and SPSS)
R Data Sets
RJ – editor to run R code inside jamovi
scatr – scatter plots with marginal density or box plots
Statkat – helps you choose a statistical test.
TOSTER – tests of equivalence for t-tests and correlation
Walrus – robust descriptive stats & tests
jamovi Arcade – hangman & blackjack games

Startup

Some user interfaces for R, such as BlueSky and Rkward, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and then call a function to finally activate the GUI. That’s more appropriate for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.

You start jamovi directly by double-clicking its icon from your desktop, or choosing it from your Start Menu (i.e. not from within R itself). It interacts with R in the background; you never need to be aware that R is running.

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including BlueSky, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a dataset from within it.

jamovi’s data editor appears at start-up (Figure 1, left) and prompts you to enter data with an empty spreadsheet-style data editor. You can start entering data immediately, though at first, the variables are simply named A, B, C….

To change metadata, such as variable names, you double click on a name, and window (Figure 2) will slide open from the top with settings for variable name, description, measurement level (continuous, ordinal, nominal, or ID), data type (integer, decimal, text), variable levels (labels) and a “retain unused levels” switch. Currently, jamovi has no date format, which is a serious limitation if you deal with that popular data format.

jamovi data editor settings — Figure 2. The jamovi data editor with the variable attributes window open, allowing you to make changes.

When choosing variable terminology, R GUI designers have two choices: follow what most statistics books use, or instead use R jargon. The jamovi designers have opted for the statistics book terminology. For example, what jamovi calls categorical, decimal, or text are called factor, numeric, or character in R. Both sets of terms are fairly easy to learn, but given that some jamovi users may wish to learn R code, I find that choice puzzling. Changing variable settings can be done to many variables at once, which is an important time saver.

You can enter integer, decimal, or character data in the editor right after starting jamovi. It will recognize those types and set their metadata accordingly.

To enter nominal/factor data, you are free to enter numbers, such as 1/2 and later set levels to see Male/Female appear. Or you can set it up in advance and enter the numbers which will instantly turn into labels. That is a feature that saves time and helps assure accuracy. All data editors should offer that choice!

Adding variables or observations is as simple as scrolling beyond the set’s current limits and entering additional data. jamovi does not require “add more” buttons as some of its competitors (e.g. BlueSky) do. Adding variables or observations in between existing ones is also easy. Under the “Data” tab, there are two sets of “Add” and “Delete” buttons. The first set deals with variables and the second with cases. You can use the first set to insert, compute, transform variables or delete variables. The second inserts, appends, or deletes cases. These two sets of buttons are labeled “Variables” and “Rows”, but the font used is so small that I used jamovi for quite a while before noticing these labels.

Data Import

The ability to import data from a wide variety of formats is extremely important; you can’t analyze what you can’t access. Most of the GUIs evaluated in this series can open a wide range of file types and even pull data from relational databases. jamovi can’t read data from databases, but it can import the following file formats:

Comma Separated Values (.csv)
Plain text files (.txt)
SPSS (.sav, .zsav, .por)
SAS binary files (.sas7bdat, .xpt)
JASP (.jasp)

While jamovi doesn’t support true date/time variables, when you import a dataset that contains them, it will convert them to an integer value representing the number of days since 1970-01-01 and assign them labels in the YYYY-MM-DD format.

Data Export

The ability to export data to a wide range of file types helps when you have to use multiple tools to complete a task. Research is commonly a team effort, and in my experience, it’s rare to have all team members prefer to use the same tool. For these reasons, GUIs such as BlueSky and Deducer offer many export formats. Others, such as R Commander and RKward can create only delimited text files.

A fairly unique feature of jamovi is that it doesn’t save just a dataset, but instead it saves the combination of a dataset plus its associated analyses. To save just the dataset, you use the menu (a.k.a. hamburger) menu to select “Export” then “Data.” The export formats supported are the same as those provided for import, except for the more rarely-used ones such as SAS xpt and SPSS por and zsav:

Comma Separated Values (.csv)
Plain text files (.txt)
SPSS (.sav)
SAS binary files (.sas7bdat)

Data Management

It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be sorted, stacked, merged, aggregated, transposed, or reshaped (e.g. from “wide” format to “long” and back).

A critically important aspect of data management is the ability to transform many variables at once. For example, social scientists need to recode many survey items, biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time is tedious.

Some GUIs, such as BlueSky and R Commander can handle nearly all of these tasks. Others, such as RKWard handle only a few of these functions.

jamovi’s data management capabilities are minimal. You can transform or recode variables, and doing so across many variables is easy. The transformations are stored in the variable itself, making it easy to see what it was by double-clicking its name. However, the R code for the transformation is not available, even in with Syntax Mode turned on.

You can also filter cases to work on a subset of your data. However, jamovi can’t sort, stack, merge, aggregate, transpose, or reshape datasets. The lack of combining datasets may be a result of the fact that jamovi can only have one dataset open in a given session.

Menus & Dialog Boxes

The goal of pointing and clicking your way through an analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as BlueSky, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.

jamovi uses standard menu choices for running steps listed on the Data and Analyses tabs. Dialog boxes appear and you select variables to place into their various roles. This is accomplished by either dragging the variable names or by selecting them and clicking an arrow located next to the particular role box. A unique feature of jamovi is that as soon as you fill in enough options to perform an analysis, its output appears instantly. There is no “OK” or “Run” button as the other GUIs reviewed here have. Thereafter, every option chosen adds to the output immediately; every option turned off is removed.

While nearly all GUIs keep your dialog box settings during your session, jamovi keeps those settings in its main “workspace” file. This allows you to return to a given analysis at a future date and try some model variations. You only need to click on the output of any analysis to have the dialog box appear to the right of it, complete with all settings intact.

Under the triple-dot menu on the upper right side of the screen, you can choose to run “Syntax Mode.” When you turn that on, the R syntax appears immediately, and when you turn it off, it vanishes just as quickly. Turning on syntax mode is the only way a jamovi user would be aware that R is doing the work in the background.

Output is saved by using the standard “Menu> Save” selection.

Documentation & Training

The jamovi User Guide covers the basics of using the software. The Resources by the Community web page provides links to a helpful array of documentation and tutorials in written and video form.

Help

R GUIs provide simple task-by-task dialog boxes which generate much more complex code. So for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exception that is the R Commander, which lacks help on the dialog boxes themselves.

jamovi doesn’t offer any integrated help files, only the documentation described in the Documentation & Training section above. The search for help can become very confusing. For example, after doing the scatterplot shown in the next section, I wondered if the scat() function offered a facet argument, normally this would be an easy question to answer. My initial attempt was to go to RStudio, load jamovi’s jmv package knowing that I routinely get help from it. However, the scat() function is not built into jamovi (or jmv); it comes in the scatr add-on module. So I had to return to jamovi and install Rj Editor module. That module lets you execute R code from within jamovi. However, running “help(scat)” yielded no result. After so much confusion, I never was able to find any help on that function. Hopefully, this situation will improve as jamovi matures.

Graphics

The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as BlueSky, focus on R’s popular ggplot graphics. GUIs also differ quite a lot in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it.

jamovi uses its own graphics functions to create plots. By default, they have the look of the popular ggplot2 package. jamovi is the only R GUI reviewed that lets you set the plot style in advance, and all future plots will use that style. It does this using four popular themes. jamovi also lets you choose color palettes in advance, from a set of eight.

[Continued…]

BlueSky Statistics 5.40 GUI for R Update

It has been just a few months since I reviewed five free and open-source point-and-click graphical user interfaces (GUIs) to the R language. I plan to keep those reviews up to date as new features are added. BlueSky’s interface would be immediately familiar to anyone who had used SPSS, as its developers model it on that popular software. The BlueSky developers’ goal is to help people use R without having to learn to write computer code.

While the previous version of BlueSky offered dozens of fairly advanced modeling methods such as generalized linear models, random forests, and support vector machines, it lacked some simpler features. Version 5.40 (correction: this was previously listed as version 6.04) adds a dialog for logistic regression, which is essentially its glm dialog simplified to do only logistic regression.

The Multi-Way ANOVA output has also been greatly enhanced, with the addition of a wide range of contrasts from the emmeans package, support for all three types of sums of squares, and plots for both post-hoc main-effects comparisons and interaction plots like this one:

Users of RStudio will be pleased that the BlueSky’s program editor now submits lines of code like RStudio does so you can step your way through a program line-by-line, by clicking the Run button repeatedly.

The new version also has functions to do string-to-date vice versa, which led me to realize I had totally missed the string and date functions that it already had. In the “Data> Compute” dialog, the functions for Arithmetic, Logical, Math, and String(1) are visible. But if you click on the “>>” arrow on the right, you’ll also see String (2), Conversion, Statistical, Random Numbers, and four different menus of Date functions.

The complete set of new features and bug fixes is below. You can read my full review of BlueSky here, and you download the software for free here. I plan to write about new features in other R GUIs, so stay tuned to my blog or follow me on twitter where I announce new posts. Happy computing!

NEW FEATURES:
=============

1) Added support for weighted datasets. This option is available in Data -> Set Weights. Once you specify the weighting variable we create a new dataset with rows replicated as defined in the weights. This is similar to what SPSS does internally. When you run frequencies, independent sample t-tests, graphics commands, statistical tests etc on the new dataset (in BlueSky Statistics) with the rows replicated you will see results identical to those in SPSS.

2) Added the option to specify a weighting variable in Linear and Logistic Regression. This allows an optional vector of weights to be used during the fitting process (i.e. the weighted least-squares solution).

3) Added support for logistic regression under ‘Model Fitting’. Once the model is built, you can score the dataset, optionally obtain a confusion matrix, model statistics, and a ROC curve by selecting the model and clicking score. This is available on the top right-hand corner of the main application window.

4) We have updated the “Multi Way Anova” dialog with following capabilities:

display contrasts.
interaction plots.
support for type I, type II and type III tests.
pairwise comparison.

5) New reshape dialog with simplified R syntax has been added, using the tidyr package.

6) The “Multi variable one sample T-Test” and “Multi variable independent sample T-Test(with factor)” have been updated to allow you to specify the alternative hypothesis.

7) Added capabilities to support date manipulations.

The “String to date” dialog allows you to convert string to POSIXct date class.
The “Date to string” dialog allows you to convert the date (POSIXct and Date class) to string.

8) Simplified the syntax for frequencies and factor analysis.

9) If you launch a second instance of BlueSky Statistics the message that gets displayed has been improved. NOTE: You can only have one BlueSky Statistics instance running at a time.

10) We have improved the ability to browse the contents of the output window in BlueSky Statistics. This can be accessed from the menu (Layout > Show Navigation Tree).

11) Items in the output window can now be deleted. To delete an item from the output, just right click on the item(table/text/graphics) and choose the “Delete” option.

12) To make the output visually more appealing, we have introduced an option to hide the R syntax that gets displayed in the output window. This is controlled by an option in the configuration window ( Tools > Configuration Settings > Output tab ). By default we hide the R syntax in the output window.

13) When the option to show R syntax in the output is turned on ( Tools > Configuration Settings > Output tab ) and you resize the output window, we wrap the R syntax so that it is always visible.

14) To run any line of R syntax just place the cursor on that line and hit the RUN button, you don’t have to select the entire line.

15) If you want to run your R script line by line, place your cursor on the first line and hit RUN, the cursor will automatically move to the next line and you can hit the RUN button again.

This feature will work for simple R syntax which does not span multiple lines.

Example 1: 3 lines below work

a=10

b=20

c=a+b

Example 2: 4 lines below will not work

if(TRUE)

{

print(“Great”)

}

16)Added helpful hints to indicate you have reached the beginning and end of a dataset when scrolling wide datasets. This is available on the paging controls on the bottom right-hand corner of the screen.

17) Application launch is now faster.

18) In the BlueSky Statistics syntax editor, just like a comma, the pipe (%>%) can be used to break a long R code statement.

19) When the application launches, we open a new blank dataset this has been populated with zeros. You can right click on a row to delete a row or go into ‘Data -> Delete Variables’ to delete variables.

20) Clicking on a variable name in the data grid sorts in ascending order by that variable name. Clicking again sorts in descending order.

BUG FIXES:

=============

1) Fixed an issue with factor analysis when saving scores using the regression method.

2) Fixed an issue when re-editing factor levels that were previously changed in the variable grid. This would result in the dialog not functioning correctly and incorrect levels being added.

3) Fixed an issue when you closed the empty dataset that gets created when BlueSky Statistics is launched and then attempted to save open datasets- those datasets would not get saved correctly.

4) Fixed an issue that was limiting the number of variables that Factor Analysis can be run across.

5) Within block commands that use ‘local()’, cat(“\n”) can now be printed in the output to leave some extra spaces.

6) When you added a new factor variable and renamed the variable using the user interface and then tried to add new levels – this did not work and has been fixed.

7) Add new factor variable. Click in the cell where the new variable name is shown. Cell goes in edit mode. Now add new factor levels to this new variable. Switch to the data grid and select a different level. Switch back to ‘Variables’ tab and the application crashes. This has been fixed.

8) When any existing factor level name is modified using the user interface, a blank level automatically gets added. If you try to modify the level, it does not take effect. This has been addressed.

9) Disable data grid navigation buttons (on the lower right-hand side of the data grid) if there are less than 16 columns in the dataset.

10) Changing factor levels was not working because one or more levels had a single quote in the level name.

11) Aggregate control fixed: Text above the drop-down (that contains mean, median etc.) was getting chopped off. Similar issue with the label text above a textbox (which was almost at the bottom of the dialog).

12) Fixed significance codes for:

One Sample T Test and Independent Sample T Test
Multivariable one sample t-test and Multivariable Independent one sample t-test (with factor)

13) Fixed a defect: Select some syntax and hit RUN. After execution, the cursor goes to the top of the script. Now it moves to the next line.

14) Data grid navigation buttons are disabled if either end of the datagrid is reached. If there are no more columns on the right, the right navigation button is disabled. If there are no more columns on the left then left navigation button is disabled.

15) Left navigation tree in the output window is fixed for look and feel. It now has a cleaner look. To access left navigation, go to Layout -> Show navigation tree

16) In the R syntax editor, we now ignore square, curly and round brackets that appear inside single or double quotes. See example below:

grepl(“[“, “a[b”, fixed = TRUE)