by Robert A. Muenchen
The ability to export data to a wide range of file types helps when you have to use multiple tools to complete a task. Research is commonly a team effort, and in my experience, it’s rare to have all team members prefer to use the same tool. For these reasons, GUIs such as BlueSky and Deducer offer many export formats. Others, such as R Commander and RKward can create only delimited text files.
A fairly unique feature of jamovi is that it doesn’t save just a dataset, but instead it saves the combination of a dataset plus its associated analyses. To save just the dataset, you use the menu (a.k.a. hamburger) menu to select “Export” then “Data.” The export formats supported are the same as those provided for import, except for the more rarely-used ones such as SAS xpt and SPSS por and zsav:
- Comma Separated Values (.csv)
- Plain text files (.txt)
- SPSS (.sav)
- SAS binary files (.sas7bdat)
It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be sorted, stacked, merged, aggregated, transposed, or reshaped (e.g. from “wide” format to “long” and back).
A R user who is familiar with analysis of variance would not recognize that code but would see that it was a single function call from one package rather than the usual half-dozen function calls from as many packages. It’s much simpler to learn, though less flexible. (Note that jmv functions do follow the tidyverse’s ability to accept piped input since the data argument is in the first position.)
Support for Programmers
Some of the GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.
jamovi’s main mission is to make their point-and-click GUI comprehensive. However, it does offer a plug-in module named Rj. It provides syntax highlighting (color coding), and basic code completion.
Reproducibility & Sharing
One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written report so that others can check their work.
As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since GUI users don’t write the code, they also can’t read it or change it! Others such as JASP and the newest version of SPSS, save the dialog box entries and allow GUI users to have reproducibility in the form they prefer.
jamovi records the steps of all analyses, providing exact reproducibility. In addition, if you update a data value, all the analyses that used that variable are recalculated instantly. That’s helpful especially for people coming from Excel who expect this to happen. In addition, jamovi allows you to create a “template” that can rerun an entire sets of analyses on a new dataset that has the same variables.
If you wish to share your work with a colleague, and they’re jamovi users, you need only your jamovi workspace file. It contains everything they need in a single file.
If your colleague is an R coder, you could export your dataset to whichever file format they need and save your jamovi code. Code export is a tedious, one-analysis-at-a-time copy-paste process. While the data management code is not exportable, the dataset itself will contain the results of those transformations. Your colleague could then install the jmv package from CRAN to run your code. Note that at the moment, the Bayesian analyses cannot display their R code, but the developers are aware of the issue and plan to fix it.
A topic related to reproducibility is package management. One of the major advantages to the R language is that it’s very easy to extend its capabilities through add-on packages. However, updates in these packages may break a previously functioning analysis. Years from now you may need to run a variation of an analysis, which would require you to find the version of R you used, plus the packages you used at the time. As a GUI user, you’d also need to find the version of the GUI that was compatible with that version of R.
Some GUIs, such as the R Commander and Deducer, depend on you to find and install R. For them, the problem is left for you to solve. Others, such as BlueSky, distribute their own version of R, all R packages, and all of its add-on modules. This requires a bigger installation file, but it makes dealing with long-term stability as simple as finding the version you used when you last performed a particular analysis. Of course, this depends on all major versions being around for long-term, but for open-source software, there are usually multiple archives available to store software even if the original project is defunct.
jamovi’s approach to package management is halfway between the above two approaches. It provides nearly everything you need in a single download. This includes the jamovi interface, the jmv package, and its dependencies, R itself, and even a version of Python which is used internally. So for the base package, you’re all set. What remains to be seen is how they manage the dates for the add-on modules. While this is a current problem with long-term reproducibility, the jamovi developers are working on solving that problem.
Output & Report Writing
Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into the realm of word processing through R Mardown, knitr or Sweave documents. At the moment, none of the GUIs covered in this series of reviews meets all of these requirements. See the separate reviews to see how each of the other packages is doing on this topic.
The labels for each of jamovi’s analyses are provided by its menu title. However, this title cannot be changed. So if you perform several of the same type of analysis in a row, it’s hard to tell them apart. Also, there is no way to add comments or notes in the output.
The organization of the output is in time-order only. You can delete an analysis, but you cannot move it into an order that may make more sense after you see it.
While such tables of contents are commonly used in GUIs to let you jump directly to a section, or to re-order, rename, or delete bits of output, that feature is not available in jamovi.
Those limitations aside, jamovi’s output quality is very high, with nice fonts and true rich text tables (Figure 4). Tabular output is displayed in the popular style of the American Psychological Association. That means you can right-click on any table and choose “Copy" and the formatting is retained. That really helps speed your work as R output defaults to mono-spaced fonts that require additional steps to get into publication form (e.g. using functions from packages such as xtable or texreg). You can also export an entire set of analyses to HTML, then open the nicely-formatted tables in Word.
Figure 4. Output cut from jamovi & pasted into Microsoft Word. The font was changed just to show that it’s a true table.
Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide an ability to select a subset one group to analyze, then another subset to compare it to. All the R GUIs reviewed in this series can do this task. jamovi does single-group selections using “Data> Filters". It generates a subset that you can analyze in the same way as the entire dataset.
Software also needs the ability to automate such selections so that you might generate dozens of analyses, one group at a time. While this has been available in commercial GUIs for decades (e.g. SPSS “split-file”, SAS “by” statement), BlueSky is the only R GUI reviewed here that includes this feature. The closest jamovi gets on this topic is to offer a “Split by” variable selection box in its Descriptives procedure.
Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g. predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g. SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, for the simplification that GUIs provide, that’s a challenge.
Output data can be observation-level, such as predicted values for each observation or case. When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group, rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).
Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling" above.)
For example, in our organization, we have 250 departments and want to see if any of them have a gender bias on salary. We write all 250 regression models to a data set, and then search to find those whose gender parameter is significant (hoping to find none, of course!)
BlueSky is the only R GUI reviewed here that does all three levels of output management. jamovi not only lacks these three levels of output management, it even lacks the fundamental observation-level saving that SAS and SPSS offered in their first versions back in the early 1970’s. This entails saving predicted values or residuals from regression, scores from principal components analysis or factor analysis. I hope that these basics are added to a future version of jamovi soon.
The jamovi development team encourages people to develop add-on modules to expand jamovi’s capabilities. It does this through a Developer’s Hub which includes a Getting Started guide. The team also offers live workshops for developers and provides the jamovi Library which lets you publish your modules and make them easily installable by all jamovi users.
jamovi is a gem of a package, one that looks so good I asked the developers if they had an artist or user interface designer on the team. They don’t, but clearly, they have put a lot of thought into how to make the software beautiful and easy to use. They have also chosen their options carefully so that each analysis includes what a researcher would want to see. jamovi’s ability to save analysis templates for reuse is a feature that all GUI users will find helpful as it makes the work re-usable without having to save code. Another particular strength to note is jamovi’s support of Bayesian analysis (borrowed from JASP).
Their creation of the jmv package is a bold move, one that promises to greatly simplify the number of separate packages a coder would need to learn, though in so doing they challenge the existing way of working with R. Just as the tidyverse set of commands is controversial, the jmv package is also likely to ruffle some feathers.
As nice as jamovi is, it lacks important features, including: the ability to see and save data management syntax; the ability to handle date/time variables; the ability to perform many more fundamental data management tasks; the ability to save new variables such as predicted values or factor scores; the ability to save models so they can be tested on hold-out samples or new data sets.
For a summary of all my R GUI software reviews, see the article, R Graphical User Interface Comparison.
Thanks to Jonathan Love for his suggestions that improved this review. Thanks also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.