A Comparative Review of the R-Instat GUI for R

by Robert A. Muenchen, updated September 2, 2024

Introduction

R-Instat is a free and open-source graphical user interface for the R language that focuses on people who want to point-and-click through data science analyses. Written in Visual Basic, it is currently only available for Microsoft Windows. However, a Linux version is in development using the cross-platform Mono implementation of the .NET framework.

This is one of a series of reviews that aim to help non-programmers choose the Graphical User Interface (GUI) that is best for them. I have joined the BlueSky Statistics development team and written the BlueSky User Guide (online here), but you can trust this review series, as described here. All my comments below are easily verifiable. There is no perfect user interface for everyone; each GUI for R has features that appeal to different people.

Terminology

There are various definitions of user interface types, so here’s how I’ll be using the following terms. Reviewing R GUIs keeps me quite busy, so I don’t have time also to review all the IDEs, though my favorite is RStudio.

GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment, which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users prefer to write R code to perform their analyses.

Installation

The various user interfaces available for R differ greatly in how they’re installed. Some, such as jamovi or RKWard, are installed in a single step. Others, such as Deducer, install in multiple steps (up to seven, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

R-Instat is easy to install and requires only a single step. It provides its own embedded copy of R. This simplifies the installation and ensures complete compatibility between R-Instat and the version of R it’s using. However, it also means you’ll end up with a second copy if you already have R installed. You can have R-Instat control any R version you choose, but if the version differs too much, you may encounter occasional problems.

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” that add new menus and dialog boxes to the GUI. While the R-Instat project welcomes contributions from anyone, there are no modules to add now. All of its capabilities are included in its initial installation.

Startup

Some user interfaces for R, such as jamovi or JASP, start by double-clicking on a single icon, which is great for people who prefer not to write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and then finally call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn.

You start R-Instat directly by double-clicking its icon from your desktop or choosing it from your Start Menu (i.e., not from within R).

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner, what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create one or more data sets, you save a workspace, you open a workspace, and you choose a data set from within it.

R-Instat starts up by showing its screen (Fig. 1).

Figure 1. The R-Instat startup screen. The text is truncated because I shrunk the window to increase its visibility of this screenshot.

Under Start, I chose “New Data Frame” and it showed me the dialog displayed in Fig. 2. I filled in some variable names, and chose their types from the drop-down menu. These were all character initially so if I had not known what a factor was, it would still have let me enter data until I figured that out.

Figure 2. The new data frame dialog.

Clicking “Ok” brought up a data editor window filled with NA values for Not Available, or missing. I was then able to start entering the data. Scientific notation is accepted, but dates are saved as character variables. Logical values (TRUE, FALSE) are recognized as such and are stored appropriately. Clicking Help brought up a window that explained how a previous version of R-Instat worked. It recommended choosing “Empty,” an option replaced by “New” but it was otherwise easy to understand.

As you can see in Fig. 3, each data value I entered generated an R statement that made that happen. For users not looking to learn R, that’s an intimidating display. Most other R GUIs hide the R code unless the you ask to see it. However, the fact that it is saving R commands for every change is a valuable feature which adds to R-Instat’s reproducibility. People working in large organizations often read-access to data which they cannot correct. This enables them to read a data file and make corrections while tracking precisely what they did.

Figure 3. A new dataset with some values entered on the left and their matching R commands on the right.

Right-clicking on any column allows you to convert a variable into a factor, ordered factor, numeric, logical, or character. These changes are recorded as function calls to a custom “convert_column_to_type” function for reproducibility. Other R GUIs do not usually record such interactive changes. Date/time conversion is unavailable on that menu, as that process is trickier. Those conversions are on the “Prepare> Column Date” menu item. You can also do from the right-click menu: rename, duplicate, reorder, set levels/labels, sort, and filter/remove filter.

The class of each variable is indicated by a character code that follows each variable name in parenthesis: (C) for character, (F) for factor, (O.F) for ordered factor, (D) for date, (L) for logical. When no code follows a variable name, it is numeric.

The name of the dataset appears on a tab at the bottom of the Data View window. This lets you easily manage multiple datasets, an ability popular among professional data analysts.

Once the dataset is saved, to add rows or columns you choose, “Prepare > Data Frame > Insert rows/columns” to add new rows or columns at any position in the data frame. New columns can be added with a specified default value, which can be a big time-saver when entering blocks of related data.

There is a quicker method that works for inserting new rows. You right-click the row numbers, and a pop-up menu will allow you to insert rows above or below. The number of rows selected is the number of rows added, like in Excel.

If you have another data set to enter, you can restart the process by choosing “File> New Data Frame” again. You can change data sets simply by clicking on its tab, and its window will pop to the front for you to see. When doing analyses or saving data, the data set displayed in the editor does not influence what appears in dialog boxes. That means that you can be looking at one dataset while analyzing another! Since each dialog allows you to choose the dataset to use, that is technically not a problem, but if you have several datasets that contain the same variable names, remember that what you see may not be what you get! That’s the opposite of BlueSky Statistics, which automatically analyzes the dataset you see. R-Instat’s ability to work with multiple datasets in a single instance of the software is not a feature found in all R GUIs. For example, jamovi and JASP can only work with a single dataset at a time.

Data is saved with a fairly standard “File> Save As> Save Dataset As” menu. By default, it will save all open datasets, filters, graphs, and models to a single file called a “data book.” That makes working with complex projects much easier to open and close.

Data Import

R-Instat supports the following file formats, most of which are automatically opened using “File> Import from File.” The ODK and NetCDF file formats have their own Import menus. R-Instat’s ability to open many formats related to climate science hints at what the software excels at. For details, see the Analysis Methods section below.

  1. Comma Separated Values (.csv)
  2. Plain text files (.txt)
  3. Excel (old and new xls file types)
  4. xBASE database files (dBase, etc.)
  5. SPSS (.sav)
  6. SAS binary files (sas7bdat and *.xpt)
  7. Standard R workspace files (RData, but it just opens one dataframe of its choosing)
  8. Open Data Kit (ODK)
  9. OpenRefine
  10. Network Common Data Form (NetCDF)
  11. SST Sea Surface Temperature formatted files
  12. IRI Data Library (API download)
  13. Climate Data Store (CDS) (API download)
  14. Shapefile
  15. Climsoft (Climatic database)
  16. .dly (ASCII files)
  17. .dat (ASCII files)
  18. Tab Separated Values (.tsv)
  19. Stata (.dta)
  20. JSON (.json)
  21. epiinfo (.rec)
  22. Minitab (.mtb)
  23. Systat (.syd). 
  24. CSV with a YAML metadata header (.csvy)
  25. Feather R/Python interchange format (.feather)
  26. Pipe separated files (.psv)
  27. YAML (.yml)
  28. Weka Attribute-Relation File Format (.arff)
  29. Data Interchange Format (.dif)
  30. OpenDocument Spreadsheet (*.ods)
  31. Shallow XML documents (*.xml)
  32. Single-table HTML documents (*.html)

Data Export

The ability to export data to a wide range of file types helps when you, or other members of your research team, have to use multiple tools to complete a task. Unfortunately, this is a very weak area for most R GUIs. Deducer offers no data export at all, and R Commander and rattle can export only delimited text files (an earlier version of this listed jamovi as having very limited data export; that has now been expanded).

R-Instat has extensive export facilities. Multiple data sets can be exported:

  • In a single step, as a set of files
  • As a single Excel file with one data set per sheet 
  • As a list of data frames stored as a single RDS or RData file
  • As a single HTML file with data sets end-to-end

The file formats it can export to include:

  1. Comma-separated file (*.csv)
  2. Excel files (*.xlsx
  3. TAB-separated data (*.tsv)
  4. Pipe-separated data (*.psv)
  5. Feather r / Python interchange format (*.feather)
  6. Fixed-Width format data (*.fwf)
  7. Serialized r objects (*.rds)
  8. Saved r objects (*.RData)
  9. JSON (*.json)
  10. YAML (*.yml)
  11. Stata (*.dta)
  12. SPSS (*.sav)
  13. XBASE database files (*.dbf)
  14. Weka Attribute – Relation File Format (*.arff)
  15. r syntax object (*.R)
  16. Xml (*.xml)
  17. HTML (*.html)
  18. Matlab (*.mat)
  19. SAS (*.sas7bdat)
  20. SAS XPORT (*.xpt)

 Data Management

It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g., from wide to long and back). A critically important aspect of data management is the ability to transform many variables simultaneously. For example, social scientists need to recode many survey items; biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time is tedious work. Some GUIs, such as JASP and RKWard, handle only a few of these functions. Others, such as BlueSky and R Commander, handle nearly all of them.

R-Instat offers one of the most comprehensive sets of data management tools of any R GUI. Its dialogs do not require any knowledge of R code. A unique feature is found in “Prepare> Check Data> Non-Numeric Values.” That detects common data entry errors in which the letter “O” is entered in place of a zero or a lowercase letter “l” in place of the number one. It offers to mark where those errors occur and optionally create a copy of the dataset with those observations deleted. Here is the list of methods it offers (some are repeated under different menus; I’ve tried to eliminate duplicates):

  1. Prepare> Data Frame> Rename Column
  2. Prepare> Data Frame> Duplicate Column
  3. Prepare> Data Frame> Row Numbers/Names
  4. Prepare> Data Frame> Sort
  5. Prepare> Data Frame> Filter
  6. Prepare> Data Frame> Column Selection
  7. Prepare> Data Frame> Replace Values
  8. Prepare> Data Frame> Convert Columns
  9. Prepare> Data Frame> Reorder Columns
  10. Prepare> Data Frame> Insert Columns/Rows
  11. Prepare> Data Frame> Hide/Show Columns
  12. Prepare> Data Frame> Column Structure
  13. Prepare> Data Frame> Colour by Property
  14. Prepare> Check Data> Visualize Data
  15. Prepare> Check Data> Duplicates
  16. Prepare> Check Data> Compare Columns
  17. Prepare> Check Data> Non-Numeric Values
  18. Prepare> Check Data> Anonymise ID Column
  19. Prepare> Column Calculator
  20. Prepare> Column Numeric> Regular Sequence
  21. Prepare> Column Numeric> Enter
  22. Prepare> Column Numeric> Row Summaries
  23. Prepare> Column Numeric> Transform
  24. Prepare> Column Numeric> Polynomials
  25. Prepare> Column Numeric> Random Samples
  26. Prepare> Column Numeric> Permute Columns
  27. Prepare> Column Factor> Convert to Factor
  28. Prepare> Column Factor> Recode to Numeric
  29. Prepare> Column Factor> Count in Factor
  30. Prepare> Column Factor> Recode Factor
  31. Prepare> Column Factor> Combine Factors
  32. Prepare> Column Factor> Dummy Variables
  33. Prepare> Column Factor> Levels/Labels
  34. Prepare> Column Factor> View Labels
  35. Prepare> Column Factor> Reorder Levels
  36. Prepare> Column Factor> Reference Level
  37. Prepare> Column Factor> Unused Levels
  38. Prepare> Column Factor> Contrasts
  39. Prepare> Column Factor> Factor Data Frame
  40. Prepare> Column Text> Find/Replace
  41. Prepare> Column Text> Transform
  42. Prepare> Column Text> Split
  43. Prepare> Column Text> Combine
  44. Prepare> Column Text> Distance
  45. Prepare> Column Date> Generate Dates
  46. Prepare> Column Date> Make Date
  47. Prepare> Column Date> Fill Date Gaps
  48. Prepare> Column Date> Use Date
  49. Prepare> Column Define> Convert Columns
  50. Prepare> Column Define> Circular
  51. Prepare> Data Reshape> Column Summaries
  52. Prepare> Data Reshape> General Summaries
  53. Prepare> Data Reshape> Stack (Pivot Longer)
  54. Prepare> Data Reshape> Unstack (Pivot Wider)
  55. Prepare> Data Reshape> Merge
  56. Prepare> Data Reshape> Append (Bind Rows)
  57. Prepare> Data Reshape> Subset
  58. Prepare> Data Reshape> Random Subset
  59. Prepare> Data Reshape> Transpose
  60. Prepare> Data Reshape> Scale/Distance
  61. Prepare> Keys and Links> Add Key
  62. Prepare> Keys and Links> View and Remove Keys
  63. Prepare> Keys and Links> Add Link
  64. Prepare> Keys and Links> View and Remove Links
  65. Prepare> Keys and Links> Add Comment
  66. Prepare> Data Object> Rename Data Frame
  67. Prepare> Data Object> Reorder Data Frames
  68. Prepare> Data Object> Copy Data Frame
  69. Prepare> Data Object> Delete Data Frames
  70. Prepare> Data Object> Hide/Snow Data Frames
  71. Prepare> Data Object> Metadata
  72. Prepare> R Objects> View
  73. Prepare> R Objects> Rename
  74. Prepare> R Objects> Reorder
  75. Prepare> R Objects> Delete

Menus & Dialog Boxes

The goal of pointing & clicking your way through analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.

R-Instat uses standard menu choices for running steps listed on its menus. Dialog boxes appear, and you select variables for their various roles. Each dialog lets you choose the dataset on a drop-down menu. Variables appear in the usual variable list box, and clicking an Add button moves the selected variables to their roles. You cannot drag variables to the various role boxes. One role box is highlighted in blue when the dialog first appears, indicating where variables will be added. You can choose a different role box by clicking on it or using the tab key to sequence through the boxes. Once a box for a single variable is filled, it automatically highlights the next variable box.

The Describe and Model menus offer a helpful labeling system of “One Variable”, “Two Variables,” “Three Variables,” and “Four Variables.” These counts include the dependent variables, so the “Three Variables” menu would allow for a model with two predictors. The default settings vary depending on the class of the variable. For example, with a numeric variable and a 2-level factor, the “Two Variables” analysis would do either a t-test or logistic regression, depending on which variable was entered first. That’s nice, but people looking on menus for a t-test or logistic regression must catch on to this approach. While the “Four Variables” items appear, they have not yet been implemented.

Once assigned, you can remove individual variables from a role assignment box. For a single variable box, press the backspace key or right-click and choose Remove to clear the box. For a multiple-variable box, select one or more variables and press the backspace key, or right-click and choose Remove for one or Clear to remove all.

The output is saved using the standard “File > Save As” menu. The only supported output format is rich text format (RTF). However, R-Instat uses that only to set the font style and color. The output tables are stored in the monospaced Courrier New font, not in true word processing tables. That is a significant failing as BlueSky, jamovi, and JASP all offer true word processing tables in the style many journals prefer. That saves you a great deal of report preparation time.

Documentation & Training

R-Instat’s documentation is listed on this web page: http://r-instat.org/ReleaseNotes.html. It includes several written and video tutorials.

Help

R GUIs provide simple task-by-task dialog boxes that generate much more complex code. So, for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exceptions are jamovi, which offers no help, and the R Commander, which lacks help on the dialog boxes themselves.

R-Instat’s help files are accessed by clicking the “Help” button on each dialog box. Unfortunately, most of these lead to empty placeholders to be filled in future versions. Those that are already filled, e.g., for renaming variables, are clear for how the dialog box works but provide no documentation on controlling the R functions it uses. The developers say all three levels of help are planned for a future release.

Graphics

The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as jamovi, use their own functions and integrate them into analysis steps. GUIs also differ greatly in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. That’s how jamovi works, but then jamovi is limited to its custom graph functions, as nice as they may be.

You can generate plots in R-Instat using two different perspectives: by focusing on the number of variables involved or instead by focusing on the type of plot you wish to create. Since limiting the number of variables also limits the types of graphs that are usually considered appropriate, some may view that as the easier approach. In the “Describe> One/Two/Three Variables” menus, each provides a Graph menu choice that matches each number of variables. By choosing one variable, you’re offered bar charts, histograms, and the like. When choosing two variables, then R-Instat offers others, such as scatterplots or line plots.

You can create the same graphs by forgetting about the number of variables and instead focusing on the type of graph first. You do that under the “Describe> Specific Tables/Graphs” menu. There you will find a list of common graph types directly, such as bar, line, or scatter.

Regardless of which approach you choose to create a graph, R-Instat does most of its plots using the popular ggplot2 package. If you wish to learn R code for graphics, that’s the code you’ll see. Each graph dialog offers a “Plot Options” menu that allows you to modify the graphs in flexible and powerful ways. However, to do so requires an understanding of the Grammar of Graphics concepts, upon which R’s ggplot2 package is built. A comprehensive description of that is beyond our current scope, but it includes such complexities as a pie chart is just a bar chart with cartesian coordinates swapped out for polar ones!

One of R-Instat’s unique and useful features is its ability to save any graph, and then combine several into a single image. That makes publishing multiple graphs much easier! Currently, no other R GUI offers this feature, though the code is quite easy so I expect others will add it eventually.

Compiling a list of the plots R-Instat can do is challenging since it offers a nearly unlimited range. Here is an attempt to list the popular ones that were relatively easy to figure out how to do. Given its ability to combine plots in layers, these can be combined in many ways.

  1. Bar Chart of counts or pre-summarized values
  2. Boxplot
  3. Contour
  4. Density (continuous)
  5. Density (counts)
  6. Dumbbell
  7. Dot chart
  8. Frequency charts (factors)
  9. Frequency charts (numeric)
  10. Heatmap
  11. Histogram
  12. Line Chart
  13. Line Chart, stair-step plot
  14. Line Chart, variable order
  15. Maps: World Map
  16. Mosaic
  17. Parallel Coordinate Plot
  18. Pie Chart
  19. Plot of Means
  20. Polar Coordinate Plots
  21. P-P Plots
  22. Q-Q Plots
  23. Scatterplot
  24. Scatterplot matrix
  25. Strip Chart
  26. TreeMap
  27. Violin Plot
  28. Word Cloud
  29. Visualize dataset by variable type & missing values
  30. Stacked rating data
  31. Barchart of Likert variable percents
  32. Climatic> PISCA> General Graph
  33. Climatic> PICSA> Rainfall Graph
  34. Climatic> PICSA> Temperature Graph
  35. Climatic> PICSA> Cumulative/Exceedance Graph
  36. Climatic> PICSA> Crops
  37. Structured> Circular> Circular Plots
  38. Structured> Circular> Wind Rose
  39. Structured> Circular> Wind/Polution Rose
  40. Structured> Circular> Other Rose Plots

Let’s take a look at how R-Instat does scatterplots. Using the dialog box “Describe> Graphs> Scatter Plot” I chose only the “X” variable and the “Single Variable” (an odd name for the Y variable on this dialog), row facet factor, column facet factor, the type of smoothing fit, and checked a box to plot standard errors. The facets created six “small multiples” of the plot, making comparisons easy. Other R GUIs include the ability to do “large multiples” of plots by any number of other factor variables, e.g. BlueSky’s “Datasets> Group-by” dialog. R-Instat lacks that useful feature. Here is the code that R-Instat wrote, followed by the plot (Fig. 4) it made:

require(ggplot2);
require(ggthemes);
require(stringr);

## [Scatterplot (Points)]
ggplot(data=mydata100, aes(x=pretest,y=posttest)) +
	geom_point() +
	geom_smooth( method ="lm", alpha=1, se=TRUE,) +  
	labs(x="pretest",y="posttest",
title= "Scatterplot for X axis: pretest ,Y axis: posttest ") +
	xlab("pretest") +
	ylab("posttest") + 
	theme_gray() + 
	theme(text=element_text(family="serif",
	face="plain",
	color="#000000",size=12,
	hjust=0.5,vjust=0.5))
Figure 4. A faceted scatterplot created by R-Instat and the ggplot2 package.

There is some naming confusion, in which the menu called this a scatter plot, but the title in the output labeled it a point plot. That could make it hard to recall which menu choice was responsible for making the plot.

R-Instat exports graphs using the “File> Export> Export Graph as Image.” It offers the following file formats, which include almost any format you could need:

  1. JPEG
  2. PNG
  3. BitMap
  4. EPS
  5. Postscript
  6. SVG
  7. WMF
  8. PDF

Modeling

The way statistical models (which R stores in “model objects”) are created and used is an area on which R GUIs differ the most. The simplest and least flexible approach is taken by jamovi and RKWard. They try to do everything you might need in a single dialog box. They either don’t save models, or they do nothing with them. To an R programmer, that sounds extreme since R does a lot with model objects. However, neither SAS nor SPSS could save models for the first 35 years of existence, so each approach has its merits. For simple models like linear regression, standard compute statements can make predictions. Entering them manually is not much effort, and it saves you from learning what a model object is. However, some of the most powerful model types, such as neural networks, random forests, and gradient-boosting machines, are impossible to enter by hand.

R-Instat’s “Model> Fit Model” menu offers menus named from One Variable to Four Variables. Those counts include the dependent variable, so a three-variable model would allow for two independent variables. By default, additive models are fit, but a Model Operator menu allows you to choose options such as the “:” to generate interactions. However, it does not explain what those operators do, so you must learn what they mean in R if you wish to do anything beyond a simple additive model. A Model Preview window shows you the model it is writing, and it also allows you to type your model in, assuming you know R model syntax.

Each dialog also allows you to choose the distributions from: Normal, Poisson, Gamma, Inverse Gaussian, Quasi, and Quasi-Poisson. Additionally, each allows the choice of link functions: identity, log, logit, cloglog, sqrt, 1/mu^2, Cauchit, Probit, and Inverse. Such complex options are usually divided into separate focused dialogs in the other R GUIs.You make your selections and click the Return button to exit to the main dialog.

An alternative approach to model building uses the General menu. It simply offers an Explanatory Model field rather than prompts for first and second predictor variables. The model-building tools are basic, with the Add button moving a variable and other buttons adding operators such as “+” or “/”. Other software, such as jamovi and BlueSky offer ways to enter a set of variables, each separated by any operator you choose. They also let you add all possible 2-, 3-, or N-way interactions to models, which speeds up your work with complex models.

The third approach to model building is the Hypothesis Test Keyboard, shown in Fig. 5. I’m using it to perform a t-test. This is the dialog used for the most common statistical tests. I opened it using “Model> Fit Model> Hypothesis Test Keyboards.” The “Stats1” R package was the default, and I checked the “Include Arguments” box. When I clicked on the “t” button to perform a t-test, the full R syntax for the “t.test.” function appeared in the Test box. I applied my knowledge of R and replaced the “x=” argument that it had offered with “posttest ~ gender” to compare males and females on a posttest score. I used the Add and “~” buttons, but it would have been faster to simply type the formula without that assistance. For such widely-used tests, this dialog offers nothing like the easy-to-use dialogs of all of the other R GUIs reviewed in this series.

Figure 5. The Hypothesis Tests Keyboard set to perform a t-test.

A fourth approach involves the use of a Model Keyboard, shown in Fig. 6. Here, I’m using it to perform a robust linear regression. Knowing that the MASS package offers it, I chose that from the Package menu. I then clicked the Include Arguments checkbox, then I clicked the rlm button. The full syntax is filled in as shown. I typed in “posttest ~ pretest” as my model and clicked OK to run it. However, the default settings of the other arguments generated an error, so I deleted all arguments except for the formula. I did it this way to demonstrate that while R-Instat provides valuable assistance, you must still know the basics of the R language to make the most of it.

Figure 6. The Modelling Keyboard set to perform a robust linear regression using
the MASS package’s rlm function.

The models R-Instat creates are stored in a unique “data book” structure. This approach makes it easy to use models within R-Instat and easy to save them along with the dataset(s) in its databook format. There are also ways to export models from the data books into standard R objects for use outside R-Instat.

The menu item “Model> Compare Models> One Variable” lets you compare two models, but only those that compare a single variable to a distribution. The developers plan to expand this to all model types that R-Instat can create (where mathematically possible).

The “Model> Use Model” menu lets you use saved models to do things like making predictions on new datasets. This is often the main point of creating a model, so I find it surprising that only a few R GUIs provide this capability! This menu also includes the very useful ability to glance, tidy, or augment a model. This is the broom package’s terminology for summarizing a model at the model, parameter, or observation level. The addition of a “by” factor would be helpful for those.

To summarize, R-Instat trades off ease of use for power. While most other GUIs prevent you from knowing any R code, they must provide a dialog for every situation. R-Instat’s approach is very general, allowing you to run a vast array of models from just a handful of dialogs. However, you must know the R packages and how their functions work even when performing basic statistical tests.

Analysis Methods

All of the R GUIs offer a set of statistical analysis methods. Some also offer machine learning methods. As you can see in the table below, R-Instat offers an extensive set of statistical analysis methods but no machine learning. As with R-Instat’s graphics, its analytic ability can access the same analysis in multiple ways. Here is a comprehensive list of its analysis methods in which I list each technique once. Remember that many of these methods simply generate R code that you will need to edit to control options.

  1. Describe> One Variable> Summary stats
  2. Describe> One Variable> Frequencies
  3. Describe> One Variable> Rating data (frequencies & percents on many vars in one table)
  4. Describe> Two Variables> Frequencies does crosstabs but no tests
  5. Describe> Three Variables> Frequencies does crosstabs but no CMH test
  6. Describe> Three Variables> Pivot Table creates interactive tables
  7. Describe> Multivariate> Correlations – Pearson
  8. Describe> Multivariate> Correlations – Nonparametric Kendall, Spearman
  9. Describe> Multivariate> Principle Components
  10. Describe> Multivariate> Canonical Correlations
  11. Hypothesis Tests> Stats1> bartlet
  12. Hypothesis Tests> Stats1> binom
  13. Hypothesis Tests> Stats1> box
  14. Hypothesis Tests> Stats1> chisq
  15. Hypothesis Tests> Stats1> cor
  16. Hypothesis Tests> Stats1> fisher
  17. Hypothesis Tests> Stats1> friedman
  18. Hypothesis Tests> Stats1> kruskal
  19. Hypothesis Tests> Stats1> ks
  20. Hypothesis Tests> Stats1> oneway
  21. Hypothesis Tests> Stats1> poisson
  22. Hypothesis Tests> Stats1> prop
  23. Hypothesis Tests> Stats1> shapiro
  24. Hypothesis Tests> Stats1> t
  25. Hypothesis Tests> Stats1> var
  26. Hypothesis Tests> Stats1> wilcox
  27. Hypothesis Tests> Stats2> ansari
  28. Hypothesis Tests> Stats2> fligner
  29. Hypothesis Tests> Stats2> mantelhaen
  30. Hypothesis Tests> Stats2> mauchly
  31. Hypothesis Tests> Stats2> mcnemar
  32. Hypothesis Tests> Stats2> mood
  33. Hypothesis Tests> Stats2> pairwise.Prop
  34. Hypothesis Tests> Stats2> pairwise.wilcox
  35. Hypothesis Tests> Stats2> pairwise.t
  36. Hypothesis Tests> Stats2> power.anova
  37. Hypothesis Tests> Stats2> power.prop
  38. Hypothesis Tests> Stats2> power.t
  39. Hypothesis Tests> Stats2> prop.trend
  40. Hypothesis Tests> Stats2> PP
  41. Hypothesis Tests> Stats2> quade
  42. Hypothesis Tests> Stats2> Clear
  43. Hypothesis Tests> Agricolae> BIB
  44. Hypothesis Tests> Agricolae> duncan
  45. Hypothesis Tests> Agricolae> durbin
  46. Hypothesis Tests> Agricolae> friedman
  47. Hypothesis Tests> Agricolae> kruskal
  48. Hypothesis Tests> Agricolae> LSD
  49. Hypothesis Tests> Agricolae> median
  50. Hypothesis Tests> Agricolae> nonadditivity
  51. Hypothesis Tests> Agricolae> PBIB
  52. Hypothesis Tests> Agricolae> REGW
  53. Hypothesis Tests> Agricolae> scheffe
  54. Hypothesis Tests> Agricolae> SNK
  55. Hypothesis Tests> Agricolae> waerden
  56. Hypothesis Tests> Agricolae> waller
  57. Hypothesis Tests> Verification> binary
  58. Hypothesis Tests> Verification> cat
  59. Hypothesis Tests> Verification> cont
  60. Hypothesis Tests> Coin> oneway
  61. Hypothesis Tests> Coin> wilcox
  62. Hypothesis Tests> Coin> kruskal
  63. Hypothesis Tests> Coin> normal
  64. Hypothesis Tests> Coin> median
  65. Hypothesis Tests> Coin> savage
  66. Hypothesis Tests> Coin> sign
  67. Hypothesis Tests> Coin> wilcoxsign
  68. Hypothesis Tests> Coin> friedman
  69. Hypothesis Tests> Coin> quade
  70. Hypothesis Tests> Coin> taha
  71. Hypothesis Tests> Coin> mood
  72. Hypothesis Tests> Coin> flinger
  73. Hypothesis Tests> Coin> klotz
  74. Hypothesis Tests> Coin> ansari
  75. Hypothesis Tests> Coin> conover
  76. Hypothesis Tests> Coin> spearman
  77. Hypothesis Tests> Coin> quadrant
  78. Hypothesis Tests> Coin> fisyat
  79. Hypothesis Tests> Coin> koziol
  80. Hypothesis Tests> Coin> chisq
  81. Hypothesis Tests> Coin> cmh
  82. Hypothesis Tests> Coin> lbl
  83. Hypothesis Tests> Trend> bartels
  84. Hypothesis Tests> Trend> br
  85. Hypothesis Tests> Trend> bu
  86. Hypothesis Tests> Trend> cs
  87. Hypothesis Tests> Trend> csmk
  88. Hypothesis Tests> Trend> lanzante
  89. Hypothesis Tests> Trend> mk
  90. Hypothesis Tests> Trend> mmk
  91. Hypothesis Tests> Trend> pcor
  92. Hypothesis Tests> Trend> pmk
  93. Hypothesis Tests> Trend> pettitt
  94. Hypothesis Tests> Trend> rrod
  95. Hypothesis Tests> Trend> ssens
  96. Hypothesis Tests> Trend> sens
  97. Hypothesis Tests> Trend> smk
  98. Hypothesis Tests> Trend> snh
  99. Hypothesis Tests> Trend> wm
  100. Hypothesis Tests> Trend> ww
  101. Model> Probability Distributions> Normal
  102. Model> Probability Distributions> Exponential
  103. Model> Probability Distributions> Geometric
  104. Model> Probability Distributions> Weibull
  105. Model> Probability Distributions> Uniform
  106. Model> Probability Distributions> Bernouli
  107. Model> Probability Distributions> Binomial
  108. Model> Probability Distributions> Poisson
  109. Model> Probability Distributions> Beta
  110. Model> Probability Distributions> Negative Binomial
  111. Model> Probability Distributions> Student’s t
  112. Model> Probability Distributions> von Mises
  113. Model> Probability Distributions> Cauchy
  114. Model> Probability Distributions> Chi Square
  115. Model> Probability Distributions> F
  116. Model> Probability Distributions> Lognormal
  117. Model> Probability Distributions> Gamma
  118. Model> Probability Distributions> Extreme Value
  119. Model> Probability Distributions> Generalized Pareto
  120. Model> Probability Distributions> Gumbel
  121. Modeling> stats> aov
  122. Modeling> stats> ar
  123. Modeling> stats> arima
  124. Modeling> stats> glm
  125. Modeling> stats> lm
  126. Modeling> stats> loess
  127. Modeling> stats> loglin
  128. Modeling> stats> lowess
  129. Modeling> stats> spline
  130. Modeling> stats> nls
  131. Modeling> stats> ppr
  132. Modeling> stats> princomp
  133. Modeling> extRemes> fevd
  134. Modeling> extRemes> levd
  135. Modeling> lme4> glmer
  136. Modeling> lme4> lemr
  137. Modeling> lme4> nlmer
  138. Modeling> MASS> glm.nb
  139. Modeling> MASS> glmmPQL
  140. Modeling> MASS> loglm
  141. Modeling> MASS> polr
  142. Modeling> MASS> rlm
  143. Modeling> MASS> lda
  144. Modeling> MASS> mca
  145. Modeling> MASS> lqs
  146. Modeling> MASS> qda
  147. Structured> Circular> Define
  148. Structured> Circular> Calculator
  149. Structured> Circular> Summaries
  150. Structured> Low Flow> Define
  151. Structured> Survival> Define
  152. Structured> Time Series> Define
  153. Structured> Time Series> Describe> One Variable
  154. Structured> Time Series> Describe> General
  155. Structured> Time Series> Model> One Variable
  156. Structured> Time Series> Model> General
  157. Structured> Climatic
  158. Structured> Procurement
  159. Structured> Options by Context
  160. Climatic> Check Data> Inventory
  161. Climatic> Check Data> Display Daily
  162. Climatic> Check Data> Fill Missing Values
  163. Climatic> Check Data> QC Temperatures
  164. Climatic> Check Data> QC Rainfall
  165. Climatic> Check Data> Homogenization
  166. Climatic> Check Data> Check Station Locations
  167. Climatic> Prepare> Climatic Summaries
  168. Climatic> Check Data> Start of Rains
  169. Climatic> Check Data> End of Rains
  170. Climatic> Check Data> Length of Season
  171. Climatic> Check Data> Spells
  172. Climatic> Check Data> Extremes
  173. Climatic> Check Data> Climdex
  174. Climatic> Check Data> SPI/SPEI
  175. Climatic> Check Data> Evapotranspiration
  176. Climatic> Describe> Rainfall
  177. Climatic> Describe> Temperature
  178. Climatic> Describe> Wind Speed/Direction
  179. Climatic> Describe> Sunshine/Radiation
  180. Climatic> Describe> General
  181. Climatic> NCMP> Indices
  182. Climatic> NCMP> Variogram
  183. Climatic> NCMP> Region Average
  184. Climatic> NCMP> Trend Graphs
  185. Climatic> NCMP> Count Records
  186. Climatic> NCMP> Summary
  187. Climatic> Plot Region
  188. Climatic> Compare> Calculation
  189. Climatic> Compare> Summary
  190. Climatic> Compare> Correlations
  191. Climatic> Compare> Scatterplot
  192. Climatic> Compare> Time Series Plot
  193. Climatic> Compare> Seasonal Plot
  194. Climatic> Compare> Conditional Quantiles
  195. Climatic> Compare> Taylor Diagram
  196. Climatic> Mapping> Maps
  197. Climatic> Mapping> Check Station Locations
  198. Climatic> Model> Extremes
  199. Climatic> Model> Markov Modelling
  200. Climatic> Seasonal Forecast Support> Cumulative/Exceedance Graph
  201. Procurement> Prepare> Define Contract Value Categories
  202. Procurement> Prepare> Recode Numeric into Quantiles
  203. Procurement> Prepare> Use Award Date (or other)
  204. Procurement> Prepare> Summarise Red Flags by Country (or other)
  205. Procurement> Prepare> Summarise Red Flags by Country and Year (or other)
  206. Procurement> Corruption Risk Index (CRI)> Define Corruption Risk Index (CRI)

Generated R Code

One of the aspects that most differentiates the various GUIs for R is the code they generate. If you decide to save code, what type is best for you? The base R code, as provided by the R Commander that can teach you “classic” R? The concise functions that mimic the simplicity of one-step dialogs, as jamovi provides? The completely transparent (and complex) code provided by RKWard, which might be the best for budding R power users?

R-Instat writes a blend of custom functions and functions from the popular tidyverse package. As mentioned previously, it uses ggplot2 for graphics.

Here’s an example of code R-Instat wrote to do a group-by aggregation:

data_book$calculate_summary(data_name="mydata1001", columns_to_summarise=c("pretest","posttest"), factors=c("workshop","gender"), j=1, summaries=c("summary_mean"), silent=TRUE)

Here is an example of code R-Instat wrote to convert my “wide” style dataset to a “long” one. The wide one had measurements at four times, stored in variables named q1 through q4. I wanted those stacked into a single variable named “Score” and the variable names written into a factor called “Time.” The code R-Instat generated is below. It used the tidyr package’s pivot_longer function, just as I would have. However, that is only one of four function calls used; the rest are for internal use of R-Instat. Beginners looking to learn R must sift through these to determine which performed the actual task. When I tried to unstack this new dataset back into its original form, the dialog would not accept the variable Time, since it was not a factor. Given that R-Instat had just created it, it should have made it one to ease a “round trip” conversion, which is a fairly common task to perform (variable selections along the way don’t necessarily result in a complete duplicate of the original dataset). The developers plan to make that a factor in a future release.

# Code generated by the dialog, Stack (Pivot Longer)

mydata1001 <- data_book$get_data_frame(data_name="mydata1001")

mydata1001_stacked <- tidyr::pivot_longer(data=mydata1001, cols=c("q1","q2","q3","q4"), names_to="Time",  values_to="Score")

data_book$import_data(data_tables=list(mydata1001_stacked=mydata1001_stacked))

rm(list=c("mydata1001_stacked", "mydata1001"))

Below is an example of R-Instat’s code and output for a simple linear regression. The computations use the same functions as any R programmer would choose (i.e., those included with R itself). As before, you need to know which those are to separate them from R-Instat’s internal function calls.

# Code generated by the dialog, Two Variable Fit Model
mydata1001_stacked1 <- data_book$get_data_frame(data_name="mydata1001_stacked1")

attach(what=mydata1001_stacked1)

two_var <- lm(data=mydata1001_stacked1, formula=posttest ~ pretest, na.action=na.exclude)

data_book$add_model(model_name="two_var", model=two_var, data_name="mydata1001_stacked1")
data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var")


Call:
lm(formula = posttest ~ pretest, data = mydata1001_stacked1, 
    na.action = na.exclude)

Coefficients:
(Intercept)      pretest  
     18.665        0.846  


stats::anova(object=data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var"))

Analysis of Variance Table

Response: posttest
           Df Sum Sq Mean Sq F value Pr(>F)
pretest     1   7943    7943     342 <2e-16
Residuals 398   9256      23               

summary(object=data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var"))


Call:
lm(formula = posttest ~ pretest, data = mydata1001_stacked1, 
    na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-9.313 -4.025 -0.435  3.780 11.297 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  18.6647     3.4389    5.43    1e-07
pretest       0.8456     0.0458   18.48   <2e-16

Residual standard error: 4.82 on 398 degrees of freedom
Multiple R-squared:  0.462,	Adjusted R-squared:  0.46 
F-statistic:  342 on 1 and 398 DF,  p-value: <2e-16


detach(name=mydata1001_stacked1, unload=TRUE)

rm(list=c("two_var", "mydata1001_stacked1"))

Support for Programmers

Some R GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.

R-Instat has a script window that lets you do basic programming. A “Run” button lets you step through a program one line at a time or click “Run All.” There are no additional features such as syntax color-coding, code-completion suggestions, or search or replace functions. While most GUI users are not likely to write extensive programs, a few more basics would be helpful.

The R-Instat developers view the current script window as primarily for “tweaking” the R command generated by each dialog rather than for writing code from scratch. Dialogs have a “To Script” button populating the script window with working code, ready to be examined and possibly edited before execution. They also have a short guide called “Reading, tweaking and using R commands” to help learn these steps.

Reproducibility & Sharing

One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or if the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) and written reports so that others can check their work.

As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since the GUI user didn’t write the code, they also can’t read it or change it! Others, such as BlueSky, jamovi, JASP, and RKWard, save the dialog box entries, allowing GUI users reproducibility in their preferred form.

R-Instat’s Output window contains all the code created by the GUI. As mentioned above, it goes above and beyond what most GUIs save by including every interactive change a user might make to the data via manual data entry or right-click menus to convert, say, numeric variables to factors. However, it remembers only the details of the last ten dialogs you run.

If you wish to share your work with a colleague who also uses R-Instat, you would save the contents of your log file (viewable under “View> Log Window”) and send them that script and your dataset. They would then edit the path in the R code to point to the location of the data file on their computer.

To share your work with a colleague who uses RStudio, or a similar IDE, you would send them your log and data files. Your colleague would install R-Instat to get its set of custom functions (these are not in an R package on CRAN, though that is the long-term plan). The script saved from the log window includes a pointer to the location of those functions on the person’s hard drive.

Package Management

A topic related to reproducibility is package management. One of the major advantages to the R language is that it’s very easy to extend its capabilities through add-on packages. However, updates in these packages may break a previously functioning analysis. Years from now, you may need to run a variation of an analysis, which would require you to find the version of R you used, plus the packages you used at the time. As a GUI user, you’d also need to find the version of the GUI that was compatible with that version of R.

Some GUIs, such as the R Commander and Deducer, depend on you to find and install R. For them, the long-term stability problem is yours to solve. Others, such as jamovi, distribute their version of R and all R packages but not their add-on modules. This requires a bigger installation file, but it makes dealing with long-term stability simpler. Of course, this depends on all major versions being around for the long term. Still, for open-source software, multiple archives are usually available to store software, even if the original project is defunct.

R-Instats approach to package management is one of the most comprehensive of the R GUIs reviewed here. It provides everything you need in a single download. This includes the R-Instat interface, R itself, and all the necessary R packages. If you have a problem reproducing an R-Instat analysis in the future, you only need to download the version used when you created it.

Output & Report Writing

Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into word processing through the use of Markdown or LaTeX. You can now get publication-quality output from BlueSky, Deducer, jamovi, and JASP. You can also get LaTeX output from BlueSky. jamovi, JASP, and RKWard.

Unfortunately, R-Instat’s tabular output is in R’s standard text tables. These must be displayed using a monospaced font to keep the columns lined up. While R packages such as gt, texreg, and xtable exist to convert these tables to publication quality, that step would require writing R code. The R-Instat developers say they plan to add publication-quality output in a future version.

Group-By Analyses

Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide the ability to select a subset of one group to analyze, then another subset to compare it to. All the R GUIs reviewed in this series can do this task. R-Instat does single-group selections by offering to filter rows using the “Data Options” button in every dialog. It generates a subset that you can analyze in the same way as the entire dataset.

Software also needs the ability to automate such selections so that you might generate dozens of analyses or graphs, one group at a time. This feature has been available in commercial GUIs for decades (e.g., SPSS split-file, SAS BY). R-Instat does not offer such a feature.

Output Management

Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g., predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g., SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, with the simplification that GUIs provide, that presents a challenge.

Output data can be observation-level, such as predicted values for each observation or case.  When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).

Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling” above.)

For example, in our organization, we have 250 departments and want to see if any have a gender bias on salary. We write all 250 regression models to a data set and then search to find those whose gender parameter is significant (hoping to find none, of course!)

R-Instat does all three levels of output management. To use this function, choose “Model> Use Model> Glance/Tidy/Augment.” While the code to repeat this for the levels of one or more grouping variables is fairly easy to implement, the dialog doesn’t currently offer that feature.

Developer Issues

The R-Instat team welcomes people who are willing to contribute to the project. You can submit bug reports or even copy the entire set of source code at the project’s GitHub site: https://github.com/africanmathsinitiative/R-Instat/. Information and guides for developers and contributors are available on the GitHub Wiki: https://github.com/africanmathsinitiative/R-Instat/wiki

Conclusion

R-Instat offers one of the most extensive collections of data wrangling, graphics, and statistical analysis methods of any R GUI. Its data wrangling dialogs are simple to use and require no knowledge of R code. At a basic level, its graphics and modeling dialogs are also easy to use. However, its dependence on R code for basic statistical methods such as chi-squared or t-tests makes it far more difficult to use than other R GUIs. To use its full modeling capabilities, you need to know what R’s packages (e.g., MASS) are and what each one’s functions (e.g., rlm) do. For an R programmer, recognizing a known package::function combination is much easier than recalling it without assistance. Such a user would find R-Instat’s GUI extremely helpful. R-Instat’s ability to add ggplot2 layers allows you to create a graph of nearly unlimited flexibility. But you need to learn the difference between functions like geom_line and geom_smooth to take full advantage of it.

R-Instat’s offering in climate analysis is unique among R GUIs, and a quick search on Google Scholar shows that it is being widely used with such data. R-Instat focuses on frequentist statistics rather than Bayesian, and it does not yet offer any machine learning or artificial intelligence methods. R-Instat’s developers are currently working to include some machine learning methods using the caret package, particularly for teaching data science.

R-Instat’s output is in standard R text tables, rather than the journal-style word processing tables that are such a time-saver in other R GUIs.

If you have an R programming background or want to learn R code, R-Instat may be just what you need to get started.

The R-Instat team wrote a detailed response to this review, which is online here. For a summary of all my R GUI software reviews, see the article, R Graphical User Interface Comparison.

Acknowledgments

Thanks to the R-Instat team, who have done a lot of hard work and for making it free and open source. Thanks to David Stern, Roger Stern, and Danny Parsons for clarifying many aspects of R-Instat. Also, to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.