A Comparative Review of the R-Instat GUI for R

by Robert A. Muenchen

Introduction

R-Instat is a free and open source graphical user interface for the R software that focuses on people who want to point-and-click their way through data science analyses. Written in Visual Basic, it is currently only available for Microsoft Windows. However, a Linux version is in development using the cross-platform Mono implementation of the .NET framework.

This post is one of a series of reviews that aim to help non-programmers choose the Graphical User Interface (GUI) that is best for them. Although I wrote the BlueSky User’s Guide, I hope to remain objective in these reviews. There is no one perfect user interface for everyone; each GUI for R has features that appeal to a different set of people.

Terminology

There are various definitions of user interface types, so here’s how I’ll be using these terms:

GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.

Installation

The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as jamovi or RKWard, install in a single step. Others, such as Deducer, install in multiple steps (up to seven steps, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The HelpDesks at most universities are flooded with such calls at the beginning of each semester!

R-Instat is easy to install, requiring only a single step. It provides its own embedded copy of R. This simplifies the installation and ensures complete compatibility between R-Instat and the version of R it’s using. However, it also means if you already have R installed, you’ll end up with a second copy. You can have R-Instat control any version of R you choose, but if the version differs too much, you may run into occasional problems.

Plug-in Modules

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” that add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Rattle, Deducer) through medium (JASP 15) to high (jamovi 43, R Commander 43).

While the R-Instat project welcomes contributions from anyone, there are not any modules to add at this time. All of its capabilities are included in its initial installation.

Startup

Some user interfaces for R, such as jamovi or JASP, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R commander and JGR, have you start R, then load a package from your library, and then finally call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.

You start R-Instat directly by double-clicking its icon from your desktop or choosing it from your Start Menu (i.e., not from within R).

Data Editor

A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.

R-Instat starts up by showing its screen (Fig. 1). Under Start, I chose “New Data Frame” and it showed me the rather perplexing dialog shown in Fig. 2.

As an R user, I know what expressions are, but what did the R-Instat designers mean by the term?

Clicking the “Construct Examples” button brought up the suggestions shown in Fig. 3. These are standard R expressions, which came as quite a surprise! It seems that the R-Instat designers are wanting to get people to start using R programming code immediately.

Clicking the Help button brings up the advice, “the simplest option is Empty” (the developers say this will become the default in a future version). Clicking that button brings up a simple prompt for the number of rows and columns you would like to create. After that, you’re looking at a basic spreadsheet (Fig. 4) that easily lets you enter data. As you enter data, it determines if it is numeric or character. Scientific notation is accepted, but dates are saved as character variables. Logical values (TRUE, FALSE) are recognized as such and are stored appropriately.

Right-clicking on any column allows you to convert variables to be factor, ordered factor, numeric, logical, or character. These changes are recorded as function calls to a custom “convert_column_to_type” function for reproducibility. Such interactive changes are not usually recorded by other R GUIs. Date/time conversion is not available on that menu, as that process is trickier. Those conversions are on the “Prepare> Column Date” menu item. Other things you can do from the right-click menu are: rename, duplicate, reorder, set levels/labels, sort, and filter/remove filter.

The class of each variable is indicated by a character code that follows each variable name in parenthesis: (C) for character, (F) for factor, (O.F) for ordered factor, (D) for date, (L) for logical. When no code follows a variable name, it is numeric.

Figure 4. The R-Instat Data View (left) and Output Window (right).

The name of the dataset appears on a tab at the bottom of the Data View window. This lets you easily manage multiple datasets, an ability that is popular among professionals, but which is rarely offered in R GUIs (BlueSky and R Commander are the only others that offer it).

Once the dataset is saved, to add rows or columns you choose, “Prepare > Data Frame > Insert rows/columns” to add new rows or columns at any position in the data frame. New columns can be added with a specified default value, which can be a big time-saver when entering blocks of related data.

There is a quicker method that works for inserting new rows. You right-click the row numbers and a pop-up menu will allow you to insert rows above or below, and the number of rows selected is the number of rows added – like in Excel.

When editing data, R-Instat lets you type new values on top of the old. As soon as you press the Enter key, it generates R code to execute the change. For example, in a language variable, when changing the value “English” to “Spanish,” it wrote,

Replace Value in Data
data_book$replace_value_in_data(data_name="wakefield", col_name="Language", rows="78", new_value="Spanish")

This is important for reproducibility, but R-Instat is the only GUI reviewed here that tracks such important manual changes. In fact, even among expensive proprietary software, Stata is the only one that I’m aware of that keeps track of such changes using code.

If you have another data set to enter, you can restart the process by choosing “File> New Data…” again. You can change data sets simply by clicking on its tab, and its window will pop to the front for you to see. When doing analyses, or saving data, the data set that is displayed in the editor does not influence what appears in dialog boxes. That means that you can be looking at one dataset while analyzing another! Since each dialog allows you to choose the dataset to use, that is technically not a problem, but if you have several datasets that contain the same variable names, remember that what you see may not be what you get! That’s the opposite of BlueSky Statistics, which automatically analyzes the dataset you see. R-Instat’s ability to work with multiple datasets in a single instance of the software is not a feature found in all R GUIs. For example, jamovi and JASP can only work with a single dataset at a time.

Saving the data is done with a fairly standard “File> Save As> Save Dataset As” menu. By default it will save all open datasets, filters, graphs, and models to a single file called a “data book.” That makes working with complex projects much easier to open and close.

Data Import

R-Instat supports the following file formats, most of which are automatically opened using “File> Import from File”. The ODK and NetCDF file formats have their own Import menus. R-Instat’s ability to open many formats related to climate science hints at what the software excels at. For details, see the Analysis Methods section below.

  1. Comma Separated Values (.csv)
  2. Plain text files (.txt)
  3. Excel (old and new xls file types)
  4. xBASE database files (dBase, etc.)
  5. SPSS (.sav)
  6. SAS binary files (sas7bdat and *.xpt)
  7. Standard R workspace files (RData, but it just opens one dataframe of its choosing)
  8. Open Data Kit (ODK)
  9. OpenRefine
  10. Network Common Data Form (NetCDF)
  11. SST Sea Surface Temperature formatted files
  12. IRI Data Library (API download)
  13. Climate Data Store (CDS) (API download)
  14. Shapefile
  15. Climsoft (Climatic database)
  16. .dly (ASCII files)
  17. .dat (ASCII files)
  18. Tab Separated Values (.tsv)
  19. Stata (.dta)
  20. JSON (.json)
  21. epiinfo (.rec)
  22. Minitab (.mtb)
  23. Systat (.syd). 
  24. CSV with a YAML metadata header (.csvy)
  25. Feather R/Python interchange format (.feather)
  26. Pipe separated files (.psv)
  27. YAML (.yml)
  28. Weka Attribute-Relation File Format (.arff)
  29. Data Interchange Format (.dif)
  30. OpenDocument Spreadsheet (*.ods)
  31. Shallow XML documents (*.xml)
  32. Single-table HTML documents (*.html)

Data Export

The ability to export data to a wide range of file types helps when you, or other members of your research team, have to use multiple tools to complete a task. Unfortunately, this is a very weak area for most R GUIs. Deducer offers no data export at all, and R Commander and rattle can export only delimited text files (an earlier version of this listed jamovi as having very limited data export; that has now been expanded).

R-Instat has extensive export facilities. Multiple data sets can be exported:

  • In a single step, as a set of files
  • As a single Excel file with one data set per sheet 
  • As a list of data frames stored as a single RDS or RData file
  • As a single HTML file with data sets end to end

The file formats it can export to include:

  1. Comma separated file (*.csv)
  2. Excel files (*.xlsx
  3. TAB-separated data (*.tsv)
  4. Pipe-separated data (*.psv)
  5. Feather r / Python interchange format (*.feather)
  6. Fixed-Width format data (*.fwf)
  7. Serialized r objects (*.rds)
  8. Saved r objects (*.RData)
  9. JSON (*.json)
  10. YAML (*.yml)
  11. Stata (*.dta)
  12. SPSS (*.sav)
  13. XBASE database files (*.dbf)
  14. Weka Attribute – Relation File Format (*.arff)
  15. r syntax object (*.R)
  16. Xml (*.xml)
  17. HTML (*.html)
  18. Matlab (*.mat)
  19. SAS (*.sas7bdat)
  20. SAS XPORT (*.xpt)

 Data Management

It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g. from wide to long and back). A critically important aspect of data management is the ability to transform many variables at once. For example, social scientists need to recode many survey items; biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time is tedious work. Some GUIs, such as jamovi and RKWard handle only a few of these functions. Others, such as BlueSky and R Commander handle nearly all of them.

R-Instat offers one of the most comprehensive sets of data management tools of any R GUI. Its dialogs do not require any knowledge of R code. A unique feature is found in “Prepare> Check Data> Non-Numeric Values.” That detects common data entry errors in which the letter “O” is entered in place of a zero, or a lower case letter “l” in place of the number one. It offers to mark where those errors occur and optionally create a copy of the dataset with those observations deleted. Here is the list of methods it offers (some are repeated under different menus; I’ve tried to eliminate duplicates):

  1. Prepare> Data Frame> Rename Column
  2. Prepare> Data Frame> Duplicate Column
  3. Prepare> Data Frame> Row Numbers/Names
  4. Prepare> Data Frame> Sort
  5. Prepare> Data Frame> Filter
  6. Prepare> Data Frame> Column Selection
  7. Prepare> Data Frame> Replace Values
  8. Prepare> Data Frame> Convert Columns
  9. Prepare> Data Frame> Reorder Columns
  10. Prepare> Data Frame> Insert Columns/Rows
  11. Prepare> Data Frame> Hide/Show Columns
  12. Prepare> Data Frame> Column Structure
  13. Prepare> Data Frame> Colour by Property
  14. Prepare> Check Data> Visualize Data
  15. Prepare> Check Data> Duplicates
  16. Prepare> Check Data> Compare Columns
  17. Prepare> Check Data> Non-Numeric Values
  18. Prepare> Check Data> Anonymise ID Column
  19. Prepare> Column Calculator
  20. Prepare> Column Numeric> Regular Sequence
  21. Prepare> Column Numeric> Enter
  22. Prepare> Column Numeric> Row Summaries
  23. Prepare> Column Numeric> Tranform
  24. Prepare> Column Numeric> Polynomials
  25. Prepare> Column Numeric> Random Samples
  26. Prepare> Column Numeric> Permute Columns
  27. Prepare> Column Factor> Convert to Factor
  28. Prepare> Column Factor> Recode to Numeric
  29. Prepare> Column Factor> Count in Factor
  30. Prepare> Column Factor> Recode Factor
  31. Prepare> Column Factor> Combine Factors
  32. Prepare> Column Factor> Dummy Variables
  33. Prepare> Column Factor> Levels/Labels
  34. Prepare> Column Factor> View Labels
  35. Prepare> Column Factor> Reorder Levels
  36. Prepare> Column Factor> Reference Level
  37. Prepare> Column Factor> Unused Levels
  38. Prepare> Column Factor> Contrasts
  39. Prepare> Column Factor> Factor Data Frame
  40. Prepare> Column Text> Find/Replace
  41. Prepare> Column Text> Transform
  42. Prepare> Column Text> Split
  43. Prepare> Column Text> Combine
  44. Prepare> Column Text> Distance
  45. Prepare> Column Date> Generate Dates
  46. Prepare> Column Date> Make Date
  47. Prepare> Column Date> Fill Date Gaps
  48. Prepare> Column Date> Use Date
  49. Prepare> Column Define> Convert Columns
  50. Prepare> Column Define> Circular
  51. Prepare> Data Reshape> Column Summaries
  52. Prepare> Data Reshape> General Summaries
  53. Prepare> Data Reshape> Stack (Pivot Longer)
  54. Prepare> Data Reshape> Unstack (Pivot Wider)
  55. Prepare> Data Reshape> Merge
  56. Prepare> Data Reshape> Append (Bind Rows)
  57. Prepare> Data Reshape> Subset
  58. Prepare> Data Reshape> Random Subset
  59. Prepare> Data Reshape> Transpose
  60. Prepare> Data Reshape> Scale/Distance
  61. Prepare> Keys and Links> Add Key
  62. Prepare> Keys and Links> View and Remove Keys
  63. Prepare> Keys and Links> Add Link
  64. Prepare> Keys and Links> View and Remove Links
  65. Prepare> Keys and Links> Add Comment
  66. Prepare> Data Object> Rename Data Frame
  67. Prepare> Data Object> Reorder Data Frames
  68. Prepare> Data Object> Copy Data Frame
  69. Prepare> Data Object> Delete Data Frames
  70. Prepare> Data Object> Hide/Snow Data Frames
  71. Prepare> Data Object> Metadata
  72. Prepare> R Objects> View
  73. Prepare> R Objects> Rename
  74. Prepare> R Objects> Reorder
  75. Prepare> R Objects> Delete

Menus & Dialog Boxes

The goal of pointing & clicking your way through analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.

R-Instat uses standard menu choices for running steps listed on its menus. Dialog boxes appear and you select variables to place into their various roles. Each dialog allows you to choose the dataset to use on a drop-down menu. Variables appear in the usual variable list box and clicking an Add button moves the selected variables to their roles. You cannot drag variables to the various role boxes. One role box is highlighted in blue when the dialog first appears, indicating where variables will be added. You can choose a different role box by clicking on it or using the tab key to sequence through the boxes. Once a box for a single variable is filled it automatically highlights the next variable box.

The Describe and Model menus offer a useful labeling system of “One Variable”, “Two Variables,” “Three Variables,” and “Four Variables.” These counts include the dependent variables, so the “Three Variables” menu would allow for a model with two predictors. The default settings vary depending on the class of the variable. For example, with a numeric variable and a 2-level factor, the “Two Variables” analysis would do either a t-test or logistic regression, depending on which variable was entered first. That’s nice, but people looking on menus for t-test or logistic regression will have to catch on to this approach. While the “Four Variables” items appear, they are not yet implemented.

Once assigned, you can remove individual variables from a role assignment box. For a single variable box, press the backspace key or right-click and choose Remove to clear the box. For a multiple-variable box, select one or more variables and press the backspace key, or right-click and choose Remove for one, or Clear to remove all.

The output is saved by using the standard “File > Save As” menu. The only supported output format is rich text format (RTF). However, R-Instat uses that only for setting the font style and color. The output tables are stored in the monospaced Courrier New font, not in true word processing tables. That is a significant failing as BlueSky, jamovi, and JASP all offer true word processing tables in the style many journals prefer. That saves you a great deal of report preparation time.

Documentation & Training

R-Instat’s documentation is listed on this web page: http://r-instat.org/ReleaseNotes.html. It includes several written and video tutorials.

Help

R GUIs provide simple task-by-task dialog boxes that generate much more complex code. So for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exceptions are jamovi, which offers no help, and the R Commander, which lacks help on the dialog boxes themselves.

R-Instat’s help files are accessed by a “Help” button on each dialog box. Unfortunately, most of these lead to empty placeholders to be filled in future versions. Those that are already filled, e.g. for renaming variables, are clear for how the dialog box works but provide no documentation on how to control the R functions that it uses. The developers say all three levels of help are planned for a future release.

Graphics

The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as jamovi, use their own functions and integrate them into analysis steps. GUIs also differ quite a lot in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. That’s how jamovi works, but then jamovi is limited to its custom graph functions, as nice as they may be.

You can generate plots in R-Instat using two different perspectives: by focusing on the number of variables involved, or instead by focusing on the type of plot you wish to create. Since limiting the number of variables also limits the types of graphs that are usually considered appropriate, some may view that as the easier approach. In the “Describe> One/Two/Three Variables” menus, each provides a Graph menu choice that matches each number of variables. By choosing one variable, you’re offered bar charts, histograms, and the like. When choosing two variables, then R-Instat offers others, such as scatterplots or line plots.

You can create the same graphs by forgetting about the number of variables and instead focusing on the type of graph first. You do that under the “Describe> Specific Tables/Graphs” menu. There you will find a list of common graph types directly, such as bar, line, or scatter.

Regardless of which approach you choose to create a graph, R-Instat does most of its plots using the popular ggplot2 package. If you wish to learn R code for graphics, that’s the code you’ll see. Each graph dialog offers a “Plot Options” menu that allows you to modify the graphs in flexible and powerful ways. However, to do so requires an understanding of the Grammar of Graphics concepts, upon which R’s ggplot2 package is built. A comprehensive description of that is beyond our current scope, but it includes such complexities as a pie chart is just a bar chart with cartesian coordinates swapped out for polar ones!

One of R-Instat’s unique and useful features is its ability to save any graph, and then combine several into a single image. That makes publishing multiple graphs much easier! Currently, no other R GUI offers this feature, though the code is quite easy so I expect others will add it eventually.

One oddity of R-Instat’s graphics is that it defines “first variable” as y and second as x. That’s the reverse of ggplot2’s aes function. In this case, having knowledge of R’s graphics code works against you in R-Instat. The developers say a future version will switch this notation to be “y” and “x.”

Compiling a list of the plots R-Instat can do is challenging since it offers a nearly unlimited range. Here is an attempt to list the popular ones that were relatively easy to figure out how to do. Given its ability to combine plots in layers, these can be combined in many ways.

  1. Bar Chart of counts or pre-summarized values
  2. Boxplot
  3. Contour
  4. Density (continuous)
  5. Density (counts)
  6. Dumbbell
  7. Dot chart
  8. Frequency charts (factors)
  9. Frequency charts (numeric)
  10. Heatmap
  11. Histogram
  12. Line Chart
  13. Line Chart, stair-step plot
  14. Line Chart, variable order
  15. Maps: World Map
  16. Mosaic
  17. Parallel Coordinate Plot
  18. Pie Chart
  19. Plot of Means
  20. Polar Coordinate Plots
  21. P-P Plots
  22. Q-Q Plots
  23. Scatterplot
  24. Scatterplot matrix
  25. Strip Chart
  26. TreeMap
  27. Violin Plot
  28. Word Cloud
  29. Visualize dataset by variable type & missing values
  30. Stacked rating data
  31. Barchart of Likert variable percents
  32. Structured> Circular> Circular Plots
  33. Structured> Circular> Wind Rose
  34. Structured> Circular> Wind/Polution Rose
  35. Structured> Circular> Other Rose Plots

Let’s take a look at how R-Instat does scatterplots. Using the dialog box “Describe> Specific Tables/Graphs> Point (Scatter) Plot” I chose only the “X” variable and the “Single Variable” (an odd name for the Y variable on this dialog), row facet factor, column facet factor, the type of smoothing fit, and checked a box to plot standard errors. The facets created six “small multiples” of the plot, making comparisons easy. Other R GUIs include the ability to do “large multiples” of plots by any number of other factor variables, e.g. BlueSky’s “Datasets> Group-by” dialog. R-Instat lacks that useful feature. Here is the code that R-Instat wrote, followed by the plot (Fig. 5) it made:

require(ggplot2);
require(ggthemes);
require(stringr);

## [Scatterplot (Points)]
ggplot(data=mydata100, aes(x=pretest,y=posttest)) +
	geom_point() +
	geom_smooth( method ="lm", alpha=1, se=TRUE,) +  
	labs(x="pretest",y="posttest",
title= "Scatterplot for X axis: pretest ,Y axis: posttest ") +
	xlab("pretest") +
	ylab("posttest") + 
	theme_gray() + 
	theme(text=element_text(family="serif",
	face="plain",
	color="#000000",size=12,
	hjust=0.5,vjust=0.5))
Figure 5. A faceted scatterplot created by R-Instat and the ggplot2 package.

R-Instat exports graphs using the “File> Export> Export Graph as Image.” It offers the following file formats, which includes almost any format you could need:

  1. JPEG
  2. PNG
  3. BitMap
  4. EPS
  5. Postscript
  6. SVG
  7. WMF
  8. PDF

Modeling

The way statistical models (which R stores in “model objects”) are created and used, is an area on which R GUIs differ the most. The simplest, and least flexible approach, is taken by jamovi, JASP, and RKWard. They try to do everything you might need in a single dialog box. They either don’t save models, or they do nothing with them. To an R programmer, that sounds extreme, since R does a lot with model objects. However, mneither SAS nor SPSS was able to save models for the first 35 years of their existence, so each approach has its merits. For simple models like linear regression, standard compute statements can make predictions. Entering them manually is not much effort, and it saves you from having to learn what a model object is. However, some of the most powerful model types are impossible to enter by hand, such as neural networks, random forests, and gradient boosting machines.

R-Instat’s “Model> Fit Model” menu offers menus named from One Variable to Four Variables. Those counts include the dependent variable, so a three-variable model would allow for two independent variables. By default, additive models are fit, but a Model Operator menu allows you to choose options such as the “:” to generate interactions. However, it does not explain what those operators do, so you need to learn what they mean in R if you wish to do anything beyond a simple additive model. A Model Preview window not only shows you the model it is writing, but it also allows you to just type your model in, assuming you know R model syntax.

Each dialog also allows you to choose the distributions from: Normal, Poisson, Gamma, Inverse Gaussian, Quasi, and Quasi-Poisson. Additionally, each allows the choice of link functions: identity, log, logit, cloglog, sqrt, 1/mu^2, Cauchit, Probit, and Inverse. You make your selections and click the Return button to exit to the main dialog.

An alternative approach to model building uses the General menu. It simply offers an Explanatory Model field rather than prompts for first and second predictor variables. The model building tools are very basic, with the Add button moving a variable and other buttons adding operators such as “+” or “/”. Other software, such as jamovi and BlueSky offer ways to enter a set of variables, each separated by any operator you choose. They also let you add all possible 2-, 3-, or N-way interactions to models, which really speeds your work with complex models.

The third approach at model building comes in the form of the Hypothesis Test Keyboard, shown in Fig. 6. I’m using it to perform a t-test. I opened it using, “Model> Fit Model> Hypothesis Test Keyboards.” The “Stats1” R package was the default, and I checked the “Include Arguments” box. When I clicked on the “t” button, the full R syntax for the “t.test.” function appeared in the Test box. I applied my knowledge of R and replaced the “x=” argument that it had offered with “posttest ~ gender” to compare males and females on a posttest score. I used the Add and “~” buttons, but it would have been faster to simply type the formula without that assistance.

Figure 6. The Hypothesis Tests Keyboard set to perform a t-test.

A fourth approach involves the use of a Model Keyboard, shown in Fig. 7. Here I’m using it to perform a robust linear regression. Knowing that the MASS package offers it, I those that from the Package menu. I then clicked the Include Arguments checkbox, then I clicked the rlm button. The full syntax is filled in as shown. I typed in “posttest ~ pretest” as my model and clicked OK to run it. However, the default settings of the other arguments generated an error, so I deleted all arguments except for the formula. I did it this way to demonstrate that while R-Instat is providing valuable assistance, you must still know the basics of the R language to make the most of it.

Figure 7. The Modelling Keyboard, set to perform a robust linear regression using
the MASS package’s rlm function.

The models R-Instat creates are stored in a “data book” structure, which is unique. This approach makes it easy to use models within R-Instat, and easy to save them along with the dataset(s) in its data book format. There are also ways to export models from the data books into standard R objects for use outside of R-Instat.

The menu item, “Model> Compare Models> One Variable” lets you compare two models, but only those that model a single variable to a distribution. The developers plan to expand this to all model types that R-Instat can create (where mathematically possible, of course).

The “Model> Use Model” menu lets you use saved models to do things like making predictions on new datasets. This is often the main point of creating a model, so I find it surprising that only a few R GUIs provide this capability! This menu also includes the very useful ability to glance, tidy, or augment a model. This is the broom package’s terminology for summarizing a model at the model, parameter, or observation level. The addition of a “by” factor would be helpful on those.

To summarize, R-Instat trades off ease-of-use for power. While most other GUIs prevent you from having to know any R code, they must provide a dialog for every situation. R-Instat’s approach is very general, allowing you to run a vast array of models from just a handful of dialogs. However, you must know the R packages and what their functions do to take full advantage of it.

Analysis Methods

All of the R GUIs offer a set of statistical analysis methods. Some also offer machine learning methods. As you can see in the table below, R-Instat offers an extensive set of statistical analysis methods, but no machine learning. As with R-Instat’s graphics, its analytic ability can access the same analysis multiple ways. Here is a comprehensive list of its analysis methods in which I list each technique once:

  1. Describe> One Variable> Summary stats
  2. Describe> One Variable> Frequencies
  3. Describe> One Variable> Rating data (frequencies & percents on many vars in one table)
  4. Describe> Two Variables> Frequencies does crosstabs but no tests
  5. Describe> Three Variables> Frequencies does crosstabs but no CMH test
  6. Describe> Three Variables> Pivot Table creates interactive tables
  7. Describe> Multivariate> Correlations – Pearson
  8. Describe> Multivariate> Correlations – Nonparametric Kendall, Spearman
  9. Describe> Multivariate> Principle Components
  10. Describe> Multivariate> Cannonical Correlations
  11. Hypothesis Tests> Stats1> bartlet
  12. Hypothesis Tests> Stats1> binom
  13. Hypothesis Tests> Stats1> box
  14. Hypothesis Tests> Stats1> chisq
  15. Hypothesis Tests> Stats1> cor
  16. Hypothesis Tests> Stats1> fisher
  17. Hypothesis Tests> Stats1> friedman
  18. Hypothesis Tests> Stats1> kruskal
  19. Hypothesis Tests> Stats1> ks
  20. Hypothesis Tests> Stats1> oneway
  21. Hypothesis Tests> Stats1> poisson
  22. Hypothesis Tests> Stats1> prop
  23. Hypothesis Tests> Stats1> shapiro
  24. Hypothesis Tests> Stats1> t
  25. Hypothesis Tests> Stats1> var
  26. Hypothesis Tests> Stats1> wilcox
  27. Hypothesis Tests> Stats2> ansari
  28. Hypothesis Tests> Stats2> fligner
  29. Hypothesis Tests> Stats2> mantelhaen
  30. Hypothesis Tests> Stats2> mauchly
  31. Hypothesis Tests> Stats2> mcnemar
  32. Hypothesis Tests> Stats2> mood
  33. Hypothesis Tests> Stats2> pairwise.Prop
  34. Hypothesis Tests> Stats2> pairwise.wilcox
  35. Hypothesis Tests> Stats2> pairwise.t
  36. Hypothesis Tests> Stats2> power.anova
  37. Hypothesis Tests> Stats2> power.prop
  38. Hypothesis Tests> Stats2> power.t
  39. Hypothesis Tests> Stats2> prop.trend
  40. Hypothesis Tests> Stats2> PP
  41. Hypothesis Tests> Stats2> quade
  42. Hypothesis Tests> Stats2> Clear
  43. Hypothesis Tests> Agricolae> BIB
  44. Hypothesis Tests> Agricolae> duncan
  45. Hypothesis Tests> Agricolae> durbin
  46. Hypothesis Tests> Agricolae> friedman
  47. Hypothesis Tests> Agricolae> kruskal
  48. Hypothesis Tests> Agricolae> LSD
  49. Hypothesis Tests> Agricolae> median
  50. Hypothesis Tests> Agricolae> nonadditivity
  51. Hypothesis Tests> Agricolae> PBIB
  52. Hypothesis Tests> Agricolae> REGW
  53. Hypothesis Tests> Agricolae> scheffe
  54. Hypothesis Tests> Agricolae> SNK
  55. Hypothesis Tests> Agricolae> waerden
  56. Hypothesis Tests> Agricolae> waller
  57. Hypothesis Tests> Verification> binary
  58. Hypothesis Tests> Verification> cat
  59. Hypothesis Tests> Verification> cont
  60. Hypothesis Tests> Coin> oneway
  61. Hypothesis Tests> Coin> wilcox
  62. Hypothesis Tests> Coin> kruskal
  63. Hypothesis Tests> Coin> normal
  64. Hypothesis Tests> Coin> median
  65. Hypothesis Tests> Coin> savage
  66. Hypothesis Tests> Coin> sign
  67. Hypothesis Tests> Coin> wilcoxsign
  68. Hypothesis Tests> Coin> friedman
  69. Hypothesis Tests> Coin> quade
  70. Hypothesis Tests> Coin> taha
  71. Hypothesis Tests> Coin> mood
  72. Hypothesis Tests> Coin> flinger
  73. Hypothesis Tests> Coin> klotz
  74. Hypothesis Tests> Coin> ansari
  75. Hypothesis Tests> Coin> conover
  76. Hypothesis Tests> Coin> spearman
  77. Hypothesis Tests> Coin> quadrant
  78. Hypothesis Tests> Coin> fisyat
  79. Hypothesis Tests> Coin> koziol
  80. Hypothesis Tests> Coin> chisq
  81. Hypothesis Tests> Coin> cmh
  82. Hypothesis Tests> Coin> lbl
  83. Hypothesis Tests> Trend> bartels
  84. Hypothesis Tests> Trend> br
  85. Hypothesis Tests> Trend> bu
  86. Hypothesis Tests> Trend> cs
  87. Hypothesis Tests> Trend> csmk
  88. Hypothesis Tests> Trend> lanzante
  89. Hypothesis Tests> Trend> mk
  90. Hypothesis Tests> Trend> mmk
  91. Hypothesis Tests> Trend> pcor
  92. Hypothesis Tests> Trend> pmk
  93. Hypothesis Tests> Trend> pettitt
  94. Hypothesis Tests> Trend> rrod
  95. Hypothesis Tests> Trend> ssens
  96. Hypothesis Tests> Trend> sens
  97. Hypothesis Tests> Trend> smk
  98. Hypothesis Tests> Trend> snh
  99. Hypothesis Tests> Trend> wm
  100. Hypothesis Tests> Trend> ww
  101. Model> Probability Distributions> Normal
  102. Model> Probability Distributions> Exponential
  103. Model> Probability Distributions> Geometric
  104. Model> Probability Distributions> Weibull
  105. Model> Probability Distributions> Uniform
  106. Model> Probability Distributions> Bernouli
  107. Model> Probability Distributions> Binomial
  108. Model> Probability Distributions> Poisson
  109. Model> Probability Distributions> Beta
  110. Model> Probability Distributions> Negative Binomial
  111. Model> Probability Distributions> Student’s t
  112. Model> Probability Distributions> von Mises
  113. Model> Probability Distributions> Cauchy
  114. Model> Probability Distributions> Chi Square
  115. Model> Probability Distributions> F
  116. Model> Probability Distributions> Lognormal
  117. Model> Probability Distributions> Gamma
  118. Model> Probability Distributions> Extreme Value
  119. Model> Probability Distributions> Generalized Pareto
  120. Model> Probability Distributions> Gumbel
  121. Modeling> stats> aov
  122. Modeling> stats> ar
  123. Modeling> stats> arima
  124. Modeling> stats> glm
  125. Modeling> stats> lm
  126. Modeling> stats> loess
  127. Modeling> stats> loglin
  128. Modeling> stats> lowess
  129. Modeling> stats> spline
  130. Modeling> stats> nls
  131. Modeling> stats> ppr
  132. Modeling> stats> princomp
  133. Modeling> extRemes> fevd
  134. Modeling> extRemes> levd
  135. Modeling> lme4> glmer
  136. Modeling> lme4> lemr
  137. Modeling> lme4> nlmer
  138. Modeling> MASS> glm.nb
  139. Modeling> MASS> glmmPQL
  140. Modeling> MASS> loglm
  141. Modeling> MASS> polr
  142. Modeling> MASS> rlm
  143. Modeling> MASS> lda
  144. Modeling> MASS> mca
  145. Modeling> MASS> lqs
  146. Modeling> MASS> qda
  147. Structured> Circular> Define
  148. Structured> Circular> Calculator
  149. Structured> Circular> Summaries
  150. Structured> Low Flow> Define
  151. Structured> Survival> Define
  152. Structured> Time Series> Define
  153. Structured> Time Series> Describe> One Variable
  154. Structured> Time Series> Describe> General
  155. Structured> Time Series> Model> One Variable
  156. Structured> Time Series> Model> General
  157. Structured> Climatic
  158. Structured> Procurement
  159. Structured> Options by Context
  160. Climatic> Check Data> Inventory
  161. Climatic> Check Data> Display Daily
  162. Climatic> Check Data> Fill Missing Values
  163. Climatic> Check Data> QC Temperatures
  164. Climatic> Check Data> QC Rainfall
  165. Climatic> Check Data> Homogenization
  166. Climatic> Check Data> Check Station Locations
  167. Climatic> Prepare> Climatic Summaries
  168. Climatic> Check Data> Start of Rains
  169. Climatic> Check Data> End of Rains
  170. Climatic> Check Data> Length of Season
  171. Climatic> Check Data> Spells
  172. Climatic> Check Data> Extremes
  173. Climatic> Check Data> Climdex
  174. Climatic> Check Data> SPI/SPEI
  175. Climatic> Check Data> Evapotranspiration
  176. Climatic> Describe> Rainfall
  177. Climatic> Describe> Temperature
  178. Climatic> Describe> Wind Speed/Direction
  179. Climatic> Describe> Sunshine/Radiation
  180. Climatic> Describe> General
  181. Climatic> NCMP> Indices
  182. Climatic> NCMP> Variogram
  183. Climatic> NCMP> Region Average
  184. Climatic> NCMP> Trend Graphs
  185. Climatic> NCMP> Count Records
  186. Climatic> NCMP> Summary
  187. Climatic> PICSA> Rainfall Graph
  188. Climatic> PICSA> Temperature
  189. Climatic> PICSA> Crops
  190. Climatic> Plot Region
  191. Climatic> Compare> Calculation
  192. Climatic> Compare> Summary
  193. Climatic> Compare> Correlations
  194. Climatic> Compare> Scatterplot
  195. Climatic> Compare> Time Series Plot
  196. Climatic> Compare> Seasonal Plot
  197. Climatic> Compare> Conditional Quantiles
  198. Climatic> Compare> Taylor Diagram
  199. Climatic> Mapping> Maps
  200. Climatic> Mapping> Check Station Locations
  201. Climatic> Model> Extremes
  202. Climatic> Model> Markov Modelling
  203. Climatic> Seasonal Forecast Support> Cumulative/Exceedance Graph
  204. Procurement> Prepare> Define Contract Value Categories
  205. Procurement> Prepare> Recode Numeric into Quantiles
  206. Procurement> Prepare> Use Award Date (or other)
  207. Procurement> Prepare> Summarise Red Flags by Country (or other)
  208. Procurement> Prepare> Summarise Red Flags by Country and Year (or other)
  209. Procurement> Corruption Risk Index (CRI)> Define Corruption Risk Indexx (CRI)

Generated R Code

One of the aspects that most differentiates the various GUIs for R is the code they generate. If you decide you want to save code, what type of code is best for you? The base R code as provided by the R Commander that can teach you “classic” R? The concise functions that mimic the simplicity of one-step dialogs such as jamovi provides? The completely transparent (and complex) code provided by RKWard, which might be the best for budding R power users?

R-Instat writes a blend of custom functions and functions from the popular tidyverse package. As mentioned previously, it uses ggplot2 for graphics.

Here’s an example of code R-Instat wrote to do a group-by aggregation:

data_book$calculate_summary(data_name="mydata1001", columns_to_summarise=c("pretest","posttest"), factors=c("workshop","gender"), j=1, summaries=c("summary_mean"), silent=TRUE)

Here is an example of code R-Instat wrote to convert my “wide” style dataset to a “long” one. The wide one had measurements at four times, stored in variables named q1 through q4. I wanted those stacked into a single variable named “Score” and the variable names written into a factor called “Time.” The code R-Instat generated is below. It used the tydyr package’s pivot_longer function, just as I would have done. However, that is only one of four function calls used, the rest being for R-Instat’s internal use. Beginners looking to learn R will need to sift through these to figure out which performed the actual task. When I tried to unstack this new dataset back into its original form, the dialog would not accept the variable Time, since it was not a factor. Given that R-Instat had just created it, it should have made it one to ease a “round trip” conversion, which is a fairly common task to perform (variable selections along the way don’t necessarily result in a complete duplicate of the original dataset). The developers plan to make that a factor in a future release.

# Code generated by the dialog, Stack (Pivot Longer)

mydata1001 <- data_book$get_data_frame(data_name="mydata1001")

mydata1001_stacked <- tidyr::pivot_longer(data=mydata1001, cols=c("q1","q2","q3","q4"), names_to="Time",  values_to="Score")

data_book$import_data(data_tables=list(mydata1001_stacked=mydata1001_stacked))

rm(list=c("mydata1001_stacked", "mydata1001"))

Below is an example of R-Instat’s code and output for a simple linear regression. The computations are done using the same functions as any R programmer would choose (i.e. those included with R itself). As before, you need to know which those are to separate them from R-Instat’s internal function calls.

# Code generated by the dialog, Two Variable Fit Model
mydata1001_stacked1 <- data_book$get_data_frame(data_name="mydata1001_stacked1")

attach(what=mydata1001_stacked1)

two_var <- lm(data=mydata1001_stacked1, formula=posttest ~ pretest, na.action=na.exclude)

data_book$add_model(model_name="two_var", model=two_var, data_name="mydata1001_stacked1")
data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var")


Call:
lm(formula = posttest ~ pretest, data = mydata1001_stacked1, 
    na.action = na.exclude)

Coefficients:
(Intercept)      pretest  
     18.665        0.846  


stats::anova(object=data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var"))

Analysis of Variance Table

Response: posttest
           Df Sum Sq Mean Sq F value Pr(>F)
pretest     1   7943    7943     342 <2e-16
Residuals 398   9256      23               

summary(object=data_book$get_models(data_name="mydata1001_stacked1", model_name="two_var"))


Call:
lm(formula = posttest ~ pretest, data = mydata1001_stacked1, 
    na.action = na.exclude)

Residuals:
   Min     1Q Median     3Q    Max 
-9.313 -4.025 -0.435  3.780 11.297 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  18.6647     3.4389    5.43    1e-07
pretest       0.8456     0.0458   18.48   <2e-16

Residual standard error: 4.82 on 398 degrees of freedom
Multiple R-squared:  0.462,	Adjusted R-squared:  0.46 
F-statistic:  342 on 1 and 398 DF,  p-value: <2e-16


detach(name=mydata1001_stacked1, unload=TRUE)

rm(list=c("two_var", "mydata1001_stacked1"))

Support for Programmers

Some of the GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.

R-Instat has a script window that lets you do basic programming. A “Run” button lets you step through a program one line at a time, or you can click “Run All.” There are no additional features such as syntax color-coding, code-completion suggestions, or even search or replace functions. While most GUI users are not likely to write extensive programs, a few more basics would be helpful.

The R-Instat developers view the current script window as being largely for “tweaking” the R command that is generated by each dialog, rather than for writing code from scratch. Dialogs have a “To Script” button populate the script window with working code, ready to be examined and possibly edited before execution. They also have a short guide called “Reading, tweaking and using R commands” to help learn these steps.

Reproducibility & Sharing

One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or if the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written reports so that others can check their work.

As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since the GUI user didn’t write the code, they also can’t read it or change it! Others such as BlueSky, jamovi, and RKWard save the dialog box entries and allow GUI users to have reproducibility in the form they prefer.

R-Instat’s Output window contains all the code created by the GUI. As mentioned above, it goes above and beyond what most GUIs save by including every interactive change a user might make to the data via manual data entry or right-click menus to convert, say, numeric variables to factors. However, it remembers only the details of the last 10 dialogs you run.

If you wish to share your work with a colleague who also uses R-Instat, you would save the contents of your log file (viewable under “View> Log Window”) send them that script and your dataset. They would then edit the path in the R code to point to the location of the data file on their computer.

To share your work with a colleague who uses RStudio, or a similar IDE, you would send them your log and data files. Your colleague would install R-Instat to get its set of custom functions (these are not in an R package on CRAN, though that is the long-term plan). The script saved from the log window includes a pointer to the location of those functions on the person’s hard drive.

Package Management

A topic related to reproducibility is package management. One of the major advantages to the R language is that it’s very easy to extend its capabilities through add-on packages. However, updates in these packages may break a previously functioning analysis. Years from now you may need to run a variation of an analysis, which would require you to find the version of R you used, plus the packages you used at the time. As a GUI user, you’d also need to find the version of the GUI that was compatible with that version of R.

Some GUIs, such as the R Commander and Deducer, depend on you to find and install R. For them, the problem of long-term stability is yours to solve. Others, such as jamovi, distribute their own version of R, and all R packages, but not their add-on modules. This requires a bigger installation file, but it makes dealing with long-term stability simpler. Of course, this depends on all major versions being around for long term, but for open-source software, there are usually multiple archives available to store software even if the original project is defunct.

R-Instats approach to package management is one of the most comprehensive of the R GUIs reviewed here. It provides everything you need in a single download. This includes the R-Instat interface, R itself, and all the necessary R packages. If you have a problem reproducing an R-Instat analysis in the future, all you need to do is download the version used when you created it.

Output & Report Writing

Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into the realm of word processing through the use of Markdown or LaTeX. At the moment, you can get publication-quality output from BlueSky, Deducer, jamovi, and JASP. You can also get LaTeX output from BlueSky and jamovi.

Unfortunately, R-Instat’s tabular output is in R’s standard text tables. These must be displayed using a monospaced font to keep the columns lined up. While R packages such as gt, texreg, and xtable exist to convert these tables to publication-quality, that step would require writing R code. The R-Instat developers say they plan to add publication-quality output in a future version.

Group-By Analyses

Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide an ability to select a subset of one group to analyze, then another subset to compare it to. All the R GUIs reviewed in this series can do this task. R-Instat does single-group selections by offering to filter rows using the “Data Options” button that appears in every dialog. It generates a subset that you can analyze in the same way as the entire dataset.

Software also needs the ability to automate such selections so that you might generate dozens of analyses or graphs, one group at a time. This feature has been available in commercial GUIs for decades (e.g. SPSS split-file, SAS BY). R-Instat does not offer such a feature.

Output Management

Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g. predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g. SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, with the simplification that GUIs provide, that presents a challenge.

Output data can be observation-level, such as predicted values for each observation or case.  When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group, rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).

Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling” above.)

For example, in our organization, we have 250 departments and want to see if any of them have a gender bias on salary. We write all 250 regression models to a data set and then search to find those whose gender parameter is significant (hoping to find none, of course!)

R-Instat does all three levels of output management. To use this function, choose “Model> Use Model> Glance/Tidy/Augment”. While the code to repeat this for the levels of one or more grouping variables is fairly easy to implement, the dialog doesn’t currently offer that feature.

Developer Issues

The R-Instat team welcomes people who are willing to contribute to the project. You can submit bug reports or even copy the entire set of source code at the project’s GitHub site: https://github.com/africanmathsinitiative/R-Instat/. Information and guides for developers and contributors is available on the GitHub Wiki: https://github.com/africanmathsinitiative/R-Instat/wiki

Conclusion

R-Instat offers one of the most extensive collections of data wrangling, graphics, and statistical analysis methods of any R GUI. Its data wrangling dialogs are simple to use and require no knowledge of R code. At a basic level, its graphics and modeling dialogs are also easy to use. However, to use its full modeling capabilities, you need to know what R’s packages (e.g. MASS) are and what each one’s functions (e.g. rlm) do. For an R programmer, recognizing a known package::function combination is much easier than recalling it without assistance. Such a user would find R-Instat’s GUI extremely helpful. R-Instat’s ability to add ggplot2 layers allows you to create a graph of nearly unlimited flexibility. But you need to learn the difference between functions like geom_line and geom_smooth to take full advantage of it.

R-Instat’s offering in climate analysis is unique among R GUIs, and a quick search on Google Scholar shows that it is being widely used with such data. R-Instat’s focus is specifically on frequentist statistics rather than Bayesian, and it does not yet offer any machine learning or artificial intelligence methods. R-Instat’s developers are currently working to include some machine learning methods using the caret package, particularly for the teaching of data science.

R-Instat’s output is in standard R text tables, rather than the journal-style word processing tables that are such a time-saver in other R GUIs.

If you have some R programming background, or are looking to learn R code, R-Instat may be just what you need to get started.

For a summary of all my R GUI software reviews, see the article, R Graphical User Interface Comparison.

Acknowledgments

Thanks to the R-Instat team who have done a lot of hard work and for making it free and open source. Thanks to David Stern, Roger Stern, and Danny Parsons for clarifying many aspects of R-Instat. Also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.