R AnalyticFlow (RAF) is a free and open source graphical user interface (GUI) for the R language that focuses on beginners looking to point-and-click their way through analyses. What sets it apart from the other half-dozen GUIs for R is that it uses a flowchart-like workflow diagram to control the analysis instead of only menus. In my first programming class back in the Pleistocene Era, my professor told us to never begin a program without doing a flowchart of what you were trying to accomplish. With workflow tools, you get the benefit of the diagram outlining the big picture, while the dialog box settings in each node control what happens at each step. In Figure 1 you can get a good idea of what is happening without any further information.
Another advantage you get with most workflow tools is the ability to reuse workflows very easily because the dataset is read in only once at the beginning. Unfortunately, most of that advantage is missing from R AnalyticFlow (hereafter, “RAF”) since you must specify which dataset is used in every node. The downside to workflow tools is that they’re slightly harder to learn than menu-based systems. This involves learning how to draw a diagram, what flows through it (e.g. datasets, models), and how to generate a single comprehensive reports for the entire analysis.
This post is one of a series of comparative reviews which aim to help non-programmers choose the GUI that is best for them. The reviews all follow a standard template to make comparisons across products easier. These reviews also include a cursory description of the programming support that each GUI offers.
There are various definitions of user interface types, so here’s how I’ll be using these terms:
GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users are people who prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.
IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.
The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as BlueSky Statistics, jamovi, and RKWard, install in a single step. Others, such as Deducer, install in multiple steps (up to seven steps, depending on your needs). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The Help Desks at most universities are flooded with such calls at the beginning of each semester!
RAF is available for Mac, and Linux. Its installation takes four steps:
- Install Java, if you don’t already have it installed. This can be tricky as you must match the type of Java to the type of R you use. Most computers these days have 64-bit operating systems. Whether 32-bit or 64-bit, you must use the same “bitness” on all of these steps, or it will not work.
- Next, install R if you haven’t already (available here).
- Install RAF itself after downloading it from here.
- Start RAF. It will prompt you to install some R packages, notably rJava. This step requires Internet access. To install if you don’t have such access, see the RAF website’s About R Packages section for important details on how to proceed (from another machine that does have Internet access, of course).
When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins" which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Deducer) through moderate (jamovi) to very active (R Commander).
RAF does not offer any plug-in modules, though its developers do provide instruction on how you can create your own.
Some user interfaces for R, such as BlueSky and jamovi, start by double-clicking on a single icon, which is great for people who prefer to not write code. Others, such as R Commander and JGR, have you start R, then load a package from your library, and then call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.
You start RAF directly by double-clicking its icon from your desktop or choosing it from your Start Menu (i.e. not from within R itself). On my system, I had to right-click the icon and choose, “Run as Administrator” or I would get the message, “Failed to Launch R. Confirm Settings?” If I responded “Yes”, it showed the path to my installation of R, which was already correct. I tried a second computer and it did start, but when it tried to install the JavaGD and rJava packages, it said, “Warning in install.packages (c(“JavaGD”,”rJava”)) : ‘lib = “C:/Program Files/R/R-3.6.1/library” ‘ is not writable. Would you like to use a personal library instead?”
Upon startup, it displays its startup screen, shown in Figure 2. Quick Start puts you into the software with a new Flow window open. New Project starts a new workflow, and Bookmarks give you quick access to existing workflows.
A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.
To start entering data, choose “Input> Enter Data” and drag the selection onto the workflow editor window. An empty spreadsheet will appear (Figure 3). You can enter variable names on the first line if you check the “Header: Use 1st Row” box at the bottom of the window. This is the first hint you’ll see that RAF leans on R terminology that can be somewhat esoteric. RAF’s developers could have labeled this choice as “Column Names” but went with the R terminology of “Header” instead. This approach may be confusing for beginners, but if their goal is to learn R, it will help in the long run.
To enter factors (R’s categorical variables), choose the “Options” tab and check, “Convert Characters to Factors”, then RAF will convert the character string variables you enter to factors. Otherwise, it will leave them as characters. Dates remain stored as characters; you have to use “Processing> Set Data Type” node to change them, and they must be entered in the form yyyy-mm-dd.
There is no limit to the number of rows and columns you can enter initially. However, once you choose “Run”, the data frame is created and can no longer be edited!
Saving the workflow is done with the standard “File > Save As" menu. You must save each one to its own file. To save the flow and the various objects that it uses such as data frames and models, use “Project > Export". When receiving a project from a colleague, use “Project> Import" to begin using it.
To analyze data, you must first read it. While many R GUIs can import a wide range of data formats such as files created by other statistics programs and databases, RAF can import only text and R objects.
RAF’s text import feature is well done. Once you select an Input File, it quickly scans the file and figures out if variable names are present, the delimiters it uses to separate the columns, and so on. It then displays a “preview” (Figure 4, bottom). It does this quickly since its preview is only on the first 100 rows of data. If the preview displays errors, you then manually change the settings and check the preview until it’s correct. When the preview looks good, you click, “Run”, it will then read all the data.
The ability to export data to a wide range of file types helps when you, or other members of your research team, have to use multiple tools to complete a task. Unfortunately, this is a very weak area for R GUIs. Deducer offers no data export at all, and R Commander, and rattle can export only delimited text files (an earlier version of this listed jamovi as having very limited data export; that has now been expanded). Only BlueSky offers a fairly comprehensive set of export options. Unfortunately, RAF falls into the former group, being able only to export data in text and R object files.
It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates need to be manipulated; missing values need to be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g. from wide to long and back). A critically important aspect of data management is the ability to transform many variables at once. For example, social scientists need to recode many survey items, biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time can be tedious. Some GUIs, such as jamovi and RKWard handle only a few of these functions. Others, such as BlueSky and the R Commander, can handle many, but not all, of them.
RAF handles a fairly basic set of data management tools:
- Add/Edit Columns
- Rename – Variables in a data frame)
- Set Data Type
- Select Rows
- Select Columns
- Missing Values – Sets values as missing, no imputation)
- Merge – Various joins
- Merge – Adds rows
- Manage Objects (copies, deletes, renames)
Workflows, Menus & Dialog Boxes
The goal of pointing & clicking your way through an analysis is to save time by recognizing dialog box settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as BlueSky and jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.
RAF uses a unique interface. There are two ways to add build a workflow that guides your analysis. First, you can click on a toolbar icon, which drops down a menu. Click on a selection, and – without releasing the mouse button – drag your selection onto the flow window. In that case, the dialog box with its options opens below the flow area (Figure 3, bottom right).
The second way to use it is to click on a toolbar icon, drop down its menu, click on a selection and immediately release the mouse button. This causes the dialog box to appear floating in the middle of the screen (not shown). When you finish choosing your settings, there is a “Drag to Add” button at the top of the dialog. Clicking that button causes the dialog box to collapse into an icon which you can then drag onto the workflow surface.
Regardless of which method you choose, if you drop the new icon onto the top of one that is already in the workflow, it will move the new icon to the right and draw an arrow (called an “edge”) connecting the older one to the new. If you don’t drop it onto an icon that’s already in your workflow, you can add a connecting arrow later by clicking on the first icon, then choose “Draw Edge” and an arrow will appear aimed to the right (workflows go mostly left to right). The arrow will float around as you move your mouse, until you click on the second icon. A third way to connect the nodes in a flow is to click one icon, hold the Alt key down, then drag to the second icon.
Figure 3 shows the entire RAF window. On the top right is the workflow. Here are the steps I followed to create it:
- I chose “Input> Read Text File” and dragged it onto the workflow. The icon’s settings appeared in the bottom right window.
- I filled in the dialog box’s settings, then clicked “Run”. It named the icon after the file mydata.csv and a spreadsheet appeared in the upper-right.
- I chose “Statistics> Cross Tabulation”, and dragged its icon onto the data icon.
- I clicked the downward-facing arrow in the “Group By” box, and chose the variables. The first one I chose (workshop) formed the rows and the second (gender) formed the columns. Unlike most GUIs, there’s no indication of row and column roles.
- I clicked “Run Node” at the top of the cross tabulation dialog box. The cross tabulation output appeared in the upper left window (right half). The code that RAF wrote to perform the task appears in the R Console window in the lower left.
You can run an entire flow by clicking “Run Flow” at the top left of the Flow window. While describing the process of building a workflow is tedious, learning to build one is quite easy to learn.
The goal of using a GUI is to make analysis easy, so GUI dialog boxes are usually quite simple to use and include everything that’s relevant within a single box. I looked at all the options in this dialog but could not find one to do a very common test for such a cross-tabulation table: the chi-squared test. RAF uses an aspect of R objects that ends up essentially creating two different types of dialog boxes in separate parts of its interface. R objects contain multiple bits of output. You can display them using generic R functions such as summary() and print(). The output window has radio buttons for those functions (Figure 3, right above the cross-tabulation table). Clicking the “summary” button will call R’s summary() function to display the chi-squared results where the table is currently shown. To study the pattern in the table and the chi-squared results requires clicking back and forth on Table and summary; you can’t get them to both appear on your screen at the same time.
Correlations provide another example. The statistics are shown, but their p-values are not shown until you click on the “summary” button. This approach is confusing for beginners, but good for people wishing to learn R.
A common data analysis task is repeating the same analysis across many variables. For example, you might want to repeat the above cross tabulation (or t-tests, etc.) on many variables at once. This is usually quite easy to accomplish in most GUIs, but not in RAF. Since R’s functions may not offer that ability without using R’s “apply” family of functions (or loops), and RAF does not support such functions, such simple tasks become quite a lot of work when using RAF. You need to add an node to your flow for each and every variable!
Each dialog box has an “Advanced” tab which allows you to enter the name of any R argument(s) in one column, and any value(s) you would like to pass to that argument in another. That’s a nice way to offer graphical control over common tasks, while assuring that every task a function is capable of is still available.
In a complex analysis, workflows can become quite complex and hard to read. A solution to this problem is the concept of a “metanode”. Metanodes allow you t take an entire section of your workflow and collapse it into what appears to be a single node. For example, you might commonly use eight nodes to prepare a dataset for analysis. You could combine all eight into a new node you call “Data Prep”, greatly simplifying the workflow. Unfortunately, RAF does not offer metanodes, as do other workflow-driven data science tools such as KNIME and RapidMiner.
One of the most surprising aspects of RAF’s workflow style is that every node specifies its input and output objects. That means that you can run any analysis with no connecting arrows in your diagram! Rather than be a required feature as with many workflow-based tools, in RAF they offer only the convenience of re-running an entire flow at once.
During GUI-driven analysis, the fact that R is doing the work is quite obvious as the code and any resulting messages appear in the Console window.
Documentation & Training
R GUIs provide simple task-by-task dialog boxes that generate much more complex code. So for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exception is the R Commander, which lacks help on the dialog boxes themselves.
The level of help that RAF offers is only the built-in R help file for the particular function you’re using. However, I had problems with the help getting stuck and showing me the help file from previous tasks rather than the one I was currently using.
The various GUIs available for R handle graphics in several ways. Some, such as R Commander and RKWard, focus on R’s built-in graphics. Others, such as BlueSky Statistics use the popular ggplot2 package. Still others, such as jamovi, use their own functions and integrate them into analysis steps.
GUIs also differ quite a lot in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. That’s how BlueSky and jamovi work.
RAF uses the very flexible lattice package for all of its graphics. That makes it particularly easy to display “small multiples” of the same plot repeated by levels of another variable or two. There does not appear to be any way to control the style of the plots.