by Robert A. Muenchen, updated September 23, 2024
Rattle is being converted from an older interface that is no longer fully functional. The new version does much of what the older one did and, when finished, should closely match its ability. The review below is a blend of old and new, which I’ll fix as soon as the new one is complete.
Introduction
Rattle is a popular free and open-source Graphical User Interface (GUI) for the R software, one that focuses on beginners looking to point-and-click their way through machine learning tasks. Such tasks are also referred to as artificial intelligence or predictive analytics. Rattle’s name is an acronym for “R Analytical Tool To Learn Easily.” Rattle is available on Windows, Mac, and Linux systems.
This is one of a series of reviews that aim to help non-programmers choose the Graphical User Interface (GUI) for R that is best for them. Additionally, these reviews include cursory descriptions of each GUI’s programming support. I have joined the BlueSky Statistics development team and have written the BlueSky User Guide (online here), but you can trust this review series, as described here. All my comments below are easily verifiable. There is no perfect user interface for everyone; each GUI for R has features that appeal to different people.
Terminology
There are various definitions of user interface types, so here’s how I’ll be using the following terms. Reviewing R GUIs keeps me quite busy, so I don’t have time also to review all the IDEs, though my favorite is RStudio.
GUI = Graphical User Interface using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So, GUI users prefer using a GUI to perform their analyses. They don’t have the time or inclination to become good programmers.
IDE = Integrated Development Environment, which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.
Installation
The various user interfaces available for R differ greatly in how they’re installed. Some, such as jamovi or RKWard, install in a single step. Others install in multiple steps, such as R Commander (two steps) and Deducer (up to seven steps). Advanced computer users often don’t appreciate how lost beginners can become while attempting even a simple installation. The Help Desks at most universities are flooded with such calls at the beginning of each semester!
Rattle uses a two-step installation. First, you download and install R. Second, you download and install Rattle. Mac users have an additional step of installing XQuartz. These steps, and the ones to install on Linux are covered on the Togaware website: https://rattle.togaware.com/.
Plug-in Modules
When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling sections of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins,” which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, Deducer) through moderate (jamovi) to very active (R Commander).
Rattle’s complete capability was designed and programmed by Graham Williams of Togaware. As a result, it doesn’t have plug-ins, but it does include a comprehensive set of machine learning tools.
Startup
Some user interfaces for R, such as jamovi, start by double-clicking on a single icon, which is great for people who prefer not to write code. Others, such as R commander and JGR, have you start R, then load a package from your library and call a function. That’s better for people looking to learn R, as those are among the first tasks they’ll have to learn anyway.
Rattle starts in a single step. You double-click the rattle.app icon, and it handles its interface with R behind the scenes.
Data Editor
A data editor is a fundamental feature in data analysis software. It puts you in touch with your data and lets you get a feel for it, if only in a rough way. A data editor is such a simple concept that you might think there would be hardly any differences in how they work in different GUIs. While there are technical differences, to a beginner, what matters the most are the differences in simplicity. Some GUIs, including jamovi, let you create only what R calls a data frame. They use more common terminology and call it a data set: you create one, you save one, later you open one, then you use one. Others, such as RKWard, trade this simplicity for the full R language perspective: a data set is stored in a workspace. So the process goes: you create a data set, you save a workspace, you open a workspace, and choose a data set from within it.
Rattle is unique as it does not offer a way to create or edit a data set. It offers only an icon in its upper right corner that opens a window to view the dataset (Fig. 2). Rattle automatically converts variables with fewer than 10 values into “categorical” ones. R would call these factors. You can always recode variables from numeric to categorical (or vice versa) in the “Transform” tab (see Data Management section).
Data Import
Since R GUIs use R to do the work behind the scenes, they often include the ability to read a wide range of files, including SAS, SPSS, and Stata. Some, like BlueSky Statistics, also include the ability to read directly from SQL databases. Of course, you can always use R code to import data from any source and then continue to analyze it using any GUI, but the point of GUIs is to avoid programming.
Since Rattle Next Generation is still in development in September 2024, it reads only command-separated value files. However, I wrote this about the previous version, so I expect those other formats will appear in the coming weeks:
Rattle skips many standard statistical data formats, but it includes a couple of rarely supported ones, such as the Attribute-Relation File Format used by other data mining tools. It also includes “corpus,” which imports text documents, and it then performs the popular tf-idf calculation to prepare them for analysis using the other numerically-based analysis methods.
On its “Data” tab, Rattle offers several formats:
- File: CSV
- File: TXT
- File: Excel
- Attribute-Relation File Format (ARFF)
- Open Database Connectivity (ODBC)
- R Dataset
- RData File
- Library
- Corpus (for text analysis)
- Script
Data Management
It’s often said that 80% of data analysis time is spent preparing the data. Variables need to be transformed, recoded, or created; strings and dates must be manipulated; missing values must be handled; datasets need to be stacked or merged, aggregated, transposed, or reshaped (e.g., from wide to long and back). A critical aspect of data management is the ability to transform many variables simultaneously. For example, social scientists need to recode many survey items; biologists need to take the logarithms of many variables. Doing these types of tasks one variable at a time can be tedious. Some GUIs, such as jamovi and RKWard, handle only a few of these functions. Others, such as BlueSky Statistics or the R Commander, can handle all, or nearly all, of these tasks.
Rattle’s “Transform” tab cycles through various data management “types.” It offers a minimal set of data management tools. Its designer focused on reading a single data set and making transformations common in data mining projects quick and easy. More complex data management tasks are left to other tools.
When transformed, a new variable is created and named automatically to speed up the process. For example, when I transformed the demo dataset Rainfall variable, it was transformed into “RRC_Rainfall.” The RRC prefix stands for “Recoded, Re-Centered.” Other transformations use abbreviations that reflect the method applied.
Whenever a variable is transformed, its status in the “Data” tab switches from “Input” to “Ignore,” while the transformed version of the variable enters the data with an “Input” role.
As easy as some transformations are, other transformations are impossible. For example, if you had a formula to calculate recommended daily allowances of vitamins, there’s no way to do it. Conditional transformations, those which have different formulas for different subsets of the observations (e.g., daily allowances of vitamins calculated differently for men and women), are also not possible. Here are the available transformations:
Transform> Rescale> Normalize
- Recenter (Z-score)
- Scale 0 to 1
- (Var – Median)/Mean Absolute Deviation (MAD)
- Natural Log
- Log 10
- Matrix (divide all by a constant)
- Rank
- Interval
- Groups
Transform> Impute
- Replace missing with zeros (e.g., requesting nothing gets you nothing)
- Mean
- Median
- Mode
- Constant
Transform> Recode
- Quantiles
- KMeans clusters
- Equal width intervals
- Indicator variables
- Join Categorics
- As Categoric
- As Numeric
Transform> Cleanup
- Delete ignored
- Delete selected
- Delete missing
- Delete obs with missing
Menus & Dialog Boxes
The goal of pointing & clicking your way through analysis is to save time by recognizing menu settings rather than performing the more difficult task of recalling programming commands. Some GUIs, such as jamovi, make this easy by sticking to menu standards and using simpler dialog boxes; others, such as RKWard, use non-standard menus that are unique to it and hence require more learning.
Rattle uses a unique user interface, one that differs from all the other GUIs covered in this series of reviews. Most of the time, I found this interface easy to use. However, for some tasks, the novelty of its style had me searching for the task I needed to complete for quite a while.
As shown in Figure 1, Rattle has a set of tabs down the left side. Their order of appearance from top to bottom mirrors the steps you follow when analyzing data. First, you have to read “Data,” and define the modeling roles variables will play, then “Explore” it, decide how to “Transform” it, choose how to “Model” it, and finally “Evaluate” the model. If you want to learn R programming, the “Script” window will show you how Rattle performed all its steps.
At each step, another set of menus appears, this time at the top. Those show all the options within each step. Choosing one usually opens a window showing an overview of what you can accomplish. In the upper left corner, an action button may say things like “Generate Dataset Summary” or “Build Decision Tree.”
The variable roles you set in the “Data” tab (Fig. 2) determine what’s possible in all the other tabs. Setting the “target” variable (a.k.a. dependent variable), the “input” variables (independent variables), and the variables to “ignore” will maintain those roles on all the following steps in the analysis. If you change those roles, don’t forget to push the activation button again at each step.
When the output appears, only the first page is initially displayed. The arrow icons shown in Fig. 1 allow you to page left or right to see the various output pages. The dots between the arrows indicate the number of pages and the solid dot shows which are currently displayed.
[Not yet implemented in Rattle Next Generation:] Statistical tests are all performed under the “Test” tab (see Figure 4). There, you choose a test with a radio button, then choose one continuous variable in the “Sample 1” menu, and check the “Group By Target:” box. For paired samples, you can enter the second continuous variable in the “Sample 2” box. However, the most popular paired sample T-test is not implemented. Rattle is so focused on its machine learning roots that performing a series of comparisons on different groups requires returning to the Data tab, choosing a new target, returning to the Test tab, and executing it again! All the other R GUIs let you choose the group variable and the continuous variables using the same dialog box, which is much quicker. [Not yet in Rattle Next Generation:] A common form of analysis, cross-tabulation, is located under “Explore> Summary> Crosstab.” Running it will cause all the categorical variables to cross-tabulate with the target variable. There is no way to directly request a cross-tabulation of just two variables as there is in every other GUI covered in this series of reviews. To change which variables are used in cross-tabulations, you could tell Rattle to “ignore” more variables on the Data tab or use “Transform> Recode> As Categoric” to determine which to view as categorical since only that type will be used. The “Log” tab shows that the CrossTable function from the descr package is used to create the cross-tabulations. That function can calculate a chi-squared test which is commonly done for such tables, but the GUI offers no way to make that happen. You would have to learn how to modify the code, copy it to your R console, then submit it to get that test. [Not yet in latest version:] Many of your dialog box settings are saved when you choose “Project> Save as.” That creates a file with the extension of “.rattle”. Projects save all your settings, models, and graphs.Documentation & Training
Rattle is documented extensively in Graham William’s book Data Mining with Rattle and R. I review that excellent book here.
Over 1,400 videos on YouTube show how to accomplish many different tasks using Rattle. However, those use the previous version so it takes a bit of trial and error to use them.
Help
R GUIs provide simple task-by-task dialog boxes, which generate much more complex code. So, for a particular task, you might want to get help on 1) the dialog box’s settings, 2) the custom functions it uses (if any), and 3) the R functions that the custom functions use. Nearly all R GUIs provide all three levels of help when needed. The notable exception is jamovi, which provides no help, and the R Commander, which lacks help on the dialog boxes.
Rattle offers an “Overview” at each combination of main step (left menu) and minor step (top menu). That help also points to the relevant section of the Data Science Survival Guide for more details.
Live links to the documentation for the R functions used in each step appear at the end of many overviews.
Graphics
The various GUIs available for R handle graphics in several ways. Some, such as RKWard, focus on R’s built-in graphics. Others, such as BlueSky Statistics, focus more on graphics from the popular ggplot2 package.
GUIs also differ greatly in how they control the style of the graphs they generate. Ideally, you could set the style once, and then all graphs would follow it. Jamovi and BlueSky Statistics both work that way.
Rattle uses ggplot2 packages to create graphics (e.g., Figure 5). The style is not adjustable without the use of R code, but its use of a white background and and light gray reference lines is attractive and acceptable to most research journals. You could use R code to change their appearance by editing the R code in the “Script” tab and submitting it in the R console.
The types of graphs that Rattle provides follow its focus on data mining, and you create them in the data mining process flow:
- Boxplots
- Histogram / Density
- Cumulative
- Benford
- Pairs (correlation plot)
- Hierarchical (dendrogram of correlations)
- Missing Values
- Principal Components Importance Barplot
- Principal Components Biplot
- Cluster Data (scatterplots of clusters)
- Cluster Discriminant Coordinates
- Cluster Weights Heatmap
- Associate Frequency Plot
- Associate Rule Plot
- Decision Tree Plot
- R’s plot Function on Generalized Linear Models
- Risk Chart
- Cost Curve
- Hand Curve
- Lift Plot
- ROC Plot
- Precision Plot
- Sensitivity Plot
- Predicted vs. Observed
There is no “Graphics” menu, as most of the other GUIs offer. Rattle’s graphics are done within the analysis steps (jamovi and JASP use this approach, too). The exception comes from its unique integration with the ggraptR software. That application allows you to create complex plots interactively using the ggplot2 package. Choosing “Explore> Interactive> GGRaptR” displays the interface shown in Figure 7. This application is nearly as powerful as Deducer’s Plot Builder plug-in, missing only the ability to add additional layers to a plot (e.g., jittered points on top of a boxplot).
[Not yet in latest version:] A more interactive style of plot (not shown) is available under “Explore> Interactive> GGobi.” GGobi offers scatterplots, bar charts, parallel coordinate plots, and projection pursuit tours, all with multiple linked windows and brushing. This lets you select observations in any given plot and see those same observations highlighted in all plots.Modeling
The way statistical models (which R stores in “model objects”) are created and used is an area where R GUIs differ the most. The simplest and least flexible approach is taken by jamovi and RKWard. They try to do everything you might need in a single dialog box. They either don’t save models, or they do nothing with them. To an R programmer, that sounds extreme since R can do many tasks using model objects. However, neither SAS nor SPSS could save models for their first 35 years of existence, so each approach has its merits. Other GUIs, such as the R Commander or Bluesky Statistics, let you save models and have other dialog boxes allowing you to do additional tasks with them.
Rattle creates models within an interactive session, and during that session, it can do various things with them. [Not yet in new version:] For example, “Evaluate> Lift” created the plot shown in Figure 8. You can also use “Evaluate> Score” to open a new data set and use all the models you chose (via checkboxes) to make predictions. The models are saved when you save a project.
An important limitation is that Rattle’s modeling does not allow you to control model formulas; they are strictly additive. For example, if you wanted to add an interaction term in a linear regression, you would have to copy the code from the “Script” tab to the R console, add the model formula you wanted, and then execute it. Once outside the Rattle environment, you would not be able to use the tools on its “Evaluate” tab to compare that model to others.
Analysis Methods
Most of the R GUIs offer a reasonable set of statistical analysis methods. Some also offer machine learning methods too. Since Rattle focuses on data mining and machine learning, let’s look at those first. They include:
- Cluster> KMeans
- Cluster> Entropy-Weighted KMeans (Ewkm)
- Cluster> Hierarchical
- Cluster: BiCLuster
- Association Rules
- Model> Decision Tree> Traditional (rpart)
- Model> Decision Tree> Conditional (ctree)
- Model> Random Forest> Traditional (randomForest)
- Model> Random Forest> Conditional (cforest)
- Model> Boosted Trees> Adaptive (adaboost)
- Model> Boosted Trees> Extreme (xgb)
- Model> Support Vector Machine (SVM)> Radial Basis (rbfdot)
- Model> Support Vector Machine (SVM)> Polynomial (polydot)
- Model> Support Vector Machine (SVM)> Linear (vanilladot)
- Model> Support Vector Machine (SVM)> Hyperbolic Tangent (tanhdot)
- Model> Support Vector Machine (SVM)> Laplacian (laplacedot)
- Model> Support Vector Machine (SVM)> Bessel (besseldot)
- Model> Support Vector Machine (SVM)> ANOVA RBF (anovadot)
- Model> Support Vector Machine (SVM)> Spline (splinedot)
- Model> Linear Models> Numeric
- Model> Linear Models> Generalized
- Model> Linear Models> Poisson
- Model> Linear Models> Logistic
- Model> Linear Models> Probit
- Model> Linear Models> Multinomial
- Neural Network
- Survival
Rattle’s list of statistical tests is relatively sparse, lacking such basics as the chi-squared test, proportion test, paired t-test, and even one-way analysis of variance. The tests included are [Not in new version yet]:
- Kolmogorov-Smirnov (only for two groups)
- Wilcoxon Rank-Sum
- T-test on independent samples
- F-test for variances
- Pearson Correlation
- Spearman Correlation
- Kendall Correlation
- Wilcoxon Signed Rank
Generated R Code
One of the aspects that most differentiates the various GUIs for R is the code they generate. If you decide you want to save code, what type of code is best for you? The base R code provided by the R Commander, which can teach you “classic” R? The “tidyverse” code often used by BlueSky Statistics? The concise functions that mimic the simplicity of one-step dialogs, such as jamovi provides? The completely transparent (and complex) code provided by RKWard, which might be the best for budding R power users?
Rattle uses tidyverse code in combination with classic R code. Since Rattle is focused on machine learning, you might expect it to take advantage of newer packages in that area, such as tidymodels, caret, or mlr, but it doesn’t. That means Rattle is missing useful features such as model tuning or most cross-validation methods. But it also means its code is very simple to use and understand.
Rattle writes date- and time-stamped code that is generously annotated and made easy to read thanks to plenty of blank lines and indented continuation lines. The variables are clearly assigned their roles with code like this:
# The following variable selections have been noted. crs$input <- c("id", "workshop", "q1", "q2", "q3", "q4", "pretest", "posttest") crs$numeric <- c("id", "q1", "q2", "q3", "q4", "pretest", "posttest") crs$categoric <- "workshop"
And then used in analyses like this decision tree:
# Build the Decision Tree model. crs$rpart <- rpart(gender ~ ., data=crs$dataset[crs$train, c(crs$input, crs$target)], method="class", parms=list(split="information"), control=rpart.control(usesurrogate=0, maxsurrogate=0))
Support for Programmers
Some GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.
Rattle includes only an R console window, so people wishing to embellish Rattle’s code will surely use a separate IDE.
The developer offers support for programmers in another way, with templates to help get started on programming in the form of programming “templates” and “one-pagers” here.
Reproducibility
One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written report so that others can check their work.
As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g., the R Commander, SPSS, Minitab) save only code, but since the GUI user didn’t write the code, they also can’t read or change it! Others, such as BlueSky, jamovi, JASP, and RKWard, save the dialog box entries and allow GUI users reproducibility in the preferred form.
[Not yet in new version:] While Rattle’s project files will save which data set you’re working on and the roles of each variable, they don’t save anything else. So full reproducibility of your work in Rattle requires saving the R code that it creates and stores in its “Log” tab. There’s no way to get back to re-populate all the tab settings when starting from the saved code.If you wish to share your work with a colleague, you can send them your project file and your data set. You could also save the contents of the “Script” tab and send them the complete R code since running that is the only way they’ll see a cumulative output file (See Output & Report Writing below).
Since Rattle contains few custom functions, there’s a good chance they could run your code directly. However, it would be wise for them to install the rattle package for a few custom functions, such as the popular decision tree viewer, fancyRpartPlot. [Note to author: verify this is still used.]
Output & Report Writing
Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into word processing through Sweave/knitr and R Markdown documents. Some, like BlueSky and jamovi, have those features. Others do not. See the separate reviews to see how each other package is doing on this topic.
Rattle does not prepare a cumulative report. Each output graph and table appears in its own window. Tables are done in R’s standard monospaced output with no additional formatting. If you want a cumulative report, you have to cut each piece and paste it into a word processor as you go or save the R code from the “Script” tab to execute in the R console of your choosing.
Group-By Analyses
Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide the ability to select a subset of one group to analyze, then another subset to compare it to. Of the GUIs for R reviewed in this series, Rattle was the only one that lacked this fundamental ability. You would have to use R code or another tool to break your data into subsets before reading each into Rattle one at a time.
Software also needs the ability to automate such selections so that you might generate dozens of graphs or analyses, one group at a time. While this has been available in commercial GUIs for decades (e.g., SPSS split-file, SAS BY), Rattle does not offer it. Of the GUIs reviewed, BlueSky Statistics is the only one that does.
Output Management
Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g., predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g., SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, with the simplification that GUIs provide, that’s a challenge.
Output data can be observation-level, such as predicted values for each observation or case. When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group rather than one model based on the entire original data set (perhaps with the group included as a set of indicator variables).
Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling” above.)
For example, in our organization, we have 250 departments and want to see if any have a gender bias on salary. We write all 250 regression models to a data set and then search to find those whose gender parameter is significant (hoping to find none, of course!)
Rattle offers only observation-level output management, and even then, it is limited to predicted values or group membership.
Developer Issues
You can provide Rattle feedback and request new features on this website. However, there is not an easy way to write your own plug-ins to Rattle as there are many other GUIs reviewed in this series.
Conclusion
Just a few years ago, Rattle was easy to recommend for people who wanted to focus on learning machine learning methods. However, now several other R GUIs, such as JASP and BlueSky Statistics, offer more ML methods and are as easy to use. However, the new version is still in development, so I hope it will catch up with the competition.
For a summary of all my R GUI software reviews, see the article, R Graphical User Interface Comparison.
Acknowledgments
Thanks to Graham Williams for the hard work of creating Rattle and for making it freely available to all. Graham also made many suggestions that improved this article. Thanks also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.