by Robert A. Muenchen
Note that there is no “Graphics” menu as nearly all the other GUIs offer. Rattle’s graphics are done within the analysis steps (jamovi uses this approach too). The exception this comes from its unique integration with the ggraptR software. That application that allows you to create complex plots interactively using the ggplot2 package. Choosing “Explore> Interactive> GGRaptR” displays the the interface shown in Figure 7. This application is nearly as powerful as Deducer’s Plot Builder plug-in, missing only the ability to add additional layers to a plot (e.g. jittered points on top of a boxplot).
# Build the Decision Tree model. crs$rpart <- rpart(gender ~ ., data=crs$dataset[crs$train, c(crs$input, crs$target)], method="class", parms=list(split="information"), control=rpart.control(usesurrogate=0, maxsurrogate=0))
Support for Programmers
Some of the GUIs reviewed in this series of articles include extensive support for programmers. For example, RKWard offers much of the power of Integrated Development Environments (IDEs) such as RStudio or Eclipse StatET. Others, such as jamovi or the R Commander, offer little more than a simple text editor.
Rattle works as an adjunct to R IDEs, so it doesn’t include any support for submitting its own code. It will, however, allow you to edit its “Log” in a simple text editor before saving it or pasting it into an IDE.
The developer offers support for programmers in another way, with templates to help get started on programming in the form of programming “templates” and “one-pagers” here. R https://essentials.togaware.com/
Reproducibility & Sharing
One of the biggest challenges that GUI users face is being able to reproduce their work. Reproducibility is useful for re-running everything on the same dataset if you find a data entry error. It’s also useful for applying your work to new datasets so long as they use the same variable names (or the software can handle name changes). Some scientific journals ask researchers to submit their files (usually code and data) along with their written report so that others can check their work.
As important a topic as it is, reproducibility is a problem for GUI users, a problem that has only recently been solved by some software developers. Most GUIs (e.g. the R Commander, Rattle) save only code, but since the GUI user didn’t write the code, they also can’t read it or change it! Others such as jamovi, RKWard, and the newest version of SPSS save the dialog box entries and allow GUI users to have reproducibility in the form they prefer.
While Rattle’s project files will save which data set you’re working on, and the roles of each variable, they don’t save anything else. So full reproducibility of your work in Rattle requires saving the R code that it creates and stores in its “Log” tab. There’s no way to get back to re-populate all the tab settings when starting from the saved code.
If you wish to share your work with a colleague, you would send them your project file and your data set. You could also save the contents of the “Log” tab and send them the complete R code, since running that is the only way they’ll see a cumulative output file (See Output & Report Writing below).
Since Rattle contains few custom functions, there’s a good chance they could run your code directly. However, it would be wise for them to install the rattle package for the few custom functions, such as the popular decision tree viewer, fancyRpartPlot.
Output & Report Writing
Ideally, output should be clearly labeled, well organized, and of publication quality. It might also delve into the realm of word processing through Sweave/knitr and Rmarkdown documents. At the moment, none of the GUIs covered in this series of reviews meets all of these requirements. See the separate reviews to see how each of the other packages is doing on this topic.
As soon as you click the Execute button (or F2), the output from your chosen analysis appears in the bottom of Rattle’s main control screen. It’s R’s standard monospaced output with no additional formatting. As you do each task, Rattle replaces the contents of the output window rather than append it to the bottom of the previous output as is usually the case in other R GUIs. If you want a cumulative report, you have to cut and paste it into a word processor as you go, or save the R code from the “Log” tab to execute in the R console of your choosing.
Repeating an analysis on different groups of observations is a core task in data science. Software needs to provide an ability to select a subset one group to analyze, then another subset to compare it to. Of the six GUIs for R reviewed in this series, Rattle was the only one that lacked this fundamental ability. You would have to use R code or some other tool to break your data into subsets before reading each into Rattle one at a time.
Software also needs the ability to automate such selections so that you might generate dozens of analyses, one group at a time. While this has been available in commercial GUIs for decades (e.g. SPSS split-file), Rattle does not offer it. Of the GUIs reviewed, BlueSky Statistics offers it.
Early in the development of statistical software, developers tried to guess what output would be important to save to a new dataset (e.g. predicted values, factor scores), and the ability to save such output was built into the analysis procedures themselves. However, researchers were far more creative than the developers anticipated. To better meet their needs, output management systems were created and tacked on to existing tools (e.g. SAS’ Output Delivery System, SPSS’ Output Management System). One of R’s greatest strengths is that every bit of output can be readily used as input. However, for the simplification that GUIs provide, that’s a challenge.
Output data can be observation-level, such as predicted values for each observation or case. When group-by analyses are run, the output data can also be observation-level, but now the (e.g.) predicted values would be created by individual models for each group, rather than one model based on the entire original data set (perhaps with group included as a set of indicator variables).
Group-by analyses can also create model-level data sets, such as one R-squared value for each group’s model. They can also create parameter-level data sets, such as the p-value for each regression parameter for each group’s model. (Saving and using single models is covered under “Modeling" above.)
For example, in our organization, we have 250 departments and want to see if any of them have a gender bias on salary. We write all 250 regression models to a data set, and then search to find those whose gender parameter is significant (hoping to find none, of course!)
Rattle offers only observation-level output management, and even then it is limited to predicted values or predicted group membership.
You can provide feedback and request new features at this web site. However, there is not an easy way to write your own plug-ins to Rattle as there is for many of the other GUIs reviewed in this series.
As a tool to get users started with R and data mining / machine learning, Rattle is easy to learn and quick to use. For an introductory class on data mining or machine learning, using it would allow the class to focus on learning the field rather than learning how to use software. People wanting to move beyond pointing-and-clicking will find Rattle provides extremely well-documented code to learn from.
While there are several R GUIs that offer more statistical methods than Rattle, only two of them offer a similar range of data mining tools. The R Commander has a plug-in named RcmdrPlugin.OptimClassifier that offers many of Rattle’s methods. BlueSky Statistics also offers a similar set of methods, and it can optimize them via the caret package. Both of those packages offer much more comprehensive coverage of both data management and statistical methods. However, only Rattle includes interactive visualization via GGobi, and it comes with an extensive a set of graphics focused on comparing multiple models.
Rattle is the second most popular R GUI overall, and its use has been growing steadily since its introduction. If data mining is your area, or you’re wanting to get started in it, I recommend giving Rattle a try!
Thanks to Graham Williams for the hard work that went into creating Rattle, and for making it freely available to all. Graham also made many suggestions that improved this article. Thanks also to Rachel Ladd, Ruben Ortiz, Christina Peterson, and Josh Price for their editorial suggestions.