R GUI Update: BlueSky User’s Guide, New Features

The BlueSky Statistics graphical user interface (GUI) for the R language has added quite a few new features (described below). I’m also working on a BlueSky User Guide, a draft of which you can read about and download here. [Update: don’t download that, get the full Intro Guide download instead.] Although I’m spending a lot of time on BlueSky, I still plan to be as obsessive as ever about reviewing all (or nearly all) of the R GUIs, which is summarized here.

The new data management features in BlueSky are:

Date Order Check — this lets you quickly check across the dates stored in many variables, and it reports if it finds any rows whose dates are not always increasing from left to right.
Find Duplicates – generates a report of duplicates and saves a copy of the data set from which the duplicates are removed. Duplicates can be based on all variables, or a set of just ID variables.
Select First/Last Observation per Group – finding the first or last observation in a group can create new datasets from the “best” or “worst” case in each group, find the most current record, and so on.

Model Fitting / Tuning

One of the more interesting features in BlueSky is its offering of what they call Model Fitting and Model Tuning. Model Fitting gives you direct control over the R function that does the work. That provides precise control over every setting, and it can teach you the code that the menus create, but it also means that model tuning is up to you to do. However, it does standardize scoring so that you do not have to keep up with the wide range of parameters that each of those functions need for scoring. Model Tuning controls models through the caret package, which lets you do things like K-fold cross-validation and model tuning. However, it does not allow control over every model setting.

New Model Fitting menu items are:

Cox Proportional Hazards Model: Cox Single Model
Cox Multiple Models
Cox with Formula
Cox Stratified Model
Extreme Gradient Boosting
KNN
Mixed Models
Neural Nets: Multi-layer Perceptron
NeuralNets (i.e. the package of that name)
Quantile Regression

There are so many Model Tuning entries that it’s easier to just paste in the list I updated on the main BlueSkly review that I updated earlier this morning:

Model Tuning: Adaboost Classification Trees
Model Tuning: Bagged Logic Regression
Model Tuning: Bayesian Ridge Regression
Model Tuning: Boosted trees: gbm
Model Tuning: Boosted trees: xgbtree
Model Tuning: Boosted trees: C5.0
Model Tuning: Bootstrap Resample
Model Tuning: Decision trees: C5.0tree
Model Tuning: Decision trees: ctree
Model Tuning: Decision trees: rpart (CART)
Model Tuning: K-fold Cross-Validation
Model Tuning: K Nearest Neighbors
Model Tuning: Leave One Out Cross-Validation
Model Tuning: Linear Regression: lm
Model Tuning: Linear Regression: lmStepAIC
Model Tuning: Logistic Regression: glm
Model Tuning: Logistic Regression: glmnet
Model Tuning: Multi-variate Adaptive Regression Splines (MARS via earth package)
Model Tuning: Naive Bayes
Model Tuning: Neural Network: nnet
Model Tuning: Neural Network: neuralnet
Model Tuning: Neural Network: dnn (Deep Neural Net)
Model Tuning: Neural Network: rbf
Model Tuning: Neural Network: mlp
Model Tuning: Random Forest: rf
Model Tuning: Random Forest: cforest (uses ctree algorithm)
Model Tuning: Random Forest: ranger
Model Tuning: Repeated K-fold Cross-Validation
Model Tuning: Robust Linear Regression: rlm
Model Tuning: Support Vector Machines: svmLinear
Model Tuning: Support Vector Machines: svmRadial
Model Tuning: Support Vector Machines: svmPoly

You can download the free open-source version from https://BlueSkyStatistics.com.

Author

Bob Muenchen

View all posts

5 thoughts on “R GUI Update: BlueSky User’s Guide, New Features”

Wow, impressive work! Would it make sense to integrate the OneR package as a baseline model for classification? See also: https://cran.r-project.org/web/packages/OneR/vignettes/OneR.html or for a quick intro: https://blog.ephorie.de/oner-machine-learning-in-under-one-minute.

Bob Muenchen says:

August 4, 2020 at 3:13 PM

Hi Holger,

I’m familiar with OneR and I’m very much in favor of its goal. However, I’m under the impression that CORELS is almost certain to find a better model that is just as interpretable. Check out the video here: https://www.youtube.com/watch?v=ebJHnDLLTKA. This has only been added to R in the past month or so: https://github.com/eddelbuettel/tidycorels.

Cheers,
Bob

Reply
1. Holger von Jouanne-Diedrich says:
  
  August 12, 2020 at 6:15 AM
  
  Hi Bob,
  
  I would respectfully disagree. Just take the example given there:
  
  OneR gives:
  —
  Rules:
  If wt = (1.51,2.99] then am = 1
  If wt = (2.99,5.43] then am = 0
  
  Accuracy:
  29 of 32 instances classified correctly (90.62%)
  —
  
  Very high accuracy and rules are simpler!
  
  best
  h
  
  Reply
  1. Bob Muenchen says:
    
    August 12, 2020 at 8:50 AM
    
    Hi Holger,
    
    Thanks for the comparison. That CORELS example got 31/32 or 97% accuracy, though using a more complex rule set. One of the problems with CORELS is that all continuous variables must be dichotomized. Tidymodels makes that easy to do, but then that step is moved from the modeling step to the data preparation step. That made me wonder if combining CORELS with another method that chose better cut-points for the numeric variables wouldn’t improve its accuracy. Also, by manually setting labels for the dummy variables, the rules could be more easily interpreted, like “If wt=heavy…” instead of “if wt=bin1…”.
    
    Cheers,
    Bob
    
    Reply
    1. Holger von Jouanne-Diedrich says:
      
      August 14, 2020 at 3:57 AM
      
      Yes, in OneR all of this is done in two lines of code and completely automatically:
      
      —
      data <- optbin(am ~., data = mtcars)
      OneR(data, verbose = TRUE)
      —
      
      Also, the exact cut-points are optimized on the fly.

Holger von Jouanne-Diedrich says:

August 4, 2020 at 1:18 PM

Wow, impressive work! Would it make sense to integrate the OneR package as a baseline model for classification? See also: https://cran.r-project.org/web/packages/OneR/vignettes/OneR.html or for a quick intro: https://blog.ephorie.de/oner-machine-learning-in-under-one-minute.

1. Bob Muenchen says:
  
  August 4, 2020 at 3:13 PM
  
  Hi Holger,
  
  I’m familiar with OneR and I’m very much in favor of its goal. However, I’m under the impression that CORELS is almost certain to find a better model that is just as interpretable. Check out the video here: https://www.youtube.com/watch?v=ebJHnDLLTKA. This has only been added to R in the past month or so: https://github.com/eddelbuettel/tidycorels.
  
  Cheers,
  Bob
  
  1. Holger von Jouanne-Diedrich says:
    
    August 12, 2020 at 6:15 AM
    
    Hi Bob,
    
    I would respectfully disagree. Just take the example given there:
    
    OneR gives:
    —
    Rules:
    If wt = (1.51,2.99] then am = 1
    If wt = (2.99,5.43] then am = 0
    
    Accuracy:
    29 of 32 instances classified correctly (90.62%)
    —
    
    Very high accuracy and rules are simpler!
    
    best
    h
    
    1. Bob Muenchen says:
      
      August 12, 2020 at 8:50 AM
      
      Hi Holger,
      
      Thanks for the comparison. That CORELS example got 31/32 or 97% accuracy, though using a more complex rule set. One of the problems with CORELS is that all continuous variables must be dichotomized. Tidymodels makes that easy to do, but then that step is moved from the modeling step to the data preparation step. That made me wonder if combining CORELS with another method that chose better cut-points for the numeric variables wouldn’t improve its accuracy. Also, by manually setting labels for the dummy variables, the rules could be more easily interpreted, like “If wt=heavy…” instead of “if wt=bin1…”.
      
      Cheers,
      Bob
      
      1. Holger von Jouanne-Diedrich says:
        
        August 14, 2020 at 3:57 AM
        
        Yes, in OneR all of this is done in two lines of code and completely automatically:
        
        —
        data <- optbin(am ~., data = mtcars)
        OneR(data, verbose = TRUE)
        —
        
        Also, the exact cut-points are optimized on the fly.

R GUI Update: BlueSky User’s Guide, New Features

Author

Like this:

Related

5 thoughts on “R GUI Update: BlueSky User’s Guide, New Features”

Leave a ReplyCancel reply

Author

Share this:

Like this:

Related

5 thoughts on “R GUI Update: BlueSky User’s Guide, New Features”

Leave a ReplyCancel reply