The BlueSky Statistics graphical user interface (GUI) for the R language has added quite a few new features (described below). I’m also working on a *BlueSky User’s Guide*, a draft of which you can read about and download here. Although I’m spending a lot of time on BlueSky, I still plan to be as obsessive as ever about reviewing all (or nearly all) of the R GUIs, which is summarized here.

The new data management features in BlueSky are:

- Date Order Check — this lets you quickly check across the dates stored in many variables, and it reports if it finds any rows whose dates are not always increasing from left to right.
- Find Duplicates – generates a report of duplicates and saves a copy of the data set from which the duplicates are removed. Duplicates can be based on all variables, or a set of just ID variables.
- Select First/Last Observation per Group – finding the first or last observation in a group can create new datasets from the “best” or “worst” case in each group, find the most current record, and so on.

**Model Fitting / Tuning**

One of the more interesting features in BlueSky is its offering of what they call Model Fitting and Model Tuning. Model Fitting gives you direct control over the R function that does the work. That provides precise control over every setting, and it can teach you the code that the menus create, but it also means that model tuning is up to you to do. However, it does standardize scoring so that you do not have to keep up with the wide range of parameters that each of those functions need for scoring. Model Tuning controls models through the caret package, which lets you do things like K-fold cross-validation and model tuning. However, it does not allow control over *every *model setting.

New Model Fitting menu items are:

- Cox Proportional Hazards Model: Cox Single Model
- Cox Multiple Models
- Cox with Formula
- Cox Stratified Model
- Extreme Gradient Boosting
- KNN
- Mixed Models
- Neural Nets: Multi-layer Perceptron
- NeuralNets (i.e. the package of that name)
- Quantile Regression

There are so many Model Tuning entries that it’s easier to just paste in the list I updated on the main BlueSkly review that I updated earlier this morning:

- Model Tuning: Adaboost Classification Trees
- Model Tuning: Bagged Logic Regression
- Model Tuning: Bayesian Ridge Regression
- Model Tuning: Boosted trees: gbm
- Model Tuning: Boosted trees: xgbtree
- Model Tuning: Boosted trees: C5.0
- Model Tuning: Bootstrap Resample
- Model Tuning: Decision trees: C5.0tree
- Model Tuning: Decision trees: ctree
- Model Tuning: Decision trees: rpart (CART)
- Model Tuning: K-fold Cross-Validation
- Model Tuning: K Nearest Neighbors
- Model Tuning: Leave One Out Cross-Validation
- Model Tuning: Linear Regression: lm
- Model Tuning: Linear Regression: lmStepAIC
- Model Tuning: Logistic Regression: glm
- Model Tuning: Logistic Regression: glmnet
- Model Tuning: Multi-variate Adaptive Regression Splines (MARS via earth package)
- Model Tuning: Naive Bayes
- Model Tuning: Neural Network: nnet
- Model Tuning: Neural Network: neuralnet
- Model Tuning: Neural Network: dnn (Deep Neural Net)
- Model Tuning: Neural Network: rbf
- Model Tuning: Neural Network: mlp
- Model Tuning: Random Forest: rf
- Model Tuning: Random Forest: cforest (uses ctree algorithm)
- Model Tuning: Random Forest: ranger
- Model Tuning: Repeated K-fold Cross-Validation
- Model Tuning: Robust Linear Regression: rlm
- Model Tuning: Support Vector Machines: svmLinear
- Model Tuning: Support Vector Machines: svmRadial
- Model Tuning: Support Vector Machines: svmPoly

You can download the free open-source version from https://BlueSkyStatistics.com.