It is indeed important to check the source of R algorithms, just as you would do for a SAS or SPSS macro that you got from someone’s web site. New algorithms are usually developed by university professors using R. They go through the peer review process to get published. Then SAS/SPSS/Stata programmers read the journal article and implement those algorithms. So who should you trust more, the algorithm’s inventor or the commercial programmer? I think they are of similar quality.

Cheers,

Bob

But, SAS remained strong. The fact that it could handle massive amounts of data so fast and could do descriptive, charts, and data prep so well were important. SPSS gave point-and-click ease and nicely formatted output.

If R can handle vast amounts of data fast like SAS, provide data prep and manipulation like SAS, and good output, then it could challenge SAS in the commercial space.

before I start using a new algorithm available through R, it has to be tested and validated by experts and users. This reduces the number of useful algorithms available through R

]]>I agree, sorting is yet another topic that is very confusing in R that dplyr makes simple with its arrange() function.

Cheers,

Bob

It just keeps getting better, thanks!

Bob

]]>mean.n <- function(x, n = 2, digits = 2) {

mns <- apply(x, 2, mean, na.rm = TRUE)

nv <- apply(x, 2, function(x) sum(!is.na(x)))

mnz = n, mns, NA)

cat(“Mean: “,sprintf(paste0(“%9.”,digits,”f”),mnz),

“\n N: “,sprintf(“%9s”,nv),”\n”)

}

I really like that approach. One of the things that surprised me the most about R is that the data argument is not supported by all functions. I see mosaic also adds formulas and the data argument to mean, sd, etc. Nice! Too bad they didn’t make na.rm = TRUE the default. Having it set to FALSE by default makes R deal with missing values the reverse of all other stat packages I know.

Cheers,

Bob

> require(mosaic)

> favstats(~ q1, data=df)

min Q1 median Q3 max mean sd n missing

1 1 1 1 1 1 0 3 0

> favstats(~ q2, data=df)

min Q1 median Q3 max mean sd n missing

2 2 2 2 2 2 0 2 1

> favstats(~ q3, data=df)

min Q1 median Q3 max mean sd n missing

3 3 3 3 3 3 NA 1 2

It also supports calculations such as:

favstats(y ~ x, data=df)

]]>