Go to content

R and S-PLUS produced different classification trees for predicting patient mortality


Objective — There is a growing interest in using classification and regression trees in biomedical research. R and S-PLUS are two statistical programming languages that share a similar syntax and functionality. Both R and S-PLUS allow users to fit classification and regression trees. The objective was to compare classification trees grown using R with those grown using S-PLUS.

Study Design and Setting — Using data on 9,484 patients hospitalized with an acute myocardial infarction, we compared the classification trees for predicting mortality that were grown using R and S-PLUS. We also used repeated split-sample derivation to determine the predictive accuracy of classification trees grown using R and S-PLUS.

Results — The classification tree grown using R was substantially more parsimonious than the one grown using S-PLUS. The pruned classification tree grown using R was equal to a classification tree that was obtained by removing six subtrees from the pruned classification tree grown using S-PLUS. Repeated split-sample validation was then used to demonstrate that classification trees constructed using S-PLUS had greater discrimination and accuracy compared to classification trees grown using R.

Conclusions — R can produce different classification trees than S-PLUS using the same data.



Austin PC. J Clin Epidemiol. 2008; 61(12):1222-6. Epub 2008 Jul 10.

Contributing ICES Scientists

Research Programs

Associated Sites