Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data
Date
2021
Authors
Alaqeeli, O.
Xing, L.
Zhang, Xuekui
Journal Title
Journal ISSN
Volume Title
Publisher
Microbiology Research
Abstract
Classification tree is a widely used machine learning method. It has multiple implementations
as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not
the same, and hence their performances differ from one application to another. We are interested
in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this
paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using
cross-validation, we compare packages’ prediction performances based on their Precision, Recall,
F1-score, Area Under the Curve (AUC).We also compared the Complexity and Run-time of these R
packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall,
F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others,
although its complexity is often higher than others.
Description
Keywords
classification tree, single-cell RNA-sequencing, benchmark, precision, recall, F1-score, complexity, area under the curve, run-time
Citation
Alaqeeli, O., Xing, L., Zhang, X. (2021). Software benchmark—Classification tree algorithms for cell atlases annotation using single-cell RNA-sequencing data. Microbiology Research, 12, 317-334. https://doi.org/10.3390/microbiolres12020022