scAnnotate: An Automated Cell Type Annotation Tool for Single-cell RNA-Sequencing Data

dc.contributor.authorJi, Xiangling
dc.contributor.supervisorZhang, Xuekui
dc.contributor.supervisorTsao, Min
dc.date.accessioned2022-08-11T16:39:53Z
dc.date.copyright2022en_US
dc.date.issued2022-08-11
dc.degree.departmentDepartment of Mathematics and Statisticsen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractSingle-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis often is to distinguish cell types so that they can be investigated separately. Researchers have recently developed several automated cell type annotation tools based on supervised machine learning algorithms, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data which is widely utilized in differential expression analysis but not by existing cell annotation methods. We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene’s marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using fourteen real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods, and that it accurately annotates cells when training and test data are (1) similar, (2) cross-platform, and (3) cross-species. Of the cells that are incorrectly annotated by scAnnotate, we find that a majority are different from those of other methods.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/14093
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectcell type annotationen_US
dc.subjectsingle-cell RNA-sequencingen_US
dc.subjectgene expressionen_US
dc.titlescAnnotate: An Automated Cell Type Annotation Tool for Single-cell RNA-Sequencing Dataen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ji_Xiangling_MSc_2022.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: