From interpretable penalized LDA models to end-to-end deep learning for cell type annotation in single-cell RNA-seq

dc.contributor.authorBai, Kailun
dc.contributor.supervisorZhang, Xuekui
dc.contributor.supervisorShao, Xiaojian
dc.date.accessioned2026-01-12T21:34:34Z
dc.date.available2026-01-12T21:34:34Z
dc.date.issued2025
dc.degree.departmentDepartment of Mathematics and Statistics
dc.degree.levelDoctor of Philosophy PhD
dc.description.abstractThis dissertation presents a systematic exploration of scalable, interpretable, and high-accuracy computational frameworks for automated cell type classification in single-cell RNA sequencing (scRNA-seq) data. Motivated by the increasing scale, dimensionality, and heterogeneity of modern scRNA-seq datasets, this work focuses on methods that balance statistical interpretability, computational efficiency, and predictive performance across diverse biological and technical settings. The research spans three major contributions, each addressing different trade-offs between simplicity, interpretability, and predictive power: 1. PCLDA (Penalized Component-wise Linear Discriminant Analysis) introduces a highly interpretable and statistically grounded annotation tool. 2. scSorterDL expands on this foundation by combining penalized LDA with ensemble learning and deep neural networks. 3. CellAnnotatorNet represents the culmination of this research by integrating a categorical autoencoder with the Swarm-pLDA framework into a unified, fully differentiable architecture. Together, these three contributions provide a progressive development from classical interpretable statistical models to fully integrated deep learning pipelines for large-scale single-cell analysis, offering a coherent and extensible framework for automated cell type annotation in single-cell genomics.
dc.description.scholarlevelGraduate
dc.identifier.bibliographicCitationBai, K., Moa, B., Shao, X., & Zhang, X. (2025). PCLDA: An interpretable cell annotation tool for single-cell RNA sequencing data based on simple statistical methods. Computational and Structural Biotechnology Journal, 27, 3264–3274. https://doi.org/10.1016/j.csbj.2025.07.019
dc.identifier.bibliographicCitationBai, K., Moa, B., Shao, X., & Zhang, X. (2025). scSorterDL: A deep neural network-enhanced ensemble LDA framework for single-cell classification. Briefings in Bioinformatics, 26(5), bbaf446. https://doi.org/10.1093/bib/bbaf446
dc.identifier.urihttps://hdl.handle.net/1828/23063
dc.languageEnglisheng
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.subjectSingle-cell RNA sequencing
dc.subjectCell type annotation
dc.subjectDeep learning
dc.subjectMachine learning
dc.subjectEnsemble learning
dc.subjectPenalized linear discriminant analysis
dc.titleFrom interpretable penalized LDA models to end-to-end deep learning for cell type annotation in single-cell RNA-seq
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bai_Kailun_PhD_2025.pdf
Size:
16.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: