From interpretable penalized LDA models to end-to-end deep learning for cell type annotation in single-cell RNA-seq
| dc.contributor.author | Bai, Kailun | |
| dc.contributor.supervisor | Zhang, Xuekui | |
| dc.contributor.supervisor | Shao, Xiaojian | |
| dc.date.accessioned | 2026-01-12T21:34:34Z | |
| dc.date.available | 2026-01-12T21:34:34Z | |
| dc.date.issued | 2025 | |
| dc.degree.department | Department of Mathematics and Statistics | |
| dc.degree.level | Doctor of Philosophy PhD | |
| dc.description.abstract | This dissertation presents a systematic exploration of scalable, interpretable, and high-accuracy computational frameworks for automated cell type classification in single-cell RNA sequencing (scRNA-seq) data. Motivated by the increasing scale, dimensionality, and heterogeneity of modern scRNA-seq datasets, this work focuses on methods that balance statistical interpretability, computational efficiency, and predictive performance across diverse biological and technical settings. The research spans three major contributions, each addressing different trade-offs between simplicity, interpretability, and predictive power: 1. PCLDA (Penalized Component-wise Linear Discriminant Analysis) introduces a highly interpretable and statistically grounded annotation tool. 2. scSorterDL expands on this foundation by combining penalized LDA with ensemble learning and deep neural networks. 3. CellAnnotatorNet represents the culmination of this research by integrating a categorical autoencoder with the Swarm-pLDA framework into a unified, fully differentiable architecture. Together, these three contributions provide a progressive development from classical interpretable statistical models to fully integrated deep learning pipelines for large-scale single-cell analysis, offering a coherent and extensible framework for automated cell type annotation in single-cell genomics. | |
| dc.description.scholarlevel | Graduate | |
| dc.identifier.bibliographicCitation | Bai, K., Moa, B., Shao, X., & Zhang, X. (2025). PCLDA: An interpretable cell annotation tool for single-cell RNA sequencing data based on simple statistical methods. Computational and Structural Biotechnology Journal, 27, 3264–3274. https://doi.org/10.1016/j.csbj.2025.07.019 | |
| dc.identifier.bibliographicCitation | Bai, K., Moa, B., Shao, X., & Zhang, X. (2025). scSorterDL: A deep neural network-enhanced ensemble LDA framework for single-cell classification. Briefings in Bioinformatics, 26(5), bbaf446. https://doi.org/10.1093/bib/bbaf446 | |
| dc.identifier.uri | https://hdl.handle.net/1828/23063 | |
| dc.language | English | eng |
| dc.language.iso | en | |
| dc.rights | Available to the World Wide Web | |
| dc.subject | Single-cell RNA sequencing | |
| dc.subject | Cell type annotation | |
| dc.subject | Deep learning | |
| dc.subject | Machine learning | |
| dc.subject | Ensemble learning | |
| dc.subject | Penalized linear discriminant analysis | |
| dc.title | From interpretable penalized LDA models to end-to-end deep learning for cell type annotation in single-cell RNA-seq | |
| dc.type | Thesis |