From interpretable penalized LDA models to end-to-end deep learning for cell type annotation in single-cell RNA-seq

Date

2025

Authors

Bai, Kailun

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This dissertation presents a systematic exploration of scalable, interpretable, and high-accuracy computational frameworks for automated cell type classification in single-cell RNA sequencing (scRNA-seq) data. Motivated by the increasing scale, dimensionality, and heterogeneity of modern scRNA-seq datasets, this work focuses on methods that balance statistical interpretability, computational efficiency, and predictive performance across diverse biological and technical settings. The research spans three major contributions, each addressing different trade-offs between simplicity, interpretability, and predictive power: 1. PCLDA (Penalized Component-wise Linear Discriminant Analysis) introduces a highly interpretable and statistically grounded annotation tool. 2. scSorterDL expands on this foundation by combining penalized LDA with ensemble learning and deep neural networks. 3. CellAnnotatorNet represents the culmination of this research by integrating a categorical autoencoder with the Swarm-pLDA framework into a unified, fully differentiable architecture. Together, these three contributions provide a progressive development from classical interpretable statistical models to fully integrated deep learning pipelines for large-scale single-cell analysis, offering a coherent and extensible framework for automated cell type annotation in single-cell genomics.

Description

Keywords

Single-cell RNA sequencing, Cell type annotation, Deep learning, Machine learning, Ensemble learning, Penalized linear discriminant analysis

Citation