Identification and annotation of full-length genes in Atlantic salmon (Salmo salar)




Leong, Jong S.

Journal Title

Journal ISSN

Volume Title



Large-scale expressed sequence tags (ESTs) in Atlantic salmon (Salmo salar) are examined to answer questions regarding salmonid transcriptomes. ESTs represent raw and incomplete gene sequences that need to be read, assembled and analyzed with computer software. The goal of this thesis was to develop an automatically curated and publicly accessible set of annotated full-length genes, representing a near-complete transcript set for Salmo salar. In turn, these genes provide the framework for studies in gene expression, conservation, and molecular evolution. The work presented here also touches on the results of a molecular evolution study, as an example of how full-length gene identification can be used to answer biological questions. Previous to this study, a limited number of Atlantic salmon cDNA libraries and ESTs were known. To further the goal of determining complete gene sequences, highly enriched full-length cDNA libraries and full-length libraries were created and sequenced, resulting in the ability to identify a large number of full-length reference genes. Together, all libraries represent a diverse pool of transcriptome sequences for Salmo salar. The goal of producing an accurate large-scale full-length gene set on a duplicated genome is not trivial. Complete systems for this objective do not readily exist. EST sequencing, EST assembly, and data storage, are just a few of the initial computational issues that are addressed. Once these issues are resolved, the multi-step workflow of full-length gene determination is described. The final challenge involving the development of a concise and universally accessible system for visualization is discussed. The resulting computational framework that has been developed is shown to be able to handle the intricacies and the size of a duplicated salmonid genome. It has been largely accepted that Atlantic salmon have undergone a recent genome duplication. Gene paralogs provide one source of evidence for this event. Analysis of paralogs revealed signatures of asymmetric evolution possibly due to relaxation of selective pressure. This thesis provides a complete Bioinformatics analysis pipeline to analyze and to visualize a set of full-length reference genes for Atlantic salmon. Using full-length genes as a framework, the topic of molecular evolution was addressed to show evidence of asymmetrical evolution among gene duplicates. The full-length reference genes, along with ESTs and all putative transcripts, have been made publicly available. These results serve as a valuable genomic resource for next-generation sequencing and for all other salmonid research endeavours.



sequence tags, gene, cDNA