An analysis of salmonid RNA sequences and implications for salmonid evolution




Brown, Gordon David

Journal Title

Journal ISSN

Volume Title



This work addresses two areas of computational biology: automation of sequence processing and an assessment of the evidence for a hypothesized salmonid genome based on an analysis of a set of expressed sequence tags. Three problem areas in sequence processing are addressed in the first half of the work. Chapter 3 describes an accurate technique for trimming of vector, adapter and poly(A) sequence. Chapter 4 suggests methods for verifying the accuracy of assembled mRNA transcripts despite a large number of chimeras in the cDNA clone libraries. Chapter 5 is concerned with the problem of estimating the number of transcripts in a tissue or cDNA library, concluding that computational and statistical techniques are inadequate to estimate the quantity accurately. The hypothesized salmonid genome duplication has been widely accepted since 1984. If it occurred, it should have left evidence in the form of many paralogous pairs of genes, all at approximately the same degree of sequence divergence. To assess this question, several hundred thousand ESTs were assembled into transcripts, compared to each other to find homologs, and the evolutionary distances of the homologs represented as a histogram. Evidence of a single evolutionary event was not seen. The same procedure was applied to Xenopus laevis, which has a well-established recent genome duplication, and Danio rerio, which is known not to have had one. In those cases, the evidence for or against a genome duplication appeared exactly as predicted. The conclusion is that if the salmonid genome duplication occurred, some force altered its evolutionary development subsequently to mask the duplication, but also that a genome duplication is not necessary to explain the observed pattern of homolog distances.



salmonids, evolution, RNA analysis, genome duplication, gene duplication