Recent segmental and gene duplications in the mouse genome
Date
2003-07-09
Authors
Cheung, Jospeh
Wilson, Michael D
Zhang, Junjun
Khaja, Razi
MacDonald, Jeffrey R
Heng, Henry H Q
Koop, Ben F
Scherer, Stephen W
Journal Title
Journal ISSN
Volume Title
Publisher
BioMed
Abstract
Background: The high quality of the mouse genome draft sequence and its associated annotations
are an invaluable biological resource. Identifying recent duplications in the mouse genome,
especially in regions containing genes, may highlight important events in recent murine evolution.
In addition, detecting recent sequence duplications can reveal potentially problematic regions of the
genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and
recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we
present a database of recently duplicated regions of the mouse genome found in the mouse genome
sequencing consortium (MGSC) February 2002 and February 2003 assemblies.
Results: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003
mouse genome sequence assembly is involved in recent segmental duplications, which is less than
that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the
duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an
additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence
misassignment errors in this genome assembly. By searching for genes that are located within these
regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen
of these genes appear to have been duplicated independently in the human genome. From our
dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect
gene essential for embryogenesis in mice.
Conclusion: Our results provide an initial analysis of the recently duplicated sequence and gene
content of the mouse genome. Many of these duplicated loci, as well as regions identified to be
involved in potential sequence misassignment errors, will require further mapping and sequencing
to achieve accuracy. A Genome Browser database was set up to display the identified duplication
content presented in this work. This data will also be relevant to the growing number of
investigators who use the draft genome sequence for experimental design and analysis.
Description
BioMed Central
Keywords
Citation
Cheung et al. Recent segmental and gene duplications in the mouse genome. Genome Biology 2003, 4: R47