About this item:

1,165 Views | 752 Downloads

Author Notes:

E-mail Address : zpjohns@emory.edu E-mail Address : steven.bosinger@emory.edu

RBN conceived and organized the overall project. RBN, JAY, SLS, SEB, ZPJ, BF and HSF supervised the project; AVZ supervised the development of MaSuRCA, performed the contig and scaffold assemblies and submitted the assembly to NCBI; ASC contributed to the annotation and expression analysis; MDM and XZ contributed to the chromosome assembly and annotation; DTM and KW performed Ion Torrent sequencing; RMG contributed to the annotation analyses; SP contributed to the annotation; SEB performed RNA-seq analysis; ZPJ performed the acute stress experiment and analyzed the associated expression data; GKT performed RNA-seq analysis; GM and MR contributed to MaSuRCA development; BF supervised mRNA extraction and Illumina sequencing; HSF supervised mRNA extractions; TT performed the reference-guided transcriptome assembly; SLS provided advice on bioinformatics issues; JAY supervised contig and scaffold assembly; RBN supervised chromosome assembly, annotation, and RNA-seq analysis. All authors read and approved the manuscript.

We appreciate the helpful comments of Drs. Dave O’Connor and Roger Wiseman on the rhesus MHC. We thank Jerilyn Pecotte of the Southwest National Primate Research Center for genomic DNA from the reference rhesus macaque.

We appreciate the help of John Letaw, at the Oregon National Primate Research Center, and Tade Souaiaia, at the University of Southern California, for reviewing and suggesting improvements to the annotation file.

At UNMC, we thank Brenda Morsey and Katy Emanuel for technical assistance with RNA isolation and Alok Dhar at the DNA Sequencing Core facility for library preparation and Illumina sequencing.

The authors declare that they have no competing interests.

Subjects:

Research Funding:

This work was supported by National Institute of Health grant R24RR017444 (RN), P51OD011092 (BF) and P51 OD011132 (ZJ).

Keywords:

  • Macaca mulatta
  • Rhesus macaque
  • Genome
  • Assembly
  • Annotation
  • Transcriptome
  • Next-generation sequencing

A new rhesus macaque assembly and annotation for next-generation sequencing analyses

Show all authors Show less authors

Journal Title:

Biology Direct

Volume:

Volume 9, Number 1

Publisher:

, Pages 20-20

Type of Work:

Article | Final Publisher PDF

Abstract:

Background The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. Results We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. Conclusions The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. Reviewers This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

Copyright information:

© 2014 Zimin et al.

This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits distribution, public display, and publicly performance, distribution of derivative works, making multiple copies, provided the original work is properly cited. This license requires credit be given to copyright holder and/or author, copyright and license notices be kept intact.

Creative Commons License

Export to EndNote