About this item:

109 Views | 114 Downloads

Author Notes:

Tel: +1 404 712 9576; Fax: +1 404 727 1370; Email: zhaohi.qin@emory.edu

We want to thank the executive editor, Dr Gary Benson and two anonymous reviewers for their constructive comments and suggestions which we found extremely helpful.

We thank Mr Yanjia Wang, Zhengyu Zhang, Qiushen Zhong, Min Wang and Ms Daojia Wu for coding and logistic help.

We thank Dr Jindan Yu for helpful discussion at the early stage of the project.

We thank the bioCADDIE team for support and valuable input.

Conflict of interest statement. None declared.

Subjects:

Research Funding:

Emory Integrated Computational Core (EICC), one of the Emory Integrated Core Facilities, which is subsidized by the Emory University School of Medicine and by the National Institutes of Health [UL1TR000454 to M.E.Z.]; Patient-Centered Outcomes Research Institute [ME-1310-07058 to X.J.]; National Institute of Health [R01HG008802, R01GM114612, R01GM118574, R01GM118609, R21LM012060, U01EB023685 to X.J.]; National Science Foundation [ACI 1443054, IIS 1350885 to F.W.]; National Institute of Health [P01GM085354 to Z.S.Q. and W.S.P.].

Funding for open access charge: Department fund.

Keywords:

  • Science & Technology
  • Life Sciences & Biomedicine
  • Biochemistry & Molecular Biology
  • BURROWS-WHEELER TRANSFORM
  • RNA-SEQ DATA
  • READ ALIGNMENT
  • GENOME
  • PROFILES
  • ARCHIVE
  • HUMANS

Omicseq: a web-based search engine for exploring omics datasets

Tools:

Journal Title:

Nucleic Acids Research

Volume:

Volume 45, Number W1

Publisher:

, Pages W445-W452

Type of Work:

Article | Final Publisher PDF

Abstract:

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve â € findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant.

Copyright information:

© The Author(s) 2017.Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access work distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/).

Creative Commons License

Export to EndNote