Harnessing community intelligence in collaborative curation of human long non-coding RNAs

Lina Ma, Ang Li, Dong Zou and Zhang Zhang

Beijing Institute of Genomics, China

Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes and are highly correlated with human diseases. Taking advantage of the rapid progress in sequencing technologies, there has been tens of thousands of lncRNAs. Several databases have been accordingly built to manage such huge amounts of human lncRNAs. However, human-related lncRNA databases, viz., GENCODE, NONCODE, LNCipedia, with each containing more than 20,000 human lncRNA transcripts, only have about 5,500 human lncRNAs in common. Most importantly, there is no standard in lncRNA naming or classification, posing great difficulties in downstream bioinformatic analysis. Additionally, traditional databases are dependent primarily on expert curation, thus making it laborious and time-consuming to curate the exponentially accumulated lncRNAs and accordingly requiring collective intelligence in community curation of lncRNAs. To address these issues, here we integrate human lncRNA sequences from GENCODE, NONCODE, LNCipedia, and lncRNAdb, and obtain ~53,000 non-redundant lncRNA transcripts. We then classify these lncRNAs into 9 groups based on their genomic location and context. We finally build a wiki-based database for human lncRNAs, aiming to harness collective intelligence to collect, edit, and annotate information about human lncRNAs. This database is a publicly editable and open-content platform for community curation of human lncRNAs, bearing the potential to become a comprehensive and up-to-date knowledge base for human lncRNAs.