Harnessing community intelligence in collaborative curation of
human long non-coding RNAs
Lina Ma, Ang Li, Dong Zou and Zhang Zhang
Beijing Institute of Genomics, China
Long non-coding RNAs (lncRNAs) have been found to perform various functions
in a wide variety of important biological processes and are highly
correlated with human diseases. Taking advantage of the rapid progress
in sequencing technologies, there has been tens of thousands of lncRNAs.
Several databases have been accordingly built to manage such huge amounts
of human lncRNAs. However, human-related lncRNA databases, viz., GENCODE,
NONCODE, LNCipedia, with each containing more than 20,000 human lncRNA
transcripts, only have about 5,500 human lncRNAs in common. Most
importantly, there is no standard in lncRNA naming or classification,
posing great difficulties in downstream bioinformatic analysis.
Additionally, traditional databases are dependent primarily on expert
curation, thus making it laborious and time-consuming to curate the
exponentially accumulated lncRNAs and accordingly requiring collective
intelligence in community curation of lncRNAs. To address these issues,
here we integrate human lncRNA sequences from GENCODE, NONCODE, LNCipedia,
and lncRNAdb, and obtain ~53,000 non-redundant lncRNA transcripts. We
then classify these lncRNAs into 9 groups based on their genomic location
and context. We finally build a wiki-based database for human lncRNAs,
aiming to harness collective intelligence to collect, edit, and annotate
information about human lncRNAs. This database is a publicly editable and
open-content platform for community curation of human lncRNAs, bearing
the potential to become a comprehensive and up-to-date knowledge base
for human lncRNAs.