Graph-Based Protein Function Prediction
Participants: Hon Nian Chua, Zhihui Li,
Guimei Liu, Wing-Kin Sung, Limsoon Wong
Background
Although sequence similarity search has been proven useful in many cases,
it has fundamental limitations. First, only a fraction of newly discovered
sequences have identifiable homologous genes in the current databases.
Second, the most prominent vertebrate organisms in GenBank have only a
fraction of their genomes present in finished sequences. New bioinformatics
methods allow inference of protein function using ``associative analysis’’
of functional properties to complement the traditional sequence
homology-based methods. Associative properties that have been used to
infer function not evident from sequence homology include: co-occurrence
of proteins in operons or genome context; proteins sharing common domains
in fusion proteins; proteins in the same pathway; proteins with correlated
gene expression patterns; etc.
In this project, we investigate and develop graph-based methods for
inferring protein functions without sequence homology. Most approaches
in predicting protein function from protein-protein interaction data
utilize the observation that a protein often share functions with
proteins that interacts with it (its level-1 neighbors). However,
proteins that interact with the same proteins (i.e. level-2 neighbors)
may also have a greater likelihood of sharing similar physical or
biochemical characteristics. We are interested to find out how
significant is functional association between level-2 neighbors and
how they can be exploited for protein function prediction. We will also
investigate how to integrate protein interaction information with other
types of information to improve the sensitivity and specificity of
protein function prediction, especially in the absence of sequence homology.
Objectives
In this project, we investigate and develop graph-based methods for
inferring protein functions without sequence homology. In particular,
- We find out how significant functional association between
level-2 neighbors is. For example, what proportion of proteins
has no functional association with their immediate neighbors
but have functional association with their level-2 neighbors?
- We investigate how they can be exploited for protein function
prediction in a graph-based framework. For example, how well
level-2 neighbors are used for function prediction in simple
methods like majority voting? How much further improvement can
be made in more sophisticated methods that take into account
reliability information of protein interactions or protein
function annotations?
- We investigate how to integrate protein interaction
information with other types of information to improve the
sensitivity and specificity of protein function prediction,
in a graph-based framework, especially in the absence of sequence
homology. For example, how does reliability information of protein
interaction help? How does knowledge of proteins being co-localized
help? How does knowledge of frequency of co-occurrence of proteins
in scientific literature help? How to incorporate these types of
information?
At the end of the project, we expect to have developed a robust and
powerful system to predict protein functions, even in the absence of
sequence homology.
Selected Publications
- Hon Nian Chua and Wing-Kin Sung.
A better gap penalty for pairwise SVM.
Proceedings of 3rd Asia-Pacific Bioinformatics Conference,
Singapore, pages 11-20, 17-21 January, 2005.
PDF
- Hon Nian Chua, Wing-Kin Sung, and Limsoon Wong.
Exploiting indirect neighbours and topological weight to
predict protein function from protein-protein interactions.
Bioinformatics, 22:1623-1630, 2006.
PDF,
FSWeight V1.0 Software
- Kang Ning, Hon Nian Chua.
Automated Identification of Protein Classification and
Detection of Annotation Errors in Protein Databases Using
Statistical Approaches.
LNBI 3886: Proceedings of PAKDD
2006 Workshop on Knowledge Discovery in Life Science
Literature (KDLL2006),
pages 123--138, Singapore, April 2006.
- Jin Chen, Hon Nian Chua, Wynne Hsu, Mong-Li Lee, See-Kiong Ng,
Rintaro Saito, Wing-Kin Sung, Limsoon Wong.
Increasing Confidence of Protein-Protein Interactomes.
Proceedings of 17th International Conference on Genome Informatics (GIW),
pages 284--297, Yokohama, Japan, 18-20 December 2006. (invited keynote paper)
PDF,
FSWeight V2.1 Software
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
Using Indirect Protein Interactions for the Prediction of
Gene Ontology Functions.
BMC Bioinformatics, 8(Suppl 4):S8, May 2007.
PDF,
FSWeight V2.1 Software
- Hon Nian Chua, Kang Ning, Wing-Kin Sung, Hon Wai Leong, Limsoon Wong.
Using Indirect Protein-Protein Interactions for Protein Complex
Prediction.
Proceedings of 6th Annual International Conference on
Computational Systems Bioinformatics (CSB),
pages 97--110, San Diego, California, August 2007.
PDF,
PCP V1.0 Software
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
An efficient strategy for extensive integration of diverse biological
data for protein function prediction.
Bioinformatics, 23(24):3364-3373, December 2007.
PDF,
Supplementary Info,
FSWeight V2.2 Software
- Hon Nian Chua, Kang Ning, Wing-Kin Sung, Hon Wai Leong, Limsoon Wong.
Using Indirect Protein-Protein Interactions
for Protein Complex Prediction.
Journal of Bioinformatics and Computational Biology,
6(3):435--466, June 2008.
PDF,
PCP V1.0 Software
- Hon Nian Chua, Limsoon Wong.
Increasing the Reliability of Protein Interactomes.
Drug Discovery Today, 13(15/16):652--658, August 2008.
PDF
- Guimei Liu, Jinyan Li, Limsoon Wong.
Assessing and Predicting Protein Interactions Using Both Local and
Global Network Topological Metrics.
Proceedings of 19th International Conference on Genome Informatics (GIW),
pages ???--???, Gold Coast, Australia, 3 December 2008.
PDF,
PPT
Dissertations
- Hon Nian Chua,
Graph-based methods for protein function prediction.
PhD thesis, Graduate School Integrative Sciences and Engineering,
National University of Singapore, Singapore, 2007.
- Zhihui Li,
Pubmed Abstract Processing for Protein Function Prediction.
Honours Year Project Report, Faculty of Science,
National University of Singapore, Singapore, 2008.
Selected Presentations
- Hon Nian Chua.
Function Prediction from Protein Interactions.
Invited talk at I2R-SOC Joint Lab Seminars.
NUS SOC, 16 August 2005.
- Limsoon Wong.
Protein Function Prediction From Protein Interactions.
Invited talk at 1st International Symposium on Languages
in Biology and Medicine,
KAIST, Daejon, Korea, 24-26 November 2005.
- Limsoon Wong.
Protein Function Prediction From Protein Interactions.
Invited talk at "Figuring Out Life: NUS-Karolinska
Joint Symposium on Application of Mathematics in Biomedicine",
Institute for Mathematical Sciences, Singapore, 28-29 November 2005.
PPT
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
Exploiting Indirect Neighbours and Topological Weight to
Predict Protein Function from Protein-Protein Interactions.
Invited keynote at BioDM2006, Singapore, 9 April 2006.
Proc. PAKDD 2006 Workshop on Data Mining for Biomedical
Applications (BioDM2006), Singapore,
9 April 2006, page 1.
PPT
- Limsoon Wong.
Guilt by Association of Common Interaction Partners.
Invited talk at IMS Workshop on BioAlgorithmics,
Institute for Mathematical Sciences, Singapore,
12-14 July 2006.
PPT
- Hon Nian Chua.
A Graph-Based Approach to Inferring Protein Function From
Heterogeneous Data Sources.
Invited talk at IMS Workshop on BioAlgorithmics,
Institute for Mathematical Sciences, Singapore,
12-14 July 2006.
- Hon Nian Chua.
Guilt by Indirect Functional Association.
Plenary talk at Annual Meeting on Automated Function Prediction
(AFP2006), San Diego, CA,
30 August - 1 September 2006.
- Limsoon Wong.
Guilt by Association: A Tutorial on Protein Function Inference.
Tutorial at 5th Asia-Pacific Bioinformatics Conference (APBC2007),
Hong Kong, 15-17 January 2007.
PPT.
- Limsoon Wong.
Protein Function Inference Enhanced by Text Mining.
Invited talk at Forum on Advanced NLP and Text Mining (T-FaNT),
Tokyo, Japan, 11-13 March 2007.
PPT.
- Limsoon Wong.
Guilt by Association: A Tutorial on Data Mining Techniques for
Protein Function Inference.
Tutorial at 11th Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD 2007),
Nanjing, China, 22-25 May 2007.
- Limsoon Wong.
Constructing More Reliable Protein-Protein Interaction Maps.
Invited talk at Bioinformatica Indica '08:
International Symposium on Computational Biology & Bioinformatics,
University of Kerala, 17-19 January 2008.
PDF,
PPT
- Limsoon Wong.
Two Applications of Text Mining in Bioinformatics:
Enhancing Protein Function Prediction and
Enhancing Drug Pathway Inference.
Invited talk at 7th Korea-Singapore Workshop on Bioinformatics & NLP,
Seoul, Korea, 15 February 2008.
PPT
- Limsoon Wong.
Guilt by Association.
Invited keynote at 1st Japan-Taiwan Young Researchers Conference on
Computational and Systems Biology,
Hsinchu, Taiwan, 9-11 March 2008.
- Limsoon Wong.
Guilt by Association as a Search Principle.
Invited keynote at 31st Annual International ACM SIGIR Conference,
Singapore, 20-24 July 2008.
PPT
- Limsoon Wong.
Guilt by Association: A Tutorial on Data Mining Techniques
for Protein Function Inference.
Invited tutorial at IPM-NUS Workshop on Analysis and Application of
Protein Interaction Networks,
Shahid Behesti University, Tehran, 17-18 November 2008.
PPT
- Limsoon Wong.
Increasing Confidence of Protein-Protein Interactomes.
Invited talk at IPM-NUS Workshop on Analysis and Application
of Protein Interaction Networks,
Shahid Beheshti University,
Tehran, Iran, 17-18 November 2008.
PPT
- Limsoon Wong.
Identifying Protein Complexes from Protein Interactome Maps.
Invited talk at IPM-NUS Workshop on Analysis and Application
of Protein Interaction Networks,
Shahid Beheshti University,
Tehran, Iran, 17-18 November 2008.
PPT
- Limsoon Wong.
Guilt by Association of Common Interaction Partners.
Invited talk at IPM-NUS Workshop on Analysis and Application
of Protein Interaction Networks,
Shahid Beheshti University,
Tehran, Iran, 17-18 November 2008.
PPT
- Guimei Liu.
An Iterative Approach to Weighting and Expanding Protein Interaction
Networks and its Impact on Complex Discovery.
Invited talk at IMS Workshop on Computational Systems
Biology Approaches to Analysis of Genome Complexity and
Regulatory Gene Networks,
Institute for Mathematical Sciences, NUS, Singapore,
20-25 November 2008.
PPT
- Limsoon Wong.
"Guilt by Association" as a Search Principle.
Invited keynote at BioSearch08: HCSNet Next-Generation Search Workshop
on Search in Biomedical Information,
Queensland University of Technology, Brisbane, Australia, 30 November 2008.
- Limsoon Wong.
Identifying Protein Complexes from Protein Interactome Maps.
Invited talk at Joint 5th Structural Biology & Functional Genomics and
1st Biological Physics International Conference,
University of Cultural Centre, NUS,
Singapore, 9-11 December 2008.
- Limsoon Wong.
Identifying Protein Complexes from Protein Interactome Maps.
Invited keynote at IEEE International Workshop on Data Mining
and Artificial Intelligence (DMAI 2008),
Khulna, Bangladesh, 25-27 December 2008.
PPT
Acknowledgements
This project is supported in part by
a A*STAR AGS scholarship (Chua: 8/03 - 7/07), the
I2R-SOC Joint Lab on Knowledge Discovery
from Clinical Data (Liu, Sung, Wong: 7/03 - 6/07), and a
URC grant R-252-000-274-112 (Liu, Sung, Wong: 10/06 - 9/09).
Last updated: 23/11/08, Limsoon Wong.