Other-Anaphora Resolution in Biomedical Domain

This work is started as my Honours Year Project in SOC, under joint supervision by A/P Tan Chew Lim and Dr. Su Jian.

This work aims at resolve the part-whole/member-collection relation anaphors in bio-medical texts. It utilizes different knowledges such as syntactic, sematic and web-based.

To illustrate, a typical case could be :

The kappa B sequence (GGGACTTTCC) binds a factor, NF-kappa B, that is constitutively found in its functional, DNA binding form only in B lymphocytes. A factor with apparently indistinguishable sequence specificity can be induced in many other cell types, where it is used to regulate inducible gene expression.

In the above two sentences, the red text is the anaphora while its correct antecedent is the orange text. This example is the member-collection (or subset-set) relation. Its potential usage will be ontology building and exploring bridging anaphor resolution.

During the HYP period, I have implemented a system using sytactic and manually selected patterns. The report is available here and the poster for presentation is here.

In the first year of my graduate study, I have extended the system with automatically mined patterns. This part is published in COLING 2008 as a full paper. The paper is here. Presentation slides are here.

After Coling, I further extened the system with pattern normalization and pattern pruning. This part of work is submitted to Journal of Biomedical Infomatics special issue. It is still under review.