|
|
BIO
Project
-
Kernel Engineering on Parse Trees (Current)
We have verified the effectiveness of tree sequence structure in translation equivalence modeling in our ACL2009 paper.
In this work, we verify the tree sequence based features for more NLP applications.
We propose tree sequence based kernels, which can additionally capture the structure of a subtree sequence,
both contiguous and non-contiguous, other than the single subtree features explored by traditional tree kernels.
This study tends to bring novel views of structure features in NLP.
For more details about it, see Tree Sequence Kernel for Natural Language.
and my doctoral dissertation (coming soon).
-
Syntactic Structure Alignment for SMT (April. 2009 - Dec. 2010)
Most of current work in SMT obtains Translational Equivalences by initially conducting word alignment on the plain parallel corpus
and extracting the Translational Equivalences which are consistent with the word alignment. Therefore, a decent word alignment is required as a prerequisite.
Such pipeline approach to get Translational Equivalences is argued to be vulnerable to the errors from the initial stage of word alignment.
Currently, researchers address this problem by mainly focusing on how to improve word alignment.
Alternatively, we attempt to directly conduct syntactic structure alignment to obtain the syntactic Translational Equivalences.
For more details about it, see Exploring Syntactic Structural Features for Sub-Tree Alignment using Bilingual Tree Kernels.
and Discriminative Induction of Sub-Tree Alignment using Limited Labeled Data..
-
Pisces decoder (August. 2007 - March. 2009)
We proposed a series of Synchronous Grammars (STSG, STSSG, SncTSSG) based decoder and implement in the framework of Pisces.
For more details about it, see A Tree-to-Tree Alignment-based Model for Statistical Machine Translation
and A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation.
-
IWSLT2007 (May. 2007 - July. 2007)
During my internship at Institute for Infocomm Research (I2R) in 2007, I contributed to I2R's effort in the IWSLT-2007 competition,
which produced the 1st place in the Chinese-English task out of 15 participants, and won the second position by more than 3 Bleu-score.
For more details about it, see I2R Chinese-English Translation System for IWSLT 2007.
Publication
-
Jun Sun, Min Zhang, Chew Lim Tan.
Tree Sequence Kernel for Natural Language.
In proceedings of AAAI 2011.
-
Hua Ye, Atreyi Kankanhalli, Jun Sun.
Investigating Value Co-Creation in Innovation of IT-enabled Services:An Empirical Study of Mobile Data Services.
(RIP). International Conference on Information Systems(ICIS) 2011.
-
Jun Sun, Min Zhang, Chew Lim Tan.
Discriminative Induction of Sub-Tree Alignment using Limited Labeled Data.
In proceedings of COLING 2010.
-
Jun Sun, Min Zhang, Chew Lim Tan.
Exploring Syntactic Structural Features for Sub-Tree Alignment using Bilingual Tree Kernels.
In proceedings of ACL 2010.
-
Jun Sun, Min Zhang, Chew Lim Tan.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation.
In proceedings of
ACL-IJCNLP 2009.[slides]
-
Boxing Chen, Jun Sun, Hongfei Jiang, Min
Zhang, Ai Ti Aw.
I2R Chinese-English Translation System for IWSLT 2007.
IWSLT 2007.
-
Min Zhang, Hongfei Jiang, Ai Ti Aw, Jun Sun, Sheng Li, Chew Lim Tan.
A Tree-to-Tree Alignment based Model for Statistical Machine Translation.
In proceedings of
MT-Summit 2007.
Professional Activities
- PC member
- Secondary Reviewer
- Conferences
- 2009: ACL-IJCNLP, EMNLP, SIGIR
- 2010: ACL, NAACL-HLT
- Membership
Misc
- I am quite interested in the following topics:
- Nonparametric Bayesian Methods in NLP
- If you want to play with Nonparametric Bayesian methods,
you should not miss Nonparametric Bayesian Models of Lexical Acquisition from Sharon Goldwater,
which is the only doctoral thesis I can find to apply Nonparametric Bayesian for NLP tasks.
The context's thorough in Chapter 2&3 for basic topics in Nonparametric Bayesian.
However, it is not quite easy to follow if you are not familiar with sampling and stochastic processes.
To get an intuitive idea that how an NLPer builds a Nonparametric Bayesian estimator from the scratch, esp. for those fresh,
I strongly recommend Bayesian Inference with Tears
from Prof. Knight.
This tutorial applies Chinese Restaurant Process to STSG induction, POS tagging and Chinese Word segmentation without any
frustrated integrations.
However, Prof. Knight only helps us to understand Nonparametric Bayesian with easy math.
Whenever you want to play with it and build your own Nonparametric Bayesian estimators, you still need rigid math deduction accordingly.
Then you have to ask Prof. Resnik for help.
His tutorial on Gibbs Sampling will provided you instructive ideas of how to use sampling to estimate expectation.
Finally, don't forget to come back to Sharon Goldwater's Ph.D. thesis
and her reading list of Nonparametric Bayesian for more applications.
- Hierarchical Bayesian Models
- For those who are new to Hierarchical Bayesian models, Latent Dirichlet Allocation is a decent starting point, which you will agree
with me after reading Parameter estimation for text analysis.
- Useful links:
- Friends
|
Last modified @ Jan 18 12:40 2012 |
|