Stanley Yong

M.S. in Statistics, University of Illinois at Urbana Champaign, 2004.
B.S. and M.S. in Computer Science, University of Illinois at Urbana Champaign, 2004.

Research Officer,
Natural Language Synergy Lab,
Institute for Infocomm Research (I2R), Singapore

Graduate Student (PhD track)
National University of Singapore, Singapore

Email: geekdom@gmail.com
Tel: (+65) 6874 8800 Fax: (+65) 6775 5014
Address: 21 Heng Mui Keng Terrace, Singapore, 119613

My C.V.

PDF resume


Research Interests:

Information extraction, Paraphrasing, Statistical learning

 

 

 


Past and Current Projects:

  • BOOTStrep - Bootstrapping Of Ontologies and Terminologies STrategic REsearch Project
    BOOTStrep is a Specific Targeted Research Project (STREP) of the European Union's 6th Framework Programme, Thematic Priority 2 (Information Society Technologies) within the fourth call of the programme. It addresses the strategic objective "Semantic-based Knowledge and Content Systems".

Utilities:

Text Summarization Tool:

  • About
    • This is an automatic text summarization application, it extracts the most pertinent sentences from a given article and attempts to minimize redundancy. There are two main ways to use the application, either via a simple cut and paste dialog or if you wish, by loading a text file with the ".txt" extension.
    • A typical use case might involve speed reading of a new technical article without an abstract. After launching the application, you will be presented with a window that is made up of four main elements. At the very top are three tabs, labeled "Ad hoc text", "Text from Files" and "Results" in that order. Below the tabs, one will find a white text box, where the article to be summarized may be pasted into. Under the white box are three buttons, "Demo", "Summarize" and "Clear". These are pretty self explanatory. Finally, one comes to the bottom of the window, where there is a sliding scale. This slider determines the number of sentences that are going to be extracted from the article. Just click on the little trapezoid and drag to the required number, release your mouse button and click on "Summarize" to begin the process. Once the summary is created, the application will automatically display the results and switch the view to the third tab, labeled "Results".
    • The "Demo" button summarizes Lincoln's Gettysburg address into two lines.
    • The summarization uses a matrix decomposition to determine the best clusters of sentences from the article and chooses the best representative from each cluster to include in the summary. The choice is biased towards earlier sentences and sentences of intermediate length, but otherwise, the choice is solely due to the authority of the sentence given the terms it contains (similar to the SALSA algorithm).

Downloads:

Screenshot:

screenshot

Acknowledgments

  • This tool requires JVM 1.5 and above to run. The underlying summarization engine arose from a project done for the Text Processing on the Web Class by Dr Kan Min Yen in the Spring of 2007. The GUI was implemented to ease the use of the tool.
  • The libraries used in the application are included in the distribution. We made use of the excellent Lucene Information Retrieval SDK from the Apache process, and a modified version of the Java Matrix Package (JAMA) available from the NIST at Jama.

 


Personal:


 

 

 


The material published on this Web page is personal, and is not endorsed by or the responsibility of Institute for Infocomm Research.
Last Updated on 12-Aug-2007.

counter