What goes around, comes around ....

Beng Chin OOI

Chair Professor
Department of Computer Science
School of Computing
National University of Singapore
Computing 1, Computing Drive, Singapore 117417

ooibc AT comp.nus.edu.sg
Tel: +65-6516 6465
Office: COM1, #03-22


Courses    Professional Activities    Projects     Publications     Source Codes     Research Students     Blogs/blog
Beng Chin is a Distinguished Professor of Computer Science, NGS faculty member and Director of Smart Systems Institute (SSI@NUS) at the National University of Singapore (NUS), and an adjunct Chang Jiang Professor at Zhejiang University, China. He obtained his BSc (1st Class Honors) and PhD from Monash University, Australia, in 1985 and 1989 respectively. He is a co-founder of yzBigData in 2012 for Big Data Management and analytics, and Shentilium Technologies in 2016 for AI- and data-driven Financial data analytics, and an advisory council member of a Fintech company, Cynopsis Solutions.

Bneg Chin's research interests include database, distributed processing, machine learning and large scale analytics, in the aspects of system architectures, performance issues, security, accuracy and correctness. He is also interested in exploiting IT in production and process reengineering (eg. JIT fabric printing, heathcare innovation , food analysis, and smart city.)

Beng Chin is a fellow of the ACM , IEEE, and Singapore National Academy of Science (SNAS).

He was the recipient of ACM SIGMOD 2009 Contributions award, a co-winner of the 2011 Singapore President's Science Award, the recipient of 2012 IEEE Computer Society Kanai award, 2013 NUS Outstanding Researcher Award, 2014 IEEE TCDE CSEE Impact Award, and 2016 China Computer Federation (CCF) Overseas Outstanding Contributions Award. He was a recipient of VLDB'14 Best Paper award.

Beng Chin has served as a PC member for international conferences such as ACM SIGMOD, VLDB, IEEE ICDE, WWW, and SIGKDD, and as Vice PC Chair for ICDE'00,04,06, PC co-Chair for SSD'93 and DASFAA'05, PC Chair for ACM SIGMOD'07, Core DB PC chair for VLDB'08, and PC co-Chair for IEEE ICDE'12 and IEEE Big Data'15. He is serving as a PC Chair for IEEE ICDE'18. He was an editor of VLDB Journal and IEEE Transactions on Knowledge and Data Engineering, Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (TKDE)(2009-2012), Elsevier's co-Editor-in-Chief of Journal of Big Data Research (2013-2015), and a co-chair of the ACM SIGMOD Jim Gray Best Thesis Award committee. He is serving as an editor of IEEE Transactions on Cloud Computing and Springer's Distributed and Parallel Databases. He is also serving as a Trustee Board Member and President of VLDB Endowment, and an Advisory Board Member of ACM SIGMOD.

Beng Chin's ongoing large system projects include:

  1. UStore(2015-): a distributed data storage system which has rich semantics and a set of properties that unifies and adds values to many classes of next generation applications. By keeping the core properties within the storage, UStore is designed to support fast development of forking-enabled applications, such as GIT-like versioning, Blockchain, Collaborative Analytics and OLTP with versioning, with reduced development effort, flexibility, high-level semantics and performance. It synthesizes various ideas from distributed systems and databases to support efficient forking and execution. Both macro and micro benchmarks show that it is faster by 2-3 orders of magnitude in terms of efficiency, and requires about 1000 lines of code to implement a major component of some applications mentioned. To benchmark blockchain systems and UStore, a blockchain benchmarking framework called BLOCKBENCH has been designed, and released as open source.
  2. SINGA(2014-): a distributed Deep Learning platform (indirectly funded by an ASTAR grant and NRF CRP). Apache SINGA is an Apache Incubator open source, distributed training platform for deep learning amd machine learning models, and is designed based on three principles, namely, usability, scalability and extensibility. For usability, we make the programming model of SINGA easy to follow. Specifically, users construct their models by based on Layers and Tensors, and SINGA's runtime takes care of (and is optimized for) the distributed execution and communication between nodes. Scalability is achieved by partitioning both the training data and the model, and distributing the training over multiple nodes. We make the code of SINGA modular and extensible to support different types of deep learning models, optimization algorithms and training frameworks, on both CPU and GPU clusters. Apache SINGA (incubating) v1.0 has been released. It has a Healthcare model zoo which contains deep learning models that have been used for healthcare research, and also facility for porting Caffe models onto SINGA. We now work towards AI-as-a-Service platform to enable exploration, feature selection, and model tuning and validation. Co-Space is an earlier system designed for supporting cross-domain retrieval that led to the development of SINGA.
  3. CIIDAA(2012-): a Comprehensive IT Infrastructure for Data-intensive Applications and Analysis is an CRP project funded by NRF (NRF-CRP8-2011-08) from 2013-2017. The main objective is to use cloud computing to address the Big Data problem. For specific applications, this approach has been shown to be effective, and systems such as Hadoop have become very popular. However, they have limitations (see ACM Computing survey paper on MapReduce based systems and IEEE TKDE Survey on in-memory systems), and are suitable only for a class of applications that have a structure amenable to fine-grain asynchronous parallelization. Furthermore, there remain many challenges in actually using cloud computing systems in practice, including issues of resource contention across multiple jobs being run concurrently. The aim of this project is to develop a platform for supporting real-time data integration and predictive real-time analytics in the area of web consumers (collaborating with Starhub) and healthcare (collaborating with NUH, National University Health System).
  4. epiC(2009-2013): an Elastic, Power-aware, data-Intensive Cloud platform, funded by an MOE grant (2010-2012). The objectives are to design and implement an efficient multi-tenancy cloud system for supporting high throughout low latency transactions and high performance reliable query processing, with online and interactive analytics capability. memepiC (2014-) is an extension of epiC project focusing on exploiting hardware features, multi-cores and large memory. Related earlier project: UTab.
  5. LogBase(2012-2016): a distributed log-structured data management system, funded by ASTAR (2013-2016). LogBase adopts log-only storage to handle high append and write load, such as Urban/Sensor information processing. Indexing, transaction management and query processing are the key issues that have been investigated and source codes have released. LogBase is related to an ongoing research on database support for Energy and Environmental Sustainability Solutions for Megacities.
  6. CDAS(2011-2015): a Crowdsourcing Data Analytics System that has been designed to improve the quality of query results and effectively reduce the processing cost at the same time. It is being built as a crowdsourcing system that provides primitive operators to facilitate composition of crowdsourcing tasks. Other key issues such as privacy and applicability, and various applications are being investigated.
With the ubiquity of Big Data and fusion of applications and technologies, the projects are related in many aspects. Beng Chin approaches research problems and system design with the philosophy that all algorithms and structures should be simple, elegant and yet efficient so that they can be easily grafted into existing systems and they are implementable, maintainable and scalable in actual applications, and all systems must be efficient, scalable, extensible and easy to use. A good example would be his approach towards the design of new indexes; they are mainly B+-tree based -- simple and elegant in design, and efficient, robust and scalable in performance (eg. TP-index[ICDE1994], ST B-tree[DKE1995], iMinMax[PODS2000], iDistance[VLDB2001, TODS2005], B^x-tree[VLDB2004], GiMP[TODS2005], ST^2B-tree[SIGMOD2008, TODS2010], B^{ed}-tree[SIGMOD2010], String Indexing[TKDE2014]). Recently, due to the change in h/w architecture and capability, he has been working on an index that is more scalable and efficient for the environment ( PI[CoRR 2015]). Again, the index has to be simple, elegant and fast!