Date/Time : Wednesdays, 4.15pm
Venue: Video Conference Room (VC), COM1-02-13, School of Computing

The seminar series organised by SoC Graduate Studies’ Office involve research talks given by senior PhD students, faculty members and industry partners.


Calendar of Talks

The slides/materials used for the talks can be found here.

AY2017/2018 Semester 1


Title: Analysis of Source Code and  Binaries for Vulnerability Detection and Patching
Speaker: Abhik Roychoudhury, Professor, Department of Computer Science
Abstract: Due to the absence of source code for parts of a software system - analysis methods which work on both source code and binaries are of value. We have studied vulnerability detection techniques which work on both source code and binaries. Our detection techniques combine the essential ingredients of various aspects of fuzz testing - model-based black-box fuzzing, coverage based greybox fuzzing, and symbolic execution based whitebox fuzzing. Apart from detecting security vulnerabilities, these capabilities can also be used for reproducing crashes from crash reports or clustering "similar" crashes. Finally, we have also studied methods for automated program repair, where vulnerability patch suggestions can be generated automatically.
All of our fuzz testing and patching techniques have been evaluated on large scale and well-known systems such as detecting vulnerabilities in real-life applications such as the Adobe Acrobat reader or Windows Media Player. 
The talk will also provide a glimpse into the growing field of semantic program repair and its applications, which was started at NUS and has been gaining traction ever since.

6-Sep-2017 Title: Continuing Moore’s Law: Challenges and Opportunities in Computer Architecture
Speaker: Trevor Erik Carlson, Assistant Professor, Department of Computer Science
Abstract: Ever faster, cheaper mobile phones (as well as other computing devices) have been what consumers have come to expect from technology for many years. But, given two recent trends in technology scaling (todays chips are limited by power and costs because scaling has slowed significantly), it is widely expected that we will no longer receive significant help from scaling to help us build these faster devices. Does this spell out the end of computing as we know it? Will computers stop getting faster?

As silicon technology improvements have slowed, research into alternatives technologies has increased. Nevertheless, these technologies could still take decades to reach the performance and cost that current CMOS provides. One solution to the problem of slowing technology scaling is to adapt the computer’s architecture to more efficiently use the transistors that we have. This is the main focus for our research.

To enable a variety of new applications (AR, VR, machine-learning, etc.) while still providing longer-battery life and higher performance, we need to pursue innovative architectural directions. To do this, our research focuses on building general-purpose (programmable) processors and accelerators that are now a necessity to enable these new applications. In this talk, I will present some recent developments in computer architecture to move us closer to that goal, and present some critical challenges (and potential solutions) that we will need to address in the coming years.

13-Sep-2017 Title: Learning From Multiple Social Networks for Research And Business: A PhD Journey
Aleksandr Farseev, Dean’s Graduate Award winner (AY2016/2017 Sem2)
The drastic change in the Web was witnessed throughout the past decade, which saw an exponential growth in social networking services. The reason of such growth is that social media users concurrently produce and consume data. In this context, millions of users, who follow different lifestyles and belong to different demographic groups, regularly contribute multi-modal data on various online social networks, such as Twitter, Facebook, Foursquare, Instagram, and Endomondo. Traditionally, social media users are encouraged to complete their profiles by explicitly providing their personal attributes such as age, gender, interest, etc. (individual user profile). Additionally, users are likely to join interest-based groups that are devoted to various topics (group user profile). Such information is essential for different applications, but unfortunately, it is often not available publicly. This gives rise of automatic user profiling, which aims at automatic inference of users' hidden information based on observable information such as individual's behavior or utterances. The talk is focused on investigating user profiling across multiple social networks in different application domains.

Title: Adapting User Technologies: Bridging Designers, Machine Learning and Psychology through Collaborative, Dynamic, Personalized Experimentation
Joseph Jay Williams, Assistant Professor, Department of Information Systems and Analytics
Enhancing people's real-world learning and thinking is a challenge for HCI and psychology, while AI aims to build systems that can behave intelligently in the real-world. This talk presents a framework for redesigning the everyday websites people interact with to function as: (1) Intelligent adaptive agents that implement machine learning algorithms to dynamically discover how to optimize and personalize people’s learning and reasoning. (2) Micro-laboratories for psychological experimentation and data collection,

I present an example of how this framework is used to create “MOOClets” that embed randomized experiments into real-world online educational contexts – like learning to solve math problems. Explanations (and experimental conditions) are crowdsourced from learners, teachers and scientists. Dynamically changing randomized experiments compare the learning benefits of these explanations in vivo with users, continually adding new conditions as new explanations are contributed.

Algorithms (for multi-armed bandits, reinforcement learning, Bayesian Optimization) are used for real-time analysis (of the effect of explanations on users’ learning) and optimizing policies that provide the explanations that are best for different learners. The framework enables a broad range of algorithms to discover how to optimize and personalize users’ behavior, and dynamically adapt technology components to trade off experimentation (exploration) with helping users (exploitation).

Bio: Joseph Jay Williams is an Assistant Professor at the National University of Singapore's School of Computing, department of Information Systems & Analytics. He was previously a Research Fellow at Harvard's Office of the Vice Provost for Advances in Learning, and a member of the Intelligent Interactive Systems Group in Computer Science. He completed a postdoc at Stanford University in the Graduate School of Education in Summer 2014, working with the Office of the Vice Provost for Online Learning and the Open Learning Initiative. He received his PhD from UC Berkeley in Computational Cognitive Science, where he applied Bayesian statistics and machine learning to model how people learn and reason. He received his B.Sc. from University of Toronto in Cognitive Science, Artificial Intelligence and Mathematics, and is originally from Trinidad and Tobago. More information about his research and papers is at

4-Oct-2017 Title: Improving Medication Compliance: How CS Can Help
Ooi Wei Tsang, Associate Professor, Department of Computer Science
Medical compliance refers to the degree to which a patient accurately follows medical advice given by healthcare professionals, including whether they take medication as prescribed, are they taking the right dosage, and at the right timing.  It is challenging for children and young adults patients who need long-term medication to comply due to their lifestyle and the need to balance between their study, social activities, and possibly work.  This talk aims to (i) highlight the importance of the problem and the challenge that the patients face, (ii) review some existing work in computing literature that addresses this problem, and (iii) identify some open research challenges towards improving medical compliance that involve computer networking, sensors, multimedia-multimodal data, AI, and HCI research.

Title: Introduction to blockchain and cryptocurrency research
Luu The Loi, Dean’s Graduate Award winner (AY2016/2017 Sem2)
Abstract: Cryptocurrencies, such as Bitcoin, Ethereum and 250 similar alt-coins, embody at their core a blockchain protocol—a mechanism for a open and decentralized network with even malicious nodes to periodically agree on a set of new transactions. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum’s smart contract system has seen steady adoption, supporting millions of contracts, holding billions dollars worth of virtual coins.
In this talk I will give brief introduction about blockchain and smart contract research. I also discuss a few interesting applications and research papers in this space. The talk is concluded by presenting open and interesting research problems that the community is focusing on.


Title: Bounds on Distributed Information Spreading in Networks with Latencies
Suman Sourav, PhD Student, Department of Computer Science
Abstract: Consider the problem of disseminating information (broadcast) in a large-scale distributed system: one (or more) nodes in a network have information that they want to share/aggregate/reconcile with others. Classic examples include distributed database replication, sensor network data aggregation, and P2P publish-subscribe systems. We study the performance of these distributed systems under the gossip protocol, in which a node is restricted to communicate with only one other neighboring node per round and show both theoretical upper and lower bounds for the case where networks have arbitrary varying latencies. The network is modeled as a weighted graph, where the network nodes are represented by the vertices, network links by the graph edges and the link latencies by the edge weights. We define a parameter called the weighted conductance and choose a particular latency as the critical latency for the graph. The weighted conductance characterizes how well connected the graph is with respect to the critical latency. We show that this weighted conductance provides an accurate characterization of connectedness by showing that the time required for information spreading has a tight dependence on the weighted conductance. We view our results as a step towards a more accurate characterization of connectivity in networks with delays and we believe that the metric can prove useful in solving numerous other graph problems.
In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for the problem.


Title: Making Software Secure: Hardening & Analysis
Roland Yap, Associate Professor, Department of Computer Science
Abstract: Software plays a critical role in everyday life both from personal and enterprise/government standpoint. Unfortunately it is common than many critical software suffer from vulnerabilities, part of the reason being that such software usually is written in or has components in unsafe languages such as C and C++. An important question then is how to make protect ourselves from the inevitable bugs.
This talk looks at two important ingredients to address this critical problem.
Firstly, how to harden real-world low-level code in C/C++. This involves how to make C/C++ code safer while preserving their essential properties.
For example, finding/preventing memory errors, type confusion, undefined behaviors. Some of this research directions will build on extending existing work on low fat pointers which is a state-of-art defence mechanism for buffer overflows.
Another direction is how to find such errors. Symbolic execution is the main method use to analyse the behavior of programs without test cases because it can simulate program execution in a general fashion.
Symbolic execution brings the challenge of how to solve the constraints used to model programs effectively, e.g. string operations such as regular expression matching, how to deal with the heap, etc.
Such analysis can also hand in hand with optimizing and improving the code hardening.


Title: Interpretable Machine Learning for User Friendly, Healthy Interventions
Brian Lim, Assistant Professor, Department of Computer Science
Abstract: Advances in artificial intelligence, sensors and big data management have far-reaching societal impacts. These systems augment our everyday lives and can provide healthy interventions to improve our behaviors. These AI-driven systems can be directly helpful to consumers, such as by recognizing and recommending healthy foods, or indirectly by generating insights from data analytics to help to drive policy decisions for on urban populations. However, it is becoming increasingly important for people to understand them and remain in control. As we employ more sophisticated sensors and accurate machine learning models, how can we gain the users’ trust and understand in these applications?
In this talk, I will give an overview of my group’s research into building AI-based, user-centered, and explainable applications spanning healthcare disease risk prediction, mobile food recognition logging, public health fitness tracking, context-aware interruption management, and urban mobility. We employ methods from human-computer interaction and machine learning to (i) eliciting requirements from target users, (ii) develop deployable hardware prototypes and software interfaces, and (iii) evaluate impact on real users in lab and field studies.

1-Nov-2017 Title: Data Privacy in Machine Learning
Reza Shokri, Assistant Professor, Department of Computer Science
Abstract: I will talk about what machine learning privacy is, and will discuss how and why machine learning models leak information about the individual data records on which they were trained.  My quantitative analysis will be based on the fundamental membership inference attacks: given a data record and (black-box) access to a model, determine if a record was in the model's training set.  I will demonstrate how to build such inference attacks on different classification models e.g., trained by commercial "machine learning as a service" providers such as Google and Amazon. Website:
8-Nov-2017 Title: Analyzing Filamentary Structured Objects in Biomedical Images: Segmentation, Tracing, and Synthesis
Cheng Li, Adjunct Assistant Professor, Department of Computer Science
Abstract: Filamentary structured objects are abundant in biomedical images, such as neuronal images, retinal fundus images, and angiography, to name a few.
In this talk, we will discuss on our recent research efforts in addressing the tasks of segmentation, tracing, and synthesis for such images. More details can be found at our project websites


AY2016/2017 Semester 2


Title: Transparency & Discrimination in Big Data Systems
Speaker: Yair Zick, Assistant Professor, Department of Computer Science
Abstract: Big data and machine learning techniques are being increasingly used to make decisions about important, often sensitive, aspects of our lives; these include healthcare, finance and law enforcement. These algorithms often learn from data; for example, they might try to predict someone's income levels based on various features, such as their age, salary or marital status. These algorithms are often very, very good at their job (hence their popularity): they are able to process a huge amount of data and offer accurate predictions that would have otherwise been made by human decision makers with only very partial, biased data (and would certainly require much more time). It is often thought that algorithms are unbiased, in the sense that they do not hold any prior opinions that affect their decisions. In particular, we would not like our algorithms to base their predictions on sensitive features - such as ethnicity or gender.

So, did a big data algorithm base its decisions on "protected" user features? The problem is that in many cases it is very hard to tell: big data algorithms are often extremely complex, so we cannot be sure whether an algorithm used a protected feature (say, gender), or based its prediction on a correlated input.

Our research aims at developing formal methods that offer some transparency into the way that the algorithms use their inputs. Using tools from game theory, formal causality analysis and statistics, we offer influence measures that can indicate how important was a feature in making a decision about an individual, or a protected group. In this talk, I will review some of the latest developments on algorithmic transparency, and its potential impact on interpretable ML.


Title: The emerging security and privacy issues in the tangled web
Speaker: Jia Yaoqi, Dean’s Graduate Award winner (AY2016/2017 Sem1)
Abstract: World Wide Web gradually becomes an essential part of our daily life in the digital age. With the advent of cloud services and peer-to-peer techniques, new security and privacy issues are emerging in the tangled web. In this talk, I first illustrate how cloud services affect the web/local boundary provided by browsers, and then briefly present the privacy leakage in the P2P web overlays as well as the solutions using onion-routing and oblivious RAM.

First, browsers such as Chrome adopt process-based isolation design to protect “the local system” from “the web”. However, as billions of users now use web-based cloud services (e.g., Dropbox and Google Drive), which are integrated into the local system, the premise that browsers can effectively isolate the web from the local system has become questionable. We argue that if the process-based isolation disregards the same-origin policy as one of its goals, then its promise of maintaining the “web/local system (local)” separation is doubtful. Specifically, we show that existing memory vulnerabilities in Chrome’s renderer can be used as a stepping-stone to drop executables/scripts in the local file system, install unwanted applications and misuse system sensors. These attacks are purely data-oriented and do not alter any control flow or import foreign code. Thus, such attacks bypass binary-level protection mechanisms, including ASLR and in-memory partitioning. Finally, we discuss various full defenses and present a possible way to mitigate the attacks presented.

Second, the web infrastructure used to be a client-server model, in which clients (or browsers) request and fetch web contents such as HTML, JavaScript and CSS from web servers. Recently peer-to-peer (P2P) techniques (supported by real-time communications or RTC) have been introduced into the web infrastructure, which enables browsers to directly communicate with each other and form a P2P web overlay. This also brings the open and unsolved problems like privacy issues in P2P systems to the new web overlays. We investigate the security and privacy issues in web overlays, and propose solutions to address these issues using cryptographic and hardware primitives such as onion routing and oblivious RAM. First, we present inference attacks in peer-assisted CDNs on top on web overlays, which can infer user’s online activities such as browsing history. To thwart such attacks, we propose an anonymous peer-assisted CDN (APAC), which employs onion-routing techniques to conceal users’ identities and uses region-based circuit selection algorithm to reduce performance overhead. Second, to hide online activities (or access patterns) of users against long-term global analysis, we design an oblivious peer-to-peer content sharing system (OBLIVP2P), which uses new primitives such as distributed-ORAM in the P2P setting.


Title: From networked chips to cities
Peh Li Shiuan, Provost's Chair Professor, Department of Computer Science
Abstract: As a new faculty member of SoC, I am currently actively scouting for PhD students for my group. This talk is pitched at the students, providing an overview of the kind of research my group has done in the past, and briefly discussing our next steps.
This talk will give an overview of my group’s research, starting from our foray into networks-on-a-chip that enables scalable many-core processors. With many-core processors making their way into mobile devices, providing unprecedented compute power on such devices, we then explore how these powerful mobile devices can enable next-generation applications in smart cities.


Title: On Modeling the Time-Energy Performance of Data-Parallel Applications on Heterogeneous Systems
Speaker: Dumitrel Loghin, Dean’s Graduate Award winner (AY2016/2017 Sem1)
Abstract: The increasing volume of data to be processed leads to an energy usage issue in datacenter computing. Traditionally, datacenters employ homogeneous brawny servers based on x86/64 CPUs which are known to be power-hungry. In contrast, heterogeneous systems combining CPU and GPU cores represent a promising alternative for energy-efficient data-parallel processing. Moreover, the last few years have witnessed a significant performance improvement of low-power, wimpy systems, traditionally used in mobile devices. However, selecting the best configuration in terms of software parameters and system resources is a daunting task because of the very large configuration space exposed by data-parallel frameworks and heterogeneous systems. To alleviate this, we have developed measurement-driven analytic models to determine and analyze suitable system configurations for Hadoop MapReduce, which represents the most popular data-parallel framework.  Using baseline measurements on a single node with small inputs, our models determine the execution time and energy usage on scale-out clusters and workloads. To evaluate the models, we have used two types of systems and five representative MapReduce workloads covering domains such as financial analysis, data mining and simulations. The systems consist of both cloud-based Amazon EC2 instances with discrete GPUs and self-hosted Nvidia Jetson TK1 nodes with integrated GPUs representing brawny and wimpy heterogeneous systems, respectively. Our model-based analysis supports the following key results. Firstly, for both brawny and wimpy systems, we show that heterogeneous clusters consisting of nodes with CPUs and GPUs are almost always more time-energy-efficient than homogeneous clusters with CPU-only nodes. Secondly, we show that multiple wimpy nodes achieve the same time performance as a single brawny node while saving up to 90% of the energy used. In contrast with the related work, we are the first to design an energy usage model for MapReduce and to apply this model to analyze the performance of wimpy heterogeneous systems with GPU.


Title: Real world opportunities for NLP Research to Impact Global Education through MOOCs
Speaker: Kan Min Yen, Associate Professor, Department of Computer Science
Abstract: Massive Open Online Courses (MOOCs) have been heralded as a game-changer as they have the potential to disseminate the best lectures by top educations to the masses.  However, many students who enrol drop out, in part due to the difficulties in finding the motivation to complete the assignments.  Part of this is due to the (lack of) participation by instructor staff actively involved in deliberations in the course, especially in terms of dialogue and discussions with students through courses' discussion forums. 

We leverage natural language processing technologies to better analyse student conversations to identify opportunities for timely instructor intervention to produce better learning outcomes.   We discuss how diversity in MOOC offering has compromised the validity of previously published results, how automatic discourse parsing can improve prediction and the real problem of the bias presented by the user interface that affects the instructors' decision to intervene.

We are actively recruiting interested individuals to continue work on these and allied topics.


*Note: venue

at Seminar Room 3


Title: Power Papers -- Some Practical Pointers, Part 1
Speaker: Terence Sim, Associate Professor, Department of Computer Science
Abstract: If I write with the flowery flourish of Shakespeare, but my prose proves problematic, then my words become like a noisy gong or a clanging cymbal.  If I have the gift of mathematical genius and can fathom all theorems, but cannot articulate the arcane, my genius appears no different from madness.  If I achieve breakthrough research that can change the world, but cannot explain its significance, the world gains nothing and I labor in vain.

Writing a good research paper takes effort; more so if there is a page limit.  Yet this skill is required of every researcher, who, more often than not, fumbles his or her way through.  Good grammar is only a start; care and craft must be applied to turn a mediocre paper into a memorable one.  Writing skills can indeed be honed.

In this talk, I will highlight the common mistakes many authors make, and offer practical pointers to pack more punch into your paper. Needless to say, the talk will be biased: I will speak not from linguistic theories, but from personal experience, sharing what has, and has not, worked for me.  Students and staff are all welcome to participate: your views and insights will certainly benefit us all.


Title: Cache Miss Equation, and Synthetic Dataset Scaling
Zhang Jiangwei, Research Achievement Award winner (AY2016/2017 Sem1)

Cache Miss Equation: Science seeks to discover what is forever true of nature. For Computer Science, what can we discover that will be forever true about computation or, at least, immune to changes in technology?  Computation fundamentally requires cycles, memory, bandwidth and time. The memory in a computer system has innumerable caches, and our research on this resource focuses on developing an equation to describe cache misses for all levels of the memory hierarchy. It works for a disk cache, database buffers, garbage-collected heaps, nonvolatile memory and content-centric networking. For more details, please check:

Synthetic Dataset Scaling: Benchmarks are ubiquitous in the computing industry and academia. Developers use benchmarks to compare products, while researchers use them similarly in their research. For 20-odd years, the popular benchmarks for database management systems were the ones defined by the Transaction Processing Council (TPC). However, the small number of TPC benchmarks are increasingly irrelevant to the myriad of diverse applications, and the TPC standardization process is too slow. This led to a proposal for a paradigm shift, from a top-down design of domain-specific benchmarks by committee consensus, to a bottom-up collaboration to develop tools for application-specific benchmarking. A database benchmark must have a dataset. For the benchmark to be application-specific, it must start with an empirical dataset D.  This D may be too small or too large for the benchmarking experiment, so the first tool to develop would be for scaling D to a desired size. This motivates the Dataset Scaling Problem(DSP): Given a set of relational tables D and a scale factor s, generate a database state D' that is similar to D but s times its size. For more details, please check: 

In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for both problems.


Title: Computer Vision for Robotics Perception
Lee Gim Hee, Assistant Professor, Department of Computer Science
Camera is a good sensor for robotic perception over traditionally used Lidar because of low-cost and rich in information, but the algorithms are often computationally too expensive, and sensitive to noise and outliers.
In this talk, I will present my work on making some of the computer vision algorithms more efficient and robust for robots to percieve the world through cameras.


Title: Hardening Programs Against Software Vulnerabilities AND Constraints Solvers for Problems in Security
Roland Yap, Associate Professor, Department of Computer Science
The talk will be about two but partially related topics. The first is on preventing exploitation of software vulnerabilities and will be the main focus on the talk. Memory bugs are still the main route where software is attacked. In fact, one might regard that in most of today's complex software in low level languages such as C and C++ that such bugs are inevitable. As such, a strategy to harden the program such that these bugs cannot be exploited, e.g. to corrupt the stack, is perhaps the strategy which needs to be adopted in the long term. There are many kinds of memory errors, perhaps, the most well known are spatial and temporal errors. I will talk about a research direction which opens up the area from simple to complex kinds of program hardening. For students interested in knowing a bit more before hand, a recent paper at NDSS 2017 on protecting stack objects is

Stack Object Protection with Low Fat Pointers

The second topic which I will touch on more briefly is research on constraint solving. Constraint solving is of broad applicability to many domains ranging from theoretical computer science, to verification, to security. I will mention some problems in constraints with some links to verification and security.


Title: Analyzing the Behaviors of Articulated Objects in 3D : Applications to Human and Animals
Cheng Li, Adjunct Assistant Professor, Department of Computer Science
Recent advancement of depth cameras has opened door to many interesting applications. In this talk, I will discuss our research efforts toward addressing the related tasks of pose estimation, tracking, action and behavior analysis of a range of articulated objects (human upper-body, human hand, fish, mouse) from such 3D cameras. In particular, I will talk about our recent Lie group based approach that enables us to tackle these problems under a unified framework. Looking forward, the results could be applied to everyday life scenarios such as natural user interface, behavior analysis and surveillance, gaming, among others.