Date/Time : Wednesdays, 4.00pm
Venue: Video Conference Room (VC), COM1-02-13, School of Computing

The seminar series organised by SoC Graduate Studies’ Office involve research talks given by senior PhD students, faculty members and industry partners.


Calendar of Talks

The slides/materials used for the talks can be found here.

AY2017/2018 Semester 2


Title: 3 Projects on Computer System Performance
Tay Yong Chiang, Professor, Department of Computer Science
This talk describes 3 current projects on the performance of computer systems:

(1.Database) For 20-odd years, developers and researchers have used the TPC benchmarks to compare their products and algorithms.  These benchmarks have fixed schemas that bear no relation to current applications. The target of the database project is to replace TPC benchmarks with synthetic versions of application datasets.  The idea is to first scale the empirical dataset to the appropriate size, then tweak the data in the resulting dataset to enforce application-specific properties.  The amibition is to have a repository of tweaking tools contributed by the developer community, and current work is on building a collaborative framework to facilitate tool interoperability.

(2.Memory) Most of the current hot topics in computer science will become cold within 10 years, but caching will remain an issue 50 years from now. Most caching algorithms try to strike a heuristic balance between recency (e.g. LRU) and frequency (i.e. popularity).  The target of the memory project is to use a Cache Miss Equation to do a scientific study of this balance.

(3.Networking) Over the last 2 years, Google has moved their production traffic to a TCP variant called BBR.  This may start a paradigm shift for TCP congestion control, from one based on packet loss to one based on bandwidth-delay product.  BBR requires estimates for minimum round-trip time R and maximum bandwidth X.  BBR measures R and X by periodically changing its packet sending rate.  The target of the networking project is to show that the estimation can be done differently and passively.  The underlying idea works for any TCP version (CUBIC, Reno, etc.), and even for choosing between hardware/software architectures for video games.

7-Feb-2018 Title: Privacy and Security in (Outsourced) Machine Learning
Reza Shokri, Assistant Professor, Department of Computer Science
I will talk about the security and privacy threats against machine learning, notably when its training is outsourced. I will discuss how and why machine learning models leak information about the individual data records on which they were trained, and how an attacker can train a deep neural network in such a way that it leaks even more information. I will also talk about security issues with respect to outsourced machine learning, and how we can evaluate such attacks.
14-Feb-2018 Title: Constrained Counting and Sampling: Bridging the Gap between Theory and Practice
Kuldeep Singh Meel, Assistant Professor, Department of Computer Science
Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints . In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of give n constraints. In this talk, I will introduce a novel algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in Boolean reasoning over the past two decades.  This has allowed us to obtain breakthrough results in constrained sampling and counting, providing a new algorithmic toolbox in machine learning, probabilistic reasoning, privacy, and design verification.  I will demonstrate the utility of the above techniques on various real applications including probabilistic inference, design verification and our ongoing collaboration in estimating the reliability of critical infrastructure networks during natural disasters.
21-Feb-2018 Title: Preparing for a Low-Latency Future Internet
Ben Leong, Associate Professor, Department of Computer Science
Google has deployed BBR, a new low-latency TCP variant. We show that to transition smoothly to a low-latency Internet of the future, we need a TCP variant that not only can contend effectively against CUBIC in the current Internet, but that is also able to reduce its level of aggressiveness in a low-latency environment. We present EvaRate, a rate-based congestion control algorithm that incorporates a new buffer estimation technique which allows an EvaRate flow to infer its own buffer occupancy as well as that of the competing flows sharing the same bottleneck buffer. With this mechanism, an EvaRate flow is able to determine its operating environment and, when in a low-latency (or benevolent) environment, collaboratively regulate the bottleneck buffer occupancy with other EvaRate flows. EvaRate highlights a new point in the congestion control design space that deserves further attention.
7-Mar-2018 Title: Super Speaking -- Tricks of the Trade
Speaker: Terence Sim, Associate Professor, Department of Computer Science
Abstract: Most of us in academia are engaged in this typical sequence of activities: (a) do research; (b) write a report/paper about it; (c) give an oral presentation. While many of us are good at research skills (a), and can write reasonable well (b), we are less confident in speaking about it (c). Indeed, presenting our work in front of an audience often causes knees to wobble and stomachs to cramp. It gets worse when we realize, halfway through the talk, that the audience is getting restless or bored because they are not understanding our message.

In this talk, I will share some techniques that will improve the intelligibility of our technical presentations. I learned many of these "tricks of the trade" in school -- the School of Hard Knocks. Others I picked up by observing the habits of good speakers; still others from the wise counsel of my seniors. While I cannot guarantee to take away the nervousness when you give a talk, I can certainly offer practical tips that will hopefully improve the clarity of your communication. At the very least, you can get a kick out of seeing whether I practice what I preach.

14-Mar-2018 Title: Information Theory and Machine Learning
Jonathan Scarlett, Assistant Professor, Department of Computer Science
The field of information theory was introduced as a means for understanding the fundamental limits of data compression and transmission, and has shaped the design of practical communication systems for decades.  In this talk, I will discuss the emerging viewpoint that information theory is not only a theory of communication, but a far-reaching theory of data that is applicable to seemingly unrelated learning problems such as estimation, prediction, and optimization.  This perspective leads to principled approaches for certifying the near-optimality of practical algorithms, as well as understanding where further improvements are possible.  I will provide a gentle introduction to some of the main ideas and insights offered by this perspective, and present examples in the problems of group testing, graphical model selection, sparse regression, and black-box function optimization.

Title: Correcting Language Errors using Machine Translation Techniques
Shamil Chollampatt Muhammed Ashraf, Dean’s Graduate Award winner (AY2017/2018 Sem1)
Grammatical error correction (GEC) tools play an important role in helping second language learning and providing assistance to non-native writers. Currently, the leading approach to GEC is the machine translation approach, in which potentially erroneous sentences are “translated” into fluent well-formed sentences. This talk will introduce various machine translation techniques that have been successfully applied and adapted to GEC, such as word and character-level statistical machine translation, neural network joint models, and neural encoder-decoder approaches.

Title: Linguistic Properties Matter for Implicit Discourse Relation Recognition: Combining Semantic Interaction, Topic Continuity and Attribution
Speaker: Lei Wenqiang, PhD Student, Department of Computer Science
Modern solutions for implicit discourse relation recognition largely build universal models to classify all of the different types of discourse relations. In contrast to such learning models, we build our model from first principles, analyzing the linguistic properties of the individual top-level Penn Discourse Treebank (PDTB) styled implicit discourse relations: Comparison, Contingency and Expansion.
We find semantic characteristics of each relation type and two cohesion devices – topic continuity and attribution – work together to contribute such linguistic properties. We encode those properties as complex features and feed them into a Naïve Bayes classifier, bettering baselines (including deep neural network ones) to achieve a new state-of-the-art performance level. Over a strong, feature-based baseline, our system outperforms one versus other binary classification by 4.83% for Comparison relation, 3.94% for Contingency and 2.22% for four-way classification.

28-Mar-2018 Title: (Gap/S)-ETH Hardness of SVP
Speaker: Divesh Aggarwal, Assistant Professor, Department of Computer Science
Abstract: There has been a lot of research in the last two decades on constructing cryptosystems whose security relies on the hardness of the shortest vector problem (SVP) on integer lattices. The SVP is well known to be NP-hard. However, such hardness proofs tell us very little about the quantitative or fine-grained complexity of SVP. E.g., does the fastest possible algorithm for SVP still run in time at least, say, 2^{n/5} , or is there an algorithm that runs in time 2^{n/100} or even 2^{\sqrt{n}}? The above hardness results cannot distinguish between these cases, but we certainly need to be confident in our answers to such questions if we plan to base the security of widespread cryptosystems on these answers.

In this talk, I will give a partial answer to this question by showing the following quantitative hardness results for the Shortest Vector Problem in the \ell_p norm (SVP_p)  where n is the rank of the input lattice. 1) For "almost all'' p > 2.14, there no 2^{n/C_p}-time algorithm for SVP_p for some explicit constant C_p > 0 unless the (randomized) Strong Exponential Time Hypothesis (SETH) is false. 2) For any p > 2, there is no 2^{o(n)}-time algorithm for SVP_p unless the (randomized) Gap-Exponential Time Hypothesis (Gap-ETH) is false. 3) There is no 2^{o(n)}-time algorithm for SVP_2 unless either (1) (non-uniform) Gap-ETH is false; or (2) there is no family of lattices with exponential kissing  number in the \ell_2 norm.

This is joint work with Noah Stephens-Davidowitz.


Title: Your Toolbox for Privacy in the Cloud
Tople Shruti Shrikant, Dean’s Graduate Award winner (AY2017/2018 Sem1)
Use of cloud services is becoming popular among users with terabytes of data uploaded every day. The state-of-the-practice method to secure this data is using encryption. But encryption alone is not enough. As cloud services offer complex functionalities at scale, my research raises several fundamental questions that are important to ensure practical privacy in the cloud. Concretely, 1) Can we compute on encrypted data in real-time? 2) What are the limits of defenses that hide side-channels appearing in encrypted computation techniques? 3) Can we design an ideally efficient side-channel defense for hiding specific data access patterns that exhibit in a large class of applications?
In this talk, I will present various tools that I have developed in my research that answer the above questions and enable practical privacy in the cloud. My first work enables practical arbitrary computation on encrypted data by switching between efficient cryptographic schemes with minimum trust in software. This work forks a new direction in the area of encrypted computation by bridging the gap between two independent lines of approach --- cryptographic primitives and trusted computing. Next, I will present an intractability result for hiding side-channels that leak information in encrypted computation. Lastly, I will show a construction that achieves ideal efficiency (constant latency) for hiding data access patterns in the read-only class of applications.


Title: Quantum Communication Using Coherent Rejection Sampling
Anurag Anshu, Dean’s Graduate Award winner (AY2017/2018 Sem1)
Compression of a message up to the information it carries is key to many tasks involved in classical and quantum information theory. Schumacher [B. Schumacher, Phys. Rev. A 51, 2738 (1995)] provided one of the first quantum compression schemes and several more general schemes have been developed ever since [M. Horodecki, J. Oppenheim, and A. Winter, Commun. Math. Phys. 269, 107 (2007); I. Devetak and J. Yard, Phys. Rev. Lett. 100, 230501 (2008); A. Abeyesinghe, I. Devetak, P. Hayden, and A. Winter, Proc. R. Soc. A 465, 2537 (2009)]. However, the one-shot characterization of these quantum tasks is still under development, and often lacks a direct connection with analogous classical tasks. Here we show a new technique for the compression of quantum messages with the aid of entanglement. We devise a new tool that we call the convex split lemma, which is a coherent quantum analogue of the widely used rejection sampling procedure in classical communication protocols. As a consequence, we exhibit new explicit protocols with tight communication cost for quantum state merging, quantum state splitting, and quantum state redistribution (up to a certain optimization in the latter case). We also present a port-based teleportation scheme which uses a fewer number of ports in the presence of information about input.

Based on a joint work with Vamsi Krishna Devabathini and Rahul Jain.

11-Apr-2018 Title: Mining Clinical Data
Vaibhav Rajan, Assistant Professor, Department of Information Systems and Analytics
Clinical data analysis poses several modeling challenges that arise due to data heterogeneity, temporality, sparsity, bias and noise. I will outline these challenges in the context of identifying patients at risk of developing complications in hospitals, and present two projects.

Nursing notes contain regular and valuable assessments of patients' condition but often have inconsistent abbreviations and lack the grammatical structure of formal documents, thereby making automated analysis difficult. We design a new approach that effectively utilizes the structure of the notes, is robust to inconsistencies in the text and surpasses the accuracy of previous methods.

Healthcare data often contains heterogeneous datatypes that exhibit complex feature dependencies. Our algorithm for dependency clustering uses copulas to effectively model a wide range of dependencies and can fit mixed -- continuous and ordinal -- data. It scales linearly with size and quadratically with dimensions of input data, which is significantly faster than state-of-the-art correlation clustering methods for mixed data.

I'll conclude with a summary of my current research.



AY2017/2018 Semester 1


Title: Analysis of Source Code and  Binaries for Vulnerability Detection and Patching
Speaker: Abhik Roychoudhury, Professor, Department of Computer Science
Abstract: Due to the absence of source code for parts of a software system - analysis methods which work on both source code and binaries are of value. We have studied vulnerability detection techniques which work on both source code and binaries. Our detection techniques combine the essential ingredients of various aspects of fuzz testing - model-based black-box fuzzing, coverage based greybox fuzzing, and symbolic execution based whitebox fuzzing. Apart from detecting security vulnerabilities, these capabilities can also be used for reproducing crashes from crash reports or clustering "similar" crashes. Finally, we have also studied methods for automated program repair, where vulnerability patch suggestions can be generated automatically.
All of our fuzz testing and patching techniques have been evaluated on large scale and well-known systems such as detecting vulnerabilities in real-life applications such as the Adobe Acrobat reader or Windows Media Player. 
The talk will also provide a glimpse into the growing field of semantic program repair and its applications, which was started at NUS and has been gaining traction ever since.

6-Sep-2017 Title: Continuing Moore’s Law: Challenges and Opportunities in Computer Architecture
Speaker: Trevor Erik Carlson, Assistant Professor, Department of Computer Science
Abstract: Ever faster, cheaper mobile phones (as well as other computing devices) have been what consumers have come to expect from technology for many years. But, given two recent trends in technology scaling (todays chips are limited by power and costs because scaling has slowed significantly), it is widely expected that we will no longer receive significant help from scaling to help us build these faster devices. Does this spell out the end of computing as we know it? Will computers stop getting faster?

As silicon technology improvements have slowed, research into alternatives technologies has increased. Nevertheless, these technologies could still take decades to reach the performance and cost that current CMOS provides. One solution to the problem of slowing technology scaling is to adapt the computer’s architecture to more efficiently use the transistors that we have. This is the main focus for our research.

To enable a variety of new applications (AR, VR, machine-learning, etc.) while still providing longer-battery life and higher performance, we need to pursue innovative architectural directions. To do this, our research focuses on building general-purpose (programmable) processors and accelerators that are now a necessity to enable these new applications. In this talk, I will present some recent developments in computer architecture to move us closer to that goal, and present some critical challenges (and potential solutions) that we will need to address in the coming years.

13-Sep-2017 Title: Learning From Multiple Social Networks for Research And Business: A PhD Journey
Aleksandr Farseev, Dean’s Graduate Award winner (AY2016/2017 Sem2)
The drastic change in the Web was witnessed throughout the past decade, which saw an exponential growth in social networking services. The reason of such growth is that social media users concurrently produce and consume data. In this context, millions of users, who follow different lifestyles and belong to different demographic groups, regularly contribute multi-modal data on various online social networks, such as Twitter, Facebook, Foursquare, Instagram, and Endomondo. Traditionally, social media users are encouraged to complete their profiles by explicitly providing their personal attributes such as age, gender, interest, etc. (individual user profile). Additionally, users are likely to join interest-based groups that are devoted to various topics (group user profile). Such information is essential for different applications, but unfortunately, it is often not available publicly. This gives rise of automatic user profiling, which aims at automatic inference of users' hidden information based on observable information such as individual's behavior or utterances. The talk is focused on investigating user profiling across multiple social networks in different application domains.

Title: Adapting User Technologies: Bridging Designers, Machine Learning and Psychology through Collaborative, Dynamic, Personalized Experimentation
Joseph Jay Williams, Assistant Professor, Department of Information Systems and Analytics
Enhancing people's real-world learning and thinking is a challenge for HCI and psychology, while AI aims to build systems that can behave intelligently in the real-world. This talk presents a framework for redesigning the everyday websites people interact with to function as: (1) Intelligent adaptive agents that implement machine learning algorithms to dynamically discover how to optimize and personalize people’s learning and reasoning. (2) Micro-laboratories for psychological experimentation and data collection,

I present an example of how this framework is used to create “MOOClets” that embed randomized experiments into real-world online educational contexts – like learning to solve math problems. Explanations (and experimental conditions) are crowdsourced from learners, teachers and scientists. Dynamically changing randomized experiments compare the learning benefits of these explanations in vivo with users, continually adding new conditions as new explanations are contributed.

Algorithms (for multi-armed bandits, reinforcement learning, Bayesian Optimization) are used for real-time analysis (of the effect of explanations on users’ learning) and optimizing policies that provide the explanations that are best for different learners. The framework enables a broad range of algorithms to discover how to optimize and personalize users’ behavior, and dynamically adapt technology components to trade off experimentation (exploration) with helping users (exploitation).

Bio: Joseph Jay Williams is an Assistant Professor at the National University of Singapore's School of Computing, department of Information Systems & Analytics. He was previously a Research Fellow at Harvard's Office of the Vice Provost for Advances in Learning, and a member of the Intelligent Interactive Systems Group in Computer Science. He completed a postdoc at Stanford University in the Graduate School of Education in Summer 2014, working with the Office of the Vice Provost for Online Learning and the Open Learning Initiative. He received his PhD from UC Berkeley in Computational Cognitive Science, where he applied Bayesian statistics and machine learning to model how people learn and reason. He received his B.Sc. from University of Toronto in Cognitive Science, Artificial Intelligence and Mathematics, and is originally from Trinidad and Tobago. More information about his research and papers is at

4-Oct-2017 Title: Improving Medication Compliance: How CS Can Help
Ooi Wei Tsang, Associate Professor, Department of Computer Science
Medical compliance refers to the degree to which a patient accurately follows medical advice given by healthcare professionals, including whether they take medication as prescribed, are they taking the right dosage, and at the right timing.  It is challenging for children and young adults patients who need long-term medication to comply due to their lifestyle and the need to balance between their study, social activities, and possibly work.  This talk aims to (i) highlight the importance of the problem and the challenge that the patients face, (ii) review some existing work in computing literature that addresses this problem, and (iii) identify some open research challenges towards improving medical compliance that involve computer networking, sensors, multimedia-multimodal data, AI, and HCI research.

Title: Introduction to blockchain and cryptocurrency research
Luu The Loi, Dean’s Graduate Award winner (AY2016/2017 Sem2)
Abstract: Cryptocurrencies, such as Bitcoin, Ethereum and 250 similar alt-coins, embody at their core a blockchain protocol—a mechanism for a open and decentralized network with even malicious nodes to periodically agree on a set of new transactions. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum’s smart contract system has seen steady adoption, supporting millions of contracts, holding billions dollars worth of virtual coins.
In this talk I will give brief introduction about blockchain and smart contract research. I also discuss a few interesting applications and research papers in this space. The talk is concluded by presenting open and interesting research problems that the community is focusing on.


Title: Bounds on Distributed Information Spreading in Networks with Latencies
Suman Sourav, PhD Student, Department of Computer Science
Abstract: Consider the problem of disseminating information (broadcast) in a large-scale distributed system: one (or more) nodes in a network have information that they want to share/aggregate/reconcile with others. Classic examples include distributed database replication, sensor network data aggregation, and P2P publish-subscribe systems. We study the performance of these distributed systems under the gossip protocol, in which a node is restricted to communicate with only one other neighboring node per round and show both theoretical upper and lower bounds for the case where networks have arbitrary varying latencies. The network is modeled as a weighted graph, where the network nodes are represented by the vertices, network links by the graph edges and the link latencies by the edge weights. We define a parameter called the weighted conductance and choose a particular latency as the critical latency for the graph. The weighted conductance characterizes how well connected the graph is with respect to the critical latency. We show that this weighted conductance provides an accurate characterization of connectedness by showing that the time required for information spreading has a tight dependence on the weighted conductance. We view our results as a step towards a more accurate characterization of connectivity in networks with delays and we believe that the metric can prove useful in solving numerous other graph problems.
In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for the problem.


Title: Making Software Secure: Hardening & Analysis
Roland Yap, Associate Professor, Department of Computer Science
Abstract: Software plays a critical role in everyday life both from personal and enterprise/government standpoint. Unfortunately it is common than many critical software suffer from vulnerabilities, part of the reason being that such software usually is written in or has components in unsafe languages such as C and C++. An important question then is how to make protect ourselves from the inevitable bugs.
This talk looks at two important ingredients to address this critical problem.
Firstly, how to harden real-world low-level code in C/C++. This involves how to make C/C++ code safer while preserving their essential properties.
For example, finding/preventing memory errors, type confusion, undefined behaviors. Some of this research directions will build on extending existing work on low fat pointers which is a state-of-art defence mechanism for buffer overflows.
Another direction is how to find such errors. Symbolic execution is the main method use to analyse the behavior of programs without test cases because it can simulate program execution in a general fashion.
Symbolic execution brings the challenge of how to solve the constraints used to model programs effectively, e.g. string operations such as regular expression matching, how to deal with the heap, etc.
Such analysis can also hand in hand with optimizing and improving the code hardening.


Title: Interpretable Machine Learning for User Friendly, Healthy Interventions
Brian Lim, Assistant Professor, Department of Computer Science
Abstract: Advances in artificial intelligence, sensors and big data management have far-reaching societal impacts. These systems augment our everyday lives and can provide healthy interventions to improve our behaviors. These AI-driven systems can be directly helpful to consumers, such as by recognizing and recommending healthy foods, or indirectly by generating insights from data analytics to help to drive policy decisions for on urban populations. However, it is becoming increasingly important for people to understand them and remain in control. As we employ more sophisticated sensors and accurate machine learning models, how can we gain the users’ trust and understand in these applications?
In this talk, I will give an overview of my group’s research into building AI-based, user-centered, and explainable applications spanning healthcare disease risk prediction, mobile food recognition logging, public health fitness tracking, context-aware interruption management, and urban mobility. We employ methods from human-computer interaction and machine learning to (i) eliciting requirements from target users, (ii) develop deployable hardware prototypes and software interfaces, and (iii) evaluate impact on real users in lab and field studies.

1-Nov-2017 Title: Data Privacy in Machine Learning
Reza Shokri, Assistant Professor, Department of Computer Science
Abstract: I will talk about what machine learning privacy is, and will discuss how and why machine learning models leak information about the individual data records on which they were trained.  My quantitative analysis will be based on the fundamental membership inference attacks: given a data record and (black-box) access to a model, determine if a record was in the model's training set.  I will demonstrate how to build such inference attacks on different classification models e.g., trained by commercial "machine learning as a service" providers such as Google and Amazon. Website:
8-Nov-2017 Title: Analyzing Filamentary Structured Objects in Biomedical Images: Segmentation, Tracing, and Synthesis
Cheng Li, Adjunct Assistant Professor, Department of Computer Science
Abstract: Filamentary structured objects are abundant in biomedical images, such as neuronal images, retinal fundus images, and angiography, to name a few.
In this talk, we will discuss on our recent research efforts in addressing the tasks of segmentation, tracing, and synthesis for such images. More details can be found at our project websites


AY2016/2017 Semester 2


Title: Transparency & Discrimination in Big Data Systems
Speaker: Yair Zick, Assistant Professor, Department of Computer Science
Abstract: Big data and machine learning techniques are being increasingly used to make decisions about important, often sensitive, aspects of our lives; these include healthcare, finance and law enforcement. These algorithms often learn from data; for example, they might try to predict someone's income levels based on various features, such as their age, salary or marital status. These algorithms are often very, very good at their job (hence their popularity): they are able to process a huge amount of data and offer accurate predictions that would have otherwise been made by human decision makers with only very partial, biased data (and would certainly require much more time). It is often thought that algorithms are unbiased, in the sense that they do not hold any prior opinions that affect their decisions. In particular, we would not like our algorithms to base their predictions on sensitive features - such as ethnicity or gender.

So, did a big data algorithm base its decisions on "protected" user features? The problem is that in many cases it is very hard to tell: big data algorithms are often extremely complex, so we cannot be sure whether an algorithm used a protected feature (say, gender), or based its prediction on a correlated input.

Our research aims at developing formal methods that offer some transparency into the way that the algorithms use their inputs. Using tools from game theory, formal causality analysis and statistics, we offer influence measures that can indicate how important was a feature in making a decision about an individual, or a protected group. In this talk, I will review some of the latest developments on algorithmic transparency, and its potential impact on interpretable ML.


Title: The emerging security and privacy issues in the tangled web
Speaker: Jia Yaoqi, Dean’s Graduate Award winner (AY2016/2017 Sem1)
Abstract: World Wide Web gradually becomes an essential part of our daily life in the digital age. With the advent of cloud services and peer-to-peer techniques, new security and privacy issues are emerging in the tangled web. In this talk, I first illustrate how cloud services affect the web/local boundary provided by browsers, and then briefly present the privacy leakage in the P2P web overlays as well as the solutions using onion-routing and oblivious RAM.

First, browsers such as Chrome adopt process-based isolation design to protect “the local system” from “the web”. However, as billions of users now use web-based cloud services (e.g., Dropbox and Google Drive), which are integrated into the local system, the premise that browsers can effectively isolate the web from the local system has become questionable. We argue that if the process-based isolation disregards the same-origin policy as one of its goals, then its promise of maintaining the “web/local system (local)” separation is doubtful. Specifically, we show that existing memory vulnerabilities in Chrome’s renderer can be used as a stepping-stone to drop executables/scripts in the local file system, install unwanted applications and misuse system sensors. These attacks are purely data-oriented and do not alter any control flow or import foreign code. Thus, such attacks bypass binary-level protection mechanisms, including ASLR and in-memory partitioning. Finally, we discuss various full defenses and present a possible way to mitigate the attacks presented.

Second, the web infrastructure used to be a client-server model, in which clients (or browsers) request and fetch web contents such as HTML, JavaScript and CSS from web servers. Recently peer-to-peer (P2P) techniques (supported by real-time communications or RTC) have been introduced into the web infrastructure, which enables browsers to directly communicate with each other and form a P2P web overlay. This also brings the open and unsolved problems like privacy issues in P2P systems to the new web overlays. We investigate the security and privacy issues in web overlays, and propose solutions to address these issues using cryptographic and hardware primitives such as onion routing and oblivious RAM. First, we present inference attacks in peer-assisted CDNs on top on web overlays, which can infer user’s online activities such as browsing history. To thwart such attacks, we propose an anonymous peer-assisted CDN (APAC), which employs onion-routing techniques to conceal users’ identities and uses region-based circuit selection algorithm to reduce performance overhead. Second, to hide online activities (or access patterns) of users against long-term global analysis, we design an oblivious peer-to-peer content sharing system (OBLIVP2P), which uses new primitives such as distributed-ORAM in the P2P setting.


Title: From networked chips to cities
Peh Li Shiuan, Provost's Chair Professor, Department of Computer Science
Abstract: As a new faculty member of SoC, I am currently actively scouting for PhD students for my group. This talk is pitched at the students, providing an overview of the kind of research my group has done in the past, and briefly discussing our next steps.
This talk will give an overview of my group’s research, starting from our foray into networks-on-a-chip that enables scalable many-core processors. With many-core processors making their way into mobile devices, providing unprecedented compute power on such devices, we then explore how these powerful mobile devices can enable next-generation applications in smart cities.


Title: On Modeling the Time-Energy Performance of Data-Parallel Applications on Heterogeneous Systems
Speaker: Dumitrel Loghin, Dean’s Graduate Award winner (AY2016/2017 Sem1)
Abstract: The increasing volume of data to be processed leads to an energy usage issue in datacenter computing. Traditionally, datacenters employ homogeneous brawny servers based on x86/64 CPUs which are known to be power-hungry. In contrast, heterogeneous systems combining CPU and GPU cores represent a promising alternative for energy-efficient data-parallel processing. Moreover, the last few years have witnessed a significant performance improvement of low-power, wimpy systems, traditionally used in mobile devices. However, selecting the best configuration in terms of software parameters and system resources is a daunting task because of the very large configuration space exposed by data-parallel frameworks and heterogeneous systems. To alleviate this, we have developed measurement-driven analytic models to determine and analyze suitable system configurations for Hadoop MapReduce, which represents the most popular data-parallel framework.  Using baseline measurements on a single node with small inputs, our models determine the execution time and energy usage on scale-out clusters and workloads. To evaluate the models, we have used two types of systems and five representative MapReduce workloads covering domains such as financial analysis, data mining and simulations. The systems consist of both cloud-based Amazon EC2 instances with discrete GPUs and self-hosted Nvidia Jetson TK1 nodes with integrated GPUs representing brawny and wimpy heterogeneous systems, respectively. Our model-based analysis supports the following key results. Firstly, for both brawny and wimpy systems, we show that heterogeneous clusters consisting of nodes with CPUs and GPUs are almost always more time-energy-efficient than homogeneous clusters with CPU-only nodes. Secondly, we show that multiple wimpy nodes achieve the same time performance as a single brawny node while saving up to 90% of the energy used. In contrast with the related work, we are the first to design an energy usage model for MapReduce and to apply this model to analyze the performance of wimpy heterogeneous systems with GPU.


Title: Real world opportunities for NLP Research to Impact Global Education through MOOCs
Speaker: Kan Min Yen, Associate Professor, Department of Computer Science
Abstract: Massive Open Online Courses (MOOCs) have been heralded as a game-changer as they have the potential to disseminate the best lectures by top educations to the masses.  However, many students who enrol drop out, in part due to the difficulties in finding the motivation to complete the assignments.  Part of this is due to the (lack of) participation by instructor staff actively involved in deliberations in the course, especially in terms of dialogue and discussions with students through courses' discussion forums. 

We leverage natural language processing technologies to better analyse student conversations to identify opportunities for timely instructor intervention to produce better learning outcomes.   We discuss how diversity in MOOC offering has compromised the validity of previously published results, how automatic discourse parsing can improve prediction and the real problem of the bias presented by the user interface that affects the instructors' decision to intervene.

We are actively recruiting interested individuals to continue work on these and allied topics.


*Note: venue

at Seminar Room 3


Title: Power Papers -- Some Practical Pointers, Part 1
Speaker: Terence Sim, Associate Professor, Department of Computer Science
Abstract: If I write with the flowery flourish of Shakespeare, but my prose proves problematic, then my words become like a noisy gong or a clanging cymbal.  If I have the gift of mathematical genius and can fathom all theorems, but cannot articulate the arcane, my genius appears no different from madness.  If I achieve breakthrough research that can change the world, but cannot explain its significance, the world gains nothing and I labor in vain.

Writing a good research paper takes effort; more so if there is a page limit.  Yet this skill is required of every researcher, who, more often than not, fumbles his or her way through.  Good grammar is only a start; care and craft must be applied to turn a mediocre paper into a memorable one.  Writing skills can indeed be honed.

In this talk, I will highlight the common mistakes many authors make, and offer practical pointers to pack more punch into your paper. Needless to say, the talk will be biased: I will speak not from linguistic theories, but from personal experience, sharing what has, and has not, worked for me.  Students and staff are all welcome to participate: your views and insights will certainly benefit us all.


Title: Cache Miss Equation, and Synthetic Dataset Scaling
Zhang Jiangwei, Research Achievement Award winner (AY2016/2017 Sem1)

Cache Miss Equation: Science seeks to discover what is forever true of nature. For Computer Science, what can we discover that will be forever true about computation or, at least, immune to changes in technology?  Computation fundamentally requires cycles, memory, bandwidth and time. The memory in a computer system has innumerable caches, and our research on this resource focuses on developing an equation to describe cache misses for all levels of the memory hierarchy. It works for a disk cache, database buffers, garbage-collected heaps, nonvolatile memory and content-centric networking. For more details, please check:

Synthetic Dataset Scaling: Benchmarks are ubiquitous in the computing industry and academia. Developers use benchmarks to compare products, while researchers use them similarly in their research. For 20-odd years, the popular benchmarks for database management systems were the ones defined by the Transaction Processing Council (TPC). However, the small number of TPC benchmarks are increasingly irrelevant to the myriad of diverse applications, and the TPC standardization process is too slow. This led to a proposal for a paradigm shift, from a top-down design of domain-specific benchmarks by committee consensus, to a bottom-up collaboration to develop tools for application-specific benchmarking. A database benchmark must have a dataset. For the benchmark to be application-specific, it must start with an empirical dataset D.  This D may be too small or too large for the benchmarking experiment, so the first tool to develop would be for scaling D to a desired size. This motivates the Dataset Scaling Problem(DSP): Given a set of relational tables D and a scale factor s, generate a database state D' that is similar to D but s times its size. For more details, please check: 

In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for both problems.


Title: Computer Vision for Robotics Perception
Lee Gim Hee, Assistant Professor, Department of Computer Science
Camera is a good sensor for robotic perception over traditionally used Lidar because of low-cost and rich in information, but the algorithms are often computationally too expensive, and sensitive to noise and outliers.
In this talk, I will present my work on making some of the computer vision algorithms more efficient and robust for robots to percieve the world through cameras.


Title: Hardening Programs Against Software Vulnerabilities AND Constraints Solvers for Problems in Security
Roland Yap, Associate Professor, Department of Computer Science
The talk will be about two but partially related topics. The first is on preventing exploitation of software vulnerabilities and will be the main focus on the talk. Memory bugs are still the main route where software is attacked. In fact, one might regard that in most of today's complex software in low level languages such as C and C++ that such bugs are inevitable. As such, a strategy to harden the program such that these bugs cannot be exploited, e.g. to corrupt the stack, is perhaps the strategy which needs to be adopted in the long term. There are many kinds of memory errors, perhaps, the most well known are spatial and temporal errors. I will talk about a research direction which opens up the area from simple to complex kinds of program hardening. For students interested in knowing a bit more before hand, a recent paper at NDSS 2017 on protecting stack objects is

Stack Object Protection with Low Fat Pointers

The second topic which I will touch on more briefly is research on constraint solving. Constraint solving is of broad applicability to many domains ranging from theoretical computer science, to verification, to security. I will mention some problems in constraints with some links to verification and security.


Title: Analyzing the Behaviors of Articulated Objects in 3D : Applications to Human and Animals
Cheng Li, Adjunct Assistant Professor, Department of Computer Science
Recent advancement of depth cameras has opened door to many interesting applications. In this talk, I will discuss our research efforts toward addressing the related tasks of pose estimation, tracking, action and behavior analysis of a range of articulated objects (human upper-body, human hand, fish, mouse) from such 3D cameras. In particular, I will talk about our recent Lie group based approach that enables us to tackle these problems under a unified framework. Looking forward, the results could be applied to everyday life scenarios such as natural user interface, behavior analysis and surveillance, gaming, among others.