SoC Research Talks

Date/Time : Wednesdays, 4.00pm
Venue: Via Zoom

The seminar series organised by SoC Graduate Studies’ Office involve research talks given by senior PhD students, faculty members and industry partners.

Calendar of Talks

The slides/materials used for the talks can be found here.

AY2020/2021 Semester 2

27-Jan-2021	Title: Fast and Secure Processing on the Edge: Efficiently Protecting Your Personal Data Footprint Speaker: Trevor E. Carlson, Assistant Professor, Department of Computer Science Abstract: Edge computing is a key technology that aims to enable both fast and efficient local data processing, where your requests are serviced by local, efficient endpoints instead of distant cloud services. But, maintaining both physically secure systems that are also efficient can be extremely difficult, as modern solutions tend to degrade performance with additional security. To address these concerns, the era of the end of Dennard Scaling and Moore’s Law requires moving beyond technology-only solutions to provide cross-stack solutions from the compiler, architecture and circuit designs. In this presentation, we present our recent work in physical security, and side-channel security, as well as privacy protections to bring efficient AI and data processing to the edge. Our recently released and open-source LABS hardware protection framework, and new initiatives in security and high-performance processing, aim to provide the foundation for a larger secure processor design that incorporates efficient privacy and security protections: physical, timing side-channel, software, and others.
3-Feb-2021	Title: Hardening and Defences to Make Software Secure Speaker: Roland Yap, Associate Professor, Department of Computer Science Abstract: A large fraction of software in common use is insecure. By this, what is meant is that any sufficiently complex software is likely to have security bugs. This is exacerbated by a large percentage of software which is part of the critical software stack being written in low level languages. In this talk, we will discuss why the state of software security is poor and the challenges as well as the tradeoffs faced by real-world software. We will then discuss approaches to help address this situation ranging from analysis, hardening defences (and sanitizers) and safer programming languages.
17-Feb-2021	Title: Towards Generating Human-like Deep Questions Speaker: Pan Liangming, Dean’s Graduate Award Winner Abstract: Question Generation (QG) concerns the task of automatically generating questions from various inputs such as raw text, database, or semantic representation. People have the ability to ask deep questions about events, evaluations, opinions, synthesis, or reasons, usually in the form of Why, Why-not, How, What-if, which requires an in-depth understanding of the input source and the ability to reason over disjoint relevant contexts. Learning to ask such deep questions has broad application in future intelligent systems, such as dialog systems, online education, intelligent search, among others. This talk will introduce our recent research on generating deep questions that demand high cognitive skills, including questions that require multi-hop reasoning and questions that exhibit certain human-desired properties, such as being answerable by the passage. We will also introduce one practical application of deep QG: how to generate synthetic multi-hop questions in an unsupervised way to improve the performance of multi-hop question answering.
3-Mar-2021	Title: GPU-accelerated Graph Processing Speaker: Sha Mo, Dean’s Graduate Award Winner Abstract: Graph processing is of vital significance for investigating complex relationships and mining underlying knowledge in different fields. The rapidly increasing scale of problems and the strict requirement for real-time solutions have drastically raised interest in the area of high-performance graph processing. Specifically, graph processing on graphics processing units (GPUs) has recently attracted a great deal of attention in both industry and academia due to the GPU’s enormous potential for boosting the graph processing efficiency as an accelerator. Although prior studies migrate various graph applications to GPUs and demonstrate a significantly boosted graph processing performance achieved on GPUs, many challenges still impede the popularization of GPUs as a graph processing accelerator in broader application scenarios. Therefore, in this talk, we aim to propose a systematic solution to make GPU-accelerated graph processing more practical, i.e., to develop a hardware-agnostic graph processing system that handles large-scale and dynamic graphs.

AY2020/2021 Semester 1

26-Aug-2020	Title: New algorithms for efficient statistical inference Speaker: Arnab Bhattacharyya, Assistant Professor, Department of Computer Science Abstract: I will describe some of our recent work on computational problems arising in statistics and causal inference. The talk consists of three parts. (1) In the first part, we discuss efficient distance approximation algorithms for several popular classes of structured high-dimensional distributions, such as Bayes networks, Ising models, and multivariate gaussians. Our results are the first efficient distance approximation algorithms for these well-studied problems. They are derived using a simple and general connection to distribution learning algorithms. [Joint work with Sutanu Gayen, Kuldeep Meel, and N.V. Vinodchandran] (2) In the second part, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. Our algorithms show sample complexity that scale with the model sparsity instead of the dimension.For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and a Lasso penalty term. [Joint work with Rathin Desai, Sai Nagarajan, and Ioannis Panageas] (3) In the third part, we consider testing independence among a set of variables where the samples arrive as a stream. Improving upon past work by Indyk and McGregor (SODA ’08) and Braverman et al. (STOC ’10, STACS ’10), we give new algorithms with improved space complexity bounds for approximating the distance between the input joint distribution on n variables and the product distribution of its marginals. [Joint work with Rathin Desai, Yi Li, and David P. Woodruff]
2-Sep-2020	Title: Fast and Accurate Deep Neural Network Training Speaker: Yang You, Assistant Professor, Department of Computer Science Abstract: In the last three years, supercomputers have become increasingly popular in leading AI companies. Amazon built a High Performance Computing (HPC) cloud. Google released its first 100-petaFlop supercomputer (TPU Pod). Facebook made a submission on the Top500 supercomputer list. Why do they like supercomputers? Because the computation of deep learning is very expensive. For example, even with 16 TPUs, BERT training takes more than 3 days. On the other hand, supercomputers can process 10^17 floating point operations per second. So why don’t we just use supercomputers and finish the training of deep neural networks in a very short time? The reason is that deep learning does not have enough parallelism to make full use of thousands or even millions of processors in a typical modern supercomputer. There are two directions for parallelizing deep learning: model parallelism and data parallelism. Model parallelism is very limited. For data parallelism, current optimizers can not scale to thousands of processors because large-batch training is a sharp minimum problem. In this talk, I will introduce LARS (Layer-wise Adaptive Rate Scaling) and LAMB (Layer-wise Adaptive Moments for Batch training) optimizers, which can find more parallelism for deep learning. They can not only make deep learning systems scale well, but they can also help real-world applications to achieve higher accuracy. Since 2017, all the Imagenet training speed world records have been achieved using LARS. LARS was added to MLperf, which is the industry benchmark for fast deep learning. Google used LAMB to reduce BERT training time from 3 days to 76 minutes and achieve new state-of-the-art results on GLUE, RACE, and SQuAD benchmarks. The approaches introduced in this talk have been used by state-of-the-art distributed systems at Google, Intel, NVIDIA, Sony, Tencent, and so on.
9-Sep-2020	Title: End-to-End Advanced Data Analytics Speaker: Ooi Beng Chin, Distinguished Professor, Department of Computer Science Abstract: Big Data, Data Science and Data-driven insights have drawn many into data management and analytics. Instead of focusing on particular modules or functionalities, leaving it to others to integrate these into useful systems, it is important to build end-to-end solutions, from data cleaning, through data curation with human-in-loop (crowdsourcing) and big data processing, all the way to complex (machine learning and deep learning based) data analytics. As a system researcher, we have to work on both algorithmic research and system development. In this talk, I shall briefly step through some of my works over the years, and highlight some common techniques shared in various data processing systems. I shall also share my experience as a database system researcher.
16-Sep-2020	Title: Voice-based Interactions for Editing Text On the Go Speaker: Debjyoti Ghosh, Dean’s Graduate Award winner Abstract: Towards an envisioned interaction paradigm where computing is more seamlessly integrated with the users’ everyday mobility, voice-based interaction is likely to play a pivotal role as speaking is a natural form of human communication and also, offers an untethered and device-independent channel of communication with a computing interface. Additionally, the interaction is hands- and eyes-free, leaving the users free to engage in other tasks. Yet, to lower the interaction burden when the interactions are embedded into the users’ everyday mobility and activities, the traditional (existing) interaction vocabulary for on-the-go (mobile) interactions needs to be redesigned under the new paradigm. To this end, the talk presents voice-based and multimodal interaction techniques for text input/editing as an everyday mobile computing task for both eyes-free interfaces and heads-up computing-based interfaces like Augmented Reality Smart Glasses (ARSG), that better support flexible and natural interactions consistent with the users’ mobility needs.

AY2019/2020 Semester 2

29-Jan-2020	Title: Advantages and Risks of Sensing for Cyber-Physical Security Speaker: Han Jun, Assistant Professor, Department of Computer Science Abstract: With the emergence of the Internet-of-Things (IoT) and Cyber-Physical Systems (CPS), we are witnessing a wealth of exciting applications that enable computational devices to interact with the physical world via overwhelming number of sensors and actuators. However, such interactions pose new challenges to traditional approaches of security and privacy. In this talk, I will present how I utilize sensor data to provide security and privacy protections for IoT/CPS scenarios, and further introduce novel security threats arising from similar sensor data. Specifically, I will highlight three of my recent projects that leverage sensor data for defense and attack scenarios in applications such as smart homes and semi-autonomous vehicles. Furthermore, I will introduce my future research directions such as identifying and defending against unforeseen security challenges from newer application domains such as smart vehicles, buildings, and cities.
5-Feb-2020 Note: Venue at Seminar Room 2* (COM1-02-04)	Title: Power Papers — Some Practical Pointers (Part 1) Speaker: Terence Sim, Associate Professor, Department of Computer Science Abstract: If I write with the flowery flourish of Shakespeare, but my prose proves problematic, then my words become like a noisy gong or a clanging cymbal. If I have the gift of mathematical genius and can fathom all theorems, but cannot articulate the arcane, my genius appears no different from madness. If I achieve breakthrough research that can change the world, but cannot explain its significance, the world gains nothing and I labor in vain. Writing a good research paper takes effort; more so if there is a page limit. Yet this skill is required of every researcher, who, more often than not, fumbles his or her way through. Good grammar is only a start; care and craft must be applied to turn a mediocre paper into a memorable one. Writing skills can indeed be honed. In this reprise talk, I will highlight the common mistakes many researchers make, and offer practical pointers to pack more punch into your paper. Needless to say, the talk will be biased: I will speak not from linguistic theories, but from personal experience, sharing what has, and has not, worked for me. Students and staff are all welcome ; your views and insights will certainly benefit us all.
12-Feb-2020 Note: Venue at Seminar Room 2* (COM1-02-04)	Title: Power Papers — Some Practical Pointers (Part 2) Speaker: Terence Sim, Associate Professor, Department of Computer Science Abstract: If I write with the flowery flourish of Shakespeare, but my prose proves problematic, then my words become like a noisy gong or a clanging cymbal. If I have the gift of mathematical genius and can fathom all theorems, but cannot articulate the arcane, my genius appears no different from madness. If I achieve breakthrough research that can change the world, but cannot explain its significance, the world gains nothing and I labor in vain. Writing a good research paper takes effort; more so if there is a page limit. Yet this skill is required of every researcher, who, more often than not, fumbles his or her way through. Good grammar is only a start; care and craft must be applied to turn a mediocre paper into a memorable one. Writing skills can indeed be honed. In this reprise talk, I will highlight the common mistakes many researchers make, and offer practical pointers to pack more punch into your paper. Needless to say, the talk will be biased: I will speak not from linguistic theories, but from personal experience, sharing what has, and has not, worked for me. Students and staff are all welcome ; your views and insights will certainly benefit us all.
19-Feb-2020	Title: Privacy at the intersection of trustworthy machine learning (robustness and interpretability) Speaker: Reza Shokri, Assistant Professor, Department of Computer Science Abstract: Machine learning algorithms have shown an unprecedented predictive power for many complex learning tasks. As they are increasingly being deployed in large scale critical applications for processing various types of data, new questions related to their trustworthiness would arise. Can machine learning algorithms be trusted to have access to individuals’ sensitive data? Can they be robust against noisy or adversarially perturbed data? Can we reliably interpret their learning process, and explain their predictions? In this talk, I will go over the challenges of building trustworthy yet privacy-preserving machine learning algorithms in centralized and distributed (federated) settings, and will discuss the inter-relation between privacy, robustness, and interpretability.
4-Mar-2020	Title: Learning Visual Attributes for Discovery of Actionable Media Speaker: Francesco Gelli, Dean’s Graduate Award winner (AY2019/2020 Sem1) Abstract: Since the advent of social media, it became common practice for marketers to browse user generated content in social networks websites to discover actionable media that resonate with the brand identity and is likely to engage a target audience. Since there is still no concrete understanding of what are the visual attributes of actionable media, we formalize the task of discovery of actionable media and investigate the role of three different classes of attributes. By learning generic visual attributes, brand attributes and user attributes, we achieve higher performance and provide a better understanding of the properties of the actionable media. We design a discovery framework that integrates the three different classes of attributes and addresses the challenges derived from the subjective nature of visual actionability. Our comprehensive set of experiments and visualizations confirms that this work is a valuable concrete step toward using AI to discover actionable media for brands.
11-Mar-2020	Title: Part 1: Rigorous Verification of Neural Networks, Part 2: How to be a PhD student Speaker: Kuldeep S. Meel, Assistant Professor, Department of Computer Science Abstract: The first part of the talk will focus on the rigorous verification approach for neural networks. Relevant Paper: https://teobaluta.github.io/NPAQ/. Last semester, I spent 15 minutes of talk on distilling my observations on what it takes to succeed in PhD program. Several students found it very helpful, I plan to do the same again.
18-Mar-2020	Title: Corpus-Level End-to-End Exploration for Interactive Systems Speaker: Grace Hui Yang, Visiting Associate Professor, Georgetown University Abstract: A core interest in building Artificial Intelligence (AI) agents is to let them interact with and assist humans. One example is Dynamic Search (DS), which models the process that a human works with a search engine agent to accomplish a complex and goal-oriented task. Early DS agents using Reinforcement Learning (RL) have only achieved limited success for (1) their lack of direct control over which documents to return and (2) the difficulty to recover from wrong search trajectories. In this paper, we present a novel corpus-level end-to-end exploration (CE3) method to address these issues. In our method, an entire text corpus is compressed into a global low-dimensional representation, which enables the agent to gain access to the full state and action spaces, including the under-explored areas. We also propose a new form of retrieval function, whose linear approximation allows end-to-end manipulation of documents. Experiments on the Text REtrieval Conference (TREC) Dynamic Domain (DD) Track show that CE3 outperforms the state-of-the-art DS systems.
25-Mar-2020	Title: Finding Fair and Efficient Allocations When Valuations Don’t Add Up Speaker: Yair Zick, Assistant Professor, Department of Computer Science Abstract: In this paper, we present new results on the fair and efficient allocation of indivisible goods to agents that have monotone, submodular, non-additive valuation functions over bundles. Despite their simple structure, these agent valuations are a natural model for several real-world domains. We show that, if such a valuation function has binary marginal gains, a socially optimal (i.e. utilitarian social welfare-maximizing) allocation that achieves envy-freeness up to one item (EF1) exists and is computationally tractable. We also prove that the Nash welfare-maximizing and the leximin allocations both exhibit this fairness-efficiency combination, by showing that they can be achieved by minimizing any symmetric strictly convex function over utilitarian optimal outcomes. To the best of our knowledge, this is the first valuation function class not subsumed by additive valuations for which it has been established that an allocation maximizing Nash welfare is EF1. Moreover, for a subclass of these valuation functions based on maximum (unweighted) bipartite matching, we show that a leximin allocation can be computed in polynomial time.
1-Apr-2020	Title: Systems Design in the Post-Moore’s Law Era Speaker: Li Jialin, Assistant Professor, Department of Computer Science Abstract: With the end of Dennard scaling and Moore’s Law, performance improvement of general purpose processors has been stagnant for the past decade. This is in contrast to the continuous growth in network speed in data centers and telecommunication networks, and the increasing demand of modern applications. Not surprisingly, CPU processing, particularly network packet processing, has become the performance bottleneck of many large scale systems deployed in data centers. In this talk, I will first introduce a new approach to designing distributed systems in data centers that tackle the aforementioned challenge — by co-designing distributed systems with the data center network. Specifically, my work has taken advantage of new-generation programmable switches in data centers to build novel network-level primitives with near-zero processing overhead. We then leverage these primitives to enable more efficient protocol and systems designs. I will describe three systems I built that demonstrate the benefit of this approach. The first two, Network-Ordered Paxos and Eris, virtually eliminate the coordination overhead in state machine replication and fault-tolerant distributed transactions, by relying on network sequencing primitives to consistently order user requests. The third, Pegasus, substantially improves the load balancing of a distributed storage system – up to a 9x throughput improvement over existing solutions — by implementing an in-network coherence directory in the switch data plane. I will end the talk with some future work directions in this space. In particular, I will propose a portable hardware acceleration framework for Network Function Virtualization (NFV).
8-Apr-2020	Title: Human-imperceptible Privacy Protection Against Machines Speaker: Shen Zhiqi, Research Engineer, Department of Computer Science Abstract: Privacy concerns with social media have recently been under the spotlight, due to a few incidents on user data leakage on social networking platforms. With the current advances in machine learning and big data, computer algorithms often act as a first-step filter for privacy breaches, by automatically selecting content with sensitive information, such as photos that contain faces or vehicle license plate. In this paper we propose a novel algorithm to protect the sensitive attributes against machines, meanwhile keeping the changes imperceptible to humans. In particular, we first conducted a series of human studies to investigate multiple factors that influence human sensitivity to the visual changes. We discover that human sensitivity is influenced by multiple factors, from low-level features such as illumination, texture, to high-level attributes like object sentiment and semantics. Based on our human data, we propose for the first time the concept of human sensitivity map. With the sensitivity map, we design a humansensitivity-aware image perturbation model, which is able to modify the computational classification results of sensitive attributes while preserving the remaining attributes. Experiments on real world data demonstrate the superior performance of the proposed model on human-imperceptible privacy protection.

AY2019/2020 Semester 1

21-Aug-2019	Title: Quantum Monte Carlo Speaker: Frédéric Hébert, Visiting Professor, Université Côte d’Azur Abstract: We give an introduction to the Monte Carlo methods and algorithms used in both classical and quantum statistical Physics. We first introduce these methods in the framework of classical Physics, taking as an example the simple Ising model. We then introduce quantum versions of this model, explain how they differ from their classical counterparts, and present how Monte Carlo methods can still be used to explore their properties.
28-Aug-2019	Title: Benefits and Risks of Sensing for Emerging Internet-of-Things Applications Speaker: Han Jun, Assistant Professor, Department of Computer Science Abstract: With the emergence of the Internet-of-Things (IoT) and Cyber-Physical Systems (CPS), we are witnessing a wealth of exciting applications that enable computational devices to interact with the physical world via overwhelming number of sensors and actuators. However, such interactions pose new challenges to traditional approaches of security and privacy. In this talk, I will present how we utilize sensor data to provide security and privacy protections for IoT/CPS scenarios, and further introduce novel security threats arising from similar sensor data. Specifically, I will highlight a few of our recent projects that leverage sensor data for defense and attack scenarios in applications such as smart homes, semi-autonomous vehicles, and drone delivery. I will also briefly introduce interesting research problems that I am working in newer application domains such as smart vehicles, buildings, and cities.
4-Sep-2019	Title: Dialog Systems Go Multimodal Speaker: Liao Lizi, Dean’s Graduate Award winner (AY2018/2019 Sem2) Abstract: The next generation of user interfaces aims at intelligent systems that are able to adapt to common forms of human dialogs and hence provide more intuitive and natural ways of interaction. This ambitious goal, however, poses new challenges for the design and implementation of the systems. First of all, as visual perception is one of the major means of perceiving the environment in addition to text (through speed), it motivates the development of dialog systems with multimodal understanding ability. Second, to make the system “smart” in generating substantive responses, knowledge should be incorporated as a foundation to achieve human-like abilities. In this talk, we aim to discuss how the task-oriented dialog systems could go multimodal. Specifically, we investigate the critical issues in multimodal dialog system design and propose a novel multimodal dialog system framework which can be realised as fully-fledged prototype systems.
11-Sep-2019	Title: Formal Methods and AI: Yet Another Entanglement Speaker: Kuldeep Singh Meel, Assistant Professor, Department of Computer Science
18-Sep-2019	Title: DDoS and Bitcoin Attacks Exploiting Internet Routing Speaker: Kang Minsuk, Assistant Professor, Department of Computer Science Abstract: The knowledge of Internet architecture and inter-domain routing can be extremely useful for strong and stealthy attacks. In this talk, I will present two such recent examples. First, I will discuss a new adaptive link-flooding attack strategy (IEEE S&P 2019), called a detour-learning attack, that can detect any adaptive rerouting defense attempts by victim networks that are under link-flooding attacks, such as Crossfire or Coremelt. We show that in the current BGP routing any adaptive defense is defeated by our adaptive link-flooding attack because the defense, unfortunately, is inherently slower than attacks. In the second part of the talk, I will present our recent, powerful Bitcoin partitioning attack (IEEE S&P 2020), called an Erebus attack. A previous attack by Apostolaki et al. has shown that network adversaries (e.g., ISPs) can perform a BGP prefix hijacking attack against Bitcoin nodes. However, due to the nature of BGP operation, such a hijacking is globally observable and thus enables immediate detection of the attack and the identification of the perpetrator. Our Erebus attack partitions the Bitcoin network without any routing manipulations, making the attack undetectable to control-plane and even to data-plane detectors. We show that the Erebus attack is readily available for large ISPs against the vast majority of public Bitcoin nodes with negligible attack traffic rate and a modest (e.g., 5–6 weeks) attack execution period. As the attack exploits the topological advantage of being a network adversary but not the specific vulnerabilities of Bitcoin core, no quick patches seem to be available. I will discuss some suggested modifications to the Bitcoin core.

AY2018/2019 Semester 2

21-Aug-2019	Title: Quantum Monte Carlo Speaker: Frédéric Hébert, Visiting Professor, Université Côte d’Azur Abstract: We give an introduction to the Monte Carlo methods and algorithms used in both classical and quantum statistical Physics. We first introduce these methods in the framework of classical Physics, taking as an example the simple Ising model. We then introduce quantum versions of this model, explain how they differ from their classical counterparts, and present how Monte Carlo methods can still be used to explore their properties.
28-Aug-2019	Title: Benefits and Risks of Sensing for Emerging Internet-of-Things Applications Speaker: Han Jun, Assistant Professor, Department of Computer Science Abstract: With the emergence of the Internet-of-Things (IoT) and Cyber-Physical Systems (CPS), we are witnessing a wealth of exciting applications that enable computational devices to interact with the physical world via overwhelming number of sensors and actuators. However, such interactions pose new challenges to traditional approaches of security and privacy. In this talk, I will present how we utilize sensor data to provide security and privacy protections for IoT/CPS scenarios, and further introduce novel security threats arising from similar sensor data. Specifically, I will highlight a few of our recent projects that leverage sensor data for defense and attack scenarios in applications such as smart homes, semi-autonomous vehicles, and drone delivery. I will also briefly introduce interesting research problems that I am working in newer application domains such as smart vehicles, buildings, and cities.
4-Sep-2019	Title: Dialog Systems Go Multimodal Speaker: Liao Lizi, Dean’s Graduate Award winner (AY2018/2019 Sem2) Abstract: The next generation of user interfaces aims at intelligent systems that are able to adapt to common forms of human dialogs and hence provide more intuitive and natural ways of interaction. This ambitious goal, however, poses new challenges for the design and implementation of the systems. First of all, as visual perception is one of the major means of perceiving the environment in addition to text (through speed), it motivates the development of dialog systems with multimodal understanding ability. Second, to make the system “smart” in generating substantive responses, knowledge should be incorporated as a foundation to achieve human-like abilities. In this talk, we aim to discuss how the task-oriented dialog systems could go multimodal. Specifically, we investigate the critical issues in multimodal dialog system design and propose a novel multimodal dialog system framework which can be realised as fully-fledged prototype systems.
11-Sep-2019	Title: Formal Methods and AI: Yet Another Entanglement Speaker: Kuldeep Singh Meel, Assistant Professor, Department of Computer Science
18-Sep-2019	Title: DDoS and Bitcoin Attacks Exploiting Internet Routing Speaker: Kang Minsuk, Assistant Professor, Department of Computer Science Abstract: The knowledge of Internet architecture and inter-domain routing can be extremely useful for strong and stealthy attacks. In this talk, I will present two such recent examples. First, I will discuss a new adaptive link-flooding attack strategy (IEEE S&P 2019), called a detour-learning attack, that can detect any adaptive rerouting defense attempts by victim networks that are under link-flooding attacks, such as Crossfire or Coremelt. We show that in the current BGP routing any adaptive defense is defeated by our adaptive link-flooding attack because the defense, unfortunately, is inherently slower than attacks. In the second part of the talk, I will present our recent, powerful Bitcoin partitioning attack (IEEE S&P 2020), called an Erebus attack. A previous attack by Apostolaki et al. has shown that network adversaries (e.g., ISPs) can perform a BGP prefix hijacking attack against Bitcoin nodes. However, due to the nature of BGP operation, such a hijacking is globally observable and thus enables immediate detection of the attack and the identification of the perpetrator. Our Erebus attack partitions the Bitcoin network without any routing manipulations, making the attack undetectable to control-plane and even to data-plane detectors. We show that the Erebus attack is readily available for large ISPs against the vast majority of public Bitcoin nodes with negligible attack traffic rate and a modest (e.g., 5–6 weeks) attack execution period. As the attack exploits the topological advantage of being a network adversary but not the specific vulnerabilities of Bitcoin core, no quick patches seem to be available. I will discuss some suggested modifications to the Bitcoin core.

AY2018/2019 Semester 1

29-Aug-2018	Title: Overview of research in mobile sensing and wireless sensor network protocols Speaker: Associate Professor Chan Mun Choon, Department of Computer Science Abstract: In this talk, I will cover recent research work done by research group on mobile computing and wireless sensor network protocols. For mobile sensing, I will touch on use of sensors available on wearables/smartphones for inference of user interactions, indoor localization and context detection. For wireless sensor network protocols, I will present research that exploits synchronous transmissions to mitigate wireless contention to design some of the fastest multiple-hop network protocols for data dissemination and sharing. Finally, I will also briefly cover recent work on edge computing and software defined networking with a focus on data plane programmability.
5-Sep-2018	Title: Internet-of-Things Security: Benefits and Risks of Sensing Speaker: Assistant Professor Han Jun, Department of Computer Science Abstract: With the emergence of the Internet-of-Things (IoT) and Cyber-Physical Systems (CPS), we are witnessing a wealth of exciting applications that enable computational devices to interact with the physical world via overwhelming number of sensors and actuators. However, such interactions pose new challenges to traditional approaches of security and privacy. In this talk, I will present how I utilize sensor data to provide security and privacy protections for IoT/CPS scenarios, and further introduce novel security threats arising from similar sensor data. Specifically, I will highlight a few of my recent projects that leverage sensor data for defense and attack scenarios in applications such as smart homes and semi-autonomous vehicles. I will also briefly introduce interesting research problems that I am working in newer application domains such as smart vehicles, buildings, and cities.
12-Sep-2018	Title: Overview of research in next generation low latency TCP and software defined networking Speaker: Associate Professor Ben Leong, Department of Computer Science Abstract: In this talk, I will describe recent research work done by my research group on next generation low latency TCP (Transmission Control Protocol) and software defined networking using the new P4 language (https://p4.org/) . We have seen in recent times the emergence of a large number of low-latency TCP variants. Surprisingly, these modern low-latency TCP variants can match the performance of TCP CUBIC and even outperform CUBIC for large RTT flows. We found that the likely reason is that the bottleneck buffers are relatively shallow and so these variants are likely throttling CUBIC by inflicting significant losses on the network. Our new rate-based congestion control algorithm that incorporates a new buffer estimation technique which allows a flow to infer its own buffer occupancy as well as that of the competing flows sharing the same bottleneck buffer. With this mechanism, the flow is able to determine its operating environment and, when in a low-latency environment, to collaboratively regulate the bottleneck buffer occupancy with other flows. We believe that the current Internet is facing a transition into another phase with new low latency TCP variants but the transition will not be easy. Our approach will allow the Internet to transition smoothly to a low-latency future. For P4-based work, we recently developed a new system using the P4 programming languag, called BurstRadar, that monitors microbursts in the dataplane. BurstRadar incurs 10 times less data collection and processing overhead than existing solutions. Furthermore, BurstRadar can handle simultaneous microburst traffic at gigabit line rates on multiple egress ports while consuming very little resources in the switching ASIC.
19-Sep-2018	Title: Enabling New Applications through Efficient, High-Performance Acceleration Speaker: Assistant Professor Trevor E. Carlson, Department of Computer Science Abstract: The development of faster computing devices each year, like what we have seen with mobile phones, have been what consumers have come to expect from the rapid pace of technology development. But, given two significant trends in technology scaling, this progress might hit a brick wall: even more expensive transistors going forward, while using fewer active transistors at a time to get more work done. Does this spell out the end of computing as we know it? Will computers stop getting faster? As silicon technology improvements have slowed, research into alternatives technologies has increased. Nevertheless, these alternative technologies could still take decades to reach the performance and cost that current CMOS technology provides. One near-term solution is to adapt the computer’s architecture to more efficiently use the transistors that we have. By working smarter, our aim is to continue to provide more functionality in the face of these technological headwinds. To enable new applications, from mobile-based AR and VR to new machine learning approaches, we need to pursue innovative architectural directions. To accomplish these goals, our research focuses on building flexible processors that can enable these next-generation applications. In this talk, I will present some of our recent work as well as future research directions that propose one direction to move us closer to that goal. In addition, I will also present some critical challenges and potential next steps that we will need to address in the coming years.
3-Oct-2018	Title: Towards Boosting Performance of Healthcare Analytics: Resolving Challenges in Electronic Medical Records Speaker: Ms Zheng Kaiping, Dean’s Graduate Award winner (AY2017/2018 Sem2) Abstract: In recent years, the increasing availability of Electronic Medical Records (EMR) has brought more promising opportunities to automate healthcare data analytics. However, some challenges in EMR data pose a negative effect on healthcare analytic performance if not well handled, and lead to a gap between the potential of EMR data for analytics and its usability in practice. Therefore, it is vitally necessary and important to resolve the challenges in EMR data in order to boost the performance, and further help derive more medical insights, contributing to better patient management and faster medical research advancement. In this talk, I will focus on two representative challenges in EMR data, namely irregularity, and bias, and then present our devised solutions to resolving them. First, I will justify that the irregularity challenge should be resolved at the feature level to reduce time information loss. Then I will demonstrate our proposal to incorporate the fine-grained feature-level time span information and show the analytic performance improvement. Second, I will explain that irregularity is a phenomenon, while bias should be the underlying reason. I will present our solution to transform the biased EMR time series into unbiased data and illustrate the improvement brought in terms of missing data imputation accuracy and prediction accuracy of data analytic applications.
10-Oct-2018	Title: Beyond SAT Revolution Speaker: Assistant Professor Kuldeep Singh Meel, Department of Computer Science Abstract: The paradigmatic NP-complete problem of Boolean satisfiability (SAT) solving is a central problem in Computer Science. While the mention of SAT can be traced to early 19th century, efforts to develop practically successful SAT solvers go back to 1950s. The past 20 years have witnessed a “SAT revolution” with the development of conflict-driven clause-learning (CDCL) solvers. Such solvers combine a classical backtracking search with a rich set of effective heuristics. While 20 years ago SAT solvers were able to solve instances with at most a few hundred variables, modern SAT solvers solve instances with up to millions of variables in a reasonable time. The “SAT-revolution” opens up opportunities to design practical algorithms with rigorous guarantees for problems in complexity classes beyond NP by replacing a NP oracle with a SAT Solver. In this talk, we will discuss how we use SAT revolution to design practical algorithms for two fundamental problems in artificial intelligence and formal methods: Constrained Sampling and Counting.
17-Oct-2018	Title: Exploiting Knowledge Graph for Personalized Recommendation Speaker: Mr Wang Xiang, Dean’s Graduate Award winner (AY2017/2018 Sem2) Abstract: In the era of information overload, recommender system has gained widespread adoption across industry to drive various online customer-oriented services. It facilitates users to discover a small set of relevant items, which meet their personalized interests, from overwhelming choices. Generally, the modeling of user-item interactions is at the heart of personalized recommendation. Nowadays, diverse kinds of auxiliary information on users and items become increasingly available in online platforms, such as user demographics, social relations, and item knowledge. To date, incorporating knowledge-aware channels, especially knowledge graph, into recommender systems is attracting increasing interests, since it can provide deep factual knowledge and rich semantics on items. The usage of such knowledge can better capture the underlying and complex user-item relationships, and further achieve higher recommendation quality. Furthermore, knowledge graph enables us to uncover valuable evidence as well as reasons on why a recommendation is made. Title: Securing Applications from Untrusted Operating Systems using Enclaves Speaker: Ms Shweta Shinde, Dean’s Graduate Award winner (AY2017/2018 Sem2) Abstract: For decades, we have been building software with the default assumption of a trusted underlying stack such as the operating system. From a security standpoint, the status quo has been a hierarchical trust model, where trusting one layer implies trusting all the layers underneath it. However, with new usage models such as outsourced computing and analytics on third-party cloud services, trusting the operating system is no longer an option. As a result, modern CPUs have started supporting new abstractions which address the threats of an untrusted operating system. Intel SGX is one such new security capability available in commodity CPUs shipping from 2015. It allows user-level application code to execute in enclaves which are isolated from all other software on the system, even from the privileged OS or hypervisor. However, these architectural solutions offer a trade-off between security, ease of usability, and compatibility with legacy software (both OS and applications). In this talk, I will present a low-TCB, POSIX-compatible, side-channel resistant, and a formally verified solution which allows users to securely execute their applications on an untrusted operating system.
24-Oct-2018	Title: Adversarial Machine Learning Speaker: Assistant Professor Reza Shokri, Department of Computer Science Abstract: Machine learning models are used in many critical systems and applications. This makes them very attractive targets for a number of security and privacy attacks, including data poisoning, evasion attacks, and inference attacks. In this talk, I will present all these attacks, and a systematic way for mitigating their risks. The solution is simple: know your enemy and anticipate their attacks. This is known as adversarial machine learning.
31-Oct-2018	Title: 3 Projects on Computer System Performance Speaker: Professor Tay Yong Chiang, Department of Computer Science Abstract: This talk describes 3 current projects on the performance of computer systems: (1.Database) For 20-odd years, developers and researchers have used the TPC benchmarks to compare their products and algorithms. These benchmarks have fixed schemas that bear no relation to current applications. The target of the database project is to replace TPC benchmarks with synthetic versions of application datasets. The idea is to first scale the empirical dataset to the appropriate size, then tweak the data in the resulting dataset to enforce application-specific properties. The ambition is to have a repository of tweaking tools contributed by the developer community, and current work is on building a collaborative framework to facilitate tool interoperability. (2.Memory) Most of the current hot topics in computer science will become cold within 10 years, but caching will remain an issue 50 years from now. Most caching algorithms try to strike a heuristic balance between recency (e.g. LRU) and frequency (i.e. popularity). The target of the memory project is to use a Cache Miss Equation to do a scientific study of this balance. (3.Networking) Over the last 2 years, Google has moved their production traffic to a TCP variant called BBR. This may start a paradigm shift for TCP congestion control, from one based on packet loss to one based on bandwidth-delay product. BBR requires estimates for minimum round-trip time R and maximum bandwidth X. BBR measures R and X by periodically changing its packet sending rate. The target of the networking project is to show that the estimation can be done differently and passively. The underlying idea works for any TCP version (CUBIC, Reno, etc.), and even for choosing between hardware/software architectures for video games.

AY2017/2018 Semester 2

31-Jan-2018	Title: 3 Projects on Computer System Performance Speaker: Tay Yong Chiang, Professor, Department of Computer Science Abstract: This talk describes 3 current projects on the performance of computer systems: (1.Database) For 20-odd years, developers and researchers have used the TPC benchmarks to compare their products and algorithms. These benchmarks have fixed schemas that bear no relation to current applications. The target of the database project is to replace TPC benchmarks with synthetic versions of application datasets. The idea is to first scale the empirical dataset to the appropriate size, then tweak the data in the resulting dataset to enforce application-specific properties. The amibition is to have a repository of tweaking tools contributed by the developer community, and current work is on building a collaborative framework to facilitate tool interoperability. (2.Memory) Most of the current hot topics in computer science will become cold within 10 years, but caching will remain an issue 50 years from now. Most caching algorithms try to strike a heuristic balance between recency (e.g. LRU) and frequency (i.e. popularity). The target of the memory project is to use a Cache Miss Equation to do a scientific study of this balance. (3.Networking) Over the last 2 years, Google has moved their production traffic to a TCP variant called BBR. This may start a paradigm shift for TCP congestion control, from one based on packet loss to one based on bandwidth-delay product. BBR requires estimates for minimum round-trip time R and maximum bandwidth X. BBR measures R and X by periodically changing its packet sending rate. The target of the networking project is to show that the estimation can be done differently and passively. The underlying idea works for any TCP version (CUBIC, Reno, etc.), and even for choosing between hardware/software architectures for video games.
7-Feb-2018	Title: Privacy and Security in (Outsourced) Machine Learning Speaker: Reza Shokri, Assistant Professor, Department of Computer Science Abstract: I will talk about the security and privacy threats against machine learning, notably when its training is outsourced. I will discuss how and why machine learning models leak information about the individual data records on which they were trained, and how an attacker can train a deep neural network in such a way that it leaks even more information. I will also talk about security issues with respect to outsourced machine learning, and how we can evaluate such attacks.
14-Feb-2018	Title: Constrained Counting and Sampling: Bridging the Gap between Theory and Practice Speaker: Kuldeep Singh Meel, Assistant Professor, Department of Computer Science Abstract: Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints . In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of give n constraints. In this talk, I will introduce a novel algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in Boolean reasoning over the past two decades. This has allowed us to obtain breakthrough results in constrained sampling and counting, providing a new algorithmic toolbox in machine learning, probabilistic reasoning, privacy, and design verification. I will demonstrate the utility of the above techniques on various real applications including probabilistic inference, design verification and our ongoing collaboration in estimating the reliability of critical infrastructure networks during natural disasters.
21-Feb-2018	Title: Preparing for a Low-Latency Future Internet Speaker: Ben Leong, Associate Professor, Department of Computer Science Abstract: Google has deployed BBR, a new low-latency TCP variant. We show that to transition smoothly to a low-latency Internet of the future, we need a TCP variant that not only can contend effectively against CUBIC in the current Internet, but that is also able to reduce its level of aggressiveness in a low-latency environment. We present EvaRate, a rate-based congestion control algorithm that incorporates a new buffer estimation technique which allows an EvaRate flow to infer its own buffer occupancy as well as that of the competing flows sharing the same bottleneck buffer. With this mechanism, an EvaRate flow is able to determine its operating environment and, when in a low-latency (or benevolent) environment, collaboratively regulate the bottleneck buffer occupancy with other EvaRate flows. EvaRate highlights a new point in the congestion control design space that deserves further attention.
7-Mar-2018	Title: Super Speaking — Tricks of the Trade Speaker: Terence Sim, Associate Professor, Department of Computer Science Abstract: Most of us in academia are engaged in this typical sequence of activities: (a) do research; (b) write a report/paper about it; (c) give an oral presentation. While many of us are good at research skills (a), and can write reasonable well (b), we are less confident in speaking about it (c). Indeed, presenting our work in front of an audience often causes knees to wobble and stomachs to cramp. It gets worse when we realize, halfway through the talk, that the audience is getting restless or bored because they are not understanding our message. In this talk, I will share some techniques that will improve the intelligibility of our technical presentations. I learned many of these “tricks of the trade” in school — the School of Hard Knocks. Others I picked up by observing the habits of good speakers; still others from the wise counsel of my seniors. While I cannot guarantee to take away the nervousness when you give a talk, I can certainly offer practical tips that will hopefully improve the clarity of your communication. At the very least, you can get a kick out of seeing whether I practice what I preach.
14-Mar-2018	Title: Information Theory and Machine Learning Speaker: Jonathan Scarlett, Assistant Professor, Department of Computer Science Abstract: The field of information theory was introduced as a means for understanding the fundamental limits of data compression and transmission, and has shaped the design of practical communication systems for decades. In this talk, I will discuss the emerging viewpoint that information theory is not only a theory of communication, but a far-reaching theory of data that is applicable to seemingly unrelated learning problems such as estimation, prediction, and optimization. This perspective leads to principled approaches for certifying the near-optimality of practical algorithms, as well as understanding where further improvements are possible. I will provide a gentle introduction to some of the main ideas and insights offered by this perspective, and present examples in the problems of group testing, graphical model selection, sparse regression, and black-box function optimization.
21-Mar-2018	Title: Correcting Language Errors using Machine Translation Techniques Speaker: Shamil Chollampatt Muhammed Ashraf, Dean’s Graduate Award winner (AY2017/2018 Sem1) Abstract: Grammatical error correction (GEC) tools play an important role in helping second language learning and providing assistance to non-native writers. Currently, the leading approach to GEC is the machine translation approach, in which potentially erroneous sentences are “translated” into fluent well-formed sentences. This talk will introduce various machine translation techniques that have been successfully applied and adapted to GEC, such as word and character-level statistical machine translation, neural network joint models, and neural encoder-decoder approaches. Title: Linguistic Properties Matter for Implicit Discourse Relation Recognition: Combining Semantic Interaction, Topic Continuity and Attribution Speaker: Lei Wenqiang, PhD Student, Department of Computer Science Abstract: Modern solutions for implicit discourse relation recognition largely build universal models to classify all of the different types of discourse relations. In contrast to such learning models, we build our model from first principles, analyzing the linguistic properties of the individual top-level Penn Discourse Treebank (PDTB) styled implicit discourse relations: Comparison, Contingency and Expansion. We find semantic characteristics of each relation type and two cohesion devices – topic continuity and attribution – work together to contribute such linguistic properties. We encode those properties as complex features and feed them into a Naïve Bayes classifier, bettering baselines (including deep neural network ones) to achieve a new state-of-the-art performance level. Over a strong, feature-based baseline, our system outperforms one versus other binary classification by 4.83% for Comparison relation, 3.94% for Contingency and 2.22% for four-way classification.
28-Mar-2018	Title: (Gap/S)-ETH Hardness of SVP Speaker: Divesh Aggarwal, Assistant Professor, Department of Computer Science Abstract: There has been a lot of research in the last two decades on constructing cryptosystems whose security relies on the hardness of the shortest vector problem (SVP) on integer lattices. The SVP is well known to be NP-hard. However, such hardness proofs tell us very little about the quantitative or fine-grained complexity of SVP. E.g., does the fastest possible algorithm for SVP still run in time at least, say, 2^{n/5} , or is there an algorithm that runs in time 2^{n/100} or even 2^{\sqrt{n}}? The above hardness results cannot distinguish between these cases, but we certainly need to be confident in our answers to such questions if we plan to base the security of widespread cryptosystems on these answers. In this talk, I will give a partial answer to this question by showing the following quantitative hardness results for the Shortest Vector Problem in the \ell_p norm (SVP_p) where n is the rank of the input lattice. 1) For “almost all” p > 2.14, there no 2^{n/C_p}-time algorithm for SVP_p for some explicit constant C_p > 0 unless the (randomized) Strong Exponential Time Hypothesis (SETH) is false. 2) For any p > 2, there is no 2^{o(n)}-time algorithm for SVP_p unless the (randomized) Gap-Exponential Time Hypothesis (Gap-ETH) is false. 3) There is no 2^{o(n)}-time algorithm for SVP_2 unless either (1) (non-uniform) Gap-ETH is false; or (2) there is no family of lattices with exponential kissing number in the \ell_2 norm. This is joint work with Noah Stephens-Davidowitz.
4-Apr-2018	Title: Your Toolbox for Privacy in the Cloud Speaker: Tople Shruti Shrikant, Dean’s Graduate Award winner (AY2017/2018 Sem1) Abstract: Use of cloud services is becoming popular among users with terabytes of data uploaded every day. The state-of-the-practice method to secure this data is using encryption. But encryption alone is not enough. As cloud services offer complex functionalities at scale, my research raises several fundamental questions that are important to ensure practical privacy in the cloud. Concretely, 1) Can we compute on encrypted data in real-time? 2) What are the limits of defenses that hide side-channels appearing in encrypted computation techniques? 3) Can we design an ideally efficient side-channel defense for hiding specific data access patterns that exhibit in a large class of applications? In this talk, I will present various tools that I have developed in my research that answer the above questions and enable practical privacy in the cloud. My first work enables practical arbitrary computation on encrypted data by switching between efficient cryptographic schemes with minimum trust in software. This work forks a new direction in the area of encrypted computation by bridging the gap between two independent lines of approach — cryptographic primitives and trusted computing. Next, I will present an intractability result for hiding side-channels that leak information in encrypted computation. Lastly, I will show a construction that achieves ideal efficiency (constant latency) for hiding data access patterns in the read-only class of applications. Title: Quantum Communication Using Coherent Rejection Sampling Speaker: Anurag Anshu, Dean’s Graduate Award winner (AY2017/2018 Sem1) Abstract: Compression of a message up to the information it carries is key to many tasks involved in classical and quantum information theory. Schumacher [B. Schumacher, Phys. Rev. A 51, 2738 (1995)] provided one of the first quantum compression schemes and several more general schemes have been developed ever since [M. Horodecki, J. Oppenheim, and A. Winter, Commun. Math. Phys. 269, 107 (2007); I. Devetak and J. Yard, Phys. Rev. Lett. 100, 230501 (2008); A. Abeyesinghe, I. Devetak, P. Hayden, and A. Winter, Proc. R. Soc. A 465, 2537 (2009)]. However, the one-shot characterization of these quantum tasks is still under development, and often lacks a direct connection with analogous classical tasks. Here we show a new technique for the compression of quantum messages with the aid of entanglement. We devise a new tool that we call the convex split lemma, which is a coherent quantum analogue of the widely used rejection sampling procedure in classical communication protocols. As a consequence, we exhibit new explicit protocols with tight communication cost for quantum state merging, quantum state splitting, and quantum state redistribution (up to a certain optimization in the latter case). We also present a port-based teleportation scheme which uses a fewer number of ports in the presence of information about input. Based on a joint work with Vamsi Krishna Devabathini and Rahul Jain. https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.120506
11-Apr-2018	Title: Mining Clinical Data Speaker: Vaibhav Rajan, Assistant Professor, Department of Information Systems and Analytics Abstract: Clinical data analysis poses several modeling challenges that arise due to data heterogeneity, temporality, sparsity, bias and noise. I will outline these challenges in the context of identifying patients at risk of developing complications in hospitals, and present two projects. Nursing notes contain regular and valuable assessments of patients’ condition but often have inconsistent abbreviations and lack the grammatical structure of formal documents, thereby making automated analysis difficult. We design a new approach that effectively utilizes the structure of the notes, is robust to inconsistencies in the text and surpasses the accuracy of previous methods. Healthcare data often contains heterogeneous datatypes that exhibit complex feature dependencies. Our algorithm for dependency clustering uses copulas to effectively model a wide range of dependencies and can fit mixed — continuous and ordinal — data. It scales linearly with size and quadratically with dimensions of input data, which is significantly faster than state-of-the-art correlation clustering methods for mixed data. I’ll conclude with a summary of my current research.

AY2017/2018 Semester 1

30-Aug-2017	Title: Analysis of Source Code and Binaries for Vulnerability Detection and Patching Speaker: Abhik Roychoudhury, Professor, Department of Computer Science Abstract: Due to the absence of source code for parts of a software system – analysis methods which work on both source code and binaries are of value. We have studied vulnerability detection techniques which work on both source code and binaries. Our detection techniques combine the essential ingredients of various aspects of fuzz testing – model-based black-box fuzzing, coverage based greybox fuzzing, and symbolic execution based whitebox fuzzing. Apart from detecting security vulnerabilities, these capabilities can also be used for reproducing crashes from crash reports or clustering “similar” crashes. Finally, we have also studied methods for automated program repair, where vulnerability patch suggestions can be generated automatically. All of our fuzz testing and patching techniques have been evaluated on large scale and well-known systems such as detecting vulnerabilities in real-life applications such as the Adobe Acrobat reader or Windows Media Player. The talk will also provide a glimpse into the growing field of semantic program repair and its applications, which was started at NUS and has been gaining traction ever since.
6-Sep-2017	Title: Continuing Moore’s Law: Challenges and Opportunities in Computer Architecture Speaker: Trevor Erik Carlson, Assistant Professor, Department of Computer Science Abstract: Ever faster, cheaper mobile phones (as well as other computing devices) have been what consumers have come to expect from technology for many years. But, given two recent trends in technology scaling (todays chips are limited by power and costs because scaling has slowed significantly), it is widely expected that we will no longer receive significant help from scaling to help us build these faster devices. Does this spell out the end of computing as we know it? Will computers stop getting faster? As silicon technology improvements have slowed, research into alternatives technologies has increased. Nevertheless, these technologies could still take decades to reach the performance and cost that current CMOS provides. One solution to the problem of slowing technology scaling is to adapt the computer’s architecture to more efficiently use the transistors that we have. This is the main focus for our research. To enable a variety of new applications (AR, VR, machine-learning, etc.) while still providing longer-battery life and higher performance, we need to pursue innovative architectural directions. To do this, our research focuses on building general-purpose (programmable) processors and accelerators that are now a necessity to enable these new applications. In this talk, I will present some recent developments in computer architecture to move us closer to that goal, and present some critical challenges (and potential solutions) that we will need to address in the coming years.
13-Sep-2017	Title: Learning From Multiple Social Networks for Research And Business: A PhD Journey Speaker: Aleksandr Farseev, Dean’s Graduate Award winner (AY2016/2017 Sem2) Abstract: The drastic change in the Web was witnessed throughout the past decade, which saw an exponential growth in social networking services. The reason of such growth is that social media users concurrently produce and consume data. In this context, millions of users, who follow different lifestyles and belong to different demographic groups, regularly contribute multi-modal data on various online social networks, such as Twitter, Facebook, Foursquare, Instagram, and Endomondo. Traditionally, social media users are encouraged to complete their profiles by explicitly providing their personal attributes such as age, gender, interest, etc. (individual user profile). Additionally, users are likely to join interest-based groups that are devoted to various topics (group user profile). Such information is essential for different applications, but unfortunately, it is often not available publicly. This gives rise of automatic user profiling, which aims at automatic inference of users’ hidden information based on observable information such as individual’s behavior or utterances. The talk is focused on investigating user profiling across multiple social networks in different application domains.
20-Sep-2017	Title: Adapting User Technologies: Bridging Designers, Machine Learning and Psychology through Collaborative, Dynamic, Personalized Experimentation Speaker: Joseph Jay Williams, Assistant Professor, Department of Information Systems and Analytics Abstract: Enhancing people’s real-world learning and thinking is a challenge for HCI and psychology, while AI aims to build systems that can behave intelligently in the real-world. This talk presents a framework for redesigning the everyday websites people interact with to function as: (1) Intelligent adaptive agents that implement machine learning algorithms to dynamically discover how to optimize and personalize people’s learning and reasoning. (2) Micro-laboratories for psychological experimentation and data collection, I present an example of how this framework is used to create “MOOClets” that embed randomized experiments into real-world online educational contexts – like learning to solve math problems. Explanations (and experimental conditions) are crowdsourced from learners, teachers and scientists. Dynamically changing randomized experiments compare the learning benefits of these explanations in vivo with users, continually adding new conditions as new explanations are contributed. Algorithms (for multi-armed bandits, reinforcement learning, Bayesian Optimization) are used for real-time analysis (of the effect of explanations on users’ learning) and optimizing policies that provide the explanations that are best for different learners. The framework enables a broad range of algorithms to discover how to optimize and personalize users’ behavior, and dynamically adapt technology components to trade off experimentation (exploration) with helping users (exploitation). Bio: Joseph Jay Williams is an Assistant Professor at the National University of Singapore’s School of Computing, department of Information Systems & Analytics. He was previously a Research Fellow at Harvard’s Office of the Vice Provost for Advances in Learning, and a member of the Intelligent Interactive Systems Group in Computer Science. He completed a postdoc at Stanford University in the Graduate School of Education in Summer 2014, working with the Office of the Vice Provost for Online Learning and the Open Learning Initiative. He received his PhD from UC Berkeley in Computational Cognitive Science, where he applied Bayesian statistics and machine learning to model how people learn and reason. He received his B.Sc. from University of Toronto in Cognitive Science, Artificial Intelligence and Mathematics, and is originally from Trinidad and Tobago. More information about his research and papers is at www.josephjaywilliams.com.
4-Oct-2017	Title: Improving Medication Compliance: How CS Can Help Speaker: Ooi Wei Tsang, Associate Professor, Department of Computer Science Abstract: Medical compliance refers to the degree to which a patient accurately follows medical advice given by healthcare professionals, including whether they take medication as prescribed, are they taking the right dosage, and at the right timing. It is challenging for children and young adults patients who need long-term medication to comply due to their lifestyle and the need to balance between their study, social activities, and possibly work. This talk aims to (i) highlight the importance of the problem and the challenge that the patients face, (ii) review some existing work in computing literature that addresses this problem, and (iii) identify some open research challenges towards improving medical compliance that involve computer networking, sensors, multimedia-multimodal data, AI, and HCI research.
11-Oct-2017	Title: Introduction to blockchain and cryptocurrency research Speaker: Luu The Loi, Dean’s Graduate Award winner (AY2016/2017 Sem2) Abstract: Cryptocurrencies, such as Bitcoin, Ethereum and 250 similar alt-coins, embody at their core a blockchain protocol—a mechanism for a open and decentralized network with even malicious nodes to periodically agree on a set of new transactions. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum’s smart contract system has seen steady adoption, supporting millions of contracts, holding billions dollars worth of virtual coins. In this talk I will give brief introduction about blockchain and smart contract research. I also discuss a few interesting applications and research papers in this space. The talk is concluded by presenting open and interesting research problems that the community is focusing on. Title: Bounds on Distributed Information Spreading in Networks with Latencies Speaker: Suman Sourav, PhD Student, Department of Computer Science Abstract: Consider the problem of disseminating information (broadcast) in a large-scale distributed system: one (or more) nodes in a network have information that they want to share/aggregate/reconcile with others. Classic examples include distributed database replication, sensor network data aggregation, and P2P publish-subscribe systems. We study the performance of these distributed systems under the gossip protocol, in which a node is restricted to communicate with only one other neighboring node per round and show both theoretical upper and lower bounds for the case where networks have arbitrary varying latencies. The network is modeled as a weighted graph, where the network nodes are represented by the vertices, network links by the graph edges and the link latencies by the edge weights. We define a parameter called the weighted conductance and choose a particular latency as the critical latency for the graph. The weighted conductance characterizes how well connected the graph is with respect to the critical latency. We show that this weighted conductance provides an accurate characterization of connectedness by showing that the time required for information spreading has a tight dependence on the weighted conductance. We view our results as a step towards a more accurate characterization of connectivity in networks with delays and we believe that the metric can prove useful in solving numerous other graph problems. In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for the problem.
25-Oct-2017	Title: Making Software Secure: Hardening & Analysis Speaker: Roland Yap, Associate Professor, Department of Computer Science Abstract: Software plays a critical role in everyday life both from personal and enterprise/government standpoint. Unfortunately it is common than many critical software suffer from vulnerabilities, part of the reason being that such software usually is written in or has components in unsafe languages such as C and C++. An important question then is how to make protect ourselves from the inevitable bugs. This talk looks at two important ingredients to address this critical problem. Firstly, how to harden real-world low-level code in C/C++. This involves how to make C/C++ code safer while preserving their essential properties. For example, finding/preventing memory errors, type confusion, undefined behaviors. Some of this research directions will build on extending existing work on low fat pointers which is a state-of-art defence mechanism for buffer overflows. Another direction is how to find such errors. Symbolic execution is the main method use to analyse the behavior of programs without test cases because it can simulate program execution in a general fashion. Symbolic execution brings the challenge of how to solve the constraints used to model programs effectively, e.g. string operations such as regular expression matching, how to deal with the heap, etc. Such analysis can also hand in hand with optimizing and improving the code hardening. Title: Interpretable Machine Learning for User Friendly, Healthy Interventions Speaker: Brian Lim, Assistant Professor, Department of Computer Science Abstract: Advances in artificial intelligence, sensors and big data management have far-reaching societal impacts. These systems augment our everyday lives and can provide healthy interventions to improve our behaviors. These AI-driven systems can be directly helpful to consumers, such as by recognizing and recommending healthy foods, or indirectly by generating insights from data analytics to help to drive policy decisions for on urban populations. However, it is becoming increasingly important for people to understand them and remain in control. As we employ more sophisticated sensors and accurate machine learning models, how can we gain the users’ trust and understand in these applications? In this talk, I will give an overview of my group’s research into building AI-based, user-centered, and explainable applications spanning healthcare disease risk prediction, mobile food recognition logging, public health fitness tracking, context-aware interruption management, and urban mobility. We employ methods from human-computer interaction and machine learning to (i) eliciting requirements from target users, (ii) develop deployable hardware prototypes and software interfaces, and (iii) evaluate impact on real users in lab and field studies.
1-Nov-2017	Title: Data Privacy in Machine Learning Speaker: Reza Shokri, Assistant Professor, Department of Computer Science Abstract: I will talk about what machine learning privacy is, and will discuss how and why machine learning models leak information about the individual data records on which they were trained. My quantitative analysis will be based on the fundamental membership inference attacks: given a data record and (black-box) access to a model, determine if a record was in the model’s training set. I will demonstrate how to build such inference attacks on different classification models e.g., trained by commercial “machine learning as a service” providers such as Google and Amazon. Website: http://www.shokri.org
8-Nov-2017	Title: Analyzing Filamentary Structured Objects in Biomedical Images: Segmentation, Tracing, and Synthesis Speaker: Cheng Li, Adjunct Assistant Professor, Department of Computer Science Abstract: Filamentary structured objects are abundant in biomedical images, such as neuronal images, retinal fundus images, and angiography, to name a few. In this talk, we will discuss on our recent research efforts in addressing the tasks of segmentation, tracing, and synthesis for such images. More details can be found at our project websites https://web.bii.a-star.edu.sg/archive/machine_learning/Projects/filaStructObjs/project.htm.

AY2016/2017 Semester 2

25-Jan-2017	Title: Transparency & Discrimination in Big Data Systems Speaker: Yair Zick, Assistant Professor, Department of Computer Science Abstract: Big data and machine learning techniques are being increasingly used to make decisions about important, often sensitive, aspects of our lives; these include healthcare, finance and law enforcement. These algorithms often learn from data; for example, they might try to predict someone’s income levels based on various features, such as their age, salary or marital status. These algorithms are often very, very good at their job (hence their popularity): they are able to process a huge amount of data and offer accurate predictions that would have otherwise been made by human decision makers with only very partial, biased data (and would certainly require much more time). It is often thought that algorithms are unbiased, in the sense that they do not hold any prior opinions that affect their decisions. In particular, we would not like our algorithms to base their predictions on sensitive features – such as ethnicity or gender. So, did a big data algorithm base its decisions on “protected” user features? The problem is that in many cases it is very hard to tell: big data algorithms are often extremely complex, so we cannot be sure whether an algorithm used a protected feature (say, gender), or based its prediction on a correlated input. Our research aims at developing formal methods that offer some transparency into the way that the algorithms use their inputs. Using tools from game theory, formal causality analysis and statistics, we offer influence measures that can indicate how important was a feature in making a decision about an individual, or a protected group. In this talk, I will review some of the latest developments on algorithmic transparency, and its potential impact on interpretable ML.
1-Feb-2017	Title: The emerging security and privacy issues in the tangled web Speaker: Jia Yaoqi, Dean’s Graduate Award winner (AY2016/2017 Sem1) Abstract: World Wide Web gradually becomes an essential part of our daily life in the digital age. With the advent of cloud services and peer-to-peer techniques, new security and privacy issues are emerging in the tangled web. In this talk, I first illustrate how cloud services affect the web/local boundary provided by browsers, and then briefly present the privacy leakage in the P2P web overlays as well as the solutions using onion-routing and oblivious RAM. First, browsers such as Chrome adopt process-based isolation design to protect “the local system” from “the web”. However, as billions of users now use web-based cloud services (e.g., Dropbox and Google Drive), which are integrated into the local system, the premise that browsers can effectively isolate the web from the local system has become questionable. We argue that if the process-based isolation disregards the same-origin policy as one of its goals, then its promise of maintaining the “web/local system (local)” separation is doubtful. Specifically, we show that existing memory vulnerabilities in Chrome’s renderer can be used as a stepping-stone to drop executables/scripts in the local file system, install unwanted applications and misuse system sensors. These attacks are purely data-oriented and do not alter any control flow or import foreign code. Thus, such attacks bypass binary-level protection mechanisms, including ASLR and in-memory partitioning. Finally, we discuss various full defenses and present a possible way to mitigate the attacks presented. Second, the web infrastructure used to be a client-server model, in which clients (or browsers) request and fetch web contents such as HTML, JavaScript and CSS from web servers. Recently peer-to-peer (P2P) techniques (supported by real-time communications or RTC) have been introduced into the web infrastructure, which enables browsers to directly communicate with each other and form a P2P web overlay. This also brings the open and unsolved problems like privacy issues in P2P systems to the new web overlays. We investigate the security and privacy issues in web overlays, and propose solutions to address these issues using cryptographic and hardware primitives such as onion routing and oblivious RAM. First, we present inference attacks in peer-assisted CDNs on top on web overlays, which can infer user’s online activities such as browsing history. To thwart such attacks, we propose an anonymous peer-assisted CDN (APAC), which employs onion-routing techniques to conceal users’ identities and uses region-based circuit selection algorithm to reduce performance overhead. Second, to hide online activities (or access patterns) of users against long-term global analysis, we design an oblivious peer-to-peer content sharing system (OBLIVP2P), which uses new primitives such as distributed-ORAM in the P2P setting.
8-Feb-2017	Title: From networked chips to cities Speaker: Peh Li Shiuan, Provost’s Chair Professor, Department of Computer Science Abstract: As a new faculty member of SoC, I am currently actively scouting for PhD students for my group. This talk is pitched at the students, providing an overview of the kind of research my group has done in the past, and briefly discussing our next steps. This talk will give an overview of my group’s research, starting from our foray into networks-on-a-chip that enables scalable many-core processors. With many-core processors making their way into mobile devices, providing unprecedented compute power on such devices, we then explore how these powerful mobile devices can enable next-generation applications in smart cities.
15-Feb-2017	Title: On Modeling the Time-Energy Performance of Data-Parallel Applications on Heterogeneous Systems Speaker: Dumitrel Loghin, Dean’s Graduate Award winner (AY2016/2017 Sem1) Abstract: The increasing volume of data to be processed leads to an energy usage issue in datacenter computing. Traditionally, datacenters employ homogeneous brawny servers based on x86/64 CPUs which are known to be power-hungry. In contrast, heterogeneous systems combining CPU and GPU cores represent a promising alternative for energy-efficient data-parallel processing. Moreover, the last few years have witnessed a significant performance improvement of low-power, wimpy systems, traditionally used in mobile devices. However, selecting the best configuration in terms of software parameters and system resources is a daunting task because of the very large configuration space exposed by data-parallel frameworks and heterogeneous systems. To alleviate this, we have developed measurement-driven analytic models to determine and analyze suitable system configurations for Hadoop MapReduce, which represents the most popular data-parallel framework. Using baseline measurements on a single node with small inputs, our models determine the execution time and energy usage on scale-out clusters and workloads. To evaluate the models, we have used two types of systems and five representative MapReduce workloads covering domains such as financial analysis, data mining and simulations. The systems consist of both cloud-based Amazon EC2 instances with discrete GPUs and self-hosted Nvidia Jetson TK1 nodes with integrated GPUs representing brawny and wimpy heterogeneous systems, respectively. Our model-based analysis supports the following key results. Firstly, for both brawny and wimpy systems, we show that heterogeneous clusters consisting of nodes with CPUs and GPUs are almost always more time-energy-efficient than homogeneous clusters with CPU-only nodes. Secondly, we show that multiple wimpy nodes achieve the same time performance as a single brawny node while saving up to 90% of the energy used. In contrast with the related work, we are the first to design an energy usage model for MapReduce and to apply this model to analyze the performance of wimpy heterogeneous systems with GPU.
1-Mar-2017	Title: Real world opportunities for NLP Research to Impact Global Education through MOOCs Speaker: Kan Min Yen, Associate Professor, Department of Computer Science Abstract: Massive Open Online Courses (MOOCs) have been heralded as a game-changer as they have the potential to disseminate the best lectures by top educations to the masses. However, many students who enrol drop out, in part due to the difficulties in finding the motivation to complete the assignments. Part of this is due to the (lack of) participation by instructor staff actively involved in deliberations in the course, especially in terms of dialogue and discussions with students through courses’ discussion forums. We leverage natural language processing technologies to better analyse student conversations to identify opportunities for timely instructor intervention to produce better learning outcomes. We discuss how diversity in MOOC offering has compromised the validity of previously published results, how automatic discourse parsing can improve prediction and the real problem of the bias presented by the user interface that affects the instructors’ decision to intervene. We are actively recruiting interested individuals to continue work on these and allied topics.
8-Mar-2017 Note: venue at Seminar Room 3* (COM1-02-12)	Title: Power Papers — Some Practical Pointers, Part 1 Speaker: Terence Sim, Associate Professor, Department of Computer Science Abstract: If I write with the flowery flourish of Shakespeare, but my prose proves problematic, then my words become like a noisy gong or a clanging cymbal. If I have the gift of mathematical genius and can fathom all theorems, but cannot articulate the arcane, my genius appears no different from madness. If I achieve breakthrough research that can change the world, but cannot explain its significance, the world gains nothing and I labor in vain. Writing a good research paper takes effort; more so if there is a page limit. Yet this skill is required of every researcher, who, more often than not, fumbles his or her way through. Good grammar is only a start; care and craft must be applied to turn a mediocre paper into a memorable one. Writing skills can indeed be honed. In this talk, I will highlight the common mistakes many authors make, and offer practical pointers to pack more punch into your paper. Needless to say, the talk will be biased: I will speak not from linguistic theories, but from personal experience, sharing what has, and has not, worked for me. Students and staff are all welcome to participate: your views and insights will certainly benefit us all.
15-Mar-2017	Title: Cache Miss Equation, and Synthetic Dataset Scaling Speaker: Zhang Jiangwei, Research Achievement Award winner (AY2016/2017 Sem1) Abstract: Cache Miss Equation: Science seeks to discover what is forever true of nature. For Computer Science, what can we discover that will be forever true about computation or, at least, immune to changes in technology? Computation fundamentally requires cycles, memory, bandwidth and time. The memory in a computer system has innumerable caches, and our research on this resource focuses on developing an equation to describe cache misses for all levels of the memory hierarchy. It works for a disk cache, database buffers, garbage-collected heaps, nonvolatile memory and content-centric networking. For more details, please check: http://www.math.nus.edu.sg/~mattyc/CME.html Synthetic Dataset Scaling: Benchmarks are ubiquitous in the computing industry and academia. Developers use benchmarks to compare products, while researchers use them similarly in their research. For 20-odd years, the popular benchmarks for database management systems were the ones defined by the Transaction Processing Council (TPC). However, the small number of TPC benchmarks are increasingly irrelevant to the myriad of diverse applications, and the TPC standardization process is too slow. This led to a proposal for a paradigm shift, from a top-down design of domain-specific benchmarks by committee consensus, to a bottom-up collaboration to develop tools for application-specific benchmarking. A database benchmark must have a dataset. For the benchmark to be application-specific, it must start with an empirical dataset D. This D may be too small or too large for the benchmarking experiment, so the first tool to develop would be for scaling D to a desired size. This motivates the Dataset Scaling Problem(DSP): Given a set of relational tables D and a scale factor s, generate a database state D’ that is similar to D but s times its size. For more details, please check: http://www.comp.nus.edu.sg/~upsizer/ In this talk, I will briefly share the motivation, the possible impact, the current solutions we have, and the research opportunities for both problems.
22-Mar-2017	Title: Computer Vision for Robotics Perception Speaker: Lee Gim Hee, Assistant Professor, Department of Computer Science Abstract: Camera is a good sensor for robotic perception over traditionally used Lidar because of low-cost and rich in information, but the algorithms are often computationally too expensive, and sensitive to noise and outliers. In this talk, I will present my work on making some of the computer vision algorithms more efficient and robust for robots to percieve the world through cameras.
29-Mar-2017	Title: Hardening Programs Against Software Vulnerabilities AND Constraints Solvers for Problems in Security Speaker: Roland Yap, Associate Professor, Department of Computer Science Abstract: The talk will be about two but partially related topics. The first is on preventing exploitation of software vulnerabilities and will be the main focus on the talk. Memory bugs are still the main route where software is attacked. In fact, one might regard that in most of today’s complex software in low level languages such as C and C++ that such bugs are inevitable. As such, a strategy to harden the program such that these bugs cannot be exploited, e.g. to corrupt the stack, is perhaps the strategy which needs to be adopted in the long term. There are many kinds of memory errors, perhaps, the most well known are spatial and temporal errors. I will talk about a research direction which opens up the area from simple to complex kinds of program hardening. For students interested in knowing a bit more before hand, a recent paper at NDSS 2017 on protecting stack objects is Stack Object Protection with Low Fat Pointers https://www.internetsociety.org/events/ndss-symposium/ndss-symposium-2017/ndss-2017-programme/ndss-2017-session-10-software-and The second topic which I will touch on more briefly is research on constraint solving. Constraint solving is of broad applicability to many domains ranging from theoretical computer science, to verification, to security. I will mention some problems in constraints with some links to verification and security.
5-Apr-2017	Title: Analyzing the Behaviors of Articulated Objects in 3D : Applications to Human and Animals Speaker: Cheng Li, Adjunct Assistant Professor, Department of Computer Science Abstract: Recent advancement of depth cameras has opened door to many interesting applications. In this talk, I will discuss our research efforts toward addressing the related tasks of pose estimation, tracking, action and behavior analysis of a range of articulated objects (human upper-body, human hand, fish, mouse) from such 3D cameras. In particular, I will talk about our recent Lie group based approach that enables us to tackle these problems under a unified framework. Looking forward, the results could be applied to everyday life scenarios such as natural user interface, behavior analysis and surveillance, gaming, among others.