Publications

Scroll to papers in year: 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001

Accepted Papers and Preprints

GLOW.AI members     Bryan's advisors+     Bryan's collaborators*

Localized Zeroth-Order Prompt Optimization

Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, See-Kiong Ng*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate (spotlight presentation); also appeared in ICML 2024 Workshop on In-Context Learning

Abstract. The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.

DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

Zijian Zhou, Xiaoqiang Lin, Xinyi Xu, Alok Prakash*, Daniela Rus*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate; also appeared in ICML 2024 Workshop on In-Context Learning

Abstract. In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learning paradigm. Taking the viewpoint of recent works showing that transformers learn in context by formulating an internal optimizer, we propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We empirically verify the effectiveness of our approach for demonstration attribution while being computationally efficient. Leveraging the results, we then show how DETAIL can help improve model performance in real-world scenarios through demonstration reordering and curation. Finally, we experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.

Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng*, Patrick Jaillet*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate; also appeared in ICML 2024 Workshop on In-Context Learning

Abstract. Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest.

Data Distribution Valuation

Xinyi Xu, Shuaiqi Wang, Chuan Sheng Foo*, Bryan Kian Hsiang Low, and Giulia Fanti*

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate

Abstract. Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces. Existing data valuation methods define a value for a discrete dataset. However, in many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. For example, consider a buyer trying to evaluate whether to purchase data from different vendors. The buyer may observe (and compare) only a small preview sample from each vendor, to decide which vendor's data distribution is most useful to the buyer and purchase. The core question is how should we compare the values of data distributions from their samples? Under a Huber characterization of the data heterogeneity across vendors, we propose a maximum-mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies for comparing data distributions from samples. We empirically demonstrate that our method is sample-efficient and effective in identifying valuable data distributions against several existing baselines, on multiple real-world datasets (e.g., network intrusion detection, credit card fraud detection) and downstream applications (classification, regression).

Gradient-Free Methods for Nonconvex Nonsmooth Stochastic Compositional Optimization

Zhuanghua Liu, Luo Luo, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate

Abstract. The stochastic compositional optimization (SCO) is popular in many real-world applications, including risk management, reinforcement learning, and meta-learning. However, most of the previous methods for SCO require the smoothness assumption on both the outer and inner functions, which limits their applications to a wider range of problems. In this paper, we study the SCO problem in that both the outer and inner functions are Lipschitz continuous but possibly nonconvex and nonsmooth. In particular, we propose gradient-free stochastic methods for finding the (_,_)-Goldstein stationary points of such problems with non-asymptotic convergence rates. Our results also lead to an improved convergence rate for the convex nonsmooth SCO problem. Furthermore, we conduct numerical experiments to demonstrate the effectiveness of the proposed methods.

Active Set Ordering

Quoc Phong Nguyen, Sunil Gupta, Svetha Venkatesh, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
25.8% acceptance rate

Abstract. In this paper, we formalize the active set ordering problem, which involves actively discovering a set of inputs based on their orderings determined by expensive evaluations of a blackbox function. We then propose the mean prediction (MP) algorithm and theoretically analyze it in terms of the regret of predicted pairwise orderings between inputs. Notably, as a special case of this framework, we can cast Bayesian optimization as an active set ordering problem by recognizing that maximizers can be identified solely by comparison rather than by precisely estimating the function evaluations. As a result, we are able to construct the popular Gaussian process upper confidence bound (GP-UCB) algorithm through the lens of ordering with several nuanced insights. We empirically validate the performance of our proposed solution using various synthetic functions and real-world datasets.

Position Paper: Data-Centric AI in the Age of Large Language Models

Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh*, and Bryan Kian Hsiang Low

Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 11895-11913, 2024
16.86% acceptance rate (findings)

Abstract. This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.

Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs

Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 20432-20466, 2024
20.8% acceptance rate (main conference); also appeared in ICML 2024 Workshop on Generative AI and Law; also appeared in ICML 2024 Workshop on Foundation Models in the Wild

Abstract. Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of users for practical implementation. In this paper, we propose Waterfall, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages supportable by LLMs, for general text and LLM data provenance. Waterfall comprises several key innovations, such as being the first to use LLM as paraphrasers for watermarking along with a novel combination of techniques that are surprisingly effective in achieving robust verifiability and scalability. We empirically demonstrate that Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods, and also showed how it could be directly applied to the watermarking of code.

Dependency Structure Search Bayesian Optimization for Decision Making Models

Mohit Rajpal, Gia Lac Tran, Yehong Zhang, and Bryan Kian Hsiang Low

Transactions on Machine Learning Research, 2024

Abstract. Many approaches for optimizing decision making models rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making models. This problem is exacerbated if the model requires interactions between several agents cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of agent interactions through the concept of role. We introduce Dependency Structure Search Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters, and give the first improved regret bound in additive high-dimensional Bayesian Optimization since Mutny & Krause (2018). Our approach shows strong empirical results under malformed or sparse reward.

PIED: Physics-Informed Experimental Design For Inverse Problems

Apivich Hemachandra, Gregory Kang Ruey Lau, See-Kiong Ng*, and Bryan Kian Hsiang Low

ICML Workshop on AI for Science: Scaling in AI for Scientific Discovery, 2024

Abstract. In many inverse problems (IPs) in science and engineering, optimization of design parameters (e.g., sensor placement) with experimental design (ED) methods is performed due to high data acquisition costs when conducting physical experiments, and often has to be done up front due to practical constraints on sensor deployments. However, existing ED methods are often challenging to use in practical PDE-based inverse problems due to significant computational bottlenecks during forward simulation and inverse parameter estimation. This paper presents Physics-Informed Experimental Design (PIED), the first ED framework that makes use of PINNs in a fully differentiable architecture to perform continuous optimization of design parameters for IPs. PIED utilizes techniques such as learning of a shared NN parameter initialization, and approximation of PINN training dynamics during the ED process, for better estimation of the inverse parameters. PIED selects the optimal design parameters for one-shot deployment, allows exploitation of parallel computation unlike existing methods, and is empirically shown to significantly outperform existing ED benchmarks in IPs for both finite-dimensional and function-valued inverse parameters given limited budget for observations.

Neural Dueling Bandits

Arun Verma, Zhongxiang Dai, Xiaoqiang Lin, Patrick Jaillet*, and Bryan Kian Hsiang Low

ICML Workshop on Foundations of Reinforcement Learning and Control: Connections and Perspectives, 2024

Abstract. Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web search results. To overcome this challenge, we use a neural network to estimate the reward function using preference feedback for the previously selected arms. We propose upper confidence bound- and Thompson sampling-based algorithms with sub-linear regret guarantees that efficiently select arms in each round. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution. Experimental results on the problem instances derived from synthetic datasets corroborate our theoretical results.

Heterogeneous Federated Zeroth-Order Optimization using Gradient Surrogates

Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, and Bryan Kian Hsiang Low

ICML Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, 2024

Abstract. Federated optimization, an emerging paradigm that finds wide applications, e.g., federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function by sharing their local gradients. However, the gradient information is not available in many applications, giving rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms typically suffer from the limitations of query and communication round inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates caused by client heterogeneity. To this end, we (a) introduce trajectory-informed gradient surrogates which are capable of using the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these surrogates to mitigate the aforementioned disparity. With these, we propose the federated zeroth-order optimization using gradient surrogates (FZooS) algorithm for query- and communication round-efficient heterogeneous federated ZOO, which is supported by our theoretical analyses and extensive experiments.

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Cheng Wang, Xinyang Lu, See-Kiong Ng*, and Bryan Kian Hsiang Low

arXiv:2407.04981, 2024

Abstract. The rapid evolution of large language models (LLMs) represents a substantial leap forward in natural language understanding and generation. However, alongside these advancements come significant challenges related to the accountability and transparency of LLM responses. Reliable source attribution is essential to adhering to stringent legal and regulatory standards, including those set forth by the General Data Protection Regulation. Despite the well-established methods in source attribution within the computer vision domain, the application of robust attribution frameworks to natural language processing remains underexplored. To bridge this gap, we propose a novel and versatile TRansformer-based Attribution framework using Contrastive Embeddings called TRACE that, in particular, exploits contrastive learning for source attribution. We perform an extensive empirical evaluation to demonstrate the performance and efficiency of TRACE in various settings and show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of LLMs.

Prompt Optimization with Human Feedback

Xiaoqiang Lin, Zhongxiang Dai, Arun Verma, See-Kiong Ng*, Patrick Jaillet*, and Bryan Kian Hsiang Low

ICML Workshop on Models of Human Feedback for AI Alignment, 2024
(oral presentation)

Abstract. Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest.

Deletion-Anticipative Data Selection with a Limited Budget

Rachael Hwee Ling Sim, Jue Fan, Xiao Tian, Patrick Jaillet*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 45468-45507, 2024
27.5% acceptance rate

Abstract. Supervised data subset selection and active learning have often been used in data acquisition to select a smaller training set and reduce the time and cost of training machine learning (ML) models. These methods assume the selected training set to remain available throughout the ML model deployment. Such an assumption is increasingly being challenged as data owners, enabled by the GDPR's right to erasure, request deletions of their data. This raises an important question: During data acquisition of a training set of size k, how can a learner proactively maximize the data utility after future unknown deletions? We propose that the learner anticipates/estimates the probability that (i) each data owner in the feasible set will independently delete its data or (ii) a number of deletions occur out of k, and justify our proposal with concrete real-world use cases. Then, instead of directly maximizing the data utility function, the learner should maximize the expected or risk-averse utility based on the anticipated probabilities. We further propose how to construct these deletion-anticipative data selection (DADS) maximization objectives to preserve properties like monotone submodularity and near-optimality of greedy solutions, optimize the objectives efficiently, and empirically evaluate DADS' performance on real-world datasets.

Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions

Jingtan Wang, Xiaoqiang Lin, Rui Qiao, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 50960-50991, 2024
27.5% acceptance rate; also appeared in ICML 2024 Workshop on Data-Centric Machine Learning Research

Abstract. The increasing complexity of foundational models underscores the necessity for explainability, particularly for fine-tuning, the most widely used training method for adapting models to downstream tasks. Instance attribution, one type of explanation, attributes the model prediction to each training example by an instance score. However, the robustness of instance scores, specifically towards dataset resampling, has been overlooked. To bridge this gap, we propose a notion of robustness on the sign of the instance score. We theoretically and empirically demonstrate that the popular leave-one-out-based methods lack robustness, while the Shapley value behaves significantly better, but at a higher computational cost. Accordingly, we introduce an efficient fine-tuning-free approximation of the Shapley value (FreeShap) for instance attribution based on the neural tangent kernel. We empirically demonstrate that FreeShap outperforms other methods for instance attribution and other data-centric applications such as data removal, data selection, and wrong label detection, and further generalize our scale to large language models (LLMs).

Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components

Zhiliang Chen, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 6699-6727, 2024
27.5% acceptance rate

Abstract. Machine learning (ML) models in the real world typically do not exist in isolation. They are usually part of a complex system (e.g., healthcare systems, self-driving cars) containing multiple ML and black-box components. The problem of optimizing such systems, which we refer to as automated AI (AutoAI), requires us to jointly train all ML components together and presents a significant challenge because the number of system parameters is extremely high and the system has no analytical form. To circumvent this, we introduce a novel algorithm called A-BAD-BO which uses each ML component's local loss as an auxiliary indicator for system performance. A-BAD-BO uses Bayesian optimization (BO) to optimize the local loss configuration of a system in a smaller dimensional space and exploits the differentiable structure of ML components to recover optimal system parameters from the optimized configuration. We show A-BAD-BO converges to optimal system parameters by showing that it is asymptotically no regret. We use A-BAD-BO to optimize several synthetic and real-world complex systems, including a prompt engineering pipeline for large language models containing millions of system parameters. Our results demonstrate that A-BAD-BO yields better system optimality than gradient-driven baselines and is more sample-efficient than pure BO algorithms.

Zeroth-Order Methods for Constrained Nonconvex Nonsmooth Stochastic Optimization

Zhuanghua Liu, Cheng Chen, Luo Luo, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 30842-30872, 2024
1.52% acceptance rate (oral presentation)

Abstract. This paper studies the problem of solving nonconvex nonsmooth optimization over a closed convex set. Most previous works tackle such problems by transforming the constrained problem into an unconstrained problem that can be solved by the techniques developed in the unconstrained setting. However, they only provide asymptotic convergence analysis for their methods. In this work, we provide the non-asymptotic analysis for solving constrained nonconvex nonsmooth optimization. We first generalize classical gradient mapping and the Frank-Wolfe gap in the nonsmooth setting. Then we introduce novel notions of approximate stationarity concerning such generalized quantities. We also propose several stochastic zeroth-order algorithms for the problem, along with their non-asymptotic convergence guarantees of obtaining the proposed approximate stationarity. Finally, we conduct numerical experiments that demonstrate the effectiveness of our algorithms.

Distributionally Robust Data Valuation

Xiaoqiang Lin, Xinyi Xu, Zhaoxuan Wu, See-Kiong Ng*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 30362-30391, 2024
27.5% acceptance rate

Abstract. Data valuation quantifies the contribution of each data point to the performance of a machine learning model. Existing works typically define the value of data by its improvement of the validation performance of the trained model. However, this approach can be impractical to apply in collaborative machine learning and data marketplace since it is difficult for the parties/buyers to agree on a common validation dataset or determine the exact validation distribution a priori. To address this, we propose a distributionally robust data valuation approach to perform data valuation without known/fixed validation distributions. Our approach defines the value of data by its improvement of the distributionally robust generalization error (DRGE), thus providing a worst-case performance guarantee without a known/fixed validation distribution. However, since computing DRGE directly is infeasible, we propose using model deviation as a proxy for the marginal improvement of DRGE (for kernel regression and neural networks) to compute data values. Furthermore, we identify a notion of uniqueness where low uniqueness characterizes low-value data. We empirically demonstrate that our approach outperforms existing data valuation approaches in data selection and data removal tasks on real-world datasets (e.g., housing price prediction, diabetes hospitalization prediction).

Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng*, Patrick Jaillet*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 30317-30345, 2024
27.5% acceptance rate; also appeared in NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

Abstract. Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e.g., various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions.

On Newton's Method to Unlearn Neural Networks

Nhung Bui, Xinyang Lu, See-Kiong Ng*, and Bryan Kian Hsiang Low

arXiv:2406.14507, 2024

Abstract. Machine unlearning facilitates personal data ownership, including the "right to be forgotten". The proliferation of applications of neural networks (NNs) trained on users' personal data calls for the need to develop algorithms to unlearn an NN. Since retraining is costly, efficiency is often achieved through approximate unlearning which aims to unlearn a trained NN to be close to the retrained one (in distribution). Though the Newton's method has been used by previous works to approximately unlearn linear models, adapting it for unlearning an NN often encounters degenerate Hessians that make computing the Newton's update impossible. In this paper, we will first show that when coupled with naive yet often effective solutions to mitigate the degeneracy issue for unlearning, the Newton's method surprisingly suffers from catastrophic forgetting. To overcome this difficulty, we revise the Newton's method to include a theoretically justified regularizer and propose a cubic-regularized Newton's method for unlearning an NN. The cubic regularizer comes with the benefits of not requiring manual finetuning and affording a natural interpretation. Empirical evaluation on several models and real-world datasets shows that our method is more resilient to catastrophic forgetting and performs better than the baselines, especially in sequential unlearning.

PINNACLE: PINN Adaptive ColLocation and Experimental points selection

Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng*, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2024
5% acceptance rate (spotlight presentation); also appeared in ICML 2024 Workshop on AI for Science: Scaling in AI for Scientific Discovery (best paper award, oral presentation); also appeared in NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

Abstract. Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of collocation points chosen during training to enforce each PDE and initial/boundary conditions, and experimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this loss function is challenging as it typically requires selecting large numbers of points of different types, each with different training dynamics. Unlike past works that focused on the selection of either collocation or experimental points, this work introduces PINN Adaptive ColLocation and Experimental points selection (PINNACLE), the first algorithm that jointly optimizes the selection of all training point types, while automatically adjusting the proportion of collocation point types as training progresses. PINNACLE uses information on the interactions among training point types, which had not been considered before, based on an analysis of PINN training dynamics via the Neural Tangent Kernel (NTK). We theoretically show that the criterion used by PINNACLE is related to the PINN generalization error, and empirically demonstrate that PINNACLE is able to outperform existing point selection methods for forward, inverse, and transfer learning problems.

Robustifying and Boosting Training-Free Neural Architecture Search

Zhenfeng He, Yao Shu, Zhongxiang Dai, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics typically varies across different tasks, making it challenging to achieve robust and consistently good search performance on diverse tasks with only a single training-free metric. Meanwhile, the estimation gap between training-free metrics and the true architecture performances limits training-free NAS to achieve superior performance. To address these challenges, we propose the robustifying and boosting training-free NAS (RoBoT) algorithm which (a) employs the optimized combination of existing training-free metrics explored from Bayesian optimization to develop a robust and consistently better-performing metric on diverse tasks, and (b) applies greedy search, i.e., the exploitation, on the newly developed metric to bridge the aforementioned gap and consequently to boost the search performance of standard training-free NAS further. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS under mild conditions with additional interesting insights. Our extensive experiments on various NAS benchmark tasks yield substantial empirical evidence to support our theoretical results.

Incentive-Aware Federated Learning with Training-Time Model Rewards

Zhaoxuan Wu, Mohammad Mohammadi Amiri, Ramesh Raskar, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. In federated learning (FL), incentivizing contributions of training resources (e.g., data, compute) from potentially competitive clients is crucial. Existing incentive mechanisms often distribute post-training monetary rewards, which suffer from practical challenges of timeliness and feasibility of the rewards. Rewarding the clients after the completion of training may incentivize them to abort the collaboration, and monetizing the contribution is challenging in practice. To address these problems, we propose an incentive-aware algorithm that offers differentiated training-time model rewards for each client at each FL iteration. We theoretically prove that such a local design ensures the global objective of client incentivization. Through theoretical analyses, we further identify the issue of error propagation in model rewards and thus propose a stochastic reference-model recovery strategy to ensure theoretically that all the clients eventually obtain the optimal model in the limit. We perform extensive experiments to demonstrate the superior incentivizing performance of our method compared to existing baselines.

Understanding Domain Generalization: A Noise Robustness Perspective

Rui Qiao, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. Despite the rapid development of machine learning algorithms for domain generalization (DG), there is no clear empirical evidence that the existing DG algorithms outperform the classic empirical risk minimization (ERM) across standard benchmarks. To better understand this phenomenon, we investigate whether there are benefits of DG algorithms over ERM through the lens of label noise. Specifically, our finite-sample analysis reveals that label noise exacerbates the effect of spurious correlations for ERM, undermining generalization. Conversely, we illustrate that DG algorithms exhibit implicit label-noise robustness during finite-sample training even when spurious correlation is present. Such desirable property helps mitigate spurious correlations and improve generalization in synthetic experiments. However, additional comprehensive experiments on real-world benchmark datasets indicate that label-noise robustness does not necessarily translate to better performance compared to ERM. We conjecture that the failure mode of ERM arising from spurious correlations may be less prevalent in practice.

A Unified Framework for Bayesian Optimization under Contextual Uncertainty

Sebastian Shenghong Tay, Chuan Sheng Foo*, Daisuke Urano*, Richalynn Leong*, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. Bayesian optimization under contextual uncertainty (BOCU) is a family of BO problems in which the learner makes a decision prior to observing the context and must manage the risks involved. Distributionally robust BO (DRBO) is a subset of BOCU that affords robustness against context distribution shift, and includes the optimization of expected values and worst-case values as special cases. By considering the first derivatives of the DRBO objective, we generalize DRBO to one that includes several other uncertainty objectives studied in the BOCU literature such as worst-case sensitivity (and thus notions of risk such as variance, range, and conditional value-at-risk) and mean-risk tradeoffs. We develop a general Thompson sampling algorithm that is able to optimize any objective within the BOCU framework, analyze its theoretical properties, and compare it to suitable baselines across different experimental settings and uncertainty objectives.

Optimistic Bayesian Optimization with Unknown Constraints

Quoc Phong Nguyen, Wan Theng Ruth Chew, Le Song, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. Though some research efforts have been dedicated to constrained Bayesian optimization (BO), there remains a notable absence of a principled approach with a theoretical performance guarantee in the decoupled setting. Such a setting involves independent evaluations of the objective function and constraints at different inputs, and is hence a relaxation of the commonly-studied coupled setting where functions must be evaluated together. As a result, the decoupled setting requires an adaptive selection between evaluating either the objective function or a constraint, in addition to selecting an input (in the coupled setting). This paper presents a novel constrained BO algorithm with a provable performance guarantee that can address the above relaxed setting. Specifically, it considers the fundamental trade-off between exploration and exploitation in constrained BO, and, interestingly, affords a noteworthy connection to active learning. The performance of our proposed algorithms is also empirically evaluated using several synthetic and real-world optimization problems.

Meta-VBO: Utilizing Prior Tasks in Optimizing Risk Measures with Gaussian Processes

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Learning Representations (ICLR), 2024
31% acceptance rate

Abstract. Research on optimizing the risk measure of a blackbox function using Gaussian processes, especially Bayesian optimization (BO) of risk measures, has become increasingly important due to the inevitable presence of uncontrollable variables in real-world applications. Nevertheless, existing works on BO of risk measures start the optimization from scratch for every new task without considering the results of previous tasks. In contrast, its vanilla BO counterpart has received a thorough investigation on utilizing previous tasks to speed up the current task through the body of works on meta-BO which, however, have not considered risk measures. To bridge this gap, this paper presents the first algorithm for meta-BO of risk measures (i.e., value-at-risk (VaR) and the conditional VaR) by introducing a novel adjustment to the upper confidence bound acquisition function. Our proposed algorithm exhibits two desirable properties: (i) invariance to scaling and vertical shifting of the blackbox function and (ii) robustness to previous harmful tasks. We provide a theoretical performance guarantee for our algorithm and empirically demonstrate its performance using several synthetic function benchmarks and real-world objective functions.

Fairness in Federated Learning

Xiaoqiang Lin, Xinyi Xu, Zhaoxuan Wu, Rachael Hwee Ling Sim, See-Kiong Ng*, Chuan Sheng Foo*, Patrick Jaillet*, Trong Nghia Hoang, and Bryan Kian Hsiang Low

L. M. Nguyen, T. N. Hoang, P.-Y. Chen, editors, Federated Learning: Theory and Practice, chapter 8, Academic Press, pages 143-160, 2024

Abstract. Federated learning (FL) enables a form of collaboration among multiple clients in jointly learning a machine learning (ML) model without centralizing their local datasets. Like in any collaboration, it is imperative to guarantee fairness so that the clients are willing to participate. For instance, it is unfair if one client benefits significantly more than others, or if some client benefits disproportionately to its contribution in the collaboration. Additionally, it is also unfair if the ML model makes biased predictions against certain groups of clients. This chapter discusses three specific notions of fairness by highlighting their motivations from real-world use-cases, examining several specific definitions for each notion and lastly describing the corresponding algorithms proposed to achieve each notion of fairness. At the end, this chapter will also summarize the identified gaps in current research efforts into open problems.

Federated Sequential Decision Making: Bayesian Optimization, Reinforcement Learning, and Beyond

Zhongxiang Dai, Flint Xiaofeng Fan, Cheston Tan*, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Patrick Jaillet*

L. M. Nguyen, T. N. Hoang, P.-Y. Chen, editors, Federated Learning: Theory and Practice, chapter 14, Academic Press, pages 257-279, 2024

Abstract. Federated learning (FL) in its classic form involves the collaborative training of supervised learning models (e.g., neural networks) among multiple agents/clients. However, in addition to supervised learning, many other machine learning tasks which are inherently sequential decision-making problems, such as Bayesian optimization (BO) and reinforcement learning (RL), also find important applications in the federated setting. For example, the crucial problem of hyperparameter tuning of neural networks in the federated setting calls for algorithms for federated BO; collaborative clinical treatment recommendation among multiple hospitals is a natural application for federated RL. However, the extension of these classic sequential decision-making algorithms into the federated setting is faced with immense challenges. Firstly, these algorithms (e.g., BO and RL) have to be adapted to satisfy the core principles of FL. For example, consistent with the requirement of FL, the raw data (e.g., the history of observations in BO and the trajectories in RL) of every agent can never be shared with the other agents. Next, it is challenging to preserve the rigorous theoretical guarantees of these classic sequential decision-making algorithms (e.g., the sub-linear regret upper bound of classic BO algorithms and the sample complexity of classic policy gradient algorithms for RL) and at the same time consistently improve their empirical performances by leveraging the federation of multiple agents. In this regard, a number of recent works have tackled these challenges and hence introduced federated versions of classic sequential decision-making algorithms (e.g., federated BO and federated RL algorithms) which satisfy the core principles of FL and are both theoretically grounded and practically effective. In light of these recent advances, this chapter discusses federated sequential decision-making problems with a focus on recent representative works on federated BO and federated RL, and describes open problems and potential future directions in these areas.

Data Valuation in Federated Learning

Zhaoxuan Wu, Xinyi Xu, Rachael Hwee Ling Sim, Yao Shu, Xiaoqiang Lin, Lucas Agussurja, Zhongxiang Dai, See-Kiong Ng*, Chuan Sheng Foo*, Patrick Jaillet*, Trong Nghia Hoang, and Bryan Kian Hsiang Low

L. M. Nguyen, T. N. Hoang, P.-Y. Chen, editors, Federated Learning: Theory and Practice, chapter 15, Academic Press, pages 281-296, 2024

Abstract. Federated learning (FL) has become an increasingly popular solution paradigm for enabling collaborative machine learning (CML) in which multiple clients can collaboratively train a common model without sharing their private training data with others. However, broad adoption of FL in practice is still limited, as clients can often be reluctant to participate in such federated effort unless their contributions are accurately recognized and fairly compensated. Data valuation is thus extensively required to measure the relative contributions among clients. In this chapter, we review data valuation methods in the conventional supervised CML setting, followed by extensions to the FL paradigm. To better address the challenge that the private data from local clients cannot be made available to the server, we further discuss many specialized data valuation methods developed for both horizontal and vertical FL in detail. Overall, this chapter aims to provide a comprehensive suite of data valuation tools to empower FL practitioners in various practical scenarios.

Incentives in Federated Learning

Rachael Hwee Ling Sim, Sebastian Shenghong Tay, Xinyi Xu, Yehong Zhang, Zhaoxuan Wu, Xiaoqiang Lin, See-Kiong Ng*, Chuan Sheng Foo*, Patrick Jaillet*, Trong Nghia Hoang, and Bryan Kian Hsiang Low

L. M. Nguyen, T. N. Hoang, P.-Y. Chen, editors, Federated Learning: Theory and Practice, chapter 16, Academic Press, pages 299-309, 2024

Abstract. This chapter explores incentive schemes that encourage clients to participate in federated learning (FL) and contribute more valuable data. Such schemes are important to enable collaboration in competitive situations where clients need justifiable incentives to participate and benefit others with information acquired at significant costs and resources, such as collecting and processing data, computing and communicating model updates, risking the privacy of data via shared model updates. Incentivization addresses these concerns through three key components: (1) fair contribution evaluation of each client's data, (2) client selection to maximize the utility of the global model, and (3) reward allocation to clients. Intuitively, clients desire higher valued rewards which should at least outweigh their costs. These and other requirements will be formally described as incentives. The chapter will also discuss some recent solutions and open problems to achieve these incentives in various settings, which includes settings where the contribution evaluation is declared or measured while the rewards can be monetary- or model-based.

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

Xiao Tian, Rachael Hwee Ling Sim, Jue Fan, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 15373-15381, 2024
23.75% acceptance rate

Abstract. Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions. With the rising interest in personal data ownership and data protection regulations, model owners will likely have to fulfil more data deletion requests. This raises issues that have not been addressed by existing works: Are the data valuation scores still fair with deletions? Must the scores be expensively recomputed? The answer is no. To avoid recomputations, we propose using our data valuation framework DeRDaVa upfront for valuing each data source's contribution to preserving robust model performance after anticipated data deletions. DeRDaVa can be efficiently approximated and will assign higher value to data that are more useful or less likely to be deleted. We further generalize DeRDaVa to Risk-DeRDaVa to cater to risk-averse/seeking model owners who are concerned with the worst/best-cases model utility. We also empirically demonstrate the practicality of our solutions.

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

Zhuanghua Liu, Luo Luo, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 14097-14105, 2024
23.75% acceptance rate

Abstract. We consider the finite-sum optimization problem of the form minx ∈ Rd f(x) = (1/n) (f1(x)+ ... + fi(x) + ... + fn(x)) where each fi(.) is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate of O((1 − (dκ)-1)⌈t/n⌉2) where κ is the condition number and t is the number of iterations. This paper proposes a more efficient quasi-Newton method by incorporating the Symmetric Rank-1 update into the incremental framework, which results in the better local convergence rate of O((1 − d-1)⌈t/n⌉2). Furthermore, we can boost our method by applying the block update on the Hessian approximation, which leads to the even faster convergence rate of O((1 − (kd)-1)⌈t/n⌉2) where k < d is the rank of update matrix. The numerical experiments show the proposed methods significantly outperform the baseline methods.

Decentralized Sum-of-Nonconvex Optimization

Zhuanghua Liu, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 14088-14096, 2024
23.75% acceptance rate

Abstract. We consider the optimization problem of minimizing the sum-of-nonconvex function, i.e., a convex function that is the average of nonconvex components. The existing stochastic algorithms for such a problem only focus on a single machine and the centralized scenario. In this paper, we study the sum-of-nonconvex optimization in the decentralized setting. We present a new theoretical analysis of the PMGT-SVRG algorithm to this problem and prove the linear convergence of their approach. However, the convergence rate of the PMGT-SVRG algorithm has a linear dependency on the condition number, which is undesirable for the ill-conditioned problem. To remedy this issue, we propose an accelerated stochastic decentralized first-order algorithm by incorporating the techniques of acceleration, gradient tracking, and multi-consensus mixing into the SVRG algorithm. The convergence rate of the proposed method has a square-root dependency on the condition number. The numerical experiments validate the theoretical guarantee of our proposed algorithms on both synthetic and real-world datasets.

Quantum Bayesian Optimization

Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 20179-20207, 2023
26.1% acceptance rate

Abstract. Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(poly log T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.

Batch Bayesian Optimization for Replicable Experimental Design

Zhongxiang Dai, Quoc Phong Nguyen, Sebastian Shenghong Tay, Daisuke Urano*, Richalynn Leong*, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 36476-36506, 2023
26.1% acceptance rate

Abstract. Many real-world experimental design problems (a) evaluate multiple experimental conditions in parallel and (b) replicate each condition multiple times due to large and heteroscedastic observation noise. Given a fixed total budget, this naturally induces a trade-off between evaluating more unique conditions while replicating each of them fewer times vs. evaluating fewer unique conditions and replicating each more times. Moreover, in these problems, practitioners may be risk-averse and hence prefer an input with both good average performance and small variability. To tackle both challenges, we propose the Batch Thompson Sampling for Replicable Experimental Design (BTS-RED) framework, which encompasses three algorithms. Our BTS-RED-Known and BTS-RED-Unknown algorithms, for, respectively, known and unknown noise variance, choose the number of replications adaptively rather than deterministically such that an input with a larger noise variance is replicated more times. As a result, despite the noise heteroscedasticity, both algorithms enjoy a theoretical guarantee and are asymptotically no-regret. Our Mean-Var-BTS-RED algorithm aims at risk-averse optimization and is also asymptotically no-regret. We also show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.

Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

Arun Verma, Zhongxiang Dai, Yao Shu, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4430-4451, 2023
26.1% acceptance rate

Abstract. We study a novel variant of the parameterized bandits problem in which the learner can observe auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback). We first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret. We then characterize the regret reduction in terms of the correlation coefficient between reward and auxiliary feedback. Experimental results in different settings also verify the performance gain achieved by our proposed method.

Incentives in Private Collaborative Machine Learning

Rachael Hwee Ling Sim, Yehong Zhang, Trong Nghia Hoang, Xinyi Xu, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 7555-7593, 2023
26.1% acceptance rate

Abstract. Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.

Model Shapley: Equitable Model Valuation with Black-box Access

Xinyi Xu, Chi Thanh Lam, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 43254-43283, 2023
26.1% acceptance rate

Abstract. Valuation methods of data and machine learning (ML) models are essential to the establishment of AI marketplaces. Importantly, certain practical considerations (e.g., operational constraints, legal restrictions) favor the use of model valuation over data valuation. Also, existing marketplaces that involve trading of pre-trained ML models call for an equitable model valuation method to price them. In particular, we investigate the black-box access setting which allows querying a model (to observe predictions) without disclosing model-specific information (e.g., architecture and parameters). By exploiting a Dirichlet abstraction of a model's predictions, we propose a novel and equitable model valuation method called model Shapley. We also leverage a formal connection between the similarity in models and that in model Shapley values (MSVs) to devise a learning approach for predicting MSVs of many vendors' models (e.g., 150) in a large-scale marketplace. We perform extensive empirical validation on the effectiveness of model Shapley using various real-world datasets and heterogeneous model types.

Bayesian Optimization with Cost-varying Variable Subsets

Sebastian Shenghong Tay, Chuan Sheng Foo*, Daisuke Urano*, Richalynn Leong*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 3008-3031, 2023
26.1% acceptance rate

Abstract. We introduce the problem of Bayesian optimization with cost-varying variable subsets (BOCVS) where in each iteration, the learner chooses a subset of query variables and specifies their values while the rest are randomly sampled. Each chosen subset has an associated cost. This presents the learner with the novel challenge of balancing between choosing more informative subsets for more directed learning versus leaving some variables to be randomly sampled to reduce incurred costs. This paper presents a novel Gaussian process upper confidence bound-based algorithm for solving the BOCVS problem that is provably no-regret. We analyze how the availability of cheaper control sets helps in exploration and reduces overall regret. We empirically show that our proposed algorithm can find significantly better solutions than comparable baselines with the same budget.

WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

Jingtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan Sheng Foo*, See-Kiong Ng*, and Bryan Kian Hsiang Low

arXiv:2310.00646, 2023

Abstract. The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate mapping from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.

Training-Free Neural Active Learning with Initialization-Robustness Guarantees

Apivich Hemachandra, Zhongxiang Dai, Jasraj Singh, See-Kiong Ng*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 12931-12971, 2023
27.9% acceptance rate

Abstract. Existing neural active learning algorithms have aimed to optimize the predictive performance of neural networks (NNs) by selecting data for labeling. However, other than a good predictive performance, being robust against random parameter initializations is also a crucial requirement in safety-critical applications. To this end, we introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points that lead to trained NNs with both good predictive performances and initialization robustness. Importantly, our EV-GP criterion is training-free, i.e., it does not require any training of the NN during data selection, which makes it computationally efficient. We empirically demonstrate that our EV-GP criterion is highly correlated with both initialization robustness and generalization performance, and show that it consistently outperforms baseline methods in terms of both desiderata, especially in situations with limited initial data or large batch sizes.

Fair yet Asymptotically Equal Collaborative Learning

Xiaoqiang Lin, Xinyi Xu, See-Kiong Ng*, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 21223-21259, 2023
27.9% acceptance rate

Abstract. In collaborative learning with streaming data, nodes (e.g., organizations) jointly and continuously learn a machine learning model by sharing the latest model updates computed from their latest streaming data. For the more resourceful nodes to be willing to share their model updates, they need to be fairly incentivized. This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. Our approach leverages an explore-then-exploit formulation to estimate the nodes' contributions (i.e., exploration) for realizing our theoretically guaranteed fair incentives (i.e., exploitation). However, we observe a "rich get richer" phenomenon arising from the existing approaches to guarantee fairness and it discourages the participation of the less resourceful nodes. To remedy this, we additionally preserve asymptotic equality, i.e., less resourceful nodes achieve equal performance eventually to the more resourceful/"rich" nodes. We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.

Collaborative Causal Inference with Fair Incentives

Rui Qiao, Xinyi Xu, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 28300-28320, 2023
27.9% acceptance rate

Abstract. Collaborative causal inference (CCI) aims to improve the estimation of the causal effect of treatment variables by utilizing data aggregated from multiple self-interested parties. However, since their source data are valuable proprietary assets that can be costly or tedious to obtain, every party has to be incentivized to be willing to contribute to the collaboration, for example, with a guaranteed fair and sufficiently valuable reward (than performing causal inference on its own). This paper presents a reward scheme designed using the unique statistical properties required by causal inference to guarantee certain desirable incentive criteria (such as fairness and benefit) for the parties based on their contributions. To achieve this, we first propose a data valuation method to value parties' data for CCI based on the distributional closeness of its resulting treatment effect estimate to that utilizing the aggregated data from all parties. Then, we show how to value the parties' rewards fairly based on a modified variant of the Shapley value arising from our proposed data valuation for CCI. Finally, the Shapley fair rewards are realized in the form of improved and stochastically perturbed treatment effect estimates to be returned to the parties. We empirically demonstrate the effectiveness of our reward scheme using simulated and real-world datasets.

FedHQL: Federated Heterogeneous Q-Learning

Flint Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Cheston Tan*, and Bryan Kian Hsiang Low

International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 2810-2812, 2023

Abstract. Federated Reinforcement Learning (FedRL) encourages distributed agents to learn collectively from each other's experience to improve their performance without exchanging their raw trajectories. The existing work on FedRL assumes that all participating agents are homogeneous, which requires all agents to share the same policy parameterization (e.g., network architectures and training configurations). However, in real-world applications, agents are often in disagreement about the architecture and the parameters, possibly also because of disparate computational budgets. Because homogeneity is not given in practice, we introduce the problem setting of Federated Reinforcement Learning with Heterogeneous And bLack-box agEnts (FedRL-HALE). We present the unique challenges this new setting poses and propose the Federated Heterogeneous Q-Learning (FedHQL) algorithm that principally addresses these challenges. We empirically demonstrate the efficacy of FedHQL in boosting the sample efficiency of heterogeneous agents with distinct policy parameterization using standard RL tasks.

Federated Neural Bandits

Zhongxiang Dai, Yao Shu, Arun Verma, Flint Xiaofeng Fan, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Learning Representations (ICLR), 2023
31.8% acceptance rate

Abstract. Recent works on neural contextual bandits have achieved compelling performances due to their ability to leverage the strong representation power of neural networks (NNs) for reward prediction. Many applications of contextual bandits involve multiple agents who collaborate without sharing raw observations, thus giving rise to the setting of federated contextual bandits}. Existing works on federated contextual bandits rely on linear or kernelized bandits, which may fall short when modeling complex real-world reward functions. So, this paper introduces the federated neural-upper confidence bound (FN-UCB) algorithm. To better exploit the federated setting, FN-UCB adopts a weighted combination of two UCBs: UCBa allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations), while UCBb uses an NN with aggregated parameters for reward prediction in a similar way to federated averaging for supervised learning. Notably, the weight between the two UCBs required by our theoretical analysis is amenable to an interesting interpretation, which emphasizes UCBa initially for accelerated exploration and relies more on UCBb later after enough observations have been collected to train the NNs for accurate reward prediction (i.e., reliable exploitation). We prove sub-linear upper bounds on both the cumulative regret and the number of communication rounds of FN-UCB, and empirically demonstrate its competitive performance.

Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation

Thanh Lam, Arun Verma, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Learning Representations (ICLR), 2023
31.8% acceptance rate

Abstract. We study the risk-aware reinforcement learning (RL) problem in the episodic finite-horizon Markov decision process with unknown transition and reward functions. In contrast to the risk-neutral RL problem, we consider minimizing the risk of having low rewards, which arise due to the intrinsic randomness of the MDPs and imperfect knowledge of the model. Our work provides a unified framework to analyze the regret of risk-aware RL policy with coherent risk measures in conjunction with non-linear function approximation, which gives the first sub-linear regret bounds in the setting. Finally, we validate our theoretical results via empirical experiments on synthetic and real-world data.

Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation

Yao Shu, Zhongxiang Dai, Weicong Sng, Arun Verma, Patrick Jaillet*, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2023
31.8% acceptance rate

Abstract. Zeroth-order (ZO) optimization, in which the derivative is unavailable, has recently succeeded in many important machine learning applications. Existing algorithms rely on finite difference (FD) methods for derivative estimation and gradient descent (GD)-based approaches for optimization. However, these algorithms suffer from query inefficiency because additional function queries are required for derivative estimation in their every GD update, which typically hinders their deployment in applications where every function query is expensive. To this end, we propose a trajectory-informed derivative estimation method which only uses the optimization trajectory (i.e., the history of function queries during optimization) and hence eliminates the need for additional function queries to estimate a derivative. Moreover, based on our derivative estimation, we propose the technique of dynamic virtual updates, which allows us to reliably perform multiple steps of GD updates without reapplying derivative estimation. Based on these two contributions, we introduce the zeroth-order optimization with trajectory-informed derivative estimation (ZoRD) algorithm for query-efficient ZO optimization. We theoretically demonstrate that our trajectory-informed derivative estimation and our ZoRD algorithm improve over existing approaches, which is then supported by our real-world experiments such as black-box adversarial attack, non-differentiable metric optimization and derivative-free reinforcement learning.

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

Tiedong Liu, and Bryan Kian Hsiang Low

arXiv:2305.14201, 2023

Abstract. We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.

No-regret Sample-efficient Bayesian Optimization for Finding Nash Equilibria with Unknown Utilities

Sebastian Shenghong Tay, Quoc Phong Nguyen, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

International Conference on Artificial Intelligence and Statistics (AISTATS), pages 3591-3619, 2023
29% acceptance rate

Abstract. The Nash equilibrium (NE) is a classic solution concept for normal-form games that is stable under potential unilateral deviations by self-interested agents. Bayesian optimization (BO) has been used to find NE in continuous general-sum games with unknown costly-to-sample utility functions in a sample-efficient manner. This paper presents the first no-regret BO algorithm that is sample-efficient in finding pure NE by leveraging theory on high probability confidence bounds with Gaussian processes and the maximum information gain of kernel functions. Unlike previous works, our algorithm is theoretically guaranteed to converge to the optimal solution (i.e., NE). We also introduce the novel setting of applying BO to finding mixed NE in unknown discrete general-sum games and show that our theoretical framework is general enough to be extended naturally to this setting by developing a no-regret BO algorithm that is sample-efficient in finding mixed NE. We empirically show that our algorithms are competitive w.r.t. suitable baselines in finding NE.

FAIR: Fair Collaborative Active Learning with Individual Rationality for Scientific Discovery

Xinyi Xu, Zhaoxuan Wu, Arun Verma, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4033-4057, 2023
29% acceptance rate

Abstract. Scientific discovery aims to find new patterns and test specific hypotheses by analysing large-scale experimental data. However, various practical limitations, like high experimental costs or the inability to do some experiments, make it challenging for researchers to collect sufficient experimental data for successful scientific discovery. To this end, we propose a framework named collaborative active learning (CAL) that enables researchers to share their experimental data for mutual benefit. Specifically, our proposed coordinated acquisition function sets out to achieve individual rationality and fairness so that everyone can equitably benefit from collaboration. Finally, we empirically demonstrate that our method outperforms existing batch active learning methods adapted to the CAL setting in terms of both learning performance and fairness on various real-world scientific discovery datasets (biochemistry, material science, and physics).

Pruning during Training by Network Efficacy Modeling

Mohit Rajpal, Yehong Zhang, and Bryan Kian Hsiang Low

Machine Learning (Special Issue on ECML-PKDD 2022 Journal Track), volume 112, issue 7, pages 2653-2684, 2023

Abstract. Deep neural networks (DNNs) are costly to train. Pruning, an approach to alleviate model complexity by zeroing out or pruning DNN elements, has shown promise in reducing training costs for DNNs with little to no efficacy at a given task. This paper presents a novel method to perform early pruning of DNN elements (e.g., neurons or convolutional filters) during the training process while minimizing losses to model performance. To achieve this, we model the efficacy of DNN elements in a Bayesian manner conditioned upon efficacy data collected during the training and prune DNN elements with low predictive efficacy after training completion. Empirical evaluations show that the proposed Bayesian early pruning improves the computational efficiency of DNN training while better preserving model performance compared to other tested pruning approaches.

Recursive Reasoning-based Training-time Adversarial Machine Learning

Yizhou Chen, Zhongxiang Dai, Haibin Yu, Bryan Kian Hsiang Low, and Teck-Hua Ho

Artificial Intelligence (Special Issue on Risk-Aware Autonomous Systems: Theory and Practice), volume 315, pages 103837, 2023

Abstract. The training process of a machine learning (ML) model may be subject to adversarial attacks from an attacker who attempts to undermine the test performance of the ML model by perturbing the training minibatches, and thus needs to be protected by a defender. Such a problem setting is referred to as training-time adversarial ML. We formulate it as a two-player game and propose a principled Recursive Reasoning-based Training-Time adversarial ML (R2T2) framework to model this game. R2T2 models the reasoning process between the attacker and the defender and captures their bounded reasoning capabilities (due to bounded computational resources) through the recursive reasoning formalism. In particular, we associate a deeper level of recursive reasoning with the use of a higher-order gradient to derive the attack (defense) strategy, which naturally improves its performance while requiring greater computational resources. Interestingly, our R2T2 framework encompasses a variety of existing adversarial ML methods which correspond to attackers (defenders) with different recursive reasoning capabilities. We show how an R2T2 attacker (defender) can utilize our proposed nested projected gradient descent-based method to approximate the optimal attack (defense) strategy at an arbitrary level of reasoning. R2T2 can empirically achieve state-of-the-art attack and defense performances on benchmark image datasets.

Probably Approximate Shapley Fairness with Applications in Machine Learning

Zijian Zhou, Xinyi Xu, Rachael Hwee Ling Sim, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 5910-5918, 2023
19.6% acceptance rate (oral presentation)

Abstract. The Shapley value (SV) is adopted in various scenarios in machine learning including data valuation, agent valuation, feature attribution etc. because it guarantees fairness (i.e., satisfying several fairness axioms) and that fairness is important in such valuations (e.g., for pricing in a marketplace). However, the exact calculation of SVs (with exponential time complexity) is infeasible in practice, so estimations are necessary. Consequently, it raises the following important question: The exact SVs guarantee fairness, but do the SV estimates also guarantee the same fairness? We show the fairness guaranteed by exact SVs is too restrictive (for SV estimates) and generalise it to a probably approximate fairness. We propose fidelity score, a metric to measure the variation of SV estimates and establish theoretically the relationship between fidelity score and fairness guarantee. We exploit the relationship between fidelity score and fairness guarantee to propose a novel greedy active estimation (GAE) with theoretical guarantees. We theoretically show GAE achieves better fairness guarantee than the de facto Monte-Carlo estimation and empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various specific ML scenarios using real-world datasets.

Sample-Then-Optimize Batch Neural Thompson Sampling

Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 23331-23344, 2022
25.6% acceptance rate

Abstract. Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and derived theoretical guarantees using the theory of neural tangent kernel (NTK). However, these works suffer from the limitations of the requirement to invert an extremely large parameter matrix and the restriction to the sequential (rather than batch) setting. To overcome these limitations, we introduce two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO-BNTS-Linear. To choose an input query, we only need to train an NN (resp. a linear model) and then choose the query by maximizing the trained NN (resp. linear model), which is equivalently sampled from the GP posterior with the NTK as the kernel function. As a result, our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy. Next, we derive regret upper bounds for our algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions. Finally, we verify their empirical effectiveness using practical AutoML and reinforcement learning experiments.

Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 30542-30553, 2022
25.6% acceptance rate

Abstract. This paper investigates the problem of fairly trading off between payoff and model rewards in collaborative machine learning (ML) where parties aggregate their datasets together to obtain improved ML models over that of each party. Supposing parties can afford the optimal model trained on the aggregated dataset, we propose an allocation scheme that distributes the payoff fairly. Notably, the same scheme can be derived from two different approaches based on (1) desirable properties of the parties' payoffs or (2) that of the underlying payoff flows from one party to another. While the former is conceptually simpler, the latter can be used to handle the practical constraint on the budgets of parties. In particular, we propose desirable properties for achieving a fair adjustment of the payoff flows that can trade off between the model reward's performance and the payoff reward. We empirically demonstrate that our proposed scheme is a sensible solution in several scenarios of collaborative ML with different budget constraints.

Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search

Yao Shu, Zhongxiang Dai, Zhaoxuan Wu, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 33001-33015, 2022
25.6% acceptance rate

Abstract. Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical interpretation for their empirical performances, and (c) there may exist untapped potential in existing training-free NAS, which probably can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Remarkably, HNAS can enjoy the advantages of both training-free (i.e., superior search efficiency) and training-based (i.e., remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.

On Provably Robust Meta-Bayesian Optimization

Zhongxiang Dai, Yizhou Chen, Haibin Yu, Bryan Kian Hsiang Low, and Patrick Jaillet*

Conference on Uncertainty in Artificial Intelligence (UAI), pages 475-485, 2022
32.3% acceptance rate

Abstract. Bayesian optimization (BO) has become popular for sequential optimization of black-box functions. When BO is used to optimize a target function, we often have access to previous evaluations of potentially related functions. This begs the question as to whether we can leverage these previous experiences to accelerate the current BO task through meta-learning (meta-BO), while ensuring robustness against potentially harmful dissimilar tasks that could sabotage the convergence of BO. This paper introduces two scalable and provably robust meta-BO algorithms: robust meta-Gaussian process-upper confidence bound (RM-GP-UCB) and RM-GP-Thompson sampling (RM-GP-TS). We prove that both algorithms are asymptotically no-regret even when some or all previous tasks are dissimilar to the current task, and show that RM-GP-UCB enjoys a better theoretical robustness than RM-GP-TS. We also exploit the theoretical guarantees to optimize the weights assigned to individual previous tasks through regret minimization via online learning, which diminishes the impact of dissimilar tasks and hence further enhances the robustness. Empirical evaluations show that (a) RM-GP-UCB performs effectively and consistently across various applications, and (b) RM-GP-TS, despite being less robust than RM-GP-UCB both in theory and in practice, performs competitively in some scenarios with less dissimilar tasks and is more computationally efficient.

Neural Ensemble Search via Bayesian Sampling

Yao Shu, Yizhou Chen, Zhongxiang Dai, and Bryan Kian Hsiang Low

Conference on Uncertainty in Artificial Intelligence (UAI), pages 1803-1812, 2022
32.3% acceptance rate

Abstract. Recently, neural architecture search (NAS) has been applied to automate the design of neural networks in real-world applications. A large number of algorithms have been developed to improve the search cost or the performance of the final selected architectures in NAS. Unfortunately, these NAS algorithms aim to select only one single well-performing architecture from their search spaces and thus have overlooked the capability of neural network ensemble (i.e., an ensemble of neural networks with diverse architectures) in achieving improved performance over a single final selected architecture. To this end, we introduce a novel neural ensemble search algorithm, called neural ensemble search via Bayesian sampling (NESBS), to effectively and efficiently select well-performing neural network ensembles from a NAS search space. In our extensive experiments, NESBS algorithm is shown to be able to achieve improved performance over state-of-the-art NAS algorithms while incurring a comparable search cost, indicating the superior performance of our NESBS algorithm over these conventional NAS algorithms in practice.

On the Convergence of the Shapley Value in Parametric Bayesian Learning Games

Lucas Agussurja, Xinyi Xu, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 180-196, 2022
21.9% acceptance rate

Abstract. Measuring contributions is a classical problem in cooperative game theory where the Shapley value is the most well-known solution concept. In this paper, we establish the convergence property of the Shapley value in parametric Bayesian learning games where players perform a Bayesian inference using their combined data, and the posterior-prior KL divergence is used as the characteristic function. We show that for any two players, under some regularity conditions, their difference in Shapley value converges in probability to the difference in Shapley value of a limiting game whose characteristic function is proportional to the log-determinant of the joint Fisher information. As an application, we present an online collaborative learning framework that is asymptotically Shapley-fair. Our result enables this to be achieved without any costly computations of posterior-prior KL divergences. Only a consistent estimator of the Fisher information is needed. The framework's effectiveness is demonstrated with experiments using real-world data.

Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity

Sebastian Shenghong Tay, Chuan Sheng Foo*, Daisuke Urano*, Richalynn Leong*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 21180-21204, 2022
21.9% acceptance rate

Abstract. In distributionally robust Bayesian optimization (DRBO), an exact computation of the worst-case expected value requires solving an expensive convex optimization problem. We develop a fast approximation of the worst-case expected value based on the notion of worst-case sensitivity that caters to arbitrary convex distribution distances. We provide a regret bound for our novel DRBO algorithm with the fast approximation, and empirically show it is competitive with that using the exact worst-case expected value while incurring significantly less computation time. In order to guide the choice of distribution distance to be used with DRBO, we show that our approximation implicitly optimizes an objective close to an interpretable risk-sensitive value.

Bayesian Optimization under Stochastic Delayed Feedback

Arun Verma, Zhongxiang Dai, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 22145-22167, 2022
21.9% acceptance rate

Abstract. Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization of complex and expensive-to-compute black-box functions. The existing BO methods assume that the function evaluation (feedback) is available to the learner immediately or after a fixed delay. Such assumptions may not be practical in many real-life problems like clinical trials, online recommendations, and hyper-parameters tuning, where feedback is available after a random delay. To benefit from the experimental parallelization in these problems, the learner needs to start new function evaluations without waiting for delayed feedback. In this paper, we consider the BO under stochastic delayed feedback problem. We propose algorithms with sub-linear regret guarantees that efficiently address the dilemma of selecting new function queries while waiting for randomly delayed feedback. Building on our results, we also make novel contributions to batch BO and contextual Gaussian process bandits. Our experiments on synthetic and real-life datasets verify the performance of proposed algorithms.

DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Zhaoxuan Wu, Yao Shu, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 24150-24176, 2022
21.9% acceptance rate

Abstract. Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications, e.g., collaborative machine learning, data marketplaces, etc. Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, making it more trustworthy in practice.

Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges

Rachael Hwee Ling Sim, Xinyi Xu, and Bryan Kian Hsiang Low

International Joint Conference on Artificial Intelligence (IJCAI), pages 5607-5614, 2022
18.2% acceptance rate

Abstract. Data valuation in machine learning (ML) is an emerging research area that studies the worth of data in ML. Data valuation is used in collaborative ML to determine a fair compensation for every data owner and in interpretable ML to identify the most responsible, noisy, or misleading training examples. This paper presents a comprehensive technical survey that provides a new formal study of data valuation in ML through its "ingredients" and the corresponding properties, grounds the discussion of common desiderata satisfied by existing data valuation strategies on our proposed ingredients, and identifies open research challenges for designing new ingredients, data valuation strategies, and cost reduction techniques.

Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Quoc Phong Nguyen, Ryutaro Oikawa, Dinil Mon Divakaran*, Mun Choon Chan*, and Bryan Kian Hsiang Low

ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS), pages 351-363, 2022
18.4% acceptance rate

Abstract. As the use of machine learning (ML) models is becoming increasingly popular in many real-world applications, there are practical challenges that need to be addressed for model maintenance. One such challenge is to 'undo' the effect of a specific subset of dataset used for training a model. This specific subset may contain malicious or adversarial data injected by an attacker, which affects the model performance. Another reason may be the need for a service provider to remove data pertaining to a specific user to respect the user's privacy. In both cases, the problem is to 'unlearn' a specific subset of the training data from a trained model without incurring the costly procedure of retraining the whole model from scratch. Towards this goal, this paper presents a Markov chain Monte Carlo-based machine unlearning (MCU) algorithm. MCU helps to effectively and efficiently unlearn a trained model from subsets of training dataset. Furthermore, we show that with MCU, we are able to explain the effect of a subset of a training dataset on the model prediction. Thus, MCU is useful for examining subsets of data to identify the adversarial data to be removed. Similarly, MCU can be used to erase the lineage of a user's personal data from trained ML models, thus upholding a user's "right to be forgotten". We empirically evaluate the performance of our proposed MCU algorithm on real-world phishing and diabetes datasets. Results show that MCU can achieve a desirable performance by efficiently removing the effect of a subset of training dataset and outperform an existing algorithm that utilizes the remaining dataset.

Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization

Shouri Hu, Haowei Wang, Zhongxiang Dai, Bryan Kian Hsiang Low, and Szu Hui Ng

arXiv:2205.04901, 2022

Abstract. The expected improvement (EI) is one of the most popular acquisition functions for Bayesian optimization (BO) and has demonstrated good empirical performances in many applications for the minimization of simple regret. However, under the evaluation metric of cumulative regret, the performance of EI may not be competitive, and its existing theoretical regret upper bound still has room for improvement. To adapt the EI for better performance under cumulative regret, we introduce a novel quantity called the evaluation cost which is compared against the acquisition function, and with this, develop the expected improvement-cost (EIC) algorithm. In each iteration of EIC, a new point with the largest acquisition function value is sampled, only if that value exceeds its evaluation cost. If none meets this criteria, the current best point is resampled. This evaluation cost quantifies the potential downside of sampling a point, which is important under the cumulative regret metric as the objective function value in every iteration affects the performance measure. We further establish in theory a tight regret upper bound of EIC for the squared-exponential covariance kernel under mild regularity conditions, and perform experiments to illustrate the improvement of EIC over several popular BO algorithms.

NASI: Label- and Data-agnostic Neural Architecture Search at Initialization

Yao Shu, Shaofeng Cai, Zhongxiang Dai, Beng Chin Ooi*, and Bryan Kian Hsiang Low

International Conference on Learning Representations (ICLR), 2022
32.29% acceptance rate

Abstract. Recent years have witnessed a surging interest in Neural Architecture Search (NAS). Various algorithms have been proposed to improve the search efficiency and effectiveness of NAS, i.e., to reduce the search cost and improve the generalization performance of the selected architectures, respectively. However, the search efficiency of these algorithms is severely limited by the need for model training during the search process. To overcome this limitation, we propose a novel NAS algorithm called NAS at Initialization (NASI) that exploits the capability of a Neural Tangent Kernel in being able to characterize the performance of candidate architectures at initialization, hence allowing model training to be completely avoided to boost the search efficiency. Besides the improved search efficiency, NASI also achieves competitive search effectiveness on various datasets like CIFAR-10/100 and ImageNet. Further, NASI is shown to be label- and data-agnostic under mild conditions, which guarantees the transferability of architectures selected by our NASI over different datasets.

Near-Optimal Task Selection for Meta-Learning with Mutual Information and Online Variational Bayesian Unlearning

Yizhou Chen, Shizhuo Zhang, and Bryan Kian Hsiang Low

International Conference on Artificial Intelligence and Statistics (AISTATS), pages 9091-9113, 2022
29.2% acceptance rate

Abstract. This paper addresses the problem of active task selection which involves selecting the most informative tasks for meta-learning. We propose a novel active task selection criterion based on the mutual information between latent task vectors. Unfortunately, such a criterion scales poorly in the number of candidate tasks when optimized. To resolve this issue, we exploit the submodularity property of our new criterion for devising the first active task selection algorithm for meta-learning with a near-optimal performance guarantee. To further improve the efficiency of our algorithm, we propose an online variant of the Stein variational gradient descent to perform fast belief updates of the meta-parameters via maintaining a set of forward (and backward) particles when learning (or unlearning) from each selected task. We empirically demonstrate the performance of our proposed algorithm on real-world datasets.

Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards

Sebastian Shenghong Tay, Xinyi Xu, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 9448-9456, 2022
4.26% acceptance rate (oral presentation)

Abstract. This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data to a pool for training a generative model (e.g., GAN), from which synthetic data are drawn and distributed to the parties as rewards commensurate to their contributions. Distributing synthetic data as rewards (instead of trained models or money) offers task- and model-agnostic benefits for downstream learning tasks and is less likely to violate data privacy regulation. To realize the framework, we firstly propose a data valuation function using maximum mean discrepancy (MMD) that values data based on its quantity and quality in terms of its closeness to the true data distribution and provide theoretical results guiding the kernel choice in our MMD-based data valuation function. Then, we formulate the reward scheme as a linear optimization problem that when solved, guarantees certain incentives such as fairness in the CGM framework. We devise a weighted sampling algorithm for generating synthetic data to be distributed to each party as reward such that the value of its data and the synthetic data combined matches its assigned reward value by the reward scheme. We empirically show using simulated and real-world datasets that the parties' synthetic data rewards are commensurate to their contributions.

Rectified Max-Value Entropy Search for Bayesian Optimization

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

arXiv:2202.13597, 2022

Abstract. Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they encourage an accurate measure of the mutual information such as the rectified MES (RMES) acquisition function we develop in this work. Unlike the evaluation of MES, we derive a closed-form probability density for the observation conditioned on the max-value and employ stochastic gradient ascent with reparameterization to efficiently optimize RMES. As a result of a more principled acquisition function, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.

Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee

Flint Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei Jing, Cheston Tan*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 1007-1021, 2021
25.6% acceptance rate

Abstract. The growing literature of Federated Learning (FL) has recently inspired Federated Reinforcement Learning (FRL) to encourage multiple agents to federatively build a better decision-making policy without sharing raw trajectories. Despite its promising applications, existing FRL work fails to I) provide theoretical analysis on its convergence; II) account for random system failures and adversarial attacks. Towards this end, we propose the first FRL framework, the convergence of which is tolerant to less than half of participating agents being random system failures or adversarial attackers. We prove that the sample efficiency of the proposed framework is guaranteed to scale with the number of agents, accounting for such potential failures or attacks. We empirically verify all theoretical results on various RL benchmarking tasks.

Optimizing Conditional Value-At-Risk of Black-Box Functions

Quoc Phong Nguyen, Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4170-4180, 2021
25.6% acceptance rate

Abstract. This paper presents two Bayesian optimization (BO) algorithms with theoretical performance guarantee to maximize the conditional value-at-risk (CVaR) of a black-box function: CV-UCB and CV-TS which are based on the well-established principle of optimism in the face of uncertainty and Thompson sampling, respectively. To achieve this, we develop an upper confidence bound of CVaR and prove the no-regret guarantee of CV-UCB by utilizing an interesting connection between CVaR and value-at-risk (VaR). For CV-TS, though it is straightforwardly performed with Thompson sampling, bounding its Bayesian regret is non-trivial because it requires a tail expectation bound for the distribution of CVaR of a black-box function, which has not been shown in the literature. The performances of both CV-UCB and CV-TS are empirically evaluated in optimizing CVaR of synthetic benchmark functions and simulated real-world optimization problems.

Differentially Private Federated Bayesian Optimization with Distributed Exploration

Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 9125-9139, 2021
25.6% acceptance rate

Abstract. Bayesian optimization (BO) has recently been extended to the federated learning (FL) setting by the federated Thompson sampling (FTS) algorithm. However, FTS is not equipped with a rigorous privacy guarantee which is an important consideration in FL. Recent works have incorporated differential privacy (DP) into the training of deep neural networks through a general framework for adding DP to iterative algorithms. Following this general DP framework, our work here integrates DP into FTS to preserve user-level privacy. We also leverage the ability of this general DP framework to handle different parameter vectors, as well as the technique of local modeling for BO, to further improve the utility of our algorithm through distributed exploration (DE). The resulting differentially private FTS with DE (DP-FTS-DE) algorithm is endowed with theoretical guarantees for both the privacy and utility and is amenable to interesting theoretical insights about the privacy-utility trade-off. We also use real-world experiments to show that DP-FTS-DE achieves high utility (competitive performance) with a strong privacy guarantee (small privacy loss) and induces a practical trade-off between privacy and utility.

Validation Free and Replication Robust Volume-based Data Valuation

Xinyi Xu, Zhaoxuan Wu, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 10837-10848, 2021
25.6% acceptance rate

Abstract. Data valuation arises as a non-trivial challenge in use cases such as collaborative data sharing, data markets and etc. The value of data is often related to the learning performance, e.g. validation accuracy, of the model trained on the data. While intuitive, this methodology introduces a high coupling between data valuation and validation, which may not be desirable in practice. For instance, data providers may disagree on the choice of the validation set, or the validation set may be (statistically) different from the actual application. A separate but practical issue is data replication. If some data points are valuable, a dishonest data provider may offer a dataset containing replications of these data points, trying to exploit the valuation to get a higher reward/payment. Based on the ordinary least squares framework, our data valuation method does not require validation, and still provides a useful connection between the value of data and learning performance. In particular, we utilize the volume of the data matrix (determinant of its left Gram), thus able to provide an intuitive interpretation of the value of data via the diversity in the data. Furthermore, we formalize the robustness to data replication, and propose a robust volume valuation with robustness guarantees. We conduct extensive experiments to demonstrate its consistency and practical advantages over existing baselines.

Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning

Xinyi Xu, Lingjuan Lyu, Xingjun Ma, Chenglin Miao, Chuan Sheng Foo*, and Bryan Kian Hsiang Low

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 16104-16117, 2021
25.6% acceptance rate

Abstract. In collaborative machine learning (CML), multiple agents pool their resources (e.g., data) together for a common learning task. In realistic CML settings where the agents are self-interested and not altruistic, they may be unwilling to share data or model information without adequate rewards. Furthermore, as the data/model information shared by the agents may differ in quality, designing rewards which are fair to them is important so that they would not feel exploited nor discouraged from sharing. In this paper, we adopt federated learning as the CML paradigm, propose a novel cosine gradient Shapley value (CGSV) to fairly evaluate the expected marginal contribution of each agent's uploaded model parameter update/gradient without needing an auxiliary validation dataset, and based on the CGSV, design a novel training-time gradient reward mechanism with a fairness guarantee by sparsifying the aggregated parameter update/gradient downloaded from the server as reward to each agent such that its resulting quality is commensurate to that of the agent’s uploaded parameter update/gradient. We empirically demonstrate the effectiveness of our fair gradient reward mechanism on multiple benchmark datasets in terms of fairness, predictive performance, and time overhead.

Model Fusion for Personalized Learning

Thanh Chi Lam, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Machine Learning (ICML), pages 5948-5958, 2021
21.5% acceptance rate

Abstract. Production systems operating on a growing domain of analytic services often require generating warm-start solution models for emerging tasks with limited data. One potential approach to address this challenge is to adopt meta learning to generate a base model that can be adapted to solve unseen tasks with minimal fine-tuning. This however requires the training processes of previous solution models of existing tasks to be synchronized. This is not possible if these models were pre-trained separately on private data owned by different entities and cannot be synchronously re-trained. To accommodate for such scenarios, we develop a new personalized learning framework that synthesizes customized models for unseen tasks via fusion of independently pre-trained models of related tasks. We establish performance guarantee for the proposed framework and demonstrate its effectiveness on both synthetic and real datasets.

Value-at-Risk Optimization with Gaussian Processes

Quoc Phong Nguyen, Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Machine Learning (ICML), pages 8063-8072, 2021
21.5% acceptance rate

Abstract. Value-at-risk (VaR) is an established measure to assess risks in critical real-world applications with random environmental factors. This paper presents a novel VaR upper confidence bound (V-UCB) algorithm for maximizing the VaR of a black-box objective function with the first no-regret guarantee. To realize this, we first derive a confidence bound of VaR and then prove the existence of values of the environmental random variable (to be selected to achieve no regret) such that the confidence bound of VaR lies within that of the objective function evaluated at such values. Our V-UCB algorithm empirically demonstrates state-of-the-art performance in optimizing synthetic benchmark functions, a portfolio optimization problem, and a simulated robot task.

Collaborative Bayesian Optimization with Fair Regret

Rachael Hwee Ling Sim, Yehong Zhang, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Machine Learning (ICML), pages 9691-9701, 2021
21.5% acceptance rate

Abstract. Bayesian optimization (BO) is a popular tool for optimizing complex and costly-to-evaluate black-box objective functions. To further reduce the number of function evaluations, any party performing BO may be interested to collaborate with others to optimize the same objective function concurrently. To do this, existing BO algorithms have considered optimizing a batch of input queries in parallel and provided theoretical bounds on their cumulative regret reflecting inefficiency. However, when the objective function values are correlated with real-world rewards (e.g., money), parties may be hesitant to collaborate if they risk incurring larger cumulative regret (i.e., smaller real-world reward) than others. This paper shows that fairness and efficiency are both necessary for the collaborative BO setting. Inspired by social welfare concepts from economics, we propose a new notion of regret capturing these properties and a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding the new regret, both of which share an adjustable parameter for trading off between fairness vs. efficiency. We empirically demonstrate the benefits (e.g., increased fairness) of our algorithm using synthetic and real-world datasets.

Convolutional Normalizing Flows for Deep Gaussian Processes

Haibin Yu, Dapeng Liu, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Joint Conference on Neural Networks (IJCNN), 2021

Abstract. Deep Gaussian processes (DGPs), a hierarchical composition of GP models, have successfully boosted the expressive power of their single-layer counterpart. However, it is impossible to perform exact inference in DGPs, which has motivated the recent development of variational inference-based methods. Unfortunately, either these methods yield a biased posterior belief or it is difficult to evaluate their convergence. This paper introduces a new approach for specifying flexible, arbitrarily complex, and scalable approximate posterior distributions. The posterior distribution is constructed through a normalizing flow (NF) which transforms a simple initial probability into a more complex one through a sequence of invertible transformations. Moreover, a novel convolutional normalizing flow (CNF) is developed to improve the time efficiency and capture dependency between layers. Empirical evaluation shows that CNF DGP outperforms the state-of-the-art approximation methods for DGPs.

Learning to Learn with Gaussian Processes

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

Conference on Uncertainty in Artificial Intelligence (UAI), pages 1466-1475, 2021
26.5% acceptance rate

Abstract. This paper presents Gaussian process meta-learning (GPML) for few-shot regression, which explicitly exploits the distance between regression problems/tasks using a novel task kernel. It contrasts sharply with the popular metric-based meta-learning approach which is based on the distance between data inputs or their embeddings in the few-shot learning literature. Apart from the superior predictive performance by capturing the diversity of different tasks, GPML offers a set of representative tasks that are useful for understanding the task distribution. We empirically demonstrate the performance and interpretability of GPML in several few-shot regression problems involving a multimodal task distribution and real-world datasets.

Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization

Quoc Phong Nguyen, Zhaoxuan Wu, Bryan Kian Hsiang Low, and Patrick Jaillet*

Conference on Uncertainty in Artificial Intelligence (UAI), pages 1486-1495, 2021
26.5% acceptance rate

Abstract. Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch size. To alleviate these issues, this paper presents a novel trusted-maximizers entropy search (TES) acquisition function: It measures how much an input query contributes to the information gain on the maximizer over a finite set of trusted maximizers, i.e., inputs optimizing functions that are sampled from the Gaussian process posterior belief of the objective function. Evaluating TES requires either only a stochastic approximation with sampling or a deterministic approximation with expectation propagation, both of which are investigated and empirically evaluated using synthetic benchmark objective functions and real-world optimization problems, e.g., hyperparameter tuning of a convolutional neural network and synthesizing ‘physically realizable’ faces to fool a black-box face recognition system. Though TES can naturally be generalized to a batch variant with either approximation, the latter is amenable to be scaled to a much larger batch size in our experiments.

AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings

Trong Nghia Hoang, Shenda Hong, Cao Xiao, Bryan Kian Hsiang Low, and Jimeng Sun

The Web Conference (WWW), pages 3569-3581, 2021
20.6% acceptance rate

Abstract. This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model's predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model’s architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real-world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented.

An Information-Theoretic Framework for Unifying Active Learning Problems

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

AAAI Conference on Artificial Intelligence, pages 9126-9134, 2021
21.4% acceptance rate

Abstract. This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship between LSE and BO, we design a competitive information-theoretic acquisition function for BO that has interesting connections to upper confidence bound and max-value entropy search (MES). The latter connection reveals a drawback of MES which has important implications on not only MES but also on other MES-based acquisition functions. Finally, our unifying information-theoretic framework can be applied to solve a generalized problem of LSE and BO involving multiple level sets in a data-efficient manner. We empirically evaluate the performance of our proposed algorithms using synthetic benchmark functions, a real-world dataset, and in hyperparameter tuning of machine learning models.

Top-k Ranking Bayesian Optimization

Quoc Phong Nguyen, Sebastian Tay, Bryan Kian Hsiang Low, and Patrick Jaillet*

AAAI Conference on Artificial Intelligence, pages 9135-9143, 2021
21.4% acceptance rate

Abstract. This paper presents a novel approach to top-k ranking Bayesian optimization (top-k ranking BO) which is a practical and significant generalization of preferential BO to handle top-k ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equally important contribution is the introduction of the first information-theoretic acquisition function in BO with preferential observation called multinomial predictive entropy search (MPES) which is flexible in handling these observations and optimized for all inputs of a query jointly. MPES possesses superior performance compared with existing acquisition functions that select the inputs of a query one at a time greedily. We empirically evaluate the performance of MPES using several synthetic benchmark functions, CIFAR-10 dataset, and SUSHI preference dataset.

FCM-Sketch: Generic Network Measurements with Data Plane Support

Cha Hwan Song, Pravein Govindan Kannan, Bryan Kian Hsiang Low, and Mun Choon Chan*

International Conference on emerging Networking EXperiments and Technologies (CoNEXT), pages 78-92, 2020
24% acceptance rate

Abstract. Sketches have successfully provided accurate and fine-grained measurements (e.g., flow size and heavy hitters) which are imperative for network management. In particular, Count-Min (CM) sketch is widely utilized in many applications due to its simple design and ease of implementation. There have been many efforts to build monitoring frameworks based on Count-Min sketch. However, these frameworks either support very specific measurement tasks or they cannot be implemented on high-speed programmable hardware (PISA). In this work, we propose FCM, a framework that is designed to support generic network measurement with high accuracy. Our key contribution is FCM-Sketch, a data structure that has a lightweight implementation on the emerging PISA programmable switches. FCM-Sketch can also be used as a substitute for CM-Sketch in applications that use CM-Sketch. We have implemented FCM-Sketch on a commodity programmable switch (Barefoot Tofino) using the P4 language. Our evaluation shows that FCM-Sketch can reduce the errors in many measurement tasks by 50% to 80% compared to CM-Sketch and other state-of-the-art approaches.

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Harold Soh*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4187-4198, 2020
20.1% acceptance rate

Abstract. In this paper, we focus on the problem of Inverse Reinforcement Learning (IRL), which is relevant for a variety of tasks including value alignment and robot learning from demonstration. Despite significant algorithmic contributions in recent years, IRL remains an ill-posed problem at its core; multiple reward functions coincide with the observed behavior, and the actual reward function is not identifiable without prior knowledge or supplementary information. Here, we propose Bayesian Optimization-IRL (BO-IRL), an IRL framework that identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian Optimization along with our newly proposed kernel that (a) projects the parameters of policy invariant reward functions to a single point in a latent space, and (b) ensures that nearby points in the latent space correspond to reward functions that yield similar likelihoods. This projection allows for the use of standard stationary kernels in the latent space to capture the correlations present across the reward function space. Empirical results on synthetic and real-world environments (model-free and model-based) show that BO-IRL discovers multiple reward functions while minimizing the number of expensive exact policy optimizations.

Federated Bayesian Optimization via Thompson Sampling

Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 9687-9699, 2020
20.1% acceptance rate

Abstract. Bayesian optimization (BO) is a prominent method for optimizing expensive-to-compute black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to immense recent interest in federated learning (FL), which focuses on collaborative training of deep neural networks (DNN) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNN lack access to gradients and thus require zeroth-order optimization (black-box optimization). This hints at the considerable potential of extending BO to the FL setting (FBO), to allow agents to collaborate in these black-box optimization tasks. Here, we introduce federated Thompson sampling (FTS), which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency and practical performance.

Variational Bayesian Unlearning

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 16025-16036, 2020
20.1% acceptance rate

Abstract. This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler distance between the approximate posterior belief of model parameters after directly unlearning from the erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i.e., including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge. We empirically demonstrate our unlearning methods on Bayesian models such as sparse Gaussian process and logistic regression using synthetic and real-world datasets.

Nonmyopic Gaussian Process Optimization with Macro-Actions

Dmitrii Kharkovskii, Chun Kai Ling, and Bryan Kian Hsiang Low

International Conference on Artificial Intelligence and Statistics (AISTATS), pages 4593-4604, 2020
28.7% acceptance rate

Abstract. This paper presents a multi-staged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we generalize GP upper confidence bound to a new acquisition function defined w.r.t. a nonmyopic adaptive macro-action policy, which is intractable to be optimized exactly due to an uncountable set of candidate outputs. The contribution of our work here is thus to derive a nonmyopic adaptive ϵ-Bayes-optimal macro-action GPO (ϵ-Macro-GPO) policy. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our ϵ-Macro-GPO policy with a performance guarantee. We empirically evaluate the performance of our ϵ-Macro-GPO policy and its anytime variant in BO with synthetic and real-world datasets.

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

Zhongxiang Dai, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet*, and Teck-Hua Ho

International Conference on Machine Learning (ICML), pages 2291-2301, 2020
21.8% acceptance rate

Abstract. This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion

Trong Nghia Hoang, Thanh Lam, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Machine Learning (ICML), pages 4282-4292, 2020
21.8% acceptance rate

Abstract. Model fusion is an emerging study in collective learning where heterogeneous experts with private data and learning architectures need to combine their black-box knowledge for better performance. Existing literature achieves this via a local knowledge distillation scheme that transfuses the predictive patterns of each pre-trained expert onto a white-box imitator model, which can be incorporated efficiently into a global model. This scheme however does not extend to multi-task scenarios where different experts were trained to solve different tasks and only part of their distilled knowledge is relevant to a new task. To address this multi-task challenge, we develop a new fusion paradigm that represents each expert as a distribution over a spectrum of predictive prototypes, which are isolated from task-specific information encoded within the prototype distribution. The task-agnostic prototypes can then be reintegrated to generate a new model that solves a new task encoded with a different prototype distribution. The fusion and adaptation performance of the proposed framework is demonstrated empirically on several real-world benchmark datasets.

Private Outsourced Bayesian Optimization

Dmitrii Kharkovskii, Zhongxiang Dai, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 5231-5242, 2020
21.8% acceptance rate

Abstract. This paper presents the private-outsourced-Gaussian process-upper confidence bound (PO-GP-UCB) algorithm, which is the first algorithm for privacy-preserving Bayesian optimization (BO) in the outsourced setting with a provable performance guarantee. We consider the outsourced setting where the entity holding the dataset and the entity performing BO are represented by different parties, and the dataset cannot be released non-privately. For example, a hospital holds a dataset of sensitive medical records and outsources the BO task on this dataset to an industrial AI company. The key idea of our approach is to make the BO performance of our algorithm similar to that of non-private GP-UCB run using the original dataset, which is achieved by using a random projection-based transformation that preserves both privacy and the pairwise distances between inputs. Our main theoretical contribution is to show that a regret bound similar to that of the standard GP-UCB algorithm can be established for our PO-GP-UCB algorithm. We empirically evaluate the performance of our PO-GP-UCB algorithm with synthetic and real-world datasets.

Collaborative Machine Learning with Incentive-Aware Model Rewards

Rachael Hwee Ling Sim, Yehong Zhang, Mun Choon Chan*, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 8927-8936, 2020
21.8% acceptance rate

Abstract. Collaborative machine learning (ML) is an appealing paradigm to build high-quality ML models by training on the aggregated data from many parties. However, these parties are only willing to share their data when given enough incentives, such as a guaranteed fair reward based on their contributions. This motivates the need for measuring a party's contribution and designing an incentive-aware reward scheme accordingly. This paper proposes to value a party's reward based on Shapley value and information gain on model parameters given its data. Subsequently, we give each party a model as a reward. To formally incentivize the collaboration, we define some desirable properties (e.g., fairness and stability) which are inspired by cooperative game theory but adapted for our model reward that is uniquely freely replicable. Then, we propose a novel model reward scheme to satisfy fairness and trade off between the desirable properties via an adjustable parameter. The value of each party's model reward determined by our scheme is attained by injecting Gaussian noise to the aggregated training data with an optimized noise variance. We empirically demonstrate interesting properties of our scheme and evaluate its performance using synthetic and real-world datasets.

Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception

Ruofei Ouyang, and Bryan Kian Hsiang Low

Autonomous Robots (Special Issue on Multi-Robot and Multi-Agent Systems), volume 44, issue 3, pages 359-376, 2020
Extended version of our AAAI 2018 paper

Abstract. This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to utilize a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanism for a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on three real-world datasets for up to 128 agents show that our algorithms outperform the state-of-the-art methods.

Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression

Tong Teng, Jie Chen, Yehong Zhang, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 5997-6004, 2020
20.6% acceptance rate

Abstract. This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our proposed VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due to many kernels being evaluated simultaneously by our VBKS algorithm, we show that the variational lower bound of the log-marginal likelihood can be decomposed into an additive form such that each additive term depends only on a disjoint subset of the variational variables and can thus be optimized independently. Stochastic optimization is then used to maximize the variational lower bound by iteratively improving the variational approximation of the exact posterior belief via stochastic gradient ascent, which incurs constant time per iteration and hence scales to big data. We empirically evaluate the performance of our VBKS algorithm on synthetic and massive real-world datasets.

Implicit Posterior Variational Inference for Deep Gaussian Processes

Haibin Yu, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet*, and Zhongxiang Dai

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 14475-14486, 2019
3% acceptance rate (spotlight presentation)

Abstract. A multi-layer deep Gaussian process (DGP) model is a hierarchical composition of GP models with a greater expressive power. Exact DGP inference is intractable, which has motivated the recent development of deterministic and stochastic approximation methods. Unfortunately, the deterministic approximation methods yield a biased posterior belief while the stochastic one is computationally costly. This paper presents an implicit posterior variational inference (IPVI) framework for DGPs that can ideally recover an unbiased posterior belief and still preserve time efficiency. Inspired by generative adversarial networks, our IPVI framework achieves this by casting the DGP inference problem as a two-player game in which a Nash equilibrium, interestingly, coincides with an unbiased posterior belief. This consequently inspires us to devise a best-response dynamics algorithm to search for a Nash equilibrium (i.e., an unbiased posterior belief). Empirical evaluation shows that IPVI outperforms the state-of-the-art approximation methods for DGPs.

Inverse Reinforcement Learning with Missing Data

Tien Mai, Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

arXiv:1911.06930, 2019

Abstract. We consider the problem of recovering an expert's reward function with inverse reinforcement learning (IRL) when there are missing/incomplete state-action pairs or observations in the demonstrated trajectories. This issue of missing trajectory data or information occurs in many situations, e.g., GPS signals from vehicles moving on a road network are intermittent. In this paper, we propose a tractable approach to directly compute the log-likelihood of demonstrated trajectories with incomplete/missing data. Our algorithm is efficient in handling a large number of missing segments in the demonstrated trajectories, as it performs the training with incomplete data by solving a sequence of systems of linear equations, and the number of such systems to be solved does not depend on the number of missing segments. Empirical evaluation on a real-world dataset shows that our training algorithm outperforms other conventional techniques.

Towards Robust ResNet: A Small Step but a Giant Leap

Jingfeng Zhang, Bo Han, Laura Wynter, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

International Joint Conference on Artificial Intelligence (IJCAI), pages 4285-4291, 2019
17.9% acceptance rate

Abstract. This paper presents a simple yet principled approach to boosting the robustness of the residual network (ResNet) that is motivated by the dynamical system perspective. Namely, a deep neural network can be interpreted using a partial differential equation, which naturally inspires us to characterize ResNet by an explicit Euler method. Our analytical studies reveal that the step factor h in the Euler method is able to control the robustness of ResNet in both its training and generalization. Specifically, we prove that a small step factor h can benefit the training robustness for back-propagation; from the view of forward-propagation, a small h can aid in the robustness of the model generalization. A comprehensive empirical evaluation on both vision CIFAR-10 and text AG-NEWS datasets confirms that a small h aids both the training and generalization robustness.

Stochastic Variational Inference for Bayesian Sparse Gaussian Process Regression

Haibin Yu, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Joint Conference on Neural Networks (IJCNN), 2019
52.4% acceptance rate

Abstract. This paper presents a novel variational inference framework for deriving a family of Bayesian sparse Gaussian process regression (SGPR) models whose approximations are variationally optimal with respect to the full-rank GPR model enriched with various corresponding correlation structures of the observation noises. Our variational Bayesian SGPR (VBSGPR) models jointly treat both the distributions of the inducing variables and hyperparameters as variational parameters, which enables the decomposability of the variational lower bound that in turn can be exploited for stochastic optimization. Such a stochastic optimization involves iteratively following the stochastic gradient of the variational lower bound to improve its estimates of the optimal variational distributions of the inducing variables and hyperparameters (and hence the predictive distribution) of our VBSGPR models and is guaranteed to achieve asymptotic convergence to them. We show that the stochastic gradient is an unbiased estimator of the exact gradient and can be computed in constant time per iteration, hence achieving scalability to big data. We empirically evaluate the performance of our proposed framework on two real-world, massive datasets.

Bayesian Optimization with Binary Auxiliary Information

Yehong Zhang, Zhongxiang Dai, and Bryan Kian Hsiang Low

Conference on Uncertainty in Artificial Intelligence (UAI), pages 1222-1232, 2019
26.2% acceptance rate (plenary talk); subsumes our work on Information-Based Multi-Fidelity Bayesian Optimization presented in NeurIPS 2017 Workshop on Bayesian Optimization

Abstract. This paper presents novel mixed-type Bayesian optimization (BO) algorithms to accelerate the optimization of a target objective function by exploiting correlated auxiliary information of binary type that can be more cheaply obtained, such as in policy search for reinforcement learning and hyperparameter tuning of machine learning models with early stopping. To achieve this, we first propose a mixed-type multi-output Gaussian process (MOGP) to jointly model the continuous target function and binary auxiliary functions. Then, we propose information-based acquisition functions such as mixed-type entropy search (MT-ES) and mixed-type predictive ES (MT-PES) for mixed-type BO based on the MOGP predictive belief of the target and auxiliary functions. The exact acquisition functions of MT-ES and MT-PES cannot be computed in closed form and need to be approximated. We derive an efficient approximation of MT-PES via a novel mixed-type random features approximation of the MOGP model whose cross-correlation structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using the observations from evaluating these functions. We also propose new practical constraints to relate the global target maximizer to the binary auxiliary functions. We empirically evaluate the performance of MT-ES and MT-PES with synthetic and real-world experiments.

GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Quoc Phong Nguyen, Kar Wai Lim, Dinil Mon Divakaran*, Bryan Kian Hsiang Low, and Mun Choon Chan*

IEEE Conference on Communications and Network Security (CNS), pages 91-99, 2019
27.8% acceptance rate

Abstract. This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide an easy way to explain or identify attacks in the anomalous traffic. To address these limitations, we develop and present GEE, a framework for detecting and explaining anomalies in network traffic. GEE comprises of two components: (i) Variational Autoencoder (VAE) —- an unsupervised deep-learning technique for detecting anomalies, and (ii) a gradient-based fingerprinting technique for explaining anomalies. Evaluation of GEE on UGR dataset demonstrates that our approach is effective in detecting different anomalies as well as identifying fingerprints that are good representations of these various attacks.

Bayesian Optimization Meets Bayesian Optimal Stopping

Zhongxiang Dai, Haibin Yu, Bryan Kian Hsiang Low, and Patrick Jaillet*

International Conference on Machine Learning (ICML), pages 1496-1506, 2019
22.6% acceptance rate

Abstract. Bayesian optimization (BO) is a popular paradigm for optimizing the hyperparameters of machine learning (ML) models due to its sample efficiency. Many ML models require running an iterative training procedure (e.g., stochastic gradient descent). This motivates the question whether information available during the training process (e.g., validation accuracy after each epoch) can be exploited for improving the epoch efficiency of BO algorithms by early-stopping model training under hyperparameter settings that will end up under-performing and hence eliminating unnecessary training epochs. This paper proposes to unify BO (specifically, Gaussian process-upper confidence bound (GP-UCB)) with Bayesian optimal stopping (BO-BOS) to boost the epoch efficiency of BO. To achieve this, while GP-UCB is sample-efficient in the number of function evaluations, BOS complements it with epoch efficiency for each function evaluation by providing a principled optimal stopping mechanism for early stopping. BO-BOS preserves the (asymptotic) no-regret performance of GP-UCB using our specified choice of BOS parameters that is amenable to an elegant interpretation in terms of the exploration-exploitation trade-off. We empirically evaluate the performance of BO-BOS and demonstrate its generality in hyperparameter optimization of ML models and two other interesting applications.

Collective Model Fusion for Multiple Black-Box Experts

Quang Minh Hoang, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Carl Kingsford

International Conference on Machine Learning (ICML), pages 2742-2750, 2019
22.6% acceptance rate

Abstract. Model fusion is a fundamental problem in collective machine learning (ML) where independent experts with heterogeneous learning architectures are required to combine expertise to improve predictive performance. This is particularly challenging in information-sensitive domains (e.g., medical records in health-care analytics) where experts do not have access to each other's internal architecture and local data. To address this challenge, this paper presents the first collective model fusion framework for multiple experts with heterogeneous black-box architectures. The proposed method will enable this by addressing the following key issues of how black-box experts interact to understand the predictive behaviors of one another; how these understandings can be represented and shared efficiently among themselves; and how the shared understandings can be combined to generate high-quality consensus prediction. The performance of the resulting framework is analyzed theoretically and demonstrated empirically on several datasets.

Collective Online Learning of Gaussian Processes in Massive Multi-Agent Systems

Trong Nghia Hoang, Quang Minh Hoang, Bryan Kian Hsiang Low, and Jonathan P. How

AAAI Conference on Artificial Intelligence, pages 7850-7857, 2019
16.2% acceptance rate (oral presentation)

Abstract. This paper presents a novel Collective Online Learning of Gaussian Processes (COOL-GP) framework for enabling a massive number of GP inference agents to simultaneously perform (a) efficient online updates of their GP models using their local streaming data with varying correlation structures and (b) decentralized fusion of their resulting online GP models with different learned hyperparameter settings and inducing inputs. To realize this, we exploit the notion of a common encoding structure to encapsulate the local streaming data gathered by any GP inference agent into summary statistics based on our proposed representation, which is amenable to both an efficient online update via an importance sampling trick as well as multi-agent model fusion via decentralized message passing that can exploit sparse connectivity among agents for improving efficiency and enhance the robustness of our framework against transmission loss. We provide a rigorous theoretical analysis of the approximation loss arising from our proposed representation to achieve efficient online updates and model fusion. Empirical evaluations show that COOL-GP is highly effective in model fusion, resilient to information disparity between agents, robust to transmission loss, and can scale to thousands of agents.

Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception

Ruofei Ouyang, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 3876-3883, 2018
24.55% acceptance rate

Abstract. This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to choose a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanismfor a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on real-world datasets show that our algorithms outperform the state-of-the-art methods.

Decentralized High-Dimensional Bayesian Optimization with Factor Graphs

Trong Nghia Hoang, Quang Minh Hoang, Ruofei Ouyang, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 3231-3238, 2018
24.55% acceptance rate

Abstract. This paper presents a novel decentralized high-dimensional Bayesian optimization (DEC-HBO) algorithm that, in contrast to existing HBO algorithms, can exploit the interdependent effects of various input components on the output of the unknown objective function f for boosting the BO performance and still preserve scalability in the number of input dimensions without requiring prior knowledge or the existence of a low (effective) dimension of the input space. To realize this, we propose a sparse yet rich factor graph representation of f to be exploited for designing an acquisition function that can be similarly represented by a sparse factor graph and hence be efficiently optimized in a decentralized manner using distributed message passing. Despite richly characterizing the interdependent effects of the input components on the output of f with a factor graph, DEC-HBO can still guarantee (asymptotic) no-regret performance. Empirical evaluation on synthetic and real-world experiments shows that DEC-HBO outperforms the state-of-the-art HBO algorithms.

Information-Based Multi-Fidelity Bayesian Optimization

Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan Kankanhalli

NeurIPS Workshop on Bayesian Optimization, 2017

Abstract. This paper presents a novel generalization of predictive entropy search (PES) for multi-fidelity Bayesian optimization (BO) called multi-fidelity PES (MF-PES). In contrast to existing multi-fidelity BO algorithms, our proposed MF-PES algorithm can naturally trade off between exploitation vs. exploration over the target and auxiliary functions with varying fidelities without needing to manually tune any such parameters or input discretization. To achieve this, we first model the unknown target and auxiliary functions jointly as a convolved multi-output Gaussian process (CMOGP) whose convolutional structure is then exploited for deriving an efficient approximation of MF-PES. Empirical evaluation on synthetic and real-world experiments shows that MF-PES outperforms the state-of-the-art multi-fidelity BO algorithms.

Distributed Batch Gaussian Process Optimization

Erik A. Daxberger, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 951-960, 2017
25.9% acceptance rate

Abstract. This paper presents a novel distributed batch Gaussian process upper confidence bound (DB-GP-UCB) algorithm for performing batch Bayesian optimization (BO) of highly complex, costly-to-evaluate black-box objective functions. In contrast to existing batch BO algorithms, DB-GP-UCB can jointly optimize a batch of inputs (as opposed to selecting the inputs of a batch one at a time) while still preserving scalability in the batch size. To realize this, we generalize GP-UCB to a new batch variant amenable to a Markov approximation, which can then be naturally formulated as a multi-agent distributed constraint optimization problem in order to fully exploit the efficiency of its state-of-the-art solvers for achieving linear time in the batch size. Our DB-GP-UCB algorithm offers practitioners the flexibility to trade off between the approximation quality and time efficiency by varying the Markov order. We provide a theoretical guarantee for the convergence rate of DB-GP-UCB via bounds on its cumulative regret. Empirical evaluation on synthetic benchmark objective functions and a real-world optimization problem shows that DB-GP-UCB outperforms the state-of-the-art batch BO algorithms.

A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression

Quang Minh Hoang, Trong Nghia Hoang, and Bryan Kian Hsiang Low

AAAI Conference on Artificial Intelligence, pages 2007-2014, 2017
24.6% acceptance rate (oral presentation)

Abstract. While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of low-rank GP approximations that exploit the sparse spectral representation of a GP kernel. This paper presents such an effort to advance the state of the art of sparse spectrum GP models to achieve competitive predictive performance for massive datasets. Our generalized framework of stochastic variational Bayesian sparse spectrum GP (sVBSSGP) models addresses their shortcomings by adopting a Bayesian treatment of the spectral frequencies to avoid overfitting, modeling these frequencies jointly in its variational distribution to enable their interaction a posteriori, and exploiting local data for boosting the predictive performance. However, such structural improvements result in a variational lower bound that is intractable to be optimized. To resolve this, we exploit a variational parameterization trick to make it amenable to stochastic optimization. Interestingly, the resulting stochastic gradient has a linearly decomposable structure that can be exploited to refine our stochastic optimization method to incur constant time per iteration while preserving its property of being an unbiased estimator of the exact gradient of the variational lower bound. Empirical evaluation on real-world datasets shows that sVBSSGP outperforms state-of-the-art stochastic implementations of sparse GP models.

Concept Based Hybrid Fusion of Multimodal Event Signals

Yuhui Wang, Christian von der Weth, Yehong Zhang, Bryan Kian Hsiang Low, Vivek K. Singh, and Mohan S. Kankanhalli*

IEEE International Symposium on Multimedia (ISM), pages 14-19, 2016
26.1% acceptance rate

Abstract. Recent years have seen a significant increase in the number of sensors and resulting event related sensor data, allowing for a better monitoring and understanding of real-world events and situations. Event-related data come from not only physical sensors (e.g., CCTV cameras, webcams) but also social or microblogging platforms (e.g., Twitter). Given the wide-spread availability of sensors, we observe that sensors of different modalities often independently observe the same events. We argue that fusing multimodal data about an event can be helpful for more accurate detection, localization and detailed description of events of interest. However, multimodal data often include noisy observations, varying information densities and heterogeneous representations, which makes the fusion a challenging task. In this paper, we propose a hybrid fusion approach that takes the spatial and semantic characteristics of sensor signals about events into account. For this, we first adopt the concept of an image-based representation that expresses the situation of particular visual concepts (e.g., "crowdedness", "people marching") called Cmage for both physical and social sensor data. Based on this Cmage representation, we model sparse sensor information using a Gaussian process, fuses multimodal event signals with a Bayesian approach, and incorporates spatial relations between the sensor and social observations. We demonstrate the effectiveness of our approach as a proof-of-concept over real-world data. Our early results show that the proposed approach can reliably reduce the sensor-related noise, localize event place, improve event detection reliability, and add semantic context so that the fused data provide a better picture of the observed events or situations.

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

Jie Fu, Hongyin Luo, Jiashi Feng, Bryan Kian Hsiang Low, and Tat-Seng Chua

International Joint Conference on Artificial Intelligence (IJCAI), pages 1469-1475, 2016
<25% acceptance rate

Abstract. The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work, we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on two image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to state-of-the-art methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks.

A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models

Trong Nghia Hoang, Quang Minh Hoang, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 382-391, 2016
24.3% acceptance rate

Abstract. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets.

Multi-Agent Continuous Transportation with Online Balanced Partitioning

Chao Wang, Somchaya Liemhetcharat*, and Bryan Kian Hsiang Low

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1303-1304, 2016

Abstract. We introduce the concept of continuous transportation task to the context of multi-agent systems. A continuous transportation task is one in which a multi-agent team visits a number of fixed locations, picks up objects, and delivers them to a transportation hub. The goal is to maximize the rate of transportation while the objects are replenished over time. In this extended abstract, we present a hybrid of centralized and distributed approaches that minimize communications in the multi-agent team. We contribute a novel online partitioning-transportation algorithm with information gathering in the multi-agent team.

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

Chun Kai Ling, Bryan Kian Hsiang Low, and Patrick Jaillet*

AAAI Conference on Artificial Intelligence, pages 1860-1866, 2016
25.75% acceptance rate

Abstract. This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive ϵ-optimal GPP (ϵ-GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of ϵ-GPP with performance guarantee. We empirically demonstrate the effectiveness of our ϵ-GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.

Near-Optimal Active Learning of Multi-Output Gaussian Processes

Yehong Zhang, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

AAAI Conference on Artificial Intelligence, pages 2351-2357, 2016
25.75% acceptance rate

Abstract. This paper addresses the problem of active learning of a multi-output Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of sub-modularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and single-output GP models.

Inverse Reinforcement Learning with Locally Consistent Reward Functions

Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet*

Annual Conference on Neural Information Processing Systems (NeurIPS), pages 1747-1755, 2015
21.9% acceptance rate

Abstract. Existing inverse reinforcement learning (IRL) algorithms have assumed each expert's demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts' behaviors. Solving our generalized IRL problem thus involves not only learning these reward functions but also the stochastic transitions between them at any state (including unvisited states). By representing our IRL problem with a probabilistic graphical model, an expectation-maximization (EM) algorithm can be devised to iteratively learn the reward functions and stochastic transitions between them in order to jointly improve the likelihood of the expert's demonstrated trajectories. As a result, the most likely partition of a trajectory into segments that are generated from different locally consistent reward functions selected by EM can be derived. Empirical evaluation on synthetic and real-world datasets shows that our IRL algorithm outperforms the state-of-the-art EM clustering with maximum likelihood IRL, which is, interestingly, a reduced variant of our approach.

Gaussian Process Decentralized Data Fusion and Active Sensing for Spatiotemporal Traffic Modeling and Prediction in Mobility-on-Demand Systems

Jie Chen, Bryan Kian Hsiang Low, Yujian Yao, and Patrick Jaillet*

IEEE Transactions on Automation Science and Engineering (Special Issue on Networked Cooperative Autonomous Systems), volume 12, issue 3, pages 901-921, 2015
Extended version of our UAI 2012 and RSS 2013 papers

Abstract. Mobility-on-demand (MoD) systems have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility in densely populated cities. We assume the capability of a MoD system to be enhanced by deploying robotic shared vehicles that can autonomously cruise the streets to be hailed by users. A key challenge of the MoD system is that of real-time, fine-grained mobility demand and traffic flow sensing and prediction. This paper presents novel Gaussian process (GP) decentralized data fusion and active sensing algorithms for real-time, fine-grained traffic modeling and prediction with a fleet of MoD vehicles. The predictive performance of our decentralized data fusion algorithms are theoretically guaranteed to be equivalent to that of sophisticated centralized sparse GP approximations. We derive consensus filtering variants requiring only local communication between neighboring vehicles. We theoretically guarantee the performance of our decentralized active sensing algorithms. When they are used to gather informative data for mobility demand prediction, they can achieve a dual effect of fleet rebalancing to service mobility demands. Empirical evaluation on real-world datasets shows that our algorithms are significantly more time-efficient and scalable in the size of data and fleet while achieving predictive performance comparable to that of state-of-the-art algorithms.

A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data

Trong Nghia Hoang, Quang Minh Hoang, and Bryan Kian Hsiang Low

International Conference on Machine Learning (ICML), pages 569-578, 2015
26.0% acceptance rate

Abstract. This paper presents a novel unifying framework of anytime sparse Gaussian process regression (SGPR) models that can produce good predictive performance fast and improve their predictive performance over time. Our proposed unifying framework reverses the variational inference procedure to theoretically construct a non-trivial, concave functional that is maximized at the predictive distribution of any SGPR model of our choice. As a result, a stochastic natural gradient ascent method can be derived that involves iteratively following the stochastic natural gradient of the functional to improve its estimate of the predictive distribution of the chosen SGPR model and is guaranteed to achieve asymptotic convergence to it. Interestingly, we show that if the predictive distribution of the chosen SGPR model satisfies certain decomposability conditions, then the stochastic natural gradient is an unbiased estimator of the exact natural gradient and can be computed in constant time (i.e., independent of data size) at each iteration. We empirically evaluate the trade-off between the predictive performance vs. time efficiency of the anytime SGPR models on two real-world million-sized datasets.

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

Bryan Kian Hsiang Low, Jiangbo Yu, Jie Chen, and Patrick Jaillet*

AAAI Conference on Artificial Intelligence, pages 2821-2827, 2015
26.67% acceptance rate

Abstract. The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rank-cum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.

Multi-agent Ad Hoc Team Partitioning by Observing and Modeling Single-Agent Performance

Etkin Baris Ozgul, Somchaya Liemhetcharat*, and Bryan Kian Hsiang Low

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2014

Abstract. Multi-agent research has focused on finding the optimal team for a task. Many approaches assume that the performance of the agents are known a priori. We are interested in ad hoc teams, where the agents' algorithms and performance are initially unknown. We focus on the task of modeling the performance of single agents through observation in training environments, and using the learned models to partition a new environment for a multi-agent team. The goal is to minimize the number of agents used, while maintaining a performance threshold of the multi-agent team. We contribute a novel model to learn the agent's performance through observations, and a partitioning algorithm that minimizes the team size. We evaluate our algorithms in simulation, and show the efficacy of our learn model and partitioning algorithm.

Recent Advances in Scaling Up Gaussian Process Predictive Models for Large Spatiotemporal Data

Bryan Kian Hsiang Low, Jie Chen, Trong Nghia Hoang, Nuo Xu, and Patrick Jaillet*

In S. Ravela, A. Sandu, editors, Dynamic Data-Driven Environmental Systems Science - First International Conference, DyDESS, LNCS 8964, Springer International Publishing, pages 167-181, 2014
Oral presentation

Abstract. The expressive power of Gaussian process (GP) models comes at a cost of poor scalability in the size of the data. To improve their scalability, this paper presents an overview of our recent progress in scaling up GP models for large spatiotemporally correlated data through parallelization on clusters of machines, online learning, and nonmyopic active sensing/learning.

Scalable Decision-Theoretic Coordination and Control for Real-time Active Multi-Camera Surveillance

Prabhu Natarajan, Trong Nghia Hoang, Yongkang Wong, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC) (Invited Paper to Special Session on Smart Cameras for Smart Environments), pages 115-120, 2014

Abstract. This paper presents an overview of our novel decision-theoretic multi-agent approach for controlling and coordinating multiple active cameras in surveillance. In this approach, a surveillance task is modeled as a stochastic optimization problem, where the active cameras are controlled and coordinated to achieve the desired surveillance goal in presence of uncertainties. We enumerate the practical issues in active camera surveillance and discuss how these issues are addressed in our decision-theoretic approach. We focus on two novel surveillance tasks: maximize the number of targets observed in active cameras with guaranteed image resolution and to improve the fairness in observation of multiple targets. We discuss the overview of our novel decision-theoretic frameworks: Markov decision process and partially observable Markov decision process frameworks for coordinating active cameras in uncertain and partially occluded environments.

Active Learning Is Planning: Nonmyopic ε-Bayes-Optimal Active Learning of Gaussian Processes

Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet*, and Mohan S. Kankanhalli*

In T. Calders, F. Esposito, E. Hullermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML/PKDD Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, Springer Berlin Heidelberg, pages 494-498, 2014

Abstract. A fundamental issue in active learning of Gaussian processes is that of the exploration-exploitation trade-off. This paper presents a novel nonmyopic ϵ-Bayes-optimal active learning (ϵ-BAL) approach that jointly optimizes the trade-off. In contrast, existing works have primarily developed greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ϵ-BAL with performance guarantee and empirically demonstrate using a real-world dataset that, with limited budget, it outperforms the state-of-the-art algorithms.

Generalized Online Sparse Gaussian Processes with Application to Persistent Mobile Robot Localization

Bryan Kian Hsiang Low, Nuo Xu, Jie Chen, Keng Kiat Lim, and Etkin Baris Özgül

In T. Calders, F. Esposito, E. Hullermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML/PKDD Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, Springer Berlin Heidelberg, pages 499-503, 2014

Abstract. This paper presents a novel online sparse Gaussian process (GP) approximation method that is capable of achieving constant time and memory (i.e., independent of the size of the data) per time step. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated offline sparse GP approximation method. We empirically demonstrate the practical feasibility of using our online sparse GP approximation method through a real-world persistent mobile robot localization experiment.

No One is Left "Unwatched": Fairness in Observation of Crowds of Mobile Targets in Active Camera Surveillance

Prabhu Natarajan, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

European Conference on Artificial Intelligence (ECAI), including Prestigious Applications of Intelligent Systems (PAIS), pages 1155-1160, 2014

Abstract. Central to the problem of active multi-camera surveillance is the fundamental issue of fairness in the observation of crowds of targets such that no target is "starved" of observation by the cameras for a long time. This paper presents a principled decision-theoretic multi-camera coordination and control (MC2) algorithm called fair-MC2 that can coordinate and control the active cameras to achieve max-min fairness in the observation of crowds of targets moving stochastically. Our fair-MC2 algorithm is novel in demonstrating how (a) the uncertainty in the locations, directions, speeds, and observation times of the targets arising from the stochasticity of their motion can be modeled probabilistically, (b) the notion of fairness in observing targets can be formally realized in the domain of multi-camera surveillance for the first time by exploiting the max-min fairness metric to formalize our surveillance objective, that is, to maximize the expected minimum observation time over all targets while guaranteeing a predefined image resolution of observing them, and (c) a structural assumption in the state transition dynamics of a surveillance environment can be exploited to improve its scalability to linear time in the number of targets to be observed during surveillance. Empirical evaluation through extensive simulations in realistic surveillance environments shows that fair-MC2 outperforms the state-of-the-art and baseline MC2 algorithms. We have also demonstrated the feasibility of deploying our fair-MC2 algorithm on real AXIS 214 PTZ cameras.

GP-Localize: Persistent Mobile Robot Localization Using Online Sparse Gaussian Process Observation Model

Nuo Xu, Bryan Kian Hsiang Low, Jie Chen, Keng Kiat Lim, and Etkin Baris Ozgul

AAAI Conference on Artificial Intelligence, pages 2585-2592, 2014
16.6% acceptance rate (oral presentation); also appeared in RSS 2014 Workshop on Non-Parametric Learning in Robotics

Abstract. Central to robot exploration and mapping is the task of persistent localization in environmental fields characterized by spatially correlated measurements. This paper presents a Gaussian process localization (GP-Localize) algorithm that, in contrast to existing works, can exploit the spatially correlated field measurements taken during a robot's exploration (instead of relying on prior training data) for efficiently and scalably learning the GP observation model online through our proposed novel online sparse GP. As a result, GP-Localize is capable of achieving constant time and memory (i.e., independent of the size of the data) per filtering step, which demonstrates the practical feasibility of using GPs for persistent robot localization and autonomy. Empirical evaluation via simulated experiments with real-world datasets and a real robot experiment shows that GP-Localize outperforms existing GP localization algorithms.

Nonmyopic ϵ-Bayes-Optimal Active Learning of Gaussian Processes

Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet*, and Mohan S. Kankanhalli*

International Conference on Machine Learning (ICML), pages 739-747, 2014
22.4% acceptance rate (cycle 2); also appeared in RSS 2014 Workshop on Non-Parametric Learning in Robotics

Abstract. A fundamental issue in active learning of Gaussian processes is that of the exploration-exploitation trade-off. This paper presents a novel nonmyopic ϵ-Bayes-optimal active learning (ϵ-BAL) approach that jointly and naturally optimizes the trade-off. In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ϵ-BAL with performance guarantee and empirically demonstrate using synthetic and real-world datasets that, with limited budget, it outperforms the state-of-the-art algorithms.

Multi-Robot Active Sensing of Non-Stationary Gaussian Process-based Environmental Phenomena

Ruofei Ouyang, Bryan Kian Hsiang Low, Jie Chen, and Patrick Jaillet*

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 573-580, 2014
23.8% acceptance rate; also appeared in RSS 2014 Workshop on Non-Parametric Learning in Robotics

Abstract. A key challenge of environmental sensing and monitoring is that of sensing, modeling, and predicting large-scale, spatially correlated environmental phenomena, especially when they are unknown and non-stationary. This paper presents a decentralized multi-robot active sensing (DEC-MAS) algorithm that can efficiently coordinate the exploration of multiple robots to gather the most informative observations for predicting an unknown, non-stationary phenomenon. By modeling the phenomenon using a Dirichlet process mixture of Gaussian processes (DPM-GPs), our work here is novel in demonstrating how DPM-GPs and its structural properties can be exploited to (a) formalize an active sensing criterion that trades off between gathering the most informative observations for estimating the unknown, non-stationary spatial correlation structure vs. that for predicting the phenomenon given the current, imprecise estimate of the correlation structure, and (b) support efficient decentralized coordination. We also provide a theoretical performance guarantee for DEC-MAS and analyze its time complexity. We empirically demonstrate using two real-world datasets that DEC-MAS outperforms state-of-the-art MAS algorithms.

Decision-Theoretic Approach to Maximizing Fairness in Multi-Target Observation in Multi-Camera Surveillance

Prabhu Natarajan, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1521-1522, 2014

Abstract. Central to the problem of active multi-camera surveillance is the fundamental issue of fairness in the observation of multiple targets such that no target is left unobserved by the cameras for a long time. To address this important issue, we propose a novel principled decision-theoretic approach to control and coordinate multiple active cameras to achieve fairness in the observation of multiple moving targets.

A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

Trong Nghia Hoang, and Bryan Kian Hsiang Low

International Joint Conference on Artificial Intelligence (IJCAI), pages 1394-1400, 2013
28.0% acceptance rate

Abstract. Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents

Trong Nghia Hoang, and Bryan Kian Hsiang Low

International Joint Conference on Artificial Intelligence (IJCAI), pages 2298-2305, 2013
13.2% acceptance rate (oral presentation)

Abstract. A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, self-interested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the state-of-the-art algorithms.

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

Jie Chen, Nannan Cao, Bryan Kian Hsiang Low, Ruofei Ouyang, Colin Keng-Yan Tan, and Patrick Jaillet*

Conference on Uncertainty in Artificial Intelligence (UAI), pages 152-161, 2013
31.3% acceptance rate

Abstract. Gaussian processes (GP) are Bayesian non-parametric models that are widely used for probabilistic regression. Unfortunately, it cannot scale well with large data nor perform real-time predictions due to its cubic time cost in the data size. This paper presents two parallel GP regression methods that exploit low-rank covariance matrix approximations for distributing the computational load among parallel machines to achieve time efficiency and scalability. We theoretically guarantee the predictive performances of our proposed parallel GPs to be equivalent to that of some centralized approximate GP regression methods: The computation of their centralized counterparts can be distributed among parallel machines, hence achieving greater time efficiency and scalability. We analytically compare the properties of our parallel GPs such as time, space, and communication complexity. Empirical evaluation on two real-world datasets in a cluster of 20 computing nodes shows that our parallel GPs are significantly more time-efficient and scalable than their centralized counterparts and exact/full GP while achieving predictive performances comparable to full GP.

Gaussian Process-based Decentralized Data Fusion and Active Sensing for Mobility-on-Demand System

Jie Chen, Bryan Kian Hsiang Low, and Colin Keng-Yan Tan

Robotics: Science and Systems Conference (RSS), 2013
30.1% acceptance rate

Abstract. Mobility-on-demand (MoD) systems have recently emerged as a promising paradigm of one-way vehicle sharing for sustainable personal urban mobility in densely populated cities. In this paper, we enhance the capability of a MoD system by deploying robotic shared vehicles that can autonomously cruise the streets to be hailed by users. A key challenge to managing the MoD system effectively is that of real-time, fine-grained mobility demand sensing and prediction. This paper presents a novel decentralized data fusion and active sensing algorithm for real-time, fine-grained mobility demand sensing and prediction with a fleet of autonomous robotic vehicles in a MoD system. Our Gaussian process (GP)-based decentralized data fusion algorithm can achieve a fine balance between predictive power and time efficiency. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the GP model: The computation of such a sparse approximate GP model can thus be distributed among the MoD vehicles, hence achieving efficient and scalable demand prediction. Though our decentralized active sensing strategy is devised to gather the most informative demand data for demand prediction, it can achieve a dual effect of fleet rebalancing to service the mobility demands. Empirical evaluation on real-world mobility demand data shows that our proposed algorithm can achieve a better balance between predictive accuracy and time efficiency than state-of-the-art algorithms.

Multi-Robot Informative Path Planning for Active Sensing of Environmental Phenomena: A Tale of Two Algorithms

Nannan Cao, Bryan Kian Hsiang Low, and John M. Dolan+

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 007-014, 2013
22.9% acceptance rate

Abstract. A key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient information-theoretic path planning based on entropy and mutual information criteria for in situ active sensing of an important broad class of widely-occurring environmental phenomena called anisotropic fields. Our proposed algorithms are novel in addressing a trade-off between active sensing performance and time efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of Gaussian process-based anisotropic fields to improve time efficiency while preserving near-optimal active sensing performance. We analyze the time complexity of our algorithms and prove analytically that they scale better than state-of-the-art algorithms with increasing planning horizon length. We provide theoretical guarantees on the active sensing performance of our algorithms for a class of exploration tasks called transect sampling, which, in particular, can be improved with longer planning time and/or lower spatial correlation along the transect. Empirical evaluation on real-world anisotropic field data shows that our algorithms can perform better or at least as well as the state-of-the-art algorithms while often incurring a few orders of magnitude less computational time, even when the field conditions are less favorable.

Adaptive Sensing of Time Series with Application to Remote Exploration

David R. Thompson*, Nathalie Cabrol, P. Michael Furlong, Craig Hardgrove, Bryan Kian Hsiang Low, Jeffrey Moersch, and David Wettergreen

IEEE International Conference on Robotics and Automation (ICRA), pages 3463-3468, 2013

Abstract. We address the problem of adaptive information-optimal data collection in time series. Here a remote sensor or explorer agent throttles its sampling rate in order to track anomalous events while obeying constraints on time and power. This problem is challenging because the agent has limited visibility - all collected datapoints lie in the past, but its resource allocation decisions require predicting far into the future. Our solution is to continually fit a Gaussian process model to the latest data and optimize the sampling plan on line to maximize information gain. We compare the performance characteristics of stationary and nonstationary Gaussian process models. We also describe an application based on geologic analysis during planetary rover exploration. Here adaptive sampling can improve coverage of localized anomalies and potentially benefit mission science yield of long autonomous traverses.

Hierarchical Bayesian Nonparametric Approach to Modeling and Learning the Wisdom of Crowds of Urban Traffic Route Planning Agents

Jiangbo Yu, Bryan Kian Hsiang Low, Ali Oran*, and Patrick Jaillet*

IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT) (Invited Paper to Special Session on Large-Scale Application-Focused Multi-Agent Systems), pages 478-485, 2012

Abstract. Route prediction is important to analyzing and understanding the route patterns and behavior of traffic crowds. Its objective is to predict the most likely or "popular" route of road segments from a given point in a road network. This paper presents a hierarchical Bayesian non-parametric approach to efficient and scalable route prediction that can harness the wisdom of crowds of route planning agents by aggregating their sequential routes of possibly varying lengths and origin-destination pairs. In particular, our approach has the advantages of (a) not requiring a Markov assumption to be imposed and (b) generalizing well with sparse data, thus resulting in significantly improved prediction accuracy, as demonstrated empirically using real-world taxi route data. We also show two practical applications of our route prediction algorithm: predictive taxi ranking and route recommendation.

Decision-Theoretic Coordination and Control for Active Multi-Camera Surveillance in Uncertain, Partially Observable Environments

Prabhu Natarajan, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pages 001-006, 2012

Abstract. A central problem of surveillance is to monitor multiple targets moving in a large-scale, obstacle-ridden environment with occlusions. This paper presents a novel principled Partially Observable Markov Decision Process-based approach to coordinating and controlling a network of active cameras for tracking and observing multiple mobile targets at high resolution in such surveillance environments. Our proposed approach is capable of (a) maintaining a belief over the targets' states (i.e., locations, directions, and velocities) to track them, even when they may not be observed directly by the cameras at all times, (b) coordinating the cameras' actions to simultaneously improve the belief over the targets' states and maximize the expected number of targets observed with a guaranteed resolution, and (c) exploiting the inherent structure of our surveillance problem to improve its scalability (i.e., linear time) in the number of targets to be observed. Quantitative comparisons with state-of-the-art multi-camera coordination and control techniques show that our approach can achieve higher surveillance quality in real time. The practical feasibility of our approach is also demonstrated using real AXIS 214 PTZ cameras.

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

Jie Chen, Bryan Kian Hsiang Low, Colin Keng-Yan Tan, Ali Oran*, Patrick Jaillet*, John M. Dolan+, and Gaurav S. Sukhatme*

Conference on Uncertainty in Artificial Intelligence (UAI), pages 163-173, 2012
31.6% acceptance rate; also appeared in AAMAS 2012 Workshop on Agents in Traffic and Transportation (ATT)

Abstract. The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-of-the-art centralized algorithms while achieving comparable predictive performance.

Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing

Bryan Kian Hsiang Low, Jie Chen, John M. Dolan+, Steve A. Chien*, and David R. Thompson*

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 105-112, 2012
20.4% acceptance rate; also appeared in IROS 2011 Workshop on Robotics for Environmental Monitoring (WREM)

Abstract. A central problem in environmental sensing and monitoring is to classify/label the hotspots in a large-scale environmental field. This paper presents a novel decentralized active robotic exploration (DARE) strategy for probabilistic classification/labeling of hotspots in a Gaussian process (GP)-based field. In contrast to existing state-of-the-art exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Its exploration behavior exhibits an interesting formal trade-off between that of boundary tracking until the hotspot region boundary can be accurately predicted and wide-area coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal cost-minimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GP-based field is greatest at or close to the hotspot region boundaries. Empirical evaluation on real-world plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than state-of-the-art active exploration strategies.

Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance

Prabhu Natarajan, Trong Nghia Hoang, Bryan Kian Hsiang Low, and Mohan S. Kankanhalli*

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 155-162, 2012
20.4% acceptance rate

Abstract. This paper presents a novel decision-theoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the trade-off between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multi-camera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decision-theoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve high-quality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decision-theoretic approach in controlling and coordinating the active cameras in real surveillance system.

Intention-Aware Planning under Uncertainty for Interacting with Self-Interested, Boundedly Rational Agents

Trong Nghia Hoang, and Bryan Kian Hsiang Low

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1233-1234, 2012

Abstract. A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to perform effectively among boundedly rational, self-interested (i.e., non-cooperative) agents (e.g., humans). To address this challenge, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions.

Autonomous Personal Vehicle for the First- and Last-Mile Transportation Services

Zhuang Jie Chong, Baoxing Qin, Tirthankar Bandyopadhyay, Tichakorn Wongpiromsarn, Edward Samuel Rankin, Marcelo H. Ang, Jr.+, Emilio Frazzoli*, Daniela Rus*, David Hsu*, and Bryan Kian Hsiang Low

IEEE International Conference on Cybernetics and Intelligent Systems and IEEE International Conference on Robotics, Automation and Mechatronics (CIS-RAM), pages 253-260, 2011
Also appeared in IROS 2011 Workshop on Perception and Navigation for Autonomous Vehicles in Human Environment

Abstract. This paper describes an autonomous vehicle testbed that aims at providing the first- and last- mile transportation services. The vehicle mainly operates in a crowded urban environment whose features can be extracted a priori. To ensure that the system is economically feasible, we take a minimalistic approach and exploit prior knowledge of the environment and the availability of the existing infrastructure such as cellular networks and traffic cameras. We present three main components of the system: pedestrian detection, localization (even in the presence of tall buildings) and navigation. The performance of each component is evaluated. Finally, we describe the role of the existing infrastructural sensors and show the improved performance of the system when they are utilized.

Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing

Bryan Kian Hsiang Low, John M. Dolan+, and Pradeep K. Khosla+

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 753-760, 2011
22.1% acceptance rate

Abstract. Recent research in multi-robot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, real-time active sampling. To ease this computational burden, this paper presents a Markov-based approach to efficient information-theoretic path planning for active sampling of GP-based fields. We analyze the time complexity of solving the Markov-based path planning problem, and demonstrate analytically that it scales better than that of deriving the non-Markovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markov-based policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on real-world temperature and plankton density field data shows that our Markov-based policy can generally achieve active sampling performance comparable to that of the widely-used non-Markovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.

Telesupervised Remote Surface Water Quality Sensing

Gregg Podnar, John M. Dolan+, Bryan Kian Hsiang Low, and Alberto Elfes

IEEE Aerospace Conference, 2010

Abstract. We present a fleet of autonomous Robot Sensor Boats (RSBs) developed for lake and river fresh water quality assessment and controlled by our Multilevel Autonomy Robot Telesupervision Architecture (MARTA). The RSBs are low cost, highly maneuverable, shallow draft sensor boats, developed as part of the Sensor Web program supported under the Advanced Information Systems Technology program of NASA's Earth Systems Technology Office. They can scan large areas of lakes, and navigate up tributaries to measure water quality near outfalls that larger research vessels cannot reach. The MARTA telesupervision architecture has been applied to a number of domains from multi-platform autonomous wide area planetary mineral prospecting, to multi-platform ocean monitoring. The RSBs are a complementary expansion of a fleet of NOAA/NASA-developed extended-deployment surface autonomous vehicles that enable in-situ study of meteorological factors of the ocean/atmosphere interface, and which have been adapted to investigate harmful algal blooms under this program. The flexibility of the MARTA telesupervision architecture was proven as it supported simultaneous operation of these heterogenous autonomous sensor platforms while geographically widely separated. Results and analysis are presented of multiple tests carried out over three months using a multi-sensor water sonde to assess water quality in a small recreational lake. Inference Grids were used to produce maps representing temperature, pH, and dissolved oxygen. The tests were performed under various water conditions (clear vs. hair algae-laden) and both before and after heavy rains. Data from each RSB was relayed to a data server in our lab in Pittsburgh, Pennsylvania, and made available over the World Wide Web where it was acquired by team members at the Jet Propulsion Laboratory of NASA in Pasadena, California who monitored the boats and their sensor readings in real time, as well as using these data to model the water quality by producing Inference Grid-based maps.

Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing

Bryan Kian Hsiang Low, John M. Dolan+, and Pradeep K. Khosla+

International Conference on Automated Planning and Scheduling (ICAPS), pages 233-240, 2009
33.9% acceptance rate; also appeared in IPSN 2009 Workshop on Sensor Networks for Earth and Space Science Applications (ESSA); also orally presented in RSS 2009 Workshop on Aquatic Robots and Ocean Sampling

Abstract. Recent research in robot exploration and mapping has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decision-theoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in large-scale, high-resolution exploration and mapping. To alleviate this computational difficulty, this paper presents an information-theoretic approach to MASP (iMASP) for efficient adaptive path planning; by reformulating the cost-minimizing iMASP as a reward-maximizing problem, its time complexity becomes independent of map resolution and is less sensitive to increasing robot team size as demonstrated both theoretically and empirically. Using the reward-maximizing dual, we derive a novel adaptive variant of maximum entropy sampling, thus improving the induced exploration policy performance. It also allows us to establish theoretical bounds quantifying the performance advantage of optimal adaptive over non-adaptive policies and the performance quality of approximately optimal vs. optimal adaptive policies. We show analytically and empirically the superior performance of iMASP-based policies for sampling the log-Gaussian process to that of policies for the widely-used Gaussian process in mapping the hotspot field. Lastly, we provide sufficient conditions that, when met, guarantee adaptivity has no benefit under an assumed environment model.

Cooperative Aquatic Sensing using the Telesupervised Adaptive Ocean Sensor Fleet

John M. Dolan+, Gregg W. Podnar, Stephen Stancliff, Bryan Kian Hsiang Low, Alberto Elfes, John Higinbotham, Jeffrey C. Hosler, Tiffany A. Moisan, and John Moisan

SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473, 2009

Abstract. Earth science research must bridge the gap between the atmosphere and the ocean to foster understanding of Earth's climate and ecology. Typical ocean sensing is done with satellites or in situ buoys and research ships which are slow to reposition. Cloud cover inhibits study of localized transient phenomena such as Harmful Algal Blooms (HAB). A fleet of extended-deployment surface autonomous vehicles will enable in situ study of characteristics of HAB, coastal pollutants, and related phenomena. We have developed a multiplatform telesupervision architecture that supports adaptive reconfiguration based on environmental sensor inputs. Our system allows the autonomous repositioning of smart sensors for HAB study by networking a fleet of NOAA OASIS (Ocean Atmosphere Sensor Integration System) surface autonomous vehicles. In situ measurements intelligently modify the search for areas of high concentration. Inference Grid and complementary information-theoretic techniques support sensor fusion and analysis. Telesupervision supports sliding autonomy from high-level mission tasking, through vehicle and data monitoring, to teleoperation when direct human interaction is appropriate. This paper reports on experimental results from multi-platform tests conducted in the Chesapeake Bay and in Pittsburgh, Pennsylvania waters using OASIS platforms, autonomous kayaks, and multiple simulated platforms to conduct cooperative sensing of chlorophyll-a and water quality.

Robot Boats as a Mobile Aquatic Sensor Network

Bryan Kian Hsiang Low, Gregg Podnar, Stephen Stancliff, John M. Dolan+, and Alberto Elfes

IPSN Workshop on Sensor Networks for Earth and Space Science Applications (ESSA), 2009

Abstract. This paper describes the Multilevel Autonomy Robot Telesupervision Architecture (MARTA), an architecture for supervisory control of a heterogeneous fleet of net-worked unmanned autonomous aquatic surface vessels carrying a payload of environmental science sensors. This architecture allows a land-based human scientist to effectively supervise data gathering by multiple robotic assets that implement a web of widely dispersed mobile sensors for in situ study of physical, chemical or bio-logical processes in water or in the water/atmosphere interface.

Adaptive Multi-Robot Wide-Area Exploration and Mapping

Bryan Kian Hsiang Low, John M. Dolan+, and Pradeep K. Khosla+

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 23-30, 2008
22.2% acceptance rate; also presented as a poster in RSS 2009 Workshop on Aquatic Robots and Ocean Sampling

Abstract. The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multi-robot exploration strategy that is novel in performing both wide-area coverage and hotspot sampling using non-myopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multi-robot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log-Gaussian processes, and analyze if the resulting strategies are adaptive and maximize wide-area coverage and hotspot sampling. Solving MASP is non-trivial as it comprises continuous state components. So, it is reformulated for convex analysis, which allows discrete-state monotone-bounding approximation to be developed. We provide a theoretical guarantee on the policy quality of the approximate MASP (aMASP) for using in MASP. Although aMASP can be solved exactly, its state size grows exponentially with the number of stages. To alleviate this computational difficulty, anytime algorithms are proposed based on aMASP, one of which can guarantee its policy quality for MASP in real time.

Adaptive Sampling for Multi-Robot Wide-Area Exploration

Bryan Kian Hsiang Low, Geoffrey J. Gordon, John M. Dolan+, and Pradeep K. Khosla+

IEEE International Conference on Robotics and Automation (ICRA), pages 755-760, 2007

Abstract. The exploration problem is a central issue in mobile robotics. A complete coverage is not practical if the environment is large with a few small hotspots, and the sampling cost is high. So, it is desirable to build robot teams that can coordinate to maximize sampling at these hotspots while minimizing resource costs, and consequently learn more accurately about properties of such environmental phenomena. An important issue in designing such teams is the exploration strategy. The contribution of this paper is in the evaluation of an adaptive exploration strategy called adaptive cluster sampling (ACS), which is demonstrated to reduce the resource costs (i.e., mission time and energy consumption) of a robot team, and yield more information about the environment by directing robot exploration towards hotspots. Due to the adaptive nature of the strategy, it is not obvious how the sampled data can be used to provide unbiased, low-variance estimates of the properties. This paper therefore discusses how estimators that are Rao-Blackwellized can be used to achieve low error. This paper also presents the first analysis of the characteristics of the environmental phenomena that favor the ACS strategy and estimators. Quantitative experimental results in a mineral prospecting task simulation show that our approach is more efficient in exploration by yielding more minerals and information with fewer resources and providing more precise mineral density estimates than previous methods.

Autonomic Mobile Sensor Network with Self-Coordinated Task Allocation and Execution

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews (Special Issue on Engineering Autonomic Systems), volume 36, issue 3, pages 315-327, 2006
Extended version of our IJCAI 2003, ICRA 2004, and AAAI 2004 papers; Andrew P. Sage Best Transactions Paper Award for the best paper published in IEEE Trans. SMC - Part A, B, and C in 2006

Abstract. This paper describes a distributed layered architecture for resource-constrained multirobot cooperation, which is utilized in autonomic mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme self-organizes the robot coalitions to track efficiently across regions. It uses concepts of ant behavior to self-regulate the regional distributions of robots in proportion to that of the moving targets to be tracked in a nonstationary environment. As a result, the adverse effects of task interference between robots are minimized and network coverage is improved. In the lower task execution layer, the robots use self-organizing neural networks to coordinate their target tracking within a region. Both layers employ self-organization techniques, which exhibit autonomic properties such as self-configuring, self-optimizing, self-healing, and self-protecting. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auction-based negotiation show that our layered approach can provide better coverage, greater robustness to sensor failures, and greater flexibility to respond to environmental changes.

An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

Neural Computation, volume 17, issue 6, pages 1411-1445, 2005
Extended version of our AAMAS 2002, ICRA 2003, and IJCAI 2003 papers

Abstract. Self-organizing feature maps such as extended Kohonen maps (EKMs) have been very successful at learning sensorimotor control for mobile robot tasks. This letter presents a new ensemble approach, cooperative EKMs with indirect mapping, to achieve complex robot motion. An indirect-mapping EKM self-organizes to map from the sensory input space to the motor control space indirectly via a control parameter space. Quantitative evaluation reveals that indirect mapping can provide finer, smoother, and more efficient motion control than does direct mapping by operating in a continuous, rather than discrete, motor control space. It is also shown to outperform basis function neural networks. Furthermore, training its control parameters with recursive least squares enables faster convergence and better performance compared to gradient descent. The cooperation and competition of multiple self-organized EKMs allow a nonholonomic mobile robot to negotiate unforeseen, concave, closely spaced, and dynamic obstacles. Qualitative and quantitative comparisons with neural network ensembles employing weighted sum reveal that our method can achieve more sophisticated motion tasks even though the weighted-sum ensemble approach also operates in continuous motor control space.

Task Allocation via Self-Organizing Swarm Coalitions in Distributed Mobile Sensor Network

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

AAAI Conference on Artificial Intelligence, pages 28-33, 2004
26.7% acceptance rate

Abstract. This paper presents a task allocation scheme via self-organizing swarm coalitions for distributed mobile sensor network coverage. Our approach uses the concepts of ant behavior to self-regulate the regional distributions of sensors in proportion to that of the moving targets to be tracked in a non-stationary environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. Quantitative comparisons with other tracking strategies such as static sensor placement, potential fields, and auction-based negotiation show that our approach can provide better coverage and greater flexibility to respond to environmental changes.

Reactive, Distributed Layered Architecture for Resource-bounded Multi-robot Cooperation: Application to Mobile Sensor Network Coverage

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

IEEE International Conference on Robotics and Automation (ICRA), pages 3747-3752, 2004

Abstract. This paper describes a reactive, distributed layered architecture for cooperation of multiple resource-bounded robots, which is utilized in mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme self-organizes the robot coalitions to track efficiently in separate regions. It uses the concepts of ant behavior to self-regulate the regional distributions of robots in proportion to that of the targets to be tracked in the changing environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. In the lower layer, the robots use self-organizing neural networks to coordinate their target tracking within a region. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auction-based negotiation show that our approach can provide better coverage and greater flexibility in responding to environmental changes.

Continuous-Spaced Action Selection for Single- and Multi-Robot Tasks Using Cooperative Extended Kohonen Maps

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

IEEE International Conference on Networking, Sensing and Control (ICNSC) (Invited Paper to Special Session on Visual Surveillance), pages 198-203, 2004

Abstract. Action selection is a central issue in the design of behavior-based control architectures for autonomous mobile robots. This paper presents an action selection framework based on an assemblage of self-organizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: self-organization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps so that more complex motion tasks can be achieved. Qualitative and quantitative comparisons for both single- and multi-robot motion tasks show that our framework can provide better action selection than do action superposition methods.

Action Selection for Single- and Multi-Robot Tasks Using Cooperative Extended Kohonen Maps

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

International Joint Conference on Artificial Intelligence (IJCAI), pages 1505-1506, 2003
27.6% acceptance rate

Abstract. This paper presents an action selection framework based on an assemblage of self-organizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: self-organization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative and quantitative comparisons for single- and multi-robot tasks show our framework can provide better action selection than do potential fields method.

Action Selection in Continuous State and Action Spaces by Cooperation and Competition of Extended Kohonen Maps

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1056-1057, 2003

Abstract. This paper presents an action selection framework based on an assemblage of self-organizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: self-organization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative tests demonstrate the capability of our action selection method for both single- and multi-robot motion tasks.

Enhancing the Reactive Capabilities of Integrated Planning and Control with Cooperative Extended Kohonen Maps

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

IEEE International Conference on Robotics and Automation (ICRA), pages 3428-3433, 2003

Abstract. Despite the many significant advances made in robot motion research, few works have focused on the tight integration of high-level deliberative planning with reactive control at the lowest level. In particular, the real-time performance of existing integrated planning and control architectures is still not optimal because the reactive control capabilities have not been fully realized. This paper aims to enhance the low-level reactive capabilities of integrated planning and control with Cooperative Extended Kohonen Maps for handling complex, unpredictable environments so that the work-load of the high-level planner can be consequently eased. The enhancements include fine, smooth motion control, execution of more complex motion tasks such as overcoming unforeseen concave obstacles and traversing between closely spaced obstacles, and asynchronous execution of behaviors.

A Hybrid Mobile Robot Architecture with Integrated Planning and Control

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 219-226, 2002
26% acceptance rate

Abstract. Research in the planning and control of mobile robots has received much attention in the past two decades. Two basic approaches have emerged from these research efforts: deliberative vs. reactive. These two approaches can be distinguished by their different usage of sensed data and global knowledge, speed of response, reasoning capability, and complexity of computation. Their strengths are complementary and their weaknesses can be mitigated by combining the two approaches in a hybrid architecture. This paper describes a method for goal-directed, collision-free navigation in unpredictable environments that employs a behavior-based hybrid architecture with asynchronously operating behavioral modules. It differs from existing hybrid architectures in two important ways: (1) the planning module produces a sequence of checkpoints instead of a conventional complete path, and (2) in addition to obstacle avoidance, the reactive module also performs target reaching under the control of a self-organizing neural network. The neural network is trained to perform fine, smooth motor control that moves the robot through the checkpoints. These two aspects facilitate a tight integration between high-level planning and low-level control, which permits real-time performance and easy path modification even when the robot is en route to the goal position.

Integrated Planning and Control of Mobile Robot with Self-Organizing Neural Network

Bryan Kian Hsiang Low, Wee Kheng Leow+, and Marcelo H. Ang, Jr.+

IEEE International Conference on Robotics and Automation (ICRA), pages 3870-3875, 2002

Abstract. Despite the many significant advances made in robotics research, few works have focused on the tight integration of task planning and motion control. Most integration works involve the task planner providing discrete commands to the low-level controller, which performs kinematics and control computations to command the motor and joint actuators. This paper presents a framework of the integrated planning and control for mobile robot navigation. Unlike existing integrated approaches, it produces a sequence of checkpoints instead of a complete path at the planning level. At the motion control level, a neural network is trained to perform motor control that moves the robot from one checkpoint to the next. This method allows for a tight integration between high-level planning and low-level control, which permits real-time performance and easy modification of motion path while the robot is enroute to the goal position.

Technical Reports

Data-Centric AI: Through the Lens of Data Valuation and Beyond

Zhaoxuan Wu

Ph.D. Thesis, Institute of Data Science, National University of Singapore, 2024

Abstract. As a part of artificial intelligence (AI), modern machine learning (ML) models have achieved remarkable performances across various applications. Conventional methodologies in AI/ML have predominantly focused on model-centric approaches, which prioritize the design of specialized architectures for improved performance. In today's rapidly evolving landscape of AI and ML, there is increasing recognition that data-centric approaches should complement model-centric ones in the AI/ML development cycle. The data-centric approach acknowledges that the quantity, quality and relevance constitute the value of data, which can significantly influence the performance and reliability of AI systems. This thesis discusses novel data-centric methodologies through the lens of data valuation, which we identify as one of the key components in data-centric AI. The works in this thesis aim to address the following research question: "Can data-centric approaches be designed to identify and utilize data of greater quantity, higher quality and closer relevance, thereby enhancing the performance of machine learning models in an effective and efficient manner?" Firstly, data valuation methods typically associate the value of data with the learning performance of a trained model measured on the validation data. However, finding a suitable validation dataset is practically challenging, and achieving consensus among data providers on the selection of the validation dataset is even more difficult, given the differing interests of data providers. Another practical issue is that of data replication: Given the value of some data points, a dishonest data provider may replicate these data points to exploit the valuation for a larger reward or payment. To address these challenges, we introduce a robust volume data valuation method that leverages the inherent diversity of data without requiring a validation set. This method has a theoretical guarantee of robustness against data replication, based on the intuition that replicating the same data points does not increase the diversity of data. Secondly, existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To address this challenge, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DaVinz) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. This training-free approach not only retains the properties of conventional methods but also enhances practical efficiency, making data valuation feasible for applications involving large and complex DNNs. Then, since collecting high-quality data incurs nontrivial costs, it is important to motivate potentially competing clients to contribute more through an incentive mechanism. The existing incentive mechanisms typically reward clients with external resources (e.g., money) post-training, which could be impractical under budget constraints. To address these problems, we have introduced an incentive-aware mechanism that offers differentiated training-time model rewards for each client at each federated training iteration. We theoretically show that this mechanism incentivizes greater data contribution from clients and ensures the global objective of incentivization through a local design. Through theoretical analyses, we further identify the issue of error propagation in model rewards and thus propose a stochastic reference-model recovery strategy to ensure that all the clients eventually obtain the optimal model in the limit. This mechanism not only motivates continual data contribution but also ensures model convergence, addressing both practical and theoretical aspects of incentive mechanisms in federated learning. Lastly, we go beyond data valuation and investigate data-centric methods in the era of large language models (LLMs). LLMs have the capability to learn from a handful of input-label demonstrations (i.e., data exemplars) included in the prompt to perform downstream tasks. However, the quality of these data exemplars in the prompt greatly impacts performance. From a data-centric perspective, it is hence crucial to study how the quality and relevance of exemplars influence LLM performance and to develop data selection methods to find the optimal ordered set of exemplars for a given downstream task. We have proposed a novel method named EASE, which leverages a neural bandit algorithm to optimize ordered sets of data exemplars. Thanks to the general formulation of the data selection problem, we further extend EASE to jointly select the instruction, which is another essential component in the prompt given to the LLM. Hence, our EASE maximizes LLM performance by integrating both exemplar and instruction optimization, providing a holistic solution to automated data selection challenges in LLMs.

Algorithms for Collaborative Machine Learning: Data Sharing and Model Sharing Perspectives

Chi Thanh Lam

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2023

Abstract. The last decade has witnessed unprecedented advances in machine learning (ML), especially in deep learning. Deep neural networks possess remarkable generalization power, surpassing human-level performance on many tasks. These advances in ML crucially rely on the availability of massive and high-quality datasets. Since more data often leads to better generalization performance, it would be beneficial for multiple parties to aggregate their data and jointly train an ML model. This is known as the collaborative machine learning (CML) framework. Despite its early success, various social, economic, and legal considerations associated with sharing personal or proprietary data hinder collaborative machine learning. The recent commoditization of private data has led to the emergence of personal data ownership, which advocates for data owners to have control over the use of their data and receive compensation in return. The first work studies CML in relation to personal data ownership and addresses the challenge of data procurement from self-interested and strategic data owners. Monetary compensation is provided to data owners in exchange for their contributed data. We consider a scenario where data owners cannot fabricate data but may misreport their costs to maximize their profit. Our objective is to develop mechanisms that incentivize truthful reporting of costs while optimizing predictive performance. The proposed mechanisms draw inspiration from economic principles, game theory, and mechanism design to create a mutually beneficial situation where data owners are motivated to share their data in exchange for fair compensation. In addition to personal data ownership, growing concerns over privacy and security have hindered different parties from sharing their data. As an alternative, parties may be more willing to share machine learning models that contain distilled, useful information from the data, without revealing the architecture or parameters of their pre-trained "black-box" model. This line of research is known as model fusion (MF), which explores how parties should share pre-trained models and how to collectively learn a single ML model using these shared models. Despite recent advances, there are still important learning scenarios in which existing algorithms cannot be readily applied. We identify such important scenarios and propose a novel MF framework and algorithm for each of them. The second work investigates model fusion in a black-box and multi-task setting, in which each party's black-box model is pre-trained to solve a different yet related task. To address this multi-task challenge, we develop a new fusion paradigm that represents each expert as a distribution over a spectrum of predictive prototypes, which are isolated from task-specific information encoded within the prototype distribution. The task-agnostic prototypes can then be reintegrated to generate a new model that solves a new task encoded with a different prototype distribution. The fusion and generalization performance of the proposed framework is demonstrated empirically on several real-world benchmark datasets. The third work examines model fusion in the context of personalized learning. Production systems operating on a growing domain of analytic services often require generating warm-start solution models for emerging tasks with limited data. One potential approach to address this warm-start challenge is to adopt meta-learning to generate a base model that can be adapted to solve unseen tasks with minimal fine-tuning. This however requires the training processes of previous solution models of existing tasks to be synchronized. This is not possible if these models were pre-trained separately on private data owned by different entities at different timeframes and cannot be synchronously re-trained. To accommodate such scenarios, we develop a new personalized learning framework that synthesizes customized models for unseen tasks via the fusion of independently pre-trained models of related tasks. We establish PAC-Bayes style performance guarantees for the proposed framework and demonstrate its effectiveness on both synthetic and real datasets. In summary, this thesis contributes to the field of collaborative machine learning from two distinct perspectives. The first perspective focuses on data-centric CML, where the thesis proposes novel mechanisms that incentivize data sharing among participating parties. The second perspective of this thesis revolves around model- centric CML, where a principled collaborative intelligence framework is developed for a group of heterogeneous learning parties. Together, the results and findings of this thesis address many contemporary challenges of collaborative machine learning, including data ownership, privacy, and security concerns, and have the potential to facilitate the mainstream adoption of collaborative approaches in machine learning.

Towards Human-Centric AI: Inverse Reinforcement Learning Meets Algorithmic Fairness

Sreejith Balakrishnan

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2023

Abstract. AI decision-making systems have been widely implemented in many real-world domains where their outcomes substantially impact society. This has led to a growing emphasis on Human-centric AI, a perspective that AI should be designed with the goals and ethical principles of the end user in mind. This thesis focuses on two critical aspects of human-centric AI - value alignment and fairness. Value alignment ensures that the goals and values of the AI systems align with those of the human stakeholders. Fairness in algorithmic decisions helps prevent bias and discrimination from propagating in society. Inverse reinforcement learning (IRL) offers a potential solution to the challenge of value alignment of goals in Reinforcement Learning. Despite recent advancements, IRL suffers from an unidentifiability problem - multiple reward functions can lead to the observed expert behaviour, and the actual reward function is not identifiable without additional domain knowledge or supplementary information. To address this challenge, we adopt the perspective that an IRL algorithm should return a characterization of the reward function space instead of a single solution. The first work presented in our thesis proposes an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian optimization along with our newly proposed _-projection kernel that (a) projects the parameters of policy-invariant reward functions to a single point in a latent space and (b) ensures nearby points in the latent space correspond to reward functions yielding similar likelihoods. This projection allows the use of standard stationary kernels in the latent space to capture the correlations present across the reward function space. Empirical results on synthetic and real-world environments (model-free and model-based) show that BO-IRL discovers multiple reward functions while minimizing the number of expensive exact policy optimizations. In our second work, we explore algorithmic fairness, which focuses on incorporating fairness in AI-generated decisions. Our work is motivated by the realization that there is no universal definition of fairness. Furthermore, most stakeholders and policy-makers often disagree on the suitability of a fairness principle to a given scenario and are unable to anticipate the outcomes under a given fairness principle. To this end, we propose Scales, a general framework that translates various fairness principles into optimal decisions by representing them as elements of Scales-CMDP, a new variant of the Constrained Markov Decision Process (CMDP). With the help of causal language, our framework can place constraints on both the procedure of decision-making (procedural fairness) as well as the outcomes resulting from decisions (outcome fairness). Specifically, we show that fairness principles can be translated to a combination of a utility component, a non-causal component, or a causal component which can, in turn, be mapped to a Scales-CMDP. Scales is a powerful impact analysis tool that provides useful insights into decisions made under various fairness principles. We illustrate SCALES using a set of case studies involving a simulated healthcare scenario and the real-world COMPAS dataset. Our experiments in single-step and sequential decision-making scenarios demonstrate that our framework produces fair policies that abide by constraints under various fairness principles. Our final work unifies the concepts of value alignment of goals and fairness by extending the IRL problem to scenarios where the expert agent follows one of K given fairness principles hidden from the learner. The unidentifiability problem is exacerbated in this setting due to the presence of multiple valid combinations of reward functions and fairness principles that can make the expert's trajectories optimal. To tackle this challenge, we propose FAIR-BOIRL, a new BO-based IRL algorithm inspired by BO-IRL that sequentially searches for valid solutions across both the reward function space and fairness principles. To efficiently allocate the evaluation budget, FAIR-BOIRL uses IM-GPTS, a novel acquisition function that prioritizes fairness principles that are more likely to explain the observed behaviour while disregarding those that are less promising. IM-GPTS uses an implicit multi-arm bandit strategy to identify the most promising reward function and the corresponding fairness principle at each iteration of FAIR-BOIRL. IM-GPTS utilizes the popular Gaussian Process Thompson Sampling (GP-TS) and naturally trades-off exploration and exploitation across both the reward function space and fairness principles. Our experimental results indicate a significant improvement in sample efficiency under FAIR-BOIRL, with the acquisition function allocating most of the evaluation budgets to fairness principles that are more likely to generate the given expert trajectories.

Understanding and Improving Neural Architecture Search

Yao Shu

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2022

Abstract. Over the past decade, various famous deep neural network (DNN) architectures have been devised and have achieved superhuman performance for a wide range of tasks. Designing these neural networks, however, typically incurs substantial efforts from domain experts by trials and errors. Such human efforts gradually become unaffordable with an increasing demand for customizing DNNs for different tasks. To this end, neural architecture search (NAS) has been widely applied to automate the design of neural networks in recent years. In the literature, a number of NAS algorithms have been proposed, aiming to further improve the search efficiency and effectiveness of NAS, i.e., to reduce the search cost and improve the generalization performance of the selected architectures, respectively. Despite these advances, there are still certain essential aspects in NAS that have not been well investigated in the literature, which however may help us to understand and even further improve popular NAS algorithms. Firstly, only a few efforts have been devoted to understanding the neural architectures selected by popular NAS algorithms in the literature. In the first work of this thesis, we take the first step of understanding popular NAS algorithms by answering the following questions: What types of architectures are selected by popular NAS algorithms and why they are selected? In particular, we reveal that existing NAS algorithms (e.g., DARTS, ENAS) tend to favor architectures with wide and shallow cell structures. These favorable architectures consistently achieve fast convergence and are consequently selected by NAS algorithms. Our empirical and theoretical studies further confirm that their fast convergence derives from their smooth loss landscape and accurate gradient information. Nonetheless, these architectures may not necessarily lead to better generalization performance than other candidate architectures in the same search space, and therefore further improvement is possible by revising existing NAS algorithms. Secondly, standard NAS algorithms typically aim to select only a single neural architecture from the search spaces and thus have overlooked the capability of other candidate architectures in helping improve the performance of their final selected architecture. To this end, we present two novel sampling algorithms under our Neural Ensemble Search via Bayesian Sampling (NESBS) framework that can effectively and efficiently select a well-performing ensemble of neural architectures from NAS search space. Compared with state-of-the-art NAS algorithms and other well-known ensemble search baselines, our NESBS algorithms are shown to be able to achieve improved performance in both classification and adversarial defense tasks on various benchmark datasets while incurring a comparable search cost to these NAS algorithms. Thirdly, the search efficiency of popular NAS algorithms in the literature is severely limited by the need for model training during the search process. To overcome this limitation, we propose a novel NAS algorithm called NAS at Initialization (NASI) that exploits the capability of Neural Tangent Kernel (NTK) in being able to characterize the converged performance of candidate architectures at initialization, hence allowing model training to be completely avoided to boost the search efficiency. Besides the improved search efficiency, NASI also achieves competitive search effectiveness on various datasets like CIFAR-10/100 and ImageNet. Further, NASI can guarantee the benefits of being label- and data-agnostic under mild conditions, i.e., the provable transferability of architectures selected by our NASI over different datasets. Finally, though recent NAS algorithms using training-free metrics are able to select well-performing architectures in practice, the reason why training-free NAS using these metrics performs well and the answer to the question of how training-free NAS can be further boosted still have not been fully understood. To this end, we provide a unified theoretical analysis for gradient-based training-free NAS in this paper to understand why training-free metrics work well in practice. By exploiting these theoretical understandings, we then develop a novel NAS framework called Hybrid Neural Architecture Search (HNAS) that consistently improves training-free NAS in a principled way. Remarkably, HNAS can enjoy the advantages of both training-free (i.e., the superior search efficiency) and training-based NAS (i.e., the remarkable search effectiveness), which we have demonstrated through extensive experiments."

Exploiting Gradient Information for Modern Machine Learning Problems

Yizhou Chen

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2022

Abstract. Many deep learning achievements are attributed to the back-propagation (BP) algorithm, which exploits gradient information of the deep neural network (DNN) models: BP efficiently computes the gradient of the loss function with respect to the weights of a DNN for a batch of examples, and such gradient can be used by stochastic gradient descent to perform learning / optimization of the DNN model. Despite recent advances in deep learning like DNN training, there are still important scenarios where we can also use gradient to tackle optimization difficulty. In a broader aspect of deep learning rather than DNN training, a significant challenge faced by ML practitioners is thus whether we can design efficient algorithms to use the model gradient in the training / optimization in various deep learning scenarios. This thesis identifies four important scenarios and, for each of them, proposes a novel algorithm to utilize the gradient information for effective optimization that is both theoretically grounded and practically effective. Firstly, the training process of a machine learning (ML) model may be subject to adversarial attacks from an attacker who attempts to undermine the test performance of the ML model by perturbing the training minibatches, and thus needs to be protected by a defender. Such a problem setting is referred to as training-time adversarial ML. We formulate it as a two-player game and propose a principled Recursive Reasoning-based Training-Time adversarial ML (R2T2) framework to model this game. R2T2 models the reasoning process between the attacker and the defender and captures their bounded reasoning capabilities (due to bounded computational resources) through the recursive reasoning formalism. In particular, we associate a deeper level of recursive reasoning with the use of a higher-order gradient to derive the attack (defense) strategy, which naturally improves its performance while requiring greater computational resources. R2T2 can empirically achieve state-of-the-art attack and defense performances on benchmark image datasets. Secondly, probabilistic modeling with neural network architectures constitute a well-studied and popular area of deep learning. In contrast to a frequentist approach which is easy to overfit to the available dataset and risk learning unwanted biasing in the dataset, Gaussian process (GP) models were introduced as a fully probabilistic substitute and is one of the dominant approaches in Bayesian learning. A multi-layer deep Gaussian process (DGP) model is a hierarchical composition of GP models with a greater expressive power, and is more useful when dealing with complicated dataset. Exact DGP inference is intractable, and the approximation methods either yields a biased posterior belief (deterministic approximation by variational inference) or is computationally costly (stochastic approximation by Monte Carlo sampling). These difficulties have motivated our recent development of an implicit posterior variational inference (IPVI) framework for DGPs that can ideally recover an unbiased posterior belief and still preserve time efficiency. However, as a generator and a discriminator are integrated in each layer of the DGP, the training becomes unstable and is prone to optimization difficulties. To resolve such issues, we propose a novel gradient-bridging architecture of the generator and discriminator for the DGP model, which uses the inducing inputs as the context, thus leads to faster training and more accurate predictions. Empirical evaluation shows that IPVI with our proposed architecture outperforms the state-of-the-art methods for DGPs. Thirdly, many widely adopted Bayesian meta-learning frameworks model the uncertainty in the predictions with a set of particles or a variational distribution (of the meta-parameters), which does not allow latent task modeling.1 We present a novel implicit process-based meta-learning (IPML) algorithm that, in contrast to existing works, explicitly represents each task as a continuous latent vector and models its probabilistic belief within the highly expressive implicit processes (IP) framework. IP is a stochastic process with highly flexible implicit priors over functions, and is suitable as a Bayesian (meta) learning model for complicated datasets (e.g., when the priors are non-Gaussian unlike in GP). We tackle the meta-training in IPML with a novel expectation-maximization algorithm based on the stochastic gradient Hamiltonian Monte Carlo sampling method. Our delicate design of the neural network architecture for meta-training in IPML allows competitive meta-learning performance to be achieved. Unlike existing works, IPML offers the benefits of being amenable to the characterization of a principled distance measure between tasks using the maximum mean discrepancy, active task selection without needing the assumption of known task contexts, and synthetic task generation by modeling task-dependent input distributions. Empirical evaluation on benchmark datasets shows that IPML outperforms existing Bayesian meta-learning algorithms. Last but not least, in the problem of active task selection which involves selecting the most informative tasks for meta-learning, we propose a novel active task selection criterion based on the mutual information between latent task vectors. Unfortunately, such a criterion scales poorly in the number of candidate tasks when optimized. To resolve this issue, we exploit the submodularity property of our new criterion for devising the first active task selection algorithm for meta-learning with a near-optimal performance guarantee. To further improve our efficiency, we propose an online variant of the Stein variational gradient descent to perform fast belief updates of the meta-parameters via maintaining a set of forward (and backward) particles when learning (or unlearning) from each selected task. We empirically demonstrate the superior performance of our proposed algorithm on real-world datasets.

Automated Kernel Selection for Gaussian Processes on Large Datasets

Tong Teng

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2021

Abstract. In the machine learning community, how to choose an appropriate model to capture the underlying pattern of a dataset has always been of great concern. In particular, kernel selection is an important sub-domain of model selection for kernel-based models such as Gaussian processes (GPs). However, the complex data patterns and the vast space of candidate kernels render manually selecting an appropriate kernel infeasible. To this end, automated kernel selection (AKS) algorithms have been developed over the past few years. In spite of their appealing performance in discovering better kernels than manually designed ones, selecting an appropriate kernel for large (e.g., million-sized) datasets is still challenging. Moreover, the kernel uncertainty referring to how confident we are in the selected kernel was not considered, which may result in overconfident predictions. This thesis aims at developing scalable and efficient AKS algorithms for GPs that consider kernel uncertainty, and improving the predictive performance with a limited time budget. As an initial step, a scalable variational Bayesian kernel selection (VBKS) algorithm is developed to take into account the uncertainty in kernel selection for large datasets. Instead of selecting a single kernel and ignoring the uncertainty, VBKS considers the kernel as a random variable and learns its posterior belief from data with variational inference (VI). Consequently, with Bayesian model averaging (BMA), we take advantage of using multiple kernels for better predictive performance instead of relying on a single kernel. VBKS requires jointly optimizing a large number of hyperparameters of multiple kernels, which is computationally challenging. To address this challenge, the objective function is decomposed into an additive form depending on individual kernels. Stochastic gradient method is then applied to scale the algorithm up to large datasets. Evaluating a kernel on large datasets still takes a long time even with the stochastic gradient method. To further improve time efficiency, early pruning for AKS (EP-AKS) is designed. Starting with training a large set of GP models specified by different kernels, this algorithm is able to terminate the training of under-performing models in an early stage of training. To trade off between pruning effectiveness and time efficiency, the partially completed learning curves during the iterative training procedure are exploited to make prediction of the kernel performance upon convergence. As the kernels are inherently correlated, a multi-output Gaussian process (MOGP) model is adopted to simultaneously model the learning curves of multiple kernels and similarities among kernels. The EP-AKS algorithm significantly reduces the time incurred while preserving comparable predictive performance on test data to that of not applying pruning. To explore the vast kernel space more efficiently, we designed the F-Vec algorithm for figuring out potentially good kernels at a low cost. Different from the existing algorithm based on the approximated distances among kernels, F-Vec transfers the kernels originally in functional forms into feature vectors. When making comparison, we investigate various metrics for approximating kernel distance besides the one used in the literature. We empirically show that the F-Vec method leads to a more efficient exploration than the algorithm based on the approximated distances.

Sample-Efficient Automated Machine Learning with Bayesian Optimization

Zhongxiang Dai

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2021

Abstract. Automated hyperparameter optimization of machine learning (ML) models, referred to as AutoML, has been a challenging problem for practitioners, mainly due to the high computational cost of training modern ML models and the lack of gradient information with respect to the model hyperparameters. To this end, the black-box optimization method of Bayesian optimization (BO) has become a prominent method for optimizing the hyperparameters of ML models, which can be attributed to its impressive sample efficiency and theoretical convergence guarantee. Despite recent advances, there are still important scenarios where we can further improve the sample efficiency of BO for AutoML by exploiting naturally available auxiliary information, or extend the applicability of BO to other ML tasks. This thesis identifies five such important scenarios and, for each of them, proposes a novel BO algorithm that is both theoretically grounded and practically effective. Firstly, many ML models require an iterative training process, which requires every hyperparameter evaluation during BO to run for a certain number of training epochs. As a result, the auxiliary observations from intermediate training epochs can be exploited to early-stop the evaluations of those unpromising hyperparameter configurations to save resource. We propose the BO with Bayesian optimal stopping (BO-BOS) algorithm, which incorporates BOS into BO in order to improve the epoch efficiency of BO using a principled optimal stopping mechanism. BO-BOS preserves the asymptotic no-regret property of BO with our specified setting of BOS parameters which is amenable to an elegant interpretation in terms of the exploration-exploitation trade-off, and performs competitively in a number of AutoML experiments. Secondly, the widely celebrated federated learning (FL) setting requires first-order optimization techniques, and is hence unable to handle zeroth-order optimization tasks such as hyperparameter optimization. We extend BO into the FL setting (FBO) and derive the federated Thompson sampling (FTS) algorithm, to improve the efficiency of BO in the FL setting by employing auxiliary information from other agents. FTS tackles a number of major challenges faced by FBO in a principled way: FTS uses random Fourier features approximation to derive the parameters to be communicated in order to avoid sharing the raw data, adopts the Thompson sampling algorithm which reduces the number of parameters to be exchanged, and is robust against heterogeneous agents due to a robust theoretical convergence guarantee. Thirdly, the above-mentioned FTS algorithm, unfortunately, is not equipped with a rigorous privacy guarantee, which is an important consideration in FL. To this end, we integrate differential privacy (DP) into FTS through a general framework for adding DP to iterative algorithms. Moreover, we leverage the ability of this general DP framework to handle different parameter vectors, as well as the technique of local modeling for BO, to further improve the utility of our algorithm through distributed exploration (DE). The resulting DP-FTS-DE algorithm is able to improve an agent’s sample efficiency by exploiting auxiliary information from other agents, while rigorously hiding its participation in the algorithm. DP-FTS-DE is amenable to a number of interesting theoretical insights regarding the privacy-utility trade-off, and achieves competitive utilities with strong privacy guarantees in real-world experiments. Fourthly, when BO is used for hyperparameter optimization using a dataset, we often have access to previous completed hyperparameter optimization tasks using other potentially related datasets. This prompts the question as to whether we can leverage these previous completed tasks to improve the efficiency of the current BO task through meta-learning, while ensuring its robustness against dissimilar tasks. We introduce a scalable, principled and robust meta-BO algorithm called robust meta-Gaussian process-upper confidence bound (RM-GP-UCB). We show that RM-GP-UCB is asymptotically no-regret even when all previous tasks are dissimilar to the current task, and is amenable to a principled method to learn the weights assigned to the individual previous tasks through regret minimization via online learning. RM-GP-UCB achieves effective performances in a wide range of real-world experiments. Lastly, many ML tasks such as adversarial ML can be modeled as repeated games between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions. We introduce a recursive reasoning formalism of BO, called Recursive Reasoning-Based BO (R2-B2), which extends the applicability of BO to provide efficient strategies for players in this type of game. Under certain conditions, using R2-B2 to reason at one level higher than the other agents achieves faster asymptotic convergence to no regret than without using recursive reasoning. R2-B2 performs effectively in practice in adversarial ML and multi-agent reinforcement learning experiments.

Automated Machine Learning: New Advances on Bayesian Optimization

Dmitrii Kharkovskii

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2020

Abstract. Recent advances in Bayesian optimization (BO) have delivered a promising suite of tools for optimizing an unknown expensive to evaluate black-box objective function with a finite budget of evaluations. A significant advantage of BO is its general formulation: BO can be utilized to optimize any black-box objective function. As a result, BO has been applied in a wide range of applications such as automated machine learning, robotics or environmental monitoring, among others. Furthermore, its general formulation makes BO attractive for deployment in new applications. However, potential new applications can have additional requirements not satisfied by the classical BO setting. In this thesis, we aim to address some of these requirements in order to scale up BO technology for the practical use in new real-world applications. Firstly, this thesis tackles the problem of data privacy, which is not addressed by the standard setting of BO. Specifically, we consider the outsourced setting where the entity holding the dataset and the entity performing BO are represented by different parties, and the dataset cannot be released non-privately. For example, a hospital holds a dataset of sensitive medical records and outsources the BO task on this dataset to an industrial AI company. We present the private-outsourced-Gaussian process-upper confidence bound (PO-GP-UCB) algorithm, which is the first algorithm for privacy-preserving BO in the outsourced setting with a provable performance guarantee. The key idea of our approach is to make the BO performance of our algorithm similar to that of non-private GP-UCB run using the original dataset, which is achieved by using a random projection-based transformation that preserves both privacy and the pairwise distances between inputs. Our main theoretical contribution is to show that a regret bound similar to that of the standard GP-UCB algorithm can be established for our PO-GP-UCB algorithm. We empirically evaluate the performance of our algorithm with synthetic and real-world datasets. Secondly, we consider applications of BO for hotspot sampling in spatially varying phenomena. For such applications, we exploit the structure of the spatially varying phenomenon in order to increase the BO lookahead and, as a result, improve the performance of the algorithm and make it more suitable for practical use in real-world scenarios. To do this, we present a principled multi-staged Bayesian sequential decision algorithm for nonmyopic adaptive BO that, in particular, exploits macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we first generalize GP-UCB to a new acquisition function defined with respect to a nonmyopic adaptive macro-action policy, which, unfortunately, is intractable to be optimized exactly due to an uncountable set of candidate outputs. The key novel contribution of our work here is to show that it is in fact possible to solve for a nonmyopic adaptive ε-Bayes-optimal macro-action BO (ε-Macro-BO) policy given an arbitrary user-specified loss bound ε via stochastic sampling in each planning stage which requires only a polynomial number of samples in the length of macro-actions. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our ε-Macro-BO algorithm with a performance guarantee. Empirical evaluation on synthetic and real-world datasets shows that our proposed approach outperforms existing state-of-the-art algorithms. Finally, this thesis proposes a black-box attack for adversarial machine learning based on BO. Since the dimension of the inputs in adversarial learning is usually too high for applying BO directly, our proposed attack applies dimensionality reduction and searches for an adversarial perturbation in a low-dimensional latent space. The key idea of our approach is to automate both the selection of the latent space dimen- sion and the search of the adversarial perturbation in the selected latent space by using BO. Additionally, we use Bayesian optimal stopping to boost the query efficiency of our attack. Performance evaluation using image classification datasets shows that our proposed method outperforms the state-of-the-art black-box adversarial attacks.

New Advances in Bayesian Inference for Gaussian Process and Deep Gaussian Process Models

Haibin Yu

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2020

Abstract. Machine learning is the study of letting computers learn to perform a specific task in a data-driven manner. In particular, Bayesian machine learning has attracted enor- mous attention mainly due to their ability to provide uncertainty estimates following Bayesian inference. This thesis focuses on Gaussian processes (GPs), a rich class of Bayesian nonparametric models for performing Bayesian machine learning with formal measures of predictive uncertainty. However, the applicability of GP in large datasets and in hierarchical composition of GPs is severely limited by computational issues and intractabilities. Therefore, it is crucial to develop accurate and efficient inference algorithms to address these challenges. To this end, this thesis aims at proposing a series of novel approximate Bayesian inference methods for a wide variety of GP models, which unifies the previous literatures, significantly extends them and hopefully lays the foundation for future inference methods. To start with, this thesis presents a unifying perspective of existing inducing variables-based GP models, sparse GP (SGP) models and variational inference for SGP models (VSGP). Then, to further mitigate the issue of overfitting during optimization, we present a novel variational inference framework for deriving a family of Bayesian SGP regression models, referred to as variational Bayesian SGP (VBSGP) regression models. Next, taking into account the fact that the expressiveness of GP and SGP depends heavily on the design of the kernel function, we further extend the expressive power of GP by introducing Deep GP (DGP), which is a hierarchical composition of GP models. Unfortunately, exact inference in DGP is intractable, which has motivated the recent development of deterministic and stochastic approximation methods. However, the deterministic approximation methods yield a biased posterior belief while the stochastic one is computationally costly. In this regard, we present the implicit posterior variational inference (IPVI) framework for DGPs that can ideally recover an unbiased posterior belief and still preserve time efficiency. Inspired by generative adversarial networks, our IPVI framework casts the DGP inference problem as a two- player game in which a Nash equilibrium, interestingly, coincides with an unbiased posterior belief. We hope this thesis at least provides additional confidence and clarity for researchers who are devoting themselves to Bayesian nonparametric models, Gaussian process models in particular. Moreover, we also wish this thesis to offer inspirations for future works, and some thoughts that could be useful for future solutions.

An Alternative Information-Theoretic Criterion for Active Learning

Quoc Phong Nguyen

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2018

Abstract. Mutual information (MI) is a highly celebrated measure/criterion in the machine learning (ML) community, especially for active learning of ML models. However, when it cannot be evaluated exactly and needs to be approximated, the resulting active learning performance may be compromised. This thesis presents an alternative information-theoretic criterion called MI+ that, interestingly, can be related to MI in its formulation but, unlike MI, evaluated exactly and efficiently for active learning of several ML models. We provide various rigorous insights and perspectives to relate and differentiate MI vs. MI+ and discuss the practical implications of their differences to motivate the use of MI+ over MI for active learning. The advantage of using MI+ over MI is investigated for active learning of 3 different machine learning problems: Bayesian neural networks for regression, inverse reinforcement learning (IRL), and structure discovery for Gaussian processes (GP). In the Bayesian neural networks for regression, our empirical results show that MI+ can achieve the performance of a simple uncertainty sampling criterion which outperforms that of the approximate MI. In the inverse reinforcement learning, where there seems to be only a single work on active learning using the Bayesian IRL and the mean entropy (ME) criterion, our contributions include: (1) devising a Bayesian approach for nonlinear IRL with Gaussian processes (BGPIRL), that is naturally amenable to active learning, and is flexible to model nonlinear reward functions; (2) defining a general active learning problem for IRL that caters to varying realistic experts' preferences such as batch queries and demonstrated trajectories; (3) identifying the disadvantages of the current ME criterion in both computation and interpretation; then (4) introducing both MI, and MI+ criteria for active learning with BGPIRL, in which MI+ can leverage the conditional independence to achieve exact and efficient evaluation. In the structure discovery for Gaussian processes, MI+'s exact and efficient evaluation facilitates a gradient-based optimization for the batch mode active learning that outperforms the approximate MI as the batch size increases. Through the above diverse applications of MI+, together with the theoretical results, we hope that they serve as a guideline for the use of MI+ in other ML problems, and possibly, inspire the design of new active learning criteria.

Data-Efficient Machine Learning with Multiple Output Types and High Input Dimensions

Yehong Zhang

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2017

Abstract. Recent research works in machine learning (ML) have focused on learning some target variables of interest to achieve competitive (or state-of-the-art) predictive performance in less time but without requiring large quantities of data, which is known as data-efficient ML. This thesis focuses on two important data-efficient ML approaches: active learning (AL) and Bayesian optimization (BO) which, instead of learning passively from a given small set of data, need to select and gather the most informative observations for learning the target variables of interest more accurately given some budget constraints. To advance the state-of-the-art of data-efficient ML, novel generalizations of AL and BO algorithms are proposed in this thesis for addressing the issues arising from multiple output types and high input dimensions which are the practical settings in many real-world applications. In particular, this thesis aims to (a) exploit the auxiliary types of outputs which usually coexist and correlate well with the target output types, and more importantly, are less noisy and/or less tedious to sample for improving the learning performance of the target output type in both AL and BO algorithms and (b) scale up the state-of-the-art BO algorithm to high input dimensions. To achieve this, the specific data with multiple output types or high input dimensions is represented using some form of Gaussian process (GP)-based probabilistic regression models which allow the predictive uncertainty of the outputs to be formally quantified and consequently exploited for developing efficient AL and BO algorithms. To achieve above objectives, an AL algorithm of multi-output GP (MOGP) is first developed for minimizing the predictive uncertainty (i.e., posterior joint entropy) of the target output type. In contrast to existing works, our AL problems involve selecting not just the most informative sampling inputs to be observed but also the types of outputs at each selected input for improving the learning performance of only the target output type given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling inputs and selected observations when optimized. To resolve this issue, we exploit a structure common to sparse MOGP models for deriving a novel AL criterion. Furthermore, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for AL of MOGP and single-output GP models. Secondly, to boost the BO performance by exploiting the cheaper and/or less noisy observations of some auxiliary functions with varying fidelities, we proposed a novel generalization of predictive entropy search (PES) for multi-fidelity BO called multi-fidelity PES (MF-PES). In contrast to existing multi-fidelity BO algorithms, our proposed MF-PES algorithm can naturally trade off between exploitation vs. exploration over the target and auxiliary functions with varying fidelities without needing to manually tune any such parameters. To achieve this, we model the unknown target and auxiliary functions jointly as a convolved MOGP (CMOGP) whose convolutional structure is exploited to formally characterize the fidelity of each auxiliary function through its cross-correlation with the target function. Although the exact acquisition function of MF-PES cannot be computed in closed form, we show that it is in fact possible to derive an efficient approximation of MF-PES via a novel multi-output random features approximation of the CMOGP model whose cross-correlation (i.e., multi-fidelity) structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using the observations from evaluating these functions. Practical constraints are proposed to relate the global target maximizer to that of auxiliary functions. Empirical evaluation on synthetic and real-world experiments shows that MF-PES outperforms the state-of-the-art multi-fidelity BO algorithms. Lastly, to improve the BO performance in real-world applications with high input dimensions (e.g., computer vision, biology), we generalize PES for high-dimensional BO by exploiting an additive structure of the target function. New practical constraints are proposed and approximated efficiently such that the proposed acquisition function of additive PES (add-PES) can be optimized independently for each local and low-dimensional input component. The empirical results show that our add-PES considerably improves the performance of the state-of-the-art high-dimensional BO algorithms by using a simple and common setting for optimizing different tested functions with varying input dimensions, which makes it a superior alternative to existing high-dimensional BO algorithms.

Exploiting Decentralized Multi-Agent Coordination for Large-Scale Machine Learning Problems

Ruofei Ouyang

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2016

Abstract. Nowadays, the scale of machine learning problems becomes much larger than before. It raises a huge demand in distributed perception and distributed computation. A multi-agent system provides exceptional scalability for problems like active sensing and data fusion. However, many rich characteristics of large-scale machine learning problems have not been addressed yet such as large input domain, nonstationarity, and high dimensionality. This thesis identifies the challenges related to these characteristics from multi-agent perspective. By exploiting the correlation structure of data in large-scale problems, we propose multi-agent coordination schemes that can improve the scalability of the machine learning models while preserving the computation accuracy. To elaborate, the machine learning problems we are solving with multi-agent coordination techniques are: 1. Gaussian process regression. To perform distributed regression on a large-scale environmental phenomenon, data compression is often required due to the communication costs. Currently, decentralized data fusion methods encapsulate the data into local summaries based on a fixed support set. However in a large-scale field, this fixed support set, acting as a centralized component in the decentralized system, cannot approximate the correlation structure of the entire phenomenon well. It leads to evident losses in data summarization. Consequently, the regression performance will be significantly reduced. In order to approximate the correlation structure accurately, we propose an agent-centric support set to allow every agent in the data fusion system to choose a possibly different support set and dynamically switch to another one during execution for encapsulating its own data into a local summary which, perhaps surprisingly, can still be assimilated with the other agents’ local summaries into a globally consistent summary. Together with an information sharing mechanism we designed, the new decentralized data fusion methods with agent-centric support set can be applied to regression problems on a much larger environmental phenomenon with high performance. 2. Active learning. In the context of environmental sensing, active learning/active sensing is a process of taking observations to minimize the uncertainty in an environmental field. The uncertainty is quantified based on the correlation structure of the phenomenon which is traditionally assumed to be stationary for computational sake. In a large-scale environmental field, this stationary assumption is often violated. Therefore, existing active sensing algorithms perform sub-optimally for a nonstationary environmental phenomenon. To the best of our knowledge, our decentralized multi-robot active sensing (DEC-MAS) algorithm is the first work to address nonstationarity issue in the context of active sensing. The uncertainty in the phenomenon is quantified based on the nonstationary correlation structure estimated by Dirichlet process mixture of Gaussian processes. Further, our DEC-MAS algorithm can efficiently coordinate the exploration of multiple robots to automatically trade-off between learning the unknown, nonstationary correlation structure and minimizing the uncertainty of the environmental phenomenon. It enables multi-agent active sensing techniques to be applied to a large-scale nonstationary environmental phenomenon. 3. Bayesian optimization. Optimizing an unknown objective function is challenging for traditional optimization methods. Alternatively, in this situation, people use Bayesian optimization which is a modern optimization technique that can optimize a function by only utilizing the observation information (input and output values) collected through simulations. When the input dimension of the function is low, a few simulated observations can generate good result already. However, for high dimensional function, a huge number of observations are required which is impractical when the simulation consumes lots of time and resources. Fortunately, many high dimensional problems have sparse correlation structure. Our ANOVA-DCOP work can decompose the correlation structure in the original high-dimensional problem into many correlation structures of subsets of dimensions based on ANOVA kernel function. It significantly reduces the size of input space into a collection of lower-dimensional subspaces. Additionally, we reformulate the Bayesian optimization problem as a decentralized constrained optimization problem (DCOP) that can be efficiently solved by multi-agent coordination techniques so that it can scale up to problems with hundreds of dimensions.

New Advances on Bayesian and Decision-Theoretic Approaches for Interactive Machine Learning

Trong Nghia Hoang

Ph.D. Thesis, Department of Computer Science, National University of Singapore, 2015

Abstract. The exploration-exploitation trade-off is a fundamental dilemma in many interactive learning scenarios which include both aspects of reinforcement learning (RL) and active learning (AL): An autonomous agent, situated in an unknown environment, has to actively extract knowledge from the environment by taking actions (or conducting experiments) based on its previously collected information to make accurate predictions or to optimize some utility functions. Thus, to make the most effective use of their resource-constrained budget (e.g., processing time, experimentation cost), the agent must choose carefully between (a) exploiting options (e.g., actions, experiments) which are recommended by its current, possibly incomplete model of the environment, and (b) exploring the other ostensibly sub-optimal choices to gather more information. For example, an RL agent has to face a dilemma between (a) exploiting the most-rewarding action according to the current statistical model of the environment at the risk of running into catastrophic situations if the model is not accurate, and (b) exploring a sub-optimal action to gather more information so as to improve the model's accuracy at the potential price of losing the short-term reward. Similarly, an AL algorithm/agent has to consider between (a) conducting the most informative experiments according to its current estimation of the environment model's parameters (i.e., exploitation), and (b) running experiments that help improving the estimation accuracy of these parameters (i.e., exploration). More often, learning strategies that ignore exploration will likely exhibit sub-optimal performance due to their imperfect knowledge while, conversely, those that entirely focus on exploration might suffer the cost of learning without benefitting from it. Therefore, a good exploration-exploitation trade-off is critical to the success of those interactive learning agents: In order to perform well, they must strike the right balance between these two conflicting objectives. Unfortunately, while this trade-off has been well-recognized since the early days of RL, the studies of exploration-exploitation have mostly been developed for theoretical settings in the respective field of RL and, perhaps surprisingly, glossed over in the existing AL literature. From a practical point of view, we see three limiting factors: 1. Previous works addressing the exploration-exploitation trade-off in RL have largely focused on simple choices of the environment model and consequently, are not practical enough to accommodate real-world applications that have far more complicated environment structures. In fact, we find that most recent advances in Bayesian reinforcement learning (BRL) have only been able to analytically trade off between exploration and exploitation under a simple choice of models such as Flat-Dirichlet-Multinomial (FDM) whose independence and modeling assumptions do not hold for many real-world applications. 2. Nearly all of the notable works in the AL literature primarily advocate the use of greedy/myopic algorithms whose rates of convergence (i.e., the number of experiments required by the learning algorithm to achieve a desired performance in the worst case) are provably minimax optimal for simple classes of learning tasks (e.g., threshold learning). While these results have greatly ad- vanced our understanding about the limit of myopic AL in worst-case scenarios, significantly less is presently known about whether it is possible to devise nonmyopic AL strategies which optimize the exploration-exploitation trade-off to achieve the best expected performance in budgeted learning scenarios. 3. The issue of scalability of the existing predictive models (e.g., Gaussian processes) used in AL has generally been underrated since the majority of literature considers small-scale environments which only consist of a few thousand candidate experiments to be selected by single-mode AL algorithms one at a time prior to retraining the model. In contrast, large-scale environments usually have a massive set of million candidate experiments among which tens or hundreds of thousands should be actively selected for learning. For such data-intensive problems, it is often more cost-effective to consider batch-mode AL algorithms which select and conduct multiple experiments in parallel at each stage to collect observations in batch. Retraining the predictive model after incorporating each batch of observations then becomes a computational bottleneck as the collected dataset at each stage quickly grows up to tens or even hundreds of thousand data points. This thesis outlines some recent progresses that we have been able to make while working toward satisfactory answers to the above challenges, along with practical algorithms that achieve them: 1. In particular, in order to put BRL into practice for more complicated and practical problems, we propose a novel framework called Interactive Bayesian Reinforcement Learning (I-BRL) to integrate the general class of parametric models and model priors, thus allowing the practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the environment as often required in many real-world applications. Interestingly, we show how the nonmyopic Bayes-optimal policy can be derived analytically by solving I-BRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Our empirical studies show that the proposed approach performs competitively with the existing state-of-the-art algorithms. 2. Then, to establish a theoretical foundation for the exploration-exploitation trade-off in single-mode active learning scenarios with resource-constrained budgets, we present a novel ϵ-Bayes-optimal Decision-Theoretic Active Learning (ϵ-BAL) framework which advocates the use of differential entropy as a performance measure and consequently, derives a learning policy that can approximate the optimal expected performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). To meet the real-time requirement in time-critical applications, we then propose an asymptotically ϵ-optimal, branch-and-bound anytime algorithm based on ϵ-BAL with performance guarantees. In practice, we empirically demonstrate with both synthetic and real-world datasets that the proposed approach outperforms the state-of-the-art algorithms in budgeted scenarios. 3. Lastly, to facilitate the future developments of large-scale, nonmyopic AL applications, we further introduce a highly scalable family of anytime predictive models for AL which provably converge toward a well-known class of sparse Gaussian processes (SGPs). Unlike the existing predictive models of AL which cannot be updated incrementally and are only capable of processing middle-sized datasets (i.e., a few thousands of data points), our proposed models can process massive datasets in an anytime fashion, thus providing a principled trade-off between the processing time and the predictive accuracy. The efficiency of our framework is then demonstrated empirically on a variety of large-scale real-world datasets which contains hundreds of thousand data points.