My Conception of a Swarm of Agents. Courtesy of Cocoa Yeo (2008).
what's new
Invited to serve as a World Economic Forum’s Global Future Councils Fellow for the Council on the Future of Artificial Intelligence and Robotics, Sep 2016  Jun 2018


Invited to participate in Global AI Dialogue Series in Seoul, South Korea on 23 Jun 2017
hosted by Berkman Klein Center for Internet & Society at Harvard University
in collaboration with MIT Media Lab & K Governance & Media Lab


Our ICML 2017, AAAI 2017, ICML 2016, & IJCAI 2016 submissions are accepted!


Source codes for our parallel/distributed Gaussian processes (ICML 2016,
AAAI 2015, UAI 2013) are now available!


Source codes for our online/anytime Gaussian processes (ICML 2015,
AAAI 2014) are now available!


MOE AcRF Tier 2 Grant : Scaling up Gaussian Process Predictive Models for Big Data, SGD $737,461, Jul 2017  Jul 2020


SMART Subaward Agreement  FM IRG :
Automatic Probabilistic Machine Learning for Traffic Modeling and Prediction, SGD $45,000, Apr 2017  Mar 2018


Invited to serve as senior program committee members of AAMAS 2018, IJCAI 2015,
associate editors of IROS 2012 & ICRA 2011,
program committee members of AAAI 2010, 20162018, IJCAI 2011, 2015, 2017, AAMAS 20112014, 2016, RSS 2014,
ICAPS 20102012, 2018, and
reviewer of NIPS 20132016


Students interested to join MapleCG, click here for more info


members
I am looking for talented undergraduate and graduate students in NUS to join my MapleCG research group.
If you are really excited and motivated to be involved in novel research in the fields of
artificial intelligence, planning under uncertainty (i.e., decisiontheoretic, informationtheoretic),
robotics, multiagent systems (i.e., multiagent coordination, planning, and learning), game theory,
statistical machine learning, optimization, and/or swarm intelligence, please email me and we can set
up a time for discussion. Please also take some time to view our research projects.
I am currently advising the following students and research staff:

Dai, Zhongxiang
Ph.D. (Coadvised by Patrick Jaillet, MIT)
Recipient of SMART SMA3 Graduate Fellowship & ST Electronics Prizes for being the top Year 1 and Year 2 student in Electrical Engineering
B.Eng. in Electrical Engineering (Hons. 1st Class) > National University of Singapore
Research Interests: machine learning


Ryutaro Oikawa 及川竜太郎
Ph.D.
M.Sc. in Statistics > Imperial College London
B.Eng in Social Engineering > Tokyo Institute of Technology
Research Interests: reinforcement learning


Sreejith Balakrishnan
Ph.D.
M.Eng. in Electrical Engineering with Specialization in Computer Engineering > National University of Singapore
B.Eng. in Electrical and Electronics Engineering (Hons. 1st Class) with Minor in Computing > Nanyang Technological University
Research Interests: robotics, machine learning


Zhang, Jingfeng 张景锋
Ph.D. (Coadvised by Mohan Kankanhalli)
B.Eng. in Computer Science and Technology > Taishan College > Shandong University
Research Interests: privacypreserving machine learning


Teng, Tong 滕茼
Ph.D.
B.Eng. in Computer Science > Shandong University
Research Interests: probabilistic machine learning


Dmitrii Kharkovskii Дмитрий Алексеевич Харьковский
Ph.D.
Specialist degree in Math > St. Petersburg State University
Research Interests: privacypreserving machine learning, Bayesian optimization


Yu, Haibin 于海斌
Ph.D. (Coadvised by Patrick Jaillet, MIT)
Recipient of SMART SMA3 Graduate Fellowship
B.Eng. in Mechanical Engineering and Automation > Beihang University
Research Interests: Bayesian deep learning


Nguyễn, Quốc Phong
Ph.D. (Coadvised by Patrick Jaillet, MIT)
Recipient of Research Achievement Award, SMART SMA3 Graduate Fellowship, Lee Kuan Yew Gold Medal for best performing graduate in B.Eng. (Computing Engineering) programme, & IES Gold Medal for top graduating student in B.Eng. in Computer Engineering
B.Eng. in Computer Engineering > National University of Singapore, 2013
Research Interests: probabilistic machine learning, inverse reinforcement learning


Zhang, Yehong 张叶红
Ph.D. (Coadvised by Mohan Kankanhalli)
Recipient of AAAI 2016 Scholarship & Research Achievement Award
B.Eng. in Computer Science > Harbin Institute of Technology
Research Interests: probabilistic machine learning, active learning, Bayesian optimization


Chen, Jie 陈杰
Postdoctoral Associate, SMART FM (Coadvised by Patrick Jaillet, MIT)
Recipient of Dean's Graduate Research Excellence Award, Research Achievement Award, & UAI 2012 Scholarship
B.Eng. in Electrical Engineering > Taizhou University
M.Eng. in Computer Science > Zhejiang University
Ph.D. in Computer Science > National University of Singapore, Dec 2013
Ph.D. Thesis: Gaussian ProcessBased Decentralized Data Fusion & Active Sensing Agents: Towards LargeScale Modeling & Prediction of Spatiotemporal Traffic Phenomena
Research Interests: robotics, multiagent planning, machine learning


Lim, Kar Wai
Research Fellow, NUSSingtel Cyber Security Research and Development Laboratory (Coadvised by Mun Choon Chan)
Recipient of ACML 2016 Best Student Paper Award & AMP Prize for Honours Thesis in Actuarial Studies (Best Thesis Award)
Bachelor of Actuarial Studies (Hons. 1st Class) > Australian National University
Ph.D. in Computer Science > Australian National University, Dec 2016
Ph.D. Thesis: Nonparametric Bayesian Topic Modelling with Auxiliary Data
Research Interests: machine learning, Bayesian nonparametric methods, stochastic processes, point processes, Hawkes processes, Bayesian inference, Markov chain Monte Carlo methods


Hoang, Quang Minh
Research Assistant
Recipient of Lee Kuan Yew Gold Medal for best performing graduate in B.Comp. (Computational Biology) programme, Outstanding Undergraduate Researcher Award in National University of Singapore, & ICML 2015 Scholarship
B.Comp. in Computational Biology (Hons. 1st Class) > National University of Singapore, 2016
FYP Dissertation: A Probabilistic Approach for Protein Function Prediction with Hierarchical Structured Outputs
Research Interests: machine learning

Former Members

Hoàng, Trọng Nghĩa
Postdoctoral Fellow, MIT (Advised by Jonathan How)
Recipient of Dean's Graduate Research Excellence Award, President's Graduate Fellowship, Research Achievement Awards x 2, IJCAI 2013 Travel Grant Award, & AAMAS 2012 Scholarship
B.Sc. in Computer Science (Hons) > University of Science > Vietnam National University
Ph.D. in Computer Science > National University of Singapore, Feb 2015
Ph.D. Thesis: New Advances on Bayesian and DecisionTheoretic Approaches for Interactive Machine Learning
Research Interests: machine learning, multiagent planning
+ oral defense ◊ jan 21 ◊ 2K+15


Ouyang, Ruofei 欧阳若飞
Data Scientist, Wecash (Southeast Asia) Pte. Ltd.
Recipient of AAMAS 2014 Scholarship
B.Sc. in Computer Science > East China Normal University
Ph.D. in Computer Science > National University of Singapore, Dec 2016
Ph.D. Thesis: Exploiting Decentralized Multiagent Coordination for LargeScale Machine Learning Problems
Research Interests: Gaussian process, active sensing/learning, data fusion, Bayesian optimization
+ oral defense ◊ nov 7 ◊ 2K+16


Xu, Nuo 许诺
Data Scientist, Grab
Recipient of AAAI 2014 Scholarship & Research Achievement Award
Bachelor of Software Engineering > Harbin Institute of Technology
Ph.D. in Computer Science > National University of Singapore, Jan 2017
Ph.D. Thesis: Online Gaussian Process Filtering for Persistent Robot Localization With Arbitrary Sensor Modalities
Research Interests: machine learning, robotics
+ oral defense ◊ jan 11 ◊ 2K+17


Prabhu Natarajan
Assistant Professor, DigiPen Institute of Technology Singapore, Jun 2016
Recipient of ICDSC 2012 Best PhD Forum Paper Award, Research Achievement Award, & AAMAS 2012 Scholarship
B.Tech. in Information Technology > Anna University
M.Eng. in Computer Science & Engineering > Anna University
Ph.D. in Computer Science > National University of Singapore, Dec 2013 (Coadvised by Mohan Kankanhalli)
Ph.D. Thesis: A DecisionTheoretic Approach for Controlling & Coordinating Multiple Active Cameras in Surveillance
Research Interests: multicamera surveillance, decisiontheoretic planning & control for sensor networks
+ commencement ceremony ◊ aug 13 ◊ 2K+14


Tan, Wesley
Ph.D. in Computer Science, Nanyang Technological University, Aug 2017
Recipient of President's Graduate Fellowship in Nanyang Technological University
B.Sc. (Honors) in Economics > Purdue University
M.Sc. in Risk Management & Financial Engineering > Imperial College London, Oct 2015
M.Comp. in Computer Science > National University of Singapore, Jun 2017
M.Sc. Thesis: Variational Bayesian Actor Critic
Research Interests: reinforcement learning


Son, Jaemin 손재민
Bachelor of Comp Sci & Eng > Osaka University
M.Sc. in Computer Science > National University of Singapore, Nov 2016 (Coadvised by Gary Tan)
M.Sc. Thesis: HighDimensional Bayesian Optimization with Application to Traffic Simulation
Research Interests: machine learning


Cao, Nannan 曹楠楠
Former Research Assistant
Bachelor of Software Engineering > East China Normal University
M.Sc. in Computer Science > National University of Singapore, Sep 2012
M.Sc. Thesis: InformationTheoretic MultiRobot Path Planning
Research Interests: machine learning, Gaussian process, environmental sensing
+ commencement ceremony ◊ sep 19 ◊ 2K+13


Etkin Barış Özgül
B.Sc. in Computer Science > Bilkent University
M.Sc. in Computer Science > National University of Singapore, Jan 2017
M.Sc. Thesis: Shuttleline Routing for MobilityonDemand Systems with Ridesharing
Research Interests: AI, multiagent systems


Ling, Chun Kai
Ph.D. in Computer Science, CMU, Aug 2017
Recipient of Lee Kuan Yew Gold Medal for best performing graduate in B.Eng. (Computing Engineering) programme, IES Gold Medal for top graduating student in B.Eng. in Computing Engineering, Defence Science Technology Agency Gold Medal for best local final year student for the degree of B.Eng. (Computer Engineering), Micron Prize for being one of the top two local Year 2 Computer Engineering students, & AlcatelLucent Telecommunications Prize for best performance in a module in the area of Communications and Networks in BEng (EE) or BEng (CEG) examinations
B.Eng. in Computer Engineering > National University of Singapore, 2015
FYP Dissertation: Planning and Learning in Spatiotemporal Environmental Phenomena
Research Interests: planning under uncertainty, machine learning


Erik Alexander Daxberger
M.Sc. in Computer Science, ETH Zurich, Sep 2017
Recipient of LMUexchange and PROSA scholarships for a student exchange program at NUS & ICML 2017 Travel Award
B.Sc. in Computer Science > LudwigMaximiliansUniversität, Munich, 2017
B.Sc. Thesis: Distributed Batch Bayesian Optimization
Research Interests: machine learning, AI


Khor, ShiJie
Software Engineer, Facebook Asia Pacific HQ
Recipient of Lee Kuan Yew Gold Medal for best performing graduate in B.Comp. (Computer Science) programme, IEEE Singapore Computer Society Book Prize for the best student in the Honours Year term project, and Tata Consultancy Services Asia Pacific Prize
B.Comp. in Computer Science (Honours Highest Distinction) > National University of Singapore, 2016
FYP Dissertation: Kernel Search for Gaussian Processes
Research Interests: machine learning


Nathan Azaria
Software Engineer, Traveloka
Recipient of National Computer Systems Medal And Prize for the top student in B.Comp. (Computer Science) programme
B.Comp. in Computer Science (Honours Highest Distinction) > National University of Singapore, 2016
FYP Dissertation: Stochastic Variational Inference on MultiOutput Gaussian Process
Research Interests: machine learning


Lim, Keng Kiat
Software Engineer, Facebook HQ
B.Comp. in Computer Science > National University of Singapore, 2016
FYP Dissertation: Learning with HighDimensional Data
Research Interests: machine learning


Akshay Viswanathan
Software Engineer, Visa Inc.
B.Eng. in Computer Engineering (Hons. 1st Class) > National University of Singapore, 2015
FYP Dissertation: Scaling up Machine Learning Techniques via Parallelization for Large Data
Research Interests: machine learning


Shailendra Khemka
Business Solutions: Software Engineer, Deutsche Bank AG  SG Branch
Recipient of Tata Consultancy Services Asia Pacific Medal and Prize for 2nd best graduate throughout the course of study for B.Comp, Defence Science & Technology Agency Prize for top UROP student in B.Comp., & Sung Kah Kay Memorial Prize Winner in NUS University Scholars Programme (USP)
University Scholars Programme & von Neumann Programme for B.Comp in Computer Science > National University of Singapore, 2013
FYP Dissertation: Autonomous Search for Victims in a Disaster Situation
Research Interests: multiagent planning


Yu, Jiangbo 余江波
KAI Square
B.Sc. in Computer Science > Peking University
Research Interests: statistical machine learning

publications
MOST FREQUENTLY USED WORDS ^{}extracted from paper titles and abstracts^{}
ACCEPTED PAPERS
Coauthors : My students and postdocs Former thesis advisors Collaborators
 Artificial Intelligence Research in Singapore: Assisting the Development of a Smart Nation.
Pradeep Varakantham, Bo An, Kian Hsiang Low & Jie Zhang.
In AI Magazine, volume 38, issue 3, pages 102105, Fall 2017.
Abstract. Artificial Intelligence (AI) research in Singapore is focused on accelerating the country’s development into a Smart Nation. Specifically, AI has been employed extensively in either augmenting the intelligence of humans or in developing automated methods and systems to improve quality of life in Singapore.

REFEREED PUBLICATIONS
dblp
Sorted by year 2K + 17 
16 
15 
14 
13 
12 
11 
10 
9 
8 
7 
6 
5 
4 
3 
2
 Distributed Batch Gaussian Process Optimization.
Erik Daxberger^{} & Kian Hsiang Low.
In Proceedings of the 34th International Conference on Machine Learning (ICML17), pages 951960, Sydney, Australia, Aug 611, 2017.
25.9% acceptance rate
Abstract. This paper presents a novel distributed batch Gaussian process upper confidence bound (DBGPUCB) algorithm for performing batch Bayesian optimization (BO) of highly complex, costlytoevaluate blackbox objective functions. In contrast to existing batch BO algorithms, DBGPUCB can jointly optimize a batch of inputs (as opposed to selecting the inputs of a batch one at a time) while still preserving scalability in the batch size. To realize this, we generalize GPUCB to a new batch variant amenable to a Markov approximation, which can then be naturally formulated as a multiagent distributed constraint optimization problem in order to fully exploit the efficiency of its stateoftheart solvers for achieving linear time in the batch size. Our DBGPUCB algorithm offers practitioners the flexibility to trade off between the approximation quality and time efficiency by varying the Markov order. We provide a theoretical guarantee for the convergence rate of DBGPUCB via bounds on its cumulative regret. Empirical evaluation on synthetic benchmark objective functions and a realworld optimization problem shows that DBGPUCB outperforms the stateoftheart batch BO algorithms.
 A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression.
Quang Minh Hoang^{}, Trong Nghia Hoang^{} & Kian Hsiang Low.
In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI17), pages 20072014, San Francisco, CA, Feb 49, 2017.
24.6% acceptance rate (oral presentation)
Abstract. While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of lowrank GP approximations that exploit the sparse spectral representation of a GP kernel. This paper presents such an effort to advance the state of the art of sparse spectrum GP models to achieve competitive predictive performance for massive datasets. Our generalized framework of stochastic variational Bayesian sparse spectrum GP (sVBSSGP) models addresses their shortcomings by adopting a Bayesian treatment of the spectral frequencies to avoid overfitting, modeling these frequencies jointly in its variational distribution to enable their interaction a posteriori, and exploiting local data for boosting the predictive performance. However, such structural improvements result in a variational lower bound that is intractable to be optimized. To resolve this, we exploit a variational parameterization trick to make it amenable to stochastic optimization. Interestingly, the resulting stochastic gradient has a linearly decomposable structure that can be exploited to refine our stochastic optimization method to incur constant time per iteration while preserving its property of being an unbiased estimator of the exact gradient of the variational lower bound. Empirical evaluation on realworld datasets shows that sVBSSGP outperforms stateoftheart stochastic implementations of sparse GP models.
 A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models.
Trong Nghia Hoang^{}, Quang Minh Hoang^{} & Kian Hsiang Low.
In Proceedings of the 33rd International Conference on Machine Learning (ICML16), pages 382391, New York City, NY, Jun 1924, 2016.
24.3% acceptance rate
Abstract. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a highorder Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two realworld datasets.
 NearOptimal Active Learning of MultiOutput Gaussian Processes.
Yehong Zhang^{}, Trong Nghia Hoang^{}, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI16), pages 23512357, Phoenix, AZ, Feb 1217, 2016.
25.75% acceptance rate
Abstract. This paper addresses the problem of active learning of a multioutput Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomialtime approximation algorithm that guarantees a constantfactor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on realworld datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and singleoutput GP models.
 Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond.
Chun Kai Ling^{}, Kian Hsiang Low & Patrick Jaillet^{}.
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI16), pages 18601866, Phoenix, AZ, Feb 1217, 2016.
25.75% acceptance rate
Abstract. This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the explorationexploitation tradeoff. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive ϵoptimal GPP (ϵGPP) policy. To plan in real time, we further propose an asymptotically optimal, branchandbound anytime variant of ϵGPP with performance guarantee. We empirically demonstrate the effectiveness of our ϵGPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.
 DrMAD: Distilling ReverseMode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks.
Jie Fu, Hongyin Luo, Jiashi Feng, Kian Hsiang Low & TatSeng Chua.
In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI16), pages 14691475, New York City, NY, Jul 915, 2016.
<25% acceptance rate
Abstract.
The performance of deep neural networks is wellknown to be sensitive to the setting of their hyperparameters. Recent advances in reversemode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work, we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on two image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to stateoftheart methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks.
 MultiAgent Continuous Transportation with Online Balanced Partitioning.
Chao Wang^{}, Somchaya Liemhetcharat^{} & Kian Hsiang Low.
In Proceedings of the
15th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS16), pages 13031304, Singapore, May 913, 2016.
Abstract. We introduce the concept of continuous transportation task to the context of multiagent systems. A continuous transportation task is one in which a multiagent team visits a number of fixed locations, picks up objects, and delivers them to a transportation hub. The goal is to maximize the rate of transportation while the objects are replenished over time. In this extended abstract, we present a hybrid of centralized and distributed approaches that minimize communications in the multiagent team. We contribute a novel online partitioningtransportation algorithm with information gathering in the multiagent team.
 Conceptbased Hybrid Fusion of Multimodal Event Signals.
Yuhui Wang, Christian von der Weth, Yehong Zhang^{}, Kian Hsiang Low, Vivek Singh & Mohan Kankanhalli^{}.
In Proceedings of the
IEEE International Symposium on Multimedia (ISM'16), pages 1419, San Jose, CA, Dec 1113, 2016.
26.1% acceptance rate
Abstract. Recent years have seen a significant increase in the number of sensors and resulting event related sensor data, allowing for a better monitoring and understanding of realworld events and situations. Eventrelated data come from not only physical sensors (e.g., CCTV cameras, webcams) but also social or microblogging platforms (e.g., Twitter). Given the widespread availability of sensors, we observe that sensors of different modalities often independently observe the same events. We argue that fusing multimodal data about an event can be helpful for more accurate detection, localization and detailed description of events of interest. However, multimodal data often include noisy observations, varying information densities and heterogeneous representations, which makes the fusion a challenging task. In this paper, we propose a hybrid fusion approach that takes the spatial and semantic characteristics of sensor signals about events into account. For this, we first adopt the concept of an imagebased representation that expresses the situation of particular visual concepts (e.g., "crowdedness", "people marching") called Cmage for both physical and social sensor data. Based on this Cmage representation, we model sparse sensor information using a Gaussian process, fuses multimodal event signals with a Bayesian approach, and incorporates spatial relations between the sensor and social observations. We demonstrate the effectiveness of our approach as a proofofconcept over realworld data. Our early results show that the proposed approach can reliably reduce the sensorrelated noise, localize event place, improve event detection reliability, and add semantic context so that the fused data provide a better picture of the observed events or situations.
 Inverse Reinforcement Learning with Locally Consistent Reward Functions.
Quoc Phong Nguyen^{}, Kian Hsiang Low & Patrick Jaillet^{}.
In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, editors, Advances in Neural Information Processing Systems 28: 29th Annual Conference on Neural Information Processing Systems (NIPS15), pages 17471755, Curran Associates, Inc., Montreal, Canada, Dec 712, 2015.
21.9% acceptance rate
Abstract. Existing inverse reinforcement learning (IRL) algorithms have assumed each expert's demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts' behaviors.
Solving our generalized IRL problem thus involves not only learning these reward functions but also the stochastic transitions between them at any state (including unvisited states).
By representing our IRL problem with a probabilistic graphical model, an expectationmaximization (EM) algorithm can be devised to iteratively learn the reward functions and stochastic transitions between them in order to jointly improve the likelihood of the expert's demonstrated trajectories.
As a result, the most likely partition of a trajectory into segments that are generated from different locally consistent reward functions selected by EM can be derived.
Empirical evaluation on synthetic and realworld datasets shows that our IRL algorithm outperforms the stateoftheart EM clustering with maximum likelihood IRL, which is, interestingly, a reduced variant of our approach.
 Gaussian Process Decentralized Data Fusion and Active Sensing for Spatiotemporal Traffic Modeling and Prediction in MobilityonDemand Systems.
Jie Chen^{}, Kian Hsiang Low, Patrick Jaillet^{} & Yujian Yao^{}.
In IEEE Transactions on Automation Science and Engineering
(Special Issue on Networked Cooperative Autonomous Systems), volume 12, issue 3, pages 901921, Jul 2015.
Abstract. Mobilityondemand (MoD) systems have recently emerged as a
promising paradigm of oneway vehicle sharing for sustainable personal
urban mobility in densely populated cities. We assume the capability of
a MoD system to be enhanced by deploying robotic shared vehicles that
can autonomously cruise the streets to be hailed by users. A key
challenge of the MoD system is that of realtime, finegrained mobility
demand and traffic flow sensing and prediction. This paper presents
novel Gaussian process (GP) decentralized data fusion and active
sensing algorithms for realtime, finegrained traffic modeling and
prediction with a fleet of MoD vehicles. The predictive performance of
our decentralized data fusion algorithms are theoretically guaranteed to
be equivalent to that of sophisticated centralized sparse GP
approximations. We derive consensus filtering variants requiring only
local communication between neighboring vehicles. We theoretically
guarantee the performance of our decentralized active sensing
algorithms. When they are used to gather informative data for mobility
demand prediction, they can achieve a dual effect of fleet rebalancing
to service mobility demands. Empirical evaluation on realworld datasets
shows that our algorithms are significantly more timeefficient and
scalable in the size of data and fleet while achieving predictive
performance comparable to that of stateoftheart algorithms.
 A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data.
Trong Nghia Hoang^{}, Quang Minh Hoang^{} & Kian Hsiang Low.
In Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 569578, Lille, France, Jul 611, 2015.
26.0% acceptance rate
Abstract. This paper presents a novel unifying framework of anytime sparse Gaussian process regression (SGPR) models that can produce good predictive performance fast and improve their predictive performance over time. Our proposed unifying framework reverses the variational inference procedure to theoretically construct a nontrivial, concave functional that is maximized at the predictive distribution of any SGPR model of our choice.
As a result, a stochastic natural gradient ascent method can be derived that involves iteratively following the stochastic natural gradient of the functional to improve its estimate of the predictive distribution of the chosen SGPR model
and is guaranteed to achieve asymptotic convergence to it. Interestingly, we show that if the predictive distribution of the chosen SGPR model
satisfies certain decomposability conditions, then the stochastic natural gradient is an unbiased estimator of the exact natural gradient and can be computed in constant time (i.e., independent of data size) at each iteration. We empirically evaluate the tradeoff between the predictive performance vs. time efficiency of the anytime SGPR models on two realworld millionsized datasets.
 Parallel Gaussian Process Regression for Big Data: LowRank Representation Meets Markov Approximation.
Kian Hsiang Low, Jiangbo Yu^{}, Jie Chen^{} & Patrick Jaillet^{}.
In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI15), pages 28212827, Austin, TX, Jan 2529, 2015.
26.67% acceptance rate
Abstract. The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size.
To improve its scalability, this paper presents a lowrankcumMarkov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a lowrank approximate representation of the fullrank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the KullbackLeibler distance criterion subject to some constraint
and is considerably more refined than that of existing sparse GP models utilizing lowrank representations due to its more relaxed conditional independence assumption (especially with larger data).
As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale.
Interestingly, varying the Markov order produces a spectrum of LMAs
with PIC approximation and fullrank GP at the two extremes.
An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability.
Empirical evaluation on three realworld datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more timeefficient and scalable than stateoftheart sparse and fullrank GP regression methods
while achieving comparable predictive performances.
 Recent Advances in Scaling up Gaussian Process Predictive Models for Large Spatiotemporal Data.
Kian Hsiang Low, Jie Chen^{}, Trong Nghia Hoang^{}, Nuo Xu^{} & Patrick Jaillet^{}.
In S. Ravela, A. Sandu, editors,
Dynamic DataDriven Environmental Systems Science  First International Conference, DyDESS'14, LNCS 8964, pages 167181, Springer International Publishing, MIT, Cambridge, MA, Nov 57, 2014.
Oral presentation
Abstract. The expressive power of Gaussian process (GP) models comes at a cost of poor scalability in the size of the data. To improve their scalability, this paper presents an overview of our recent progress in scaling up GP models for large spatiotemporally correlated data through parallelization on clusters of machines, online learning, and nonmyopic active sensing/learning.
 MultiAgent Ad Hoc Team Partitioning by Observing and Modeling SingleAgent Performance.
Etkin Baris Ozgul^{}, Somchaya Liemhetcharat^{} & Kian Hsiang Low.
In Proceedings of the
AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC'14), pages 17, Siem Reap, city of Angkor Wat, Cambodia, Dec 912, 2014.
Abstract. Multiagent research has focused on finding the optimal team for a task. Many approaches assume that the performance of the agents are known a priori. We are interested in ad hoc teams, where the agents' algorithms and performance are initially unknown. We focus on the task of modeling the performance of single agents through observation in training environments, and using the learned models to partition a new environment for a multiagent team. The goal is to minimize the number of agents used, while maintaining a performance threshold of the multiagent team. We contribute a novel model to learn the agent's performance through observations, and a partitioning algorithm that minimizes the team size. We evaluate our algorithms in simulation, and show the efficacy of our learn model and partitioning algorithm.
 Scalable DecisionTheoretic Coordination and Control for Realtime Active MultiCamera Surveillance.
Prabhu Natarajan^{}, Trong Nghia Hoang^{}, Yongkang Wong, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the
8th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC'14) (Invited Paper to Special Session on Smart Cameras for Smart Environments), pages 115120, Venezia, Italy, Nov 47, 2014.
Abstract. This paper presents an overview of our novel decisiontheoretic multiagent approach for controlling and coordinating multiple active cameras in surveillance. In this approach, a surveillance task is modeled as a stochastic optimization problem, where the active cameras are controlled and coordinated to achieve the desired surveillance goal in presence of uncertainties. We enumerate the practical issues in active camera surveillance and discuss how these issues are addressed in our decisiontheoretic approach. We focus on two novel surveillance tasks: maximize the number of targets observed in active cameras with guaranteed image resolution and to improve the fairness in observation of multiple targets. We discuss the overview of our novel decisiontheoretic frameworks: Markov decision process and partially observable Markov decision process frameworks for coordinating active cameras in uncertain and partially occluded environments.
 Active Learning is Planning: Nonmyopic ϵBayesOptimal Active Learning of Gaussian Processes.
Trong Nghia Hoang^{}, Kian Hsiang Low, Patrick Jaillet^{} and Mohan Kankanhalli^{}.
In T. Calders, F. Esposito, E. Hüllermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases  European Conference, ECML/PKDD14 Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, pages 494498, Springer Berlin Heidelberg, Nancy, France, Sep 1519, 2014.
Abstract. A fundamental issue in active learning of Gaussian processes is that of the explorationexploitation tradeoff. This paper presents a novel nonmyopic ϵBayesoptimal active learning (ϵBAL) approach that jointly optimizes the tradeoff. In contrast, existing works have primarily developed greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ϵBAL with performance guarantee and empirically demonstrate using a realworld dataset that, with limited budget, it outperforms the stateoftheart algorithms.
 Generalized Online Sparse Gaussian Processes with Application to Persistent Mobile Robot Localization.
Kian Hsiang Low, Nuo Xu^{}, Jie Chen^{}, Keng Kiat Lim^{} & Etkin Baris Ozgul^{}.
In T. Calders, F. Esposito, E. Hüllermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases  European Conference, ECML/PKDD14 Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, pages 499503, Springer Berlin Heidelberg, Nancy, France, Sep 1519, 2014.
Abstract. This paper presents a novel online sparse Gaussian process (GP) approximation method that is capable of achieving constant time and memory (i.e., independent of the size of the data) per time step. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated offline sparse GP approximation method. We empirically demonstrate the practical feasibility of using our online sparse GP approximation method through a realworld persistent mobile robot localization experiment.
 No One is Left "Unwatched": Fairness in Observation of Crowds of Mobile Targets in Active Camera Surveillance.
Prabhu Natarajan^{}, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the
21st European Conference on Artificial Intelligence (ECAI14), including Prestigious Applications of Intelligent Systems (PAIS14), pages 11551160, Prague, Czech Republic, Aug 1822, 2014.
Abstract. Central to the problem of active multicamera surveillance is the fundamental issue of fairness in the observation of crowds of targets such that no target is "starved" of observation by the cameras for a long time. This paper presents a principled decisiontheoretic multicamera coordination and control (MC^{2}) algorithm called fairMC^{2} that can coordinate and control the active cameras to achieve maxmin fairness in the observation of crowds of targets moving stochastically. Our fairMC^{2} algorithm is novel in demonstrating how (a) the uncertainty in the locations, directions, speeds, and observation times of the targets arising from the stochasticity of their motion can be modeled probabilistically, (b) the notion of fairness in observing targets can be formally realized in the domain of multicamera surveillance for the first time by exploiting the maxmin fairness metric to formalize our surveillance objective, that is, to maximize the expected minimum observation time over all targets while guaranteeing a predefined image resolution of observing them, and (c) a structural assumption in the state transition dynamics of a surveillance environment can be exploited to improve its scalability to linear time in the number of targets to be observed during surveillance. Empirical evaluation through extensive simulations in realistic surveillance environments shows that fairMC^{2} outperforms the stateoftheart and baseline MC^{2} algorithms. We have also demonstrated the feasibility of deploying our fairMC^{2} algorithm on real AXIS 214 PTZ cameras.
 GPLocalize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model.
Nuo Xu^{}, Kian Hsiang Low, Jie Chen^{}, Keng Kiat Lim^{} & Etkin Baris Ozgul^{}.
In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI14), pages 25852592, Quebec City, Canada, Jul 2731, 2014.
16.6% acceptance rate (oral presentation)
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. Central to robot exploration and mapping is the task of persistent localization in environmental fields characterized by spatially correlated measurements. This paper presents a Gaussian process localization (GPLocalize) algorithm that, in contrast to existing works, can exploit the spatially correlated field measurements taken during a robot's exploration (instead of relying on prior training data) for efficiently and scalably learning the GP observation model online through our proposed novel online sparse GP. As a result, GPLocalize is capable of achieving constant time and memory (i.e., independent of the size of the data) per filtering step, which demonstrates the practical feasibility of using GPs for persistent robot localization and autonomy. Empirical evaluation via simulated experiments with realworld datasets and a real robot experiment shows that GPLocalize outperforms existing GP localization algorithms.
 Nonmyopic ϵBayesOptimal Active Learning of Gaussian Processes.
Trong Nghia Hoang^{}, Kian Hsiang Low, Patrick Jaillet^{} and Mohan Kankanhalli^{}.
In Proceedings of the 31st International Conference on Machine Learning (ICML14), pages 739747, Beijing, China, Jun 2126, 2014.
22.4% acceptance rate (cycle 2)
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. A fundamental issue in active learning of Gaussian processes is that of the explorationexploitation tradeoff.
This paper presents a novel nonmyopic ϵBayesoptimal active learning (ϵBAL) approach that jointly and naturally optimizes the tradeoff.
In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and exploitation separately.
To perform active learning in real time, we then propose an anytime algorithm based on ϵBAL with performance guarantee and empirically demonstrate using synthetic and realworld datasets that, with limited budget, it outperforms the stateoftheart algorithms.
 MultiRobot Active Sensing of NonStationary Gaussian ProcessBased Environmental Phenomena.
Ruofei Ouyang^{}, Kian Hsiang Low, Jie Chen^{} & Patrick Jaillet^{}.
In Proceedings of the
13th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS14), pages 573580, Paris, France, May 59, 2014.
23.8% acceptance rate
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. A key challenge of environmental sensing and monitoring is that of sensing, modeling, and predicting largescale, spatially correlated environmental phenomena, especially when they are unknown and nonstationary.
This paper presents a decentralized multirobot active sensing (DECMAS) algorithm that can efficiently coordinate the exploration of multiple robots to gather the most informative observations for predicting an unknown, nonstationary phenomenon.
By modeling the phenomenon using a Dirichlet process mixture of Gaussian processes (DPMGPs), our work here is novel in demonstrating how DPMGPs and its structural properties can be exploited to (a) formalize an active sensing criterion that trades off between gathering the most informative observations for estimating the unknown, nonstationary spatial correlation structure vs. that for predicting the phenomenon given the current, imprecise estimate of the correlation structure, and (b) support efficient decentralized coordination.
We also provide a theoretical performance guarantee for DECMAS and analyze its time complexity.
We empirically demonstrate using two realworld datasets that DECMAS outperforms stateoftheart MAS algorithms.
 DecisionTheoretic Approach to Maximizing Fairness in MultiTarget Observation in MultiCamera Surveillance.
Prabhu Natarajan^{}, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the
13th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS14), pages 15211522, Paris, France, May 59, 2014.
Abstract. Central to the problem of active multicamera surveillance is the fundamental issue of fairness in the observation of multiple targets such that no target is left unobserved by the cameras for a long time. To address this important issue, we propose a novel principled decisiontheoretic approach to control and coordinate multiple active cameras to achieve fairness in the observation of multiple moving targets.
 Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with SelfInterested Agents.
Trong Nghia Hoang^{} & Kian Hsiang Low.
In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI13), pages 22982305, Beijing, China, Aug 39, 2013.
13.2% acceptance rate (oral presentation)
Abstract. A key challenge in noncooperative multiagent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, selfinterested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intentionaware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the stateoftheart algorithms.
 A General Framework for Interacting BayesOptimally with SelfInterested Agents using Arbitrary Parametric Model and Model Prior.
Trong Nghia Hoang^{} & Kian Hsiang Low.
In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI13), pages 13941400, Beijing, China, Aug 39, 2013.
28.0% acceptance rate
Abstract. Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayesoptimality is theoretically achievable by modeling the environment's latent dynamics using FlatDirichletMultinomial (FDM) prior. In selfinterested multiagent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multiagent reinforcement learning algorithms.
 Parallel Gaussian Process Regression with LowRank Covariance Matrix Approximations.
Jie Chen^{}, Nannan Cao^{}, Kian Hsiang Low, Ruofei Ouyang^{}, Colin KengYan Tan & Patrick Jaillet^{}.
In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI13), pages 152161, Bellevue, WA, Jul 1115, 2013.
31.3% acceptance rate
Abstract. Gaussian processes (GP) are Bayesian nonparametric models that are widely used for probabilistic regression. Unfortunately, it cannot scale well with large data nor perform realtime predictions due to its cubic time cost in the data size. This paper presents two parallel GP regression methods that exploit lowrank covariance matrix approximations for distributing the computational load among parallel machines to achieve time efficiency and scalability. We theoretically guarantee the predictive performances of our proposed parallel GPs to be equivalent to that of some centralized approximate GP regression methods: The computation of their centralized counterparts can be distributed among parallel machines, hence achieving greater time efficiency and scalability. We analytically compare the properties of our parallel GPs such as time, space, and communication complexity. Empirical evaluation on two realworld datasets in a cluster of 20 computing nodes shows that our parallel GPs are significantly more timeefficient and scalable than their centralized counterparts and exact/full GP while achieving predictive performances comparable to full GP.
 Gaussian ProcessBased Decentralized Data Fusion and Active Sensing for MobilityonDemand System.
Jie Chen^{}, Kian Hsiang Low & Colin KengYan Tan.
In Proceedings of the
Robotics: Science and Systems Conference (RSS13), Berlin, Germany, Jun 2428, 2013.
30.1% acceptance rate
 MultiRobot Informative Path Planning for Active Sensing of Environmental Phenomena: A Tale of Two Algorithms.
Nannan Cao^{}, Kian Hsiang Low & John M. Dolan^{}.
In Proceedings of the
12th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS13), pages 714, Saint Paul, MN, May 610, 2013.
22.9% acceptance rate
Abstract. A key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient informationtheoretic path planning based on entropy and mutual information criteria for in situ active sensing of an important broad class of widelyoccurring environmental phenomena called anisotropic fields. Our proposed algorithms are novel in addressing a tradeoff between active sensing performance and time efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of Gaussian processbased anisotropic fields to improve time efficiency while preserving nearoptimal active sensing performance. We analyze the time complexity of our algorithms and prove analytically that they scale better than stateoftheart algorithms with increasing planning horizon length. We provide theoretical guarantees on the active sensing performance of our algorithms for a class of exploration tasks called transect sampling, which, in particular, can be improved with longer planning time and/or lower spatial correlation along the transect. Empirical evaluation on realworld anisotropic field data shows that our algorithms can perform better or at least as well as the stateoftheart algorithms while often incurring a few orders of magnitude less computational time, even when the field conditions are less favorable.
 Adaptive Sampling of Time Series with Application to Remote Exploration.
David R. Thompson^{}, Nathalie Cabrol, Michael Furlong, Craig Hardgrove, Kian Hsiang Low, Jeffrey Moersch & David Wettergreen.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'13), pages 34633468, Karlsruhe, Germany, May 610, 2013.
Abstract. We address the problem of adaptive informationoptimal data collection in time series. Here a remote sensor or explorer agent throttles its sampling rate in order to track anomalous events while obeying constraints on time and power. This problem is challenging because the agent has limited visibility  all collected datapoints lie in the past, but its resource allocation decisions require predicting far into the future. Our solution is to continually fit a Gaussian process model to the latest data and optimize the sampling plan on line to maximize information gain. We compare the performance characteristics of stationary and nonstationary Gaussian process models. We also describe an application based on geologic analysis during planetary rover exploration. Here adaptive sampling can improve coverage of localized anomalies and potentially benefit mission science yield of long autonomous traverses.
 Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena.
Jie Chen^{}, Kian Hsiang Low, Colin KengYan Tan, Ali Oran^{}, Patrick Jaillet^{}, John M. Dolan^{} & Gaurav S. Sukhatme^{}.
In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI12), pages 163173, Catalina Island, CA, Aug 1517, 2012.
31.6% acceptance rate
Also appeared in AAMAS12 Workshop on Agents in Traffic and Transportation (ATT12), Valencia, Spain, June 48, 2012.
Abstract. The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Googlelike MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on realworld urban road network data shows that our D2FAS algorithm is significantly more timeefficient and scalable than stateoftheart centralized algorithms while achieving comparable predictive performance.
 Hierarchical Bayesian Nonparametric Approach to Modeling and Learning the Wisdom of Crowds of Urban Traffic Route Planning Agents.
Jiangbo Yu^{}, Kian Hsiang Low, Ali Oran^{} & Patrick Jaillet^{}.
In Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT'12)
(Invited Paper to Special Session on LargeScale ApplicationFocused MultiAgent Systems), pages 478485, Macau, Dec 47, 2012.
Abstract. Route prediction is important to analyzing and understanding the route patterns and behavior of traffic crowds. Its objective is to predict the most likely or "popular" route of road segments from a given point in a road network. This paper presents a hierarchical Bayesian nonparametric approach to efficient and scalable route prediction that can harness the wisdom of crowds of route planning agents by aggregating their sequential routes of possibly varying lengths and origindestination pairs. In particular, our approach has the advantages of (a) not requiring a Markov assumption to be imposed and (b) generalizing well with sparse data, thus resulting in significantly improved prediction accuracy, as demonstrated empirically using realworld taxi route data. We also show two practical applications of our route prediction algorithm: predictive taxi ranking and route recommendation.
 DecisionTheoretic Coordination and Control for Active MultiCamera Surveillance in Uncertain, Partially Observable Environments.
Prabhu Natarajan^{}, Trong Nghia Hoang^{}, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the
6th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC'12), pages 16, Hong Kong, Oct 30  Nov 2, 2012.
Abstract. A central problem of surveillance is to monitor multiple targets moving in a largescale, obstacleridden environment with occlusions. This paper presents a novel principled Partially Observable Markov Decision Processbased approach to coordinating and controlling a network of active cameras for tracking and observing multiple mobile targets at high resolution in such surveillance environments. Our proposed approach is capable of (a) maintaining a belief over the targets' states (i.e., locations, directions, and velocities) to track them, even when they may not be observed directly by the cameras at all times, (b) coordinating the cameras' actions to simultaneously improve the belief over the targets' states and maximize the expected number of targets observed with a guaranteed resolution, and (c) exploiting the inherent structure of our surveillance problem to improve its scalability (i.e., linear time) in the number of targets to be observed. Quantitative comparisons with stateoftheart multicamera coordination and control techniques show that our approach can achieve higher surveillance quality in real time. The practical feasibility of our approach is also demonstrated using real AXIS 214 PTZ cameras.
 Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing.
Kian Hsiang Low, Jie Chen^{}, John M. Dolan^{}, Steve Chien^{} & David R. Thompson^{}.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 105112, Valencia, Spain, June 48, 2012.
20.4% acceptance rate
Also appeared in
IROS'11 Workshop on Robotics for Environmental Monitoring (WREM11), San Francisco, CA, Sep 30, 2011.
Abstract. A central problem in environmental sensing and monitoring is to classify/label the hotspots in a largescale environmental field. This paper presents a novel decentralized active robotic exploration (DARE) strategy for probabilistic classification/labeling of hotspots in a Gaussian process (GP)based field. In contrast to existing stateoftheart exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, realtime active sampling. Its exploration behavior exhibits an interesting formal tradeoff between that of boundary tracking until the hotspot region boundary can be accurately predicted and widearea coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal costminimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GPbased field is greatest at or close to the hotspot region boundaries. Empirical evaluation on realworld plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than stateoftheart active exploration strategies.
 DecisionTheoretic Approach to Maximizing Observation of Multiple Targets in MultiCamera Surveillance.
Prabhu Natarajan^{}, Trong Nghia Hoang^{}, Kian Hsiang Low & Mohan Kankanhalli^{}.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 155162, Valencia, Spain, June 48, 2012.
20.4% acceptance rate
Abstract. This paper presents a novel decisiontheoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the tradeoff between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multicamera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decisiontheoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve highquality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decisiontheoretic approach in controlling and coordinating the active cameras in real surveillance system.
 IntentionAware Planning under Uncertainty for Interacting with SelfInterested, Boundedly Rational Agents.
Trong Nghia Hoang^{} & Kian Hsiang Low.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 12331234, Valencia, Spain, June 48, 2012.
Abstract. A key challenge in noncooperative multiagent systems is that of developing efficient planning algorithms for intelligent agents to perform effectively among boundedly rational, selfinterested (i.e., noncooperative) agents (e.g., humans). To address this challenge, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intentionaware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions.
 Active Markov InformationTheoretic Path Planning for Robotic Environmental Sensing.
Kian Hsiang Low, John M. Dolan^{} & Pradeep Khosla^{}.
In Proceedings of the
10th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS11), pages 753760, Taipei, Taiwan, May 26, 2011.
22.1% acceptance rate
Abstract. Recent research in multirobot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing informationtheoretic exploration strategies for learning GPbased environmental field maps adopt the nonMarkovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, realtime active sampling. To ease this computational burden, this paper presents a Markovbased approach to efficient informationtheoretic path planning for active sampling of GPbased fields. We analyze the time complexity of solving the Markovbased path planning problem, and demonstrate analytically that it scales better than that of deriving the nonMarkovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markovbased policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on realworld temperature and plankton density field data shows that our Markovbased policy can generally achieve active sampling performance comparable to that of the widelyused nonMarkovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.
 Autonomous Personal Vehicle for the First and LastMile Transportation Services.
Zhuang Jie Chong, Baoxing Qin, Tirthankar Bandyopadhyay, Tichakorn Wongpiromsarn, Edward Samuel Rankin, Marcelo H. Ang, Jr.^{}, Emilio Frazzoli^{}, Daniela Rus^{}, David Hsu^{} & Kian Hsiang Low.
In Proceedings of the
5th IEEE International Conference on Cybernetics and
Intelligent Systems and 5th IEEE International
Conference on Robotics, Automation and Mechatronics (CISRAM'11), pages 253260, Qingdao, China, Sep 1719, 2011.
Also appeared in IROS'11 Workshop on Perception and Navigation for Autonomous Vehicles in Human Environment, San Francisco, CA, Sep 30, 2011.
Abstract. This paper describes an autonomous vehicle testbed that aims at providing the first and last mile transportation services. The vehicle mainly operates in a crowded urban environment whose features can be extracted a priori. To ensure that the system is economically feasible, we take a minimalistic approach and exploit prior knowledge of the environment and the availability of the existing infrastructure such as cellular networks and traffic cameras. We present three main components of the system: pedestrian detection, localization (even in the presence of tall buildings) and navigation. The performance of each component is evaluated. Finally, we describe the role of the existing infrastructural sensors and show the improved performance of the system when they are utilized.
 Telesupervised Remote Surface
Water Quality Sensing.
Gregg Podnar, John M. Dolan^{}, Kian Hsiang Low & Alberto Elfes.
In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, Mar 613, 2010.
Abstract. We present a fleet of autonomous Robot Sensor Boats (RSBs) developed for lake and river fresh water quality assessment and controlled by our Multilevel Autonomy Robot Telesupervision Architecture (MARTA). The RSBs are low cost, highly maneuverable, shallow draft sensor boats, developed as part of the Sensor Web program supported under the Advanced Information Systems Technology program of NASA's Earth Systems Technology Office. They can scan large areas of lakes, and navigate up tributaries to measure water quality near outfalls that larger research vessels cannot reach. The MARTA telesupervision architecture has been applied to a number of domains from multiplatform autonomous wide area planetary mineral prospecting, to multiplatform ocean monitoring. The RSBs are a complementary expansion of a fleet of NOAA/NASAdeveloped extendeddeployment surface autonomous vehicles that enable insitu study of meteorological factors of the ocean/atmosphere interface, and which have been adapted to investigate harmful algal blooms under this program. The flexibility of the MARTA telesupervision architecture was proven as it supported simultaneous operation of these heterogenous autonomous sensor platforms while geographically widely separated. Results and analysis are presented of multiple tests carried out over three months using a multisensor water sonde to assess water quality in a small recreational lake. Inference Grids were used to produce maps representing temperature, pH, and dissolved oxygen. The tests were performed under various water conditions (clear vs. hair algaeladen) and both before and after heavy rains. Data from each RSB was relayed to a data server in our lab in Pittsburgh, Pennsylvania, and made available over the World Wide Web where it was acquired by team members at the Jet Propulsion Laboratory of NASA in Pasadena, California who monitored the boats and their sensor readings in real time, as well as using these data to model the water quality by producing Inference Gridbased maps.
 InformationTheoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing.
Kian Hsiang Low, John M. Dolan^{} & Pradeep Khosla^{}.
In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS09), pages 233240, Thessaloniki, Greece, Sep 1923, 2009.
33.9% acceptance rate
Also appeared in IPSN09 Workshop on Sensor Networks for Earth and Space Science Applications (ESSA09), San Francisco, CA, Apr 16, 2009.
Also orally presented in RSS09 Workshop on Aquatic Robots and Ocean Sampling, Seattle, WA, Jun 29, 2009.
Abstract. Recent research in robot exploration and mapping has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decisiontheoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in largescale, highresolution exploration and mapping. To alleviate this computational difficulty, this paper presents an informationtheoretic approach to MASP (iMASP) for efficient adaptive path planning; by reformulating the costminimizing iMASP as a rewardmaximizing problem, its time complexity becomes independent of map resolution and is less sensitive to increasing robot team size as demonstrated both theoretically and empirically. Using the rewardmaximizing dual, we derive a novel adaptive variant of maximum entropy sampling, thus improving the induced exploration policy performance. It also allows us to establish theoretical bounds quantifying the performance advantage of optimal adaptive over nonadaptive policies and the performance quality of approximately optimal vs. optimal adaptive policies. We show analytically and empirically the superior performance of iMASPbased policies for sampling the logGaussian process to that of policies for the widelyused Gaussian process in mapping the hotspot field. Lastly, we provide sufficient conditions that, when met, guarantee adaptivity has no benefit under an assumed environment model.
 Cooperative Aquatic Sensing using the Telesupervised Adaptive Ocean Sensor Fleet.
John M. Dolan^{}, Gregg W. Podnar, Stephen Stancliff, Kian Hsiang Low, Alberto Elfes, John Higinbotham, Jeffrey C. Hosler, Tiffany A. Moisan & John Moisan.
In Proceedings of the SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473, Berlin, Germany, Aug 31  Sep 3, 2009.
Abstract. Earth science research must bridge the gap between the atmosphere and the ocean to foster understanding of Earth's climate and ecology. Typical ocean sensing is done with satellites or in situ buoys and research ships which are slow to reposition. Cloud cover inhibits study of localized transient phenomena such as Harmful Algal Blooms (HAB). A fleet of extendeddeployment surface autonomous vehicles will enable in situ study of characteristics of HAB, coastal pollutants, and related phenomena. We have developed a multiplatform telesupervision architecture that supports adaptive reconfiguration based on environmental sensor inputs. Our system allows the autonomous repositioning of smart sensors for HAB study by networking a fleet of NOAA OASIS (Ocean Atmosphere Sensor Integration System) surface autonomous vehicles. In situ measurements intelligently modify the search for areas of high concentration. Inference Grid and complementary informationtheoretic techniques support sensor fusion and analysis. Telesupervision supports sliding autonomy from highlevel mission tasking, through vehicle and data monitoring, to teleoperation when direct human interaction is appropriate. This paper reports on experimental results from multiplatform tests conducted in the Chesapeake Bay and in Pittsburgh, Pennsylvania waters using OASIS platforms, autonomous kayaks, and multiple simulated platforms to conduct cooperative sensing of chlorophylla and water quality.
 Robot Boats as a Mobile Aquatic Sensor Network.
Kian Hsiang Low, Gregg Podnar, Stephen Stancliff, John M. Dolan^{} & Alberto Elfes.
In Proceedings of the IPSN09 Workshop on Sensor Networks for Earth and Space Science Applications (ESSA09), San Francisco, CA, Apr 16, 2009.
Abstract. This paper describes the Multilevel Autonomy Robot Telesupervision Architecture (MARTA), an architecture for supervisory control of a heterogeneous fleet of networked unmanned autonomous aquatic surface vessels carrying a payload of environmental science sensors. This architecture allows a landbased human scientist to effectively supervise data gathering by multiple robotic assets that implement a web of widely dispersed mobile sensors for in situ study of physical, chemical or biological processes in water or in the water/atmosphere interface.
 Adaptive MultiRobot WideArea Exploration And Mapping.
Kian Hsiang Low, John M. Dolan^{} & Pradeep Khosla^{}.
In Proceedings of the
7th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS08), pages 2330, Estoril, Portugal, May 1216, 2008.
22.2% acceptance rate
Also presented as a poster in RSS09 Workshop on Aquatic Robots and Ocean Sampling, Seattle, WA, Jun 29, 2009.
Abstract. The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sampling using nonmyopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multirobot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and logGaussian processes, and analyze if the resulting strategies are adaptive and maximize widearea coverage and hotspot sampling. Solving MASP is nontrivial as it comprises continuous state components. So, it is reformulated for convex analysis, which allows discretestate monotonebounding approximation to be developed. We provide a theoretical guarantee on the policy quality of the approximate MASP (aMASP) for using in MASP. Although aMASP can be solved exactly, its state size grows exponentially with the number of stages. To alleviate this computational difficulty, anytime algorithms are proposed based on aMASP, one of which can guarantee its policy quality for MASP in real time.
 Adaptive Sampling for MultiRobot WideArea Exploration.
Kian Hsiang Low, Geoffrey J. Gordon, John M. Dolan^{} & Pradeep Khosla^{}.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'07), pages 755760, Rome, Italy, Apr 1014, 2007.
Abstract. The exploration problem is a central issue in mobile robotics. A complete coverage is not practical if the environment is large with a few small hotspots, and the sampling cost is high. So, it is desirable to build robot teams that can coordinate to maximize sampling at these hotspots while minimizing resource costs, and consequently learn more accurately about properties of such environmental phenomena. An important issue in designing such teams is the exploration strategy. The contribution of this paper is in the evaluation of an adaptive exploration strategy called adaptive cluster sampling (ACS), which is demonstrated to reduce the resource costs (i.e., mission time and energy consumption) of a robot team, and yield more information about the environment by directing robot exploration towards hotspots. Due to the adaptive nature of the strategy, it is not obvious how the sampled data can be used to provide unbiased, lowvariance estimates of the properties. This paper therefore discusses how estimators that are RaoBlackwellized can be used to achieve low error. This paper also presents the first analysis of the characteristics of the environmental phenomena that favor the ACS strategy and estimators. Quantitative experimental results in a mineral prospecting task simulation show that our approach is more efficient in exploration by yielding more minerals and information with fewer resources and providing more precise mineral density estimates than previous methods.
 Autonomic Mobile Sensor Network with SelfCoordinated Task Allocation and Execution.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In IEEE Transactions on Systems, Man, and Cybernetics  Part C: Applications and Reviews
(Special Issue on Engineering Autonomic Systems), volume 36, issue 3, pages 315327, May 2006.
Andrew P. Sage Best Transactions Paper Award for the best paper published in IEEE Trans. SMC  Part A, B, and C in 2006
Abstract. This paper describes a distributed layered architecture for resourceconstrained multirobot cooperation, which is utilized in autonomic mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme selforganizes the robot coalitions to track efficiently across regions. It uses concepts of ant behavior to selfregulate the regional distributions of robots in proportion to that of the moving targets to be tracked in a nonstationary environment. As a result, the adverse effects of task interference between robots are minimized and network coverage is improved. In the lower task execution layer, the robots use selforganizing neural networks to coordinate their target tracking within a region. Both layers employ selforganization techniques, which exhibit autonomic properties such as selfconfiguring, selfoptimizing, selfhealing, and selfprotecting. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auctionbased negotiation show that our layered approach can provide better coverage, greater robustness to sensor failures, and greater flexibility to respond to environmental changes.
 An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Neural Computation, volume 17, issue 6, pages 14111445, Jun 2005.
Abstract. Selforganizing feature maps such as extended Kohonen maps (EKMs) have been very successful at learning sensorimotor control for mobile robot tasks. This letter presents a new ensemble approach, cooperative EKMs with indirect mapping, to achieve complex robot motion. An indirectmapping EKM selforganizes to map from the sensory input space to the motor control space indirectly via a control parameter space. Quantitative evaluation reveals that indirect mapping can provide finer, smoother, and more efficient motion control than does direct mapping by operating in a continuous, rather than discrete, motor control space. It is also shown to outperform basis function neural networks. Furthermore, training its control parameters with recursive least squares enables faster convergence and better performance compared to gradient descent. The cooperation and competition of multiple selforganized EKMs allow a nonholonomic mobile robot to negotiate unforeseen, concave, closely spaced, and dynamic obstacles. Qualitative and quantitative comparisons with neural network ensembles employing weighted sum reveal that our method can achieve more sophisticated motion tasks even though the weightedsum ensemble approach also operates in continuous motor control space.
 Task Allocation via SelfOrganizing Swarm Coalitions in Distributed Mobile Sensor Network.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI04), pages 2833, San Jose, CA, Jul 2529, 2004.
26.7% acceptance rate
Abstract. This paper presents a task allocation scheme via selforganizing swarm coalitions for distributed mobile sensor network coverage. Our approach uses the concepts of ant behavior to selfregulate the regional distributions of sensors in proportion to that of the moving targets to be tracked in a nonstationary environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. Quantitative comparisons with other tracking strategies such as static sensor placement, potential fields, and auctionbased negotiation show that our approach can provide better coverage and greater flexibility to respond to environmental changes.
 Reactive, Distributed Layered Architecture for ResourceBounded MultiRobot Cooperation: Application to Mobile Sensor Network Coverage.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'04), pages 37473752, New Orleans, LA, Apr 26  May 1, 2004.
Abstract. This paper describes a reactive, distributed layered architecture for cooperation of multiple resourcebounded robots, which is utilized in mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme selforganizes the robot coalitions to track efficiently in separate regions. It uses the concepts of ant behavior to selfregulate the regional distributions of robots in proportion to that of the targets to be tracked in the changing environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. In the lower layer, the robots use selforganizing neural networks to coordinate their target tracking within a region. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auctionbased negotiation show that our approach can provide better coverage and greater flexibility in responding to environmental changes.
 ContinuousSpaced Action Selection for Single and MultiRobot Tasks Using Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC'04)
(Invited Paper to Special Session on Visual Surveillance), pages 198203, Taipei, Taiwan, Mar 2123, 2004.
Abstract. Action selection is a central issue in the design of behaviorbased control architectures for autonomous mobile robots. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps so that more complex motion tasks can be achieved. Qualitative and quantitative comparisons for both single and multirobot motion tasks show that our framework can provide better action selection than do action superposition methods.
 Action Selection for Single and MultiRobot Tasks Using Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI03), pages 15051506, Acapulco, Mexico, Aug 915, 2003.
27.6% acceptance rate
Abstract. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative and quantitative comparisons for single and multirobot tasks show our framework can provide better action selection than do potential fields method.
 Action Selection in Continuous State and Action Spaces by Cooperation and Competition of Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the
2nd International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS03), pages 10561057, Melbourne, Australia, Jul 1418, 2003.
Abstract. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative tests demonstrate the capability of our action selection method for both single and multirobot motion tasks.
 Enhancing the Reactive Capabilities of Integrated Planning and Control with Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA'03), pages 34283433, Taipei, Taiwan, May 1217, 2003.
Abstract. Despite the many significant advances made in robot motion research, few works have focused on the tight integration of highlevel deliberative planning with reactive control at the lowest level. In particular, the realtime performance of existing integrated planning and control architectures is still not optimal because the reactive control capabilities have not been fully realized. This paper aims to enhance the lowlevel reactive capabilities of integrated planning and control with Cooperative Extended Kohonen Maps for handling complex, unpredictable environments so that the workload of the highlevel planner can be consequently eased. The enhancements include fine, smooth motion control, execution of more complex motion tasks such as overcoming unforeseen concave obstacles and traversing between closely spaced obstacles, and asynchronous execution of behaviors.
 A Hybrid Mobile Robot Architecture with Integrated Planning and Control.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the
1st International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS02), pages 219226, Bologna, Italy, Jul 1519, 2002.
26% acceptance rate
Abstract. Research in the planning and control of mobile robots has received much attention in the past two decades. Two basic approaches have emerged from these research efforts: deliberative vs. reactive. These two approaches can be distinguished by their different usage of sensed data and global knowledge, speed of response, reasoning capability, and complexity of computation. Their strengths are complementary and their weaknesses can be mitigated by combining the two approaches in a hybrid architecture. This paper describes a method for goaldirected, collisionfree navigation in unpredictable environments that employs a behaviorbased hybrid architecture with asynchronously operating behavioral modules. It differs from existing hybrid architectures in two important ways: (1) the planning module produces a sequence of checkpoints instead of a conventional complete path, and (2) in addition to obstacle avoidance, the reactive module also performs target reaching under the control of a selforganizing neural network. The neural network is trained to perform fine, smooth motor control that moves the robot through the checkpoints. These two aspects facilitate a tight integration between highlevel planning and lowlevel control, which permits realtime performance and easy path modification even when the robot is en route to the goal position.
 Integrated Planning and Control of Mobile Robot with SelfOrganizing Neural Network.
Kian Hsiang Low, Wee Kheng Leow^{} & Marcelo H. Ang, Jr.^{}
In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA'02), pages 38703875, Washington, DC, May 1115, 2002.
Abstract. Despite the many significant advances made in robotics research, few works have focused on the tight integration of task planning and motion control. Most integration works involve the task planner providing discrete commands to the lowlevel controller, which performs kinematics and control computations to command the motor and joint actuators. This paper presents a framework of the integrated planning and control for mobile robot navigation. Unlike existing integrated approaches, it produces a sequence of checkpoints instead of a complete path at the planning level. At the motion control level, a neural network is trained to perform motor control that moves the robot from one checkpoint to the next. This method allows for a tight integration between highlevel planning and lowlevel control, which permits realtime performance and easy modification of motion path while the robot is enroute to the goal position.

TECHNICAL REPORTS
 New Advances on Bayesian and DecisionTheoretic Approaches for Interactive Machine Learning.
Trong Nghia Hoang^{}.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Feb 2015.
Abstract.
The explorationexploitation tradeoff is a fundamental dilemma in many interactive learning scenarios which include both aspects of reinforcement learning (RL) and active learning (AL): An autonomous agent, situated in an unknown environment, has to actively extract knowledge from the environment by taking actions (or conducting experiments) based on its previously collected information to make accurate predictions or to optimize some utility functions. Thus, to make the most effective use of their resourceconstrained budget (e.g., processing time, experimentation cost), the agent must choose carefully between (a) exploiting options (e.g., actions, experiments) which are recommended by its current, possibly incomplete model of the environment, and (b) exploring the other ostensibly suboptimal choices to gather more information.
For example, an RL agent has to face a dilemma between (a) exploiting the mostrewarding action according to the current statistical model of the environment at the risk of running into catastrophic situations if the model is not accurate, and (b) exploring a suboptimal action to gather more information so as to improve the models accuracy at the potential price of losing the shortterm reward. Similarly, an AL algorithm/agent has to consider between (a) conducting the most informative experiments according to its current estimation of the environment models parameters (i.e., exploitation), and (b) running experiments that help improving the estimation accuracy of these parameters (i.e., exploration).
More often, learning strategies that ignore exploration will likely exhibit suboptimal performance due to their imperfect knowledge while, conversely, those that entirely focus on exploration might suffer the cost of learning without benefitting from it. Therefore, a good explorationexploitation tradeoff is critical to the success of those interactive learning agents: In order to perform well, they must strike the right balance between these two conflicting objectives. Unfortunately, while this tradeoff has been wellrecognized since the early days of RL, the studies of explorationexploitation have mostly been developed for theoretical settings in the respective field of RL and, perhaps surprisingly, glossed over in the existing AL literature. From a practical point of view, we see three limiting factors:
 Previous works addressing the explorationexploitation tradeoff in RL have largely focused on simple choices of the environment model and consequently, are not practical enough to accommodate realworld applications that have far more complicated environment structures. In fact, we find that most recent advances in Bayesian reinforcement learning (BRL) have only been able to analytically trade off between exploration and exploitation under a simple choice of models such as FlatDirichletMultinomial (FDM) whose independence and modeling assumptions do not hold for many realworld applications.
 Nearly all of the notable works in the AL literature primarily advocate the use of greedy/myopic algorithms whose rates of convergence (i.e., the number of experiments required by the learning algorithm to achieve a desired performance in the worst case) are provably minimax optimal for simple classes of learning tasks (e.g., threshold learning). While these results have greatly ad vanced our understanding about the limit of myopic AL in worstcase scenarios, significantly less is presently known about whether it is possible to devise nonmyopic AL strategies which optimize the explorationexploitation tradeoff to achieve the best expected performance in budgeted learning scenarios.
 The issue of scalability of the existing predictive models (e.g., Gaussian processes) used in AL has generally been underrated since the majority of literature considers smallscale environments which only consist of a few thousand candidate experiments to be selected by singlemode AL algorithms one at a time prior to retraining the model. In contrast, largescale environments usually have a massive set of million candidate experiments among which tens or hundreds of thousands should be actively selected for learning. For such dataintensive problems, it is often more costeffective to consider batchmode AL algorithms which select and conduct multiple experiments in parallel at each stage to collect observations in batch. Retraining the predictive model after incorporating each batch of observations then becomes a computational bottleneck as the collected dataset at each stage quickly grows up to tens or even hundreds of thousand data points.
This thesis outlines some recent progresses that we have been able to make while working toward satisfactory answers to the above challenges, along with practical algorithms that achieve them:
 In particular, in order to put BRL into practice for more complicated and practical problems, we propose a novel framework called Interactive Bayesian Reinforcement Learning (IBRL) to integrate the general class of parametric models and model priors, thus allowing the practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the environment as often required in many realworld applications. Interestingly, we show how the nonmyopic Bayesoptimal policy can be derived analytically by solving IBRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Our empirical studies show that the proposed approach performs competitively with the existing stateoftheart algorithms.
 Then, to establish a theoretical foundation for the explorationexploitation tradeoff in singlemode active learning scenarios with resourceconstrained budgets, we present a novel ϵBayesoptimal DecisionTheoretic Active Learning (ϵBAL) framework which advocates the use of differential entropy as a performance measure and consequently, derives a learning policy that can approximate the optimal expected performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). To meet the realtime requirement in timecritical applications, we then propose an asymptotically ϵoptimal, branchandbound anytime algorithm based on ϵBAL with performance guarantees. In practice, we empirically demonstrate with both synthetic and realworld datasets that the proposed approach outperforms the stateoftheart algorithms in budgeted scenarios.
 Lastly, to facilitate the future developments of largescale, nonmyopic AL applications, we further introduce a highly scalable family of anytime predictive models for AL which provably converge toward a wellknown class of sparse Gaussian processes (SGPs). Unlike the existing predictive models of AL which cannot be updated incrementally and are only capable of processing middlesized datasets (i.e., a few thousands of data points), our proposed models can process massive datasets in an anytime fashion, thus providing a principled tradeoff between the processing time and the predictive accuracy. The efficiency of our framework is then demonstrated empirically on a variety of largescale realworld datasets which contains hundreds of thousand data points.
 Gaussian ProcessBased Decentralized Data Fusion and Active Sensing Agents: Towards LargeScale Modeling and Prediction of Spatiotemporal Traffic Phenomena.
Jie Chen^{}.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Dec 2013.
Abstract.
Knowing and understanding the environmental phenomena is important to many real world applications. This thesis is devoted to study largescale modeling and prediction of spatiotemporal environmental phenomena (i.e., urban traffic phenomena). Towards this goal, our proposed approaches rely on a class of Bayesian nonparametric models: Gaussian processes (GP).
To accurately model spatiotemporal urban traffic phenomena in real world situation, a novel relational GP taking into account both the road segment features and road network topology information is proposed to model real world traffic conditions over road network. Additionally, a GP variant called logGaussian process (lGP) is exploited to model an urban mobility demand pattern which contains skewness and extremity in demand measurements.
To achieve efficient and scalable urban traffic phenomenon prediction given a large phenomenon data, we propose three novel parallel GPs: parallel partially independent training conditional (pPITC), parallel partially independent conditional(pPIC) and parallel incomplete Cholesky factorization (pICF)based approximations of GP model, which can distribute their computational load into a cluster of parallel/multicore machines, thereby achieving time efficiency. The predictive performances of such parallel GPs are theoretically guaranteed to be equivalent to that of some centralized approaches to approximate full/exact GP regression. The proposed parallel GPs are implemented using the message passing interface (MPI) framework and tested on two large real world datasets. The theoretical and empirical results show that our parallel GPs achieve significantly better time efficiency and scalability than that of full GP, while achieving comparable accuracy. They also achieve fine speedup performance that is the ratio of time required by the parallel algorithms and their centralized counterparts.
To exploit active mobile sensors to perform decentralized perception of the spatiotemporal urban traffic phenomenon, we propose a decentralized algorithm framework: Gaussian processbased decentralized data fusion and active sensing (D2FAS) which is composed of a decentralized data fusion (DDF) component and a decentralized active sensing (DAS) component. The DDF component includes a novel Gaussian processbased decentralized data fusion (GPDDF) algorithm that can achieve remarkably efficient and scalable prediction of phenomenon and a novel Gaussian processbased decentralized data fusion with local augmentation (GPDDF+) algorithm that can achieve better predictive accuracy while preserving time efficiency of GPDDF. The predictive performances of both GPDDF and GPDDF+ are theoretically guaranteed to be equivalent to that of some sophisticated centralized sparse approximations of exact/full GP. For the DAS component, we propose a novel partially decentralized active sensing (PDAS) algorithm that exploits property in correlation structure of GPDDF to enable mobile sensors cooperatively gathering traffic phenomenon data along a nearoptimal joint walk with theoretical guarantee, and a fully decentralized active sensing (FDAS) algorithm that guides each mobile sensor gather phenomenon data along its locally optimal walk.
Lastly, to justify the practicality of the D2FAS framework, we develop and test D2FAS algorithms running with active mobile sensors on real world datasets for monitoring traffic conditions and sensing/servicing urban mobility demands. Theoretical and empirical results show that the proposed algorithms are significantly more timeefficient, more scalable in the size of data and in the number of sensors than the stateoftheart centralized approaches, while achieving comparable predictive accuracy.
 A DecisionTheoretic Approach for Controlling and Coordinating Multiple Active Cameras in Surveillance.
Prabhu Natarajan^{}.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Dec 2013.
Abstract.
The use of active cameras in surveillance is becoming increasingly popular as they try to meet the demands of capturing highresolution images/videos of targets in surveillance for face recognition, target identification, forensic video analysis, etc. These active cameras are endowed with pan, tilt, and zoom capabilities, which can be exploited to provide highquality surveillance. In order to achieve effective, realtime surveillance, an efficient collaborative mechanism is needed to control and coordinate these cameras' actions, which is the focus of this thesis. The central problem in surveillance is to monitor a set of targets with guaranteed image resolution. Controlling and coordinating multiple active cameras to achieve this surveillance task is nontrivial and challenging because: (a) presence of inherent uncertainties in the surveillance environment (targets motion, location, and noisy camera observation); (b) there exists a nontrivial tradeoff between number of targets and the resolution of observing these targets; and (c) more importantly, the coordination framework should be scalable with increasing number of targets and cameras.
In this thesis, we formulate a novel decisiontheoretic multiagent planning approach for controlling and coordinating multiple active cameras in surveillance. Our decisiontheoretic approach offers advantages of (a) accounting the uncertainties using probabilistic models; (b) the nontrivial tradeoff is addressed by coordinating the active cameras' actions to maximize the number of targets with guaranteed resolution; and (c) the scalability in number of targets and cameras is achieved by exploiting the structures and properties that are present in our surveillance problem. We focus on two novel problems in active camera surveillance: (a) maximizing observations of multiple targets (MOMT), i.e., maximizing the number of targets observed in active cameras with guaranteed image resolution; and (b) improving fairness in observation of multiple targets (FOMT), i.e., no target is "starved" of observation by active cameras for long duration of time.
We propose two formal decisiontheoretic frameworks (a) Markov Decision Process (MDP) and (b) Partially Observable Markov Decision Process (POMDP) frameworks for coordinating active cameras in surveillance. MDP framework controls active cameras in fully observable surveillance environments where the active cameras are supported by one or more wideview static/fixed cameras to observe the entire surveillance environment at lowresolution. POMDP framework controls active cameras in partially observable surveillance environments where it is impractical to observe the entire surveillance environment using static/fixed cameras due to occlusions caused by physical infrastructures. Hence the POMDP framework do not have a complete view of the surveillance environment.
Specifically, we propose (a) MDP frameworks to solve MOMT problem and FOMT problem in fully observable surveillance environment; and (b) POMDP framework to solve MOMT problem in partially observable surveillance environment. As proven analytically, our MDP and POMDP frameworks incurs time that is linear in number of targets to be observed during surveillance. We have used maxplus algorithm with our MDP framework to improve its scalability in number of cameras for MOMT problem. Empirical evaluation through simulations in realistic surveillance environment reveals that our proposed approach can achieve highquality surveillance in real time. We also demonstrate our pro posed approach with real Axis 214 PTZ cameras to show the practicality of our approach in real world surveillance. Both the simulations and real camera experiments show that our decisiontheoretic approach can control and coordinate active cameras efficiently and hence contributes significantly towards improving the active camera surveillance research.
 InformationTheoretic MultiRobot Path Planning.
Nannan Cao^{}.
M.Sc. Thesis, Department of Computer Science, National University of Singapore, Sep 2012.
Abstract.
Research in environmental sensing and monitoring is especially important in supporting environmental sustainability efforts worldwide, and has recently attracted significant attention and interest. A key direction of this research lies in modeling and predicting the spatiotemporally varying environmental phenomena. One approach is to use a team of robots to sample the area and model the measurement values at unobserved points. For smoothly varying and hotspot fields, there is some work which has been done to model the fields well. However, there is still a class of common environmental fields called anisotropic fields in which the spatial phenomena are highly correlated along one direction and less correlated along the perpendicular direction. We exploit the environmental structure to improve the sampling performance and time efficiency of planning for anisotropic fields.
In this thesis, we cast the planning problem into a stagewise decisiontheoretic problem. we adopt Gaussian Process to model spatial phenomena. Maximum entropy criterion and maximum mutual information criterion are used to measure the informativeness of the observation paths. It is found that for many GPs, correlation of two points exponentially decreases with the distance between the two points. With this property, for maximum entropy criterion, we propose a polynomialtime approximation algorithm, MEPP, to find the maximum entropy paths. We also provide a theoretical performance guarantee for this algorithm. For maximum mutual information criterion, we propose another polynomialtime approximation algorithm, M2IPP. Similar to the MEPP, a performance guarantee is also provided for this algorithm. We demonstrate the performance advantages of our algorithms on two real data sets. To get lower prediction error, three priciples have also been proposed to select the criterion for different environmental fields.
 MultiRobot Adaptive Exploration and Mapping for Environmental Sensing Applications.
Kian Hsiang Low.
Ph.D. Thesis, Technical Report CMUECE2009024, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, Aug 2009.
Abstract.
Recent research in robot exploration and mapping has focused on sampling hotspot fields, which often arise in environmental and ecological sensing applications. Such a hotspot field is characterized by continuous, positively skewed, spatially correlated measurements with the hotspots exhibiting extreme measurements and much higher spatial variability than the rest of the field.
To map a hotspot field of the above characterization, we assume that it is realized from nonparametric probabilistic models such as the Gaussian and logGaussian processes (respectively, GP and lGP), which can provide formal measures of map uncertainty. To learn a hotspot field map, the exploration strategy of a robot team then has to plan resourceconstrained observation paths that minimize the uncertainty of a spatial model of the hotspot field. This exploration problem is formalized in a sequential decisiontheoretic planning under uncertainty framework called the multirobot adaptive sampling problem (MASP). So, MASP can be viewed as a sequential, nonmyopic version of active learning. In contrast to finitestate Markov decision problems, MASP adopts a more complex but realistic continuousstate, nonMarkovian problem structure so that its induced exploration policy can be informed by the complete history of continuous, spatially correlated observations for selecting paths. It is unique in unifying formulations of nonmyopic exploration problems along the entire adaptivity spectrum, thus subsuming existing nonadaptive formulations and allowing the performance advantage of a more adaptive policy to be theoretically realized. Through MASP, it is demonstrated that a more adaptive strategy can exploit clustering phenomena in a hotspot field to produce lower expected map uncertainty. By measuring map uncertainty using the meansquared error criterion, a MASPbased exploration strategy consequently plans adaptive observation paths that minimize the expected posterior map error or equivalently, maximize the expected map error reduction.
The time complexity of solving MASP (approximately) depends on the map resolution, which limits its practical use in largescale, highresolution exploration and mapping. This computational difficulty is alleviated through an informationtheoretic approach to MASP (iMASP), which measures map uncertainty based on the entropy criterion instead. As a result, an iMASPbased exploration strategy plans adaptive observation paths that minimize the expected posterior map entropy or equivalently, maximize the expected entropy of observation paths. Unlike MASP, reformulating the costminimizing iMASP as a rewardmaximizing dual problem causes its time complexity of being solved approximately to be independent of the map resolution and less sensitive to larger robot team size as demonstrated both analytically and empirically. Furthermore, this rewardmaximizing dual transforms the widelyused nonadaptive maximum entropy sampling problem into a novel adaptive variant, thus improving the performance of the induced exploration policy.
One advantage stemming from the rewardmaximizing dual formulations of MASP and iMASP is that they allow observation selection properties of the induced exploration policies to be realized for sampling the hotspot field. These properties include adaptivity, hotspot sampling, and widearea coverage. We show that existing GPbased exploration strategies may not explore and map the hotspot field well with the selected observations because they are nonadaptive and perform only widearea coverage. In contrast, the lGPbased exploration policies can learn a highquality hotspot field map because they are adaptive and perform both widearea coverage and hotspot sampling.
The other advantage is that even though MASP and iMASP are nontrivial to solve due to their continuous state components, the convexity of their rewardmaximizing duals can be exploited to derive, in a computationally tractable manner, discretestate monotonebounding approximations and subsequently, approximately optimal exploration policies with theoretical performance guarantees. Anytime algorithms based on approximate MASP and iMASP are then proposed to alleviate the computational difficulty that arises from their nonMarkovian structure.
It is of practical interest to be able to quantitatively characterize the "hotspotness" of an environmental field. We propose a novel "hotspotness" index, which is defined in terms of the spatial correlation properties of the hotspot field. As a result, this index can be related to the intensity, size, and diffuseness of the hotspots in the field.
We also investigate how the spatial correlation properties of the hotspot field affect the performance advantage of adaptivity. In particular, we derive sufficient and necessary conditions of the spatial correlation properties for adaptive exploration to yield no performance advantage.
Lastly, we develop computationally efficient approximately optimal exploration strategies for sampling the GP by assuming the Markov property in iMASP planning. We provide theoretical guarantees on the performance of the Markovbased policies, which improve with decreasing spatial correlation. We evaluate empirically the effects of varying spatial correlations on the mapping performance of the Markovbased policies as well as whether these Markovbased path planners are timeefficient for the transect sampling task.
Through the abovementioned work, this thesis establishes the following two claims: (1) adaptive, nonmyopic exploration strategies can exploit clustering phenomena to plan observation paths that produce lower map uncertainty than nonadaptive, greedy methods; and (2) Markovbased exploration strategies can exploit small spatial correlation to plan observation paths which achieve map uncertainty comparable to that of nonMarkovian policies using significantly less planning time.
 Adaptive Sampling for MultiRobot Wide Area Prospecting.
Kian Hsiang Low, Geoffrey J. Gordon, John M. Dolan, and Pradeep Khosla.
In Technical Report CMURITR0551, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Oct 2005.
Abstract. Prospecting for in situ mineral resources is essential for establishing settlements on the Moon and Mars. To reduce human effort and risk, it is desirable to build robotic systems to perform this prospecting. An important issue in designing such systems is the sampling strategy: how do the robots choose where to prospect next? This paper argues that a strategy called Adaptive Cluster Sampling (ACS) has a number of desirable properties: compared to conventional strategies, (1) it reduces the total mission time and energy consumption of a team of robots, and (2) returns a higher mineral yield and more information about the prospected region by directing exploration towards areas of high mineral density, thus providing detailed maps of the boundaries of such areas. Due to the adaptive nature of the sampling scheme, it is not immediately obvious how the resulting sampled data can be used to provide an unbiased, lowvariance estimate of the regional mineral density. This paper therefore investigates new mineral density estimators, which have lower error than previouslydeveloped estimators; they are derived from the older estimators via a process called RaoBlackwellization. Since the efficiency of estimators depends on the type of mineralogical population sampled, the population characteristics that favor ACS estimators are also analyzed. The ACS scheme and our new estimators are evaluated empirically in a detailed simulation of the prospecting task, and the quantitative results show that our approach can yield more minerals with less resources and provide more accurate mineral density estimates than previous methods.
 Integrated Robot Planning and Control with Extended Kohonen Maps.
Kian Hsiang Low.
M.Sc. Thesis, Department of Computer Science, School of Computing, National University of Singapore, Jul 2002.
Singapore Computer Society Prize for best M.Sc. Thesis 20022003
Abstract. The problem of goaldirected, collisionfree motion in a complex, unpredictable environment can be solved by tightly integrating highlevel deliberative planning with lowlevel reactive control. This thesis presents two such architectures for a nonholonomic mobile robot. To achieve realtime performance, reactive control capabilities have to be fully realized so that the deliberative planner can be simplified. These architectures are enriched with reactive target reaching and obstacle avoidance modules. Their target reaching modules use indirectmapping Extended Kohonen Map to provide finer and smoother motion control than directmapping methods. While one architecture fuses these modules indirectly via command fusion, the other one couples them directly using cooperative Extended Kohonen Maps, enabling the robot to negotiate unforeseen concave obstacles. The planner for both architectures use a slippery cells technique to decompose the free workspace into fewer cells, thus reducing search time. Any two points in the cell can still be traversed by reactive motion.
 Mobile Robots That Learn to Navigate.
Kian Hsiang Low.
Honors Thesis, Department of Computer Science,
School of Computing, National University of Singapore, Apr 2001.
Abstract. A sensorimotor controller has been implemented to enable a mobile robot to learn its motion control autonomously and perform simple targetreaching movements. This controller is able to perform fine motion by reducing its selfpositioning error and also, reach a designated target location with minimum delay. The control architecture is in the form of a neural network known as the SelfOrganizing Map. Besides implementing the motor control and the online learning algorithms, the essentiality of a prelearning phase is also evaluated. Then, we explore the possibility of incorporating a novel concept known as Local Linear Smoothing into our batch training algorithm; this notion allows the elimination of the boundary bias phenomenon. Lastly, we suggest a simple approach to learning in an obstacleridden environment.
This document, research.html, has been accessed 34761 times since 22Mar15 23:49:40 SGT.
This is the 7th time it has been accessed today.
A total of 12629 different hosts have accessed this document in the
last 947 days; your host, 54.161.73.123, has accessed it 1 times.
If you're interested, complete statistics for
this document are also available, including breakdowns by toplevel
domain, host name, and date.

AUTOMATIC MACHINE LEARNING : BAYESIAN OPTIMIZATION
PROJECT DURATION : Feb 2016  Present
PROJECT AFFILIATION

SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborator: Patrick Jaillet, MIT)

SensorEnhanced Social Media (SeSaMe) Centre (Collaborator: Mohan Kankanhalli)
PROJECT FUNDING
 MOE AcRF Tier 2 Grant : Scaling up Gaussian Process Predictive Models for Big Data, SGD $637,884, Jul 2017  Jul 2020
 SMART Subaward Agreement  FM IRG :
Automatic Probabilistic Machine Learning for Traffic Modeling and Prediction,
SGD $45,000, Apr 2017  Mar 2018
 Research Collaboration Agreement with Panasonic R&D Center Singapore : Hyperparameters Tuning using Bayesian Optimization, SGD $69,336, Mar 2016  Mar 2017
PROBLEM MOTIVATION
 Batch Bayesian Optimization. Bayesian optimization (BO) has recently gained considerable traction due to its capability of finding the global maximum of a highly complex (e.g., nonconvex, no closedform expression nor derivative), noisy blackbox objective function with a limited budget of (often costly) function evaluations, consequently witnessing its use in an increasing diversity of application domains such as robotics, environmental sensing/monitoring, automatic machine learning, among others.
A number of acquisition functions (e.g., probability of improvement or expected improvement over the currently found maximum, entropybased, and upper confidence bound (UCB)) have been devised to perform BO: They repeatedly select an input for evaluating/querying the blackbox function (i.e., until the budget is depleted) that intuitively trades off between sampling where the maximum is likely to be given the current, possibly imprecise belief of the function modeled by a Gaussian process (GP) (i.e., exploitation) vs. improving the GP belief of the function over the entire input domain (i.e., exploration) to guarantee finding the global maximum.
The rapidly growing affordability and availability of hardware resources (e.g., computer clusters, sensor networks, robot teams/swarms) have motivated the recent development of BO algorithms that can repeatedly select a batch of inputs for querying the blackbox function in parallel instead. Such batch/parallel BO algorithms can be classified into two types: On one extreme, batch BO algorithms like multipoints expected improvement, parallel predictive entropy search, and the parallel knowledge gradient method jointly optimize the batch of inputs and hence scale poorly in the batch size.
On the other extreme, greedy batch BO algorithms boost the scalability by selecting the inputs of the batch one at a time. We argue that such a highly suboptimal approach to gain scalability is an overkill: In practice, each function evaluation is often much more computationally and/or economically costly (e.g., hyperparameter tuning for deep learning, drug testing on human subjects), which justifies dedicating more time to obtain better BO performance.
 Nonmyopic Bayesian Optimization. The fundamental challenge of integrated planning and learning is to design an autonomous agent that can plan its actions to maximize its expected total rewards while interacting with an unknown task environment. Recent research efforts tackling this challenge have progressed from the use of simple Markov models assuming discretevalued, independent observations to that of a rich class of Bayesian nonparametric Gaussian process (GP) models characterizing continuousvalued, correlated observations in order to represent the latent structure of complex, possibly noisy task environments with higher fidelity. Such a challenge is posed by the problem of Bayesian optimization (BO).
Its objective is to select and gather the most informative (possibly noisy) observations for finding the global maximum of an unknown, highly complex (e.g., nonconvex, no closedform expression nor derivative) objective function (i.e., task environment) modeled by a GP given a sampling budget (e.g., number of costly function evaluations). The rewards of a BO agent are defined using an improvementbased (e.g., probability of improvement or expected improvement over currently found maximum), entropybased, or upper confidence bound (UCB) acquisition function. A limitation of most BO algorithms is that they are myopic. To overcome this limitation, approximation algorithms for nonmyopic adaptive BO have been proposed, but their performances are not theoretically guaranteed.
PROPOSED METHODOLOGY
 Batch Bayesian Optimization. To tackle the first problem, we show that it is in fact possible to jointly optimize the batch of inputs and still preserve scalability in the batch size by giving practitioners the flexibility to trade off BO performance for time efficiency.
To achieve this, we first observe that, interestingly, batch BO can be perceived as a cooperative multiagent decision making problem whereby each agent optimizes a separate input of the batch while coordinating with the other agents doing likewise.
To the best of our knowledge, this has not been considered in the BO literature.
In particular, if batch BO can be framed as some known class of multiagent decision making problems, then it can be solved efficiently and scalably by the latter's stateoftheart solvers.
The key technical challenge would therefore be to investigate how batch BO can be cast as one of such to exploit its advantage of scalability in the number of agents (hence, batch size) while at the same time theoretically guaranteeing the resulting BO performance.
To tackle the above challenge, our first work presents a novel distributed batch BO algorithm that, in contrast to greedy batch BO algorithms,
can jointly optimize a batch of inputs and, unlike the batch BO algorithms, still preserve scalability in the batch size.
To realize this, we generalize GPUCB to a new batch variant amenable to a Markov approximation, which can then be naturally formulated as a multiagent distributed constraint optimization problem (DCOP) in order to fully exploit the efficiency of its stateoftheart solvers for achieving linear time in the batch size.
Our proposed distributed batch GPUCB (DBGPUCB) algorithm offers practitioners the flexibility to trade off between the approximation quality and time efficiency by varying the Markov order. We provide a theoretical guarantee for the convergence rate of our DBGPUCB algorithm via bounds on its cumulative regret. We empirically evaluate the cumulative regret incurred by our DBGPUCB algorithm and its scalability in the batch size on synthetic benchmark objective functions and a realworld optimization problem.
 Nonmyopic Bayesian Optimization. To address the second problem, our second work presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning and BO criteria (e.g., UCB) and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the explorationexploitation tradeoff, consequently allowing planning and learning to be integrated seamlessly and performed simultaneously instead of separately. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive εoptimal GPP (εGPP) policy given an arbitrarily userspecified loss bound ε. To plan in real time, we further propose an asymptotically optimal, branchandbound anytime variant of εGPP with performance guarantee. Finally, we empirically evaluate the performances of our εGPP policy and its anytime variant in BO and an energy harvesting task on simulated and realworld environmental fields.
PUBLICATIONS
 Distributed Batch Gaussian Process Optimization.
Erik Daxberger & Kian Hsiang Low.
In Proceedings of the 34th International Conference on Machine Learning (ICML17), pages 951960, Sydney, Australia, Aug 611, 2017.
25.9% acceptance rate
Abstract. This paper presents a novel distributed batch Gaussian process upper confidence bound (DBGPUCB) algorithm for performing batch Bayesian optimization (BO) of highly complex, costlytoevaluate blackbox objective functions. In contrast to existing batch BO algorithms, DBGPUCB can jointly optimize a batch of inputs (as opposed to selecting the inputs of a batch one at a time) while still preserving scalability in the batch size. To realize this, we generalize GPUCB to a new batch variant amenable to a Markov approximation, which can then be naturally formulated as a multiagent distributed constraint optimization problem in order to fully exploit the efficiency of its stateoftheart solvers for achieving linear time in the batch size. Our DBGPUCB algorithm offers practitioners the flexibility to trade off between the approximation quality and time efficiency by varying the Markov order. We provide a theoretical guarantee for the convergence rate of DBGPUCB via bounds on its cumulative regret. Empirical evaluation on synthetic benchmark objective functions and a realworld optimization problem shows that DBGPUCB outperforms the stateoftheart batch BO algorithms.
 Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond.
Chun Kai Ling, Kian Hsiang Low & Patrick Jaillet^{}.
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI16), pages 18601866, Phoenix, AZ, Feb 1217, 2016.
25.75% acceptance rate
Abstract. This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the explorationexploitation tradeoff. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive ϵoptimal GPP (ϵGPP) policy. To plan in real time, we further propose an asymptotically optimal, branchandbound anytime variant of ϵGPP with performance guarantee. We empirically demonstrate the effectiveness of our ϵGPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.
EXPLORATIONEXPLOITATION DILEMMA IN ACTIVE LEARNING OF GAUSSIAN PROCESSES
PROJECT DURATION : Aug 2013  Present
PROJECT AFFILIATION

SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborator: Patrick Jaillet, MIT)

SensorEnhanced Social Media (SeSaMe) Centre (Collaborator: Mohan Kankanhalli)
PROJECT FUNDING
 MOE AcRF Tier 2 Grant : Scaling up Gaussian Process Predictive Models for Big Data, SGD $637,884, Jul 2017  Jul 2020
 SMART Subaward Agreements  FM IRG : Spatiotemporal Modeling and Prediction of Traffic Patterns,
SGD $361,456.17, Oct 2011  Mar 2017
PROBLEM MOTIVATION
The explorationexploitation dilemma arises in the following three problems of active learning of Gaussian processes:
 NonStationary Gaussian Processes. A key challenge of environmental sensing and monitoring is that of sensing, modeling, and predicting complex urban and natural environmental phenomena, which are typically characterized by spatially correlated measurements. To tackle this challenge, recent research efforts in the robotics community have focused on developing multirobot active sensing (MAS) algorithms: Their objective is to coordinate the exploration of a team of mobile robots to actively
gather the most informative observations for predicting a spatially varying phenomenon of interest while being subject to resource cost constraints (e.g., number of deployed robots, energy consumption, mission time). To achieve this, a number of MAS algorithms have modeled the phenomenon as a Gaussian process (GP), which allows its spatial correlation structure to be formally characterized and its predictive uncertainty to be formally quantified (e.g., based on meansquared error, entropy, or mutual information criterion) and subsequently exploited for directing the robots to explore its highly uncertain areas. In order not to incur high computational expense, these algorithms have assumed the spatial correlation structure to be known (or estimated crudely using sparse prior data) and stationary (i.e., degree of smoothness in the spatial variation of the measurements is the same across the entire phenomenon), properties of which are often violated in realworld environmental sensing applications and limited to smallscale phenomena.
In practice, the spatial correlation structure of possibly largescale environmental phenomena is usually not known and nonstationary (i.e., separate areas of a phenomenon exhibit different local degrees of smoothness in the spatial variation of the measurements). For example, in some ocean phenomena (e.g., temperature, salinity, sea surface height), their measurements far offshore are more smoothly varying (i.e., more spatially correlated) in the crossshore direction than nearshore. Urban traffic networks also display nonstationary phenomena (e.g., traffic speeds, taxi demands), which pose important considerations to traffic routing and signal control.
Existing MAS algorithms can still be used for sampling a nonstationary phenomenon by assuming, albeit incorrectly, its spatial correlation structure to be known and stationary in order to preserve time efficiency. So, though they can gather the most informative observations under an assumed stationary correlation structure, they will perform suboptimally with respect to the true nonstationary correlation structure.
A more desirable MAS algorithm should instead be designed to consider the informativeness of its selected observations for both estimating the unknown spatial correlation structure of a phenomenon (i.e., exploration) as well as predicting the phenomenon given the true correlation structure (i.e., exploitation). According to previous geostatistical studies, the most informative observations that are gathered for achieving the former active sensing criterion are not necessarily as informative for satisfying the latter. This raises a fundamental issue faced by active sensing: How can a MAS algorithm trade off between these two possibly conflicting criteria?
 Nonmyopic Active Learning. Active learning has become an increasingly important focal theme in many environmental sensing and monitoring applications (e.g., precision agriculture, mineral prospecting, monitoring of ocean and freshwater phenomena like harmful algal blooms, forest ecosystems, or pollution) where a highresolution in situ sampling of the spatial phenomenon of interest is impractical due to prohibitively costly sampling budget requirements (e.g., number of deployed sensors, energy consumption, mission time): For such applications, it is thus desirable to select and gather the most informative observations/data for modeling and predicting the spatially varying phenomenon subject to some budget constraints, which is the goal of active learning and also known as the active sensing problem.
To elaborate, solving the active sensing problem amounts to deriving an optimal sequential policy that plans/decides the most informative locations to be observed for minimizing the predictive uncertainty of the unobserved areas of a phenomenon given a sampling budget. To achieve this, many existing active sensing algorithms have modeled the phenomenon as a Gaussian process (GP), which allows its spatial correlation structure to be formally characterized and its predictive uncertainty to be formally quantified (e.g., based on meansquared error, entropy, or mutual information criterion). However, they have assumed the spatial correlation structure (specifically, the parameters defining it) to be known, which is often violated in realworld applications, or estimated crudely using sparse prior data. So, though they aim to select sampling locations that are optimal with respect to the assumed or estimated parameters, these locations tend to be suboptimal with respect to the true parameters, thus degrading the predictive performance of the learned GP model.
In practice, the spatial correlation structure of a phenomenon is usually not known. Then, the predictive performance of the GP modeling the phenomenon depends on how informative the gathered observations/data are for both parameter estimation as well as spatial prediction given the true parameters. Interestingly, as revealed in previous geostatistical studies, policies that are efficient for parameter estimation are not necessarily efficient for spatial prediction with respect to the true model. Thus, the active sensing problem involves a potential tradeoff between sampling the most informative locations for spatial prediction given the current, possibly incomplete knowledge of the model parameters (i.e., exploitation) vs. observing locations that gain more information about the parameters (i.e., exploration): How then does an active sensing algorithm trade off between these two possibly conflicting sampling objectives?
To tackle this question, one principled approach is to frame active sensing as a sequential decision problem that jointly and naturally optimizes the above explorationexploitation tradeoff while maintaining a Bayesian belief over the model parameters. This intuitively means a policy that biases towards observing informative locations for spatial prediction given the current model prior may be penalized if it entails a highly dispersed posterior over the model parameters. So, the resulting induced policy is guaranteed to be optimal in the expected active sensing performance. Unfortunately, such a nonmyopic Bayesoptimal policy cannot be derived exactly due to an uncountable set of candidate observations and unknown model parameters. As a result, most existing works have circumvented the tradeoff by resorting to the use of myopic/greedy (hence, suboptimal) policies. To the best of our knowledge, the only notable nonmyopic active sensing algorithm for GPs advocates tackling exploration and exploitation separately, instead of jointly and naturally optimizing their tradeoff, to sidestep the difficulty of solving the Bayesian sequential decision problem. Specifically, it performs a probably approximately correct (PAC)style exploration until it can verify that the performance loss of greedy exploitation lies within a userspecified threshold. But, such an algorithm is suboptimal in the presence of budget con straints due to the following limitations: (a) It is unclear how an optimal threshold for exploration can be determined given a sampling budget, and (b) even if such a threshold is available, the PACstyle exploration is typically designed to satisfy a worstcase sample complexity rather than to be optimal in the expected active sensing performance, thus resulting in an overlyaggressive exploration.
 MultiOutput Gaussian Processes. For many budgetconstrained environmental sensing and monitoring applications in the real world, active learning/sensing is an attractive, frugal alternative to passive highresolution (hence, prohibitively costly) sampling of the spatially varying target phenomenon of interest. Different from the latter, active learning aims to select and gather the most informative observations for modeling and predicting the spatially varying phenomenon given some sampling budget constraints (e.g., quantity of deployed sensors, energy consumption, mission time).
In practice, the target phenomenon often coexists and correlates well with some auxiliary type(s) of phenomena whose measurements may be more spatially correlated, less noisy (e.g., due to higherquality sensors), and/or less tedious to sample (e.g., due to greater availability/quantity, higher sampling rate, and/or lower sampling cost of deployed sensors of these type(s)) and can consequently be exploited for improving its prediction. For example, to monitor soil pollution by some heavy metal (e.g., Cadmium), its complex and timeconsuming extraction from soil samples can be alleviated by supplementing its prediction with correlated auxiliary types of soil measurements (e.g., pH) that are easier to sample. Similarly, to monitor algal bloom in the coastal ocean, plankton abundance correlates well with auxiliary types of ocean measurements (e.g., chlorophyll a, temperature, and salinity) that can be sampled more readily. Other examples of realworld applications include remote sensing, traffic monitoring, monitoring of groundwater and indoor environmental quality, and precision agriculture, among others. All of the above practical examples motivate the need to design and develop an active learning algorithm that selects not just the most informative sampling locations to be observed but also the types of measurements (i.e., target and/or auxiliary) at each selected location for minimizing the predictive uncertainty of unobserved areas of a target phenomenon given a sampling budget, which is the focus of our work here.
To achieve this, we model all types of coexisting phenomena (i.e., target and auxiliary) jointly as a multioutput Gaussian process (MOGP), which allows the spatial correlation structure of each type of phenomenon and the crosscorrelation structure between different types of phenomena to be formally characterized. More importantly, unlike the nonprobabilistic multivariate regression methods, the probabilistic MOGP regression model allows the predictive uncertainty of the target phenomenon (as well as the auxiliary phenomena) to be formally quantified (e.g., based on entropy or mutual information criterion) and consequently exploited for deriving the active learning criterion.
PROPOSED METHODOLOGY
 NonStationary Gaussian Processes. To address the first problem, our first work presents a decentralized multirobot active sensing (DECMAS) algorithm that can efficiently coordinate the exploration of multiple robots to jointly optimize the above tradeoff for sampling unknown, nonstationary environmental phenomena. Our DECMAS algorithm models a nonstationary phenomenon as a Dirichlet process mixture of Gaussian processes (DPMGPs): Using the gathered observations, DPMGPs can learn to automatically partition the phenomenon into separate local areas, each of which comprises measurements that vary according to a stationary spatial correlation structure and can thus be modeled by a locally stationary GP. The main contributions of our work here are novel in demonstrating how DPMGPs and its structural properties can be exploited to (a) formalize an active sensing criterion that trades off between gathering the most informative observations for estimating the unknown partition (i.e., a key component of the nonstationary correlation structure) vs. that for predicting the phenomenon given the current, possibly imprecise estimate of the partition, and (b) support effective and efficient decentralized coordination. We also provide a theoretical performance guarantee for DECMAS and analyze its time complexity. Finally, we empirically demonstrate using two realworld datasets that DECMAS outperforms the stateoftheart MAS algorithms.
 Nonmyopic Active Learning. To tackle the second problem, our second work presents an efficient decisiontheoretic planning approach to nonmyopic active sensing/learning that can still preserve and exploit the principled Bayesian sequential decision problem framework for jointly and naturally optimizing the explorationexploitation tradeoff and consequently does not incur the limitations of the algorithm of Krause & Guestrin (2007). In particular, although the exact Bayesoptimal policy to the active sensing problem cannot be derived, we show that it is in fact possible to solve for a nonmyopic ϵBayesoptimal active learning (ϵBAL) policy given a userdefined bound ϵ, which is the main contribution of our work here. In other words, our proposed ϵBAL policy can approximate the optimal expected active sensing performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). In contrast, the algorithm of Krause & Guestrin (2007) can only yield a suboptimal performance bound. To meet the realtime requirement in timecritical applications, we then propose an asymptotically ϵoptimal, branchandbound anytime algorithm based on ϵBAL with performance guarantee. We empirically demonstrate using both synthetic and realworld datasets that, with limited budget, our proposed approach outperforms stateoftheart algorithms.
 MultiOutput Gaussian Processes. To solve the third problem,
our third work is the first to present an efficient algorithm for active learning of a MOGP model. We consider utilizing the entropy criterion to measure the predictive uncertainty of a target phenomenon, which is widely used for active learning of a singleoutput GP model. Unfortunately, for the MOGP model, such a criterion scales poorly in the number of candidate sampling locations of the target phenomenon and even more so in the number of selected observations (i.e., sampling budget) when optimized. To resolve this scalability issue, we first exploit a structure common to a unifying framework of sparse MOGP models for deriving a novel active learning criterion.
Our novel active learning criterion exhibits an interesting explorationexploitation tradeoff between
selecting locations with the most uncertain measurements of the target phenomenon to be observed given the latent structure of the sparse MOGP model (i.e., exploitation) vs. selecting locations to be observed (i.e., possibly of auxiliary types of phenomena) so as to rely less on measurements at the remaining unobserved locations (i.e., won't be sampled) of the target phenomenon to infer the latent model structure (i.e., exploration).
Then, we define a relaxed notion of submodularity called ϵsubmodularity and exploit the ϵsubmodularity property of our new criterion for devising a polynomialtime approximation algorithm that guarantees a constantfactor approximation of that achieved by the optimal set of selected observations. We empirically evaluate the performance of our proposed algorithm using three realworld datasets.
PUBLICATIONS
 NearOptimal Active Learning of MultiOutput Gaussian Processes.
Yehong Zhang, Trong Nghia Hoang, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI16), pages 23512357, Phoenix, AZ, Feb 1217, 2016.
25.75% acceptance rate
Abstract. This paper addresses the problem of active learning of a multioutput Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomialtime approximation algorithm that guarantees a constantfactor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on realworld datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and singleoutput GP models.
 New Advances on Bayesian and DecisionTheoretic Approaches for Interactive Machine Learning.
Trong Nghia Hoang.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Feb 2015.
Abstract.
The explorationexploitation tradeoff is a fundamental dilemma in many interactive learning scenarios which include both aspects of reinforcement learning (RL) and active learning (AL): An autonomous agent, situated in an unknown environment, has to actively extract knowledge from the environment by taking actions (or conducting experiments) based on its previously collected information to make accurate predictions or to optimize some utility functions. Thus, to make the most effective use of their resourceconstrained budget (e.g., processing time, experimentation cost), the agent must choose carefully between (a) exploiting options (e.g., actions, experiments) which are recommended by its current, possibly incomplete model of the environment, and (b) exploring the other ostensibly suboptimal choices to gather more information.
For example, an RL agent has to face a dilemma between (a) exploiting the mostrewarding action according to the current statistical model of the environment at the risk of running into catastrophic situations if the model is not accurate, and (b) exploring a suboptimal action to gather more information so as to improve the models accuracy at the potential price of losing the shortterm reward. Similarly, an AL algorithm/agent has to consider between (a) conducting the most informative experiments according to its current estimation of the environment models parameters (i.e., exploitation), and (b) running experiments that help improving the estimation accuracy of these parameters (i.e., exploration).
More often, learning strategies that ignore exploration will likely exhibit suboptimal performance due to their imperfect knowledge while, conversely, those that entirely focus on exploration might suffer the cost of learning without benefitting from it. Therefore, a good explorationexploitation tradeoff is critical to the success of those interactive learning agents: In order to perform well, they must strike the right balance between these two conflicting objectives. Unfortunately, while this tradeoff has been wellrecognized since the early days of RL, the studies of explorationexploitation have mostly been developed for theoretical settings in the respective field of RL and, perhaps surprisingly, glossed over in the existing AL literature. From a practical point of view, we see three limiting factors:
 Previous works addressing the explorationexploitation tradeoff in RL have largely focused on simple choices of the environment model and consequently, are not practical enough to accommodate realworld applications that have far more complicated environment structures. In fact, we find that most recent advances in Bayesian reinforcement learning (BRL) have only been able to analytically trade off between exploration and exploitation under a simple choice of models such as FlatDirichletMultinomial (FDM) whose independence and modeling assumptions do not hold for many realworld applications.
 Nearly all of the notable works in the AL literature primarily advocate the use of greedy/myopic algorithms whose rates of convergence (i.e., the number of experiments required by the learning algorithm to achieve a desired performance in the worst case) are provably minimax optimal for simple classes of learning tasks (e.g., threshold learning). While these results have greatly ad vanced our understanding about the limit of myopic AL in worstcase scenarios, significantly less is presently known about whether it is possible to devise nonmyopic AL strategies which optimize the explorationexploitation tradeoff to achieve the best expected performance in budgeted learning scenarios.
 The issue of scalability of the existing predictive models (e.g., Gaussian processes) used in AL has generally been underrated since the majority of literature considers smallscale environments which only consist of a few thousand candidate experiments to be selected by singlemode AL algorithms one at a time prior to retraining the model. In contrast, largescale environments usually have a massive set of million candidate experiments among which tens or hundreds of thousands should be actively selected for learning. For such dataintensive problems, it is often more costeffective to consider batchmode AL algorithms which select and conduct multiple experiments in parallel at each stage to collect observations in batch. Retraining the predictive model after incorporating each batch of observations then becomes a computational bottleneck as the collected dataset at each stage quickly grows up to tens or even hundreds of thousand data points.
This thesis outlines some recent progresses that we have been able to make while working toward satisfactory answers to the above challenges, along with practical algorithms that achieve them:
 In particular, in order to put BRL into practice for more complicated and practical problems, we propose a novel framework called Interactive Bayesian Reinforcement Learning (IBRL) to integrate the general class of parametric models and model priors, thus allowing the practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the environment as often required in many realworld applications. Interestingly, we show how the nonmyopic Bayesoptimal policy can be derived analytically by solving IBRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Our empirical studies show that the proposed approach performs competitively with the existing stateoftheart algorithms.
 Then, to establish a theoretical foundation for the explorationexploitation tradeoff in singlemode active learning scenarios with resourceconstrained budgets, we present a novel ϵBayesoptimal DecisionTheoretic Active Learning (ϵBAL) framework which advocates the use of differential entropy as a performance measure and consequently, derives a learning policy that can approximate the optimal expected performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). To meet the realtime requirement in timecritical applications, we then propose an asymptotically ϵoptimal, branchandbound anytime algorithm based on ϵBAL with performance guarantees. In practice, we empirically demonstrate with both synthetic and realworld datasets that the proposed approach outperforms the stateoftheart algorithms in budgeted scenarios.
 Lastly, to facilitate the future developments of largescale, nonmyopic AL applications, we further introduce a highly scalable family of anytime predictive models for AL which provably converge toward a wellknown class of sparse Gaussian processes (SGPs). Unlike the existing predictive models of AL which cannot be updated incrementally and are only capable of processing middlesized datasets (i.e., a few thousands of data points), our proposed models can process massive datasets in an anytime fashion, thus providing a principled tradeoff between the processing time and the predictive accuracy. The efficiency of our framework is then demonstrated empirically on a variety of largescale realworld datasets which contains hundreds of thousand data points.
 Nonmyopic ϵBayesOptimal Active Learning of Gaussian Processes.
Trong Nghia Hoang, Kian Hsiang Low, Patrick Jaillet and Mohan Kankanhalli.
In Proceedings of the 31st International Conference on Machine Learning (ICML14), pages 739747, Beijing, China, Jun 2126, 2014.
22.4% acceptance rate (cycle 2)
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. A fundamental issue in active learning of Gaussian processes is that of the explorationexploitation tradeoff.
This paper presents a novel nonmyopic ϵBayesoptimal active learning (ϵBAL) approach that jointly and naturally optimizes the tradeoff.
In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and exploitation separately.
To perform active learning in real time, we then propose an anytime algorithm based on ϵBAL with performance guarantee and empirically demonstrate using synthetic and realworld datasets that, with limited budget, it outperforms the stateoftheart algorithms.
 Active Learning is Planning: Nonmyopic ϵBayesOptimal Active Learning of Gaussian Processes.
Trong Nghia Hoang, Kian Hsiang Low, Patrick Jaillet and Mohan Kankanhalli.
In T. Calders, F. Esposito, E. Hüllermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases  European Conference, ECML/PKDD14 Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, pages 494498, Springer Berlin Heidelberg, Nancy, France, Sep 1519, 2014.
Abstract. A fundamental issue in active learning of Gaussian processes is that of the explorationexploitation tradeoff. This paper presents a novel nonmyopic ϵBayesoptimal active learning (ϵBAL) approach that jointly optimizes the tradeoff. In contrast, existing works have primarily developed greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ϵBAL with performance guarantee and empirically demonstrate using a realworld dataset that, with limited budget, it outperforms the stateoftheart algorithms.
 MultiRobot Active Sensing of NonStationary Gaussian ProcessBased Environmental Phenomena.
Ruofei Ouyang, Kian Hsiang Low, Jie Chen & Patrick Jaillet.
In Proceedings of the
13th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS14), pages 573580, Paris, France, May 59, 2014.
23.8% acceptance rate
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. A key challenge of environmental sensing and monitoring is that of sensing, modeling, and predicting largescale, spatially correlated environmental phenomena, especially when they are unknown and nonstationary.
This paper presents a decentralized multirobot active sensing (DECMAS) algorithm that can efficiently coordinate the exploration of multiple robots to gather the most informative observations for predicting an unknown, nonstationary phenomenon.
By modeling the phenomenon using a Dirichlet process mixture of Gaussian processes (DPMGPs), our work here is novel in demonstrating how DPMGPs and its structural properties can be exploited to (a) formalize an active sensing criterion that trades off between gathering the most informative observations for estimating the unknown, nonstationary spatial correlation structure vs. that for predicting the phenomenon given the current, imprecise estimate of the correlation structure, and (b) support efficient decentralized coordination.
We also provide a theoretical performance guarantee for DECMAS and analyze its time complexity.
We empirically demonstrate using two realworld datasets that DECMAS outperforms stateoftheart MAS algorithms.
ONLINE AND ANYTIME SPARSE GAUSSIAN PROCESSES FOR BIG DATA
PROJECT DURATION : Aug 2013  Present
PROJECT AFFILIATION

SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborators: Emilio Frazzoli, MIT; Daniela Rus, MIT)

SensorEnhanced Social Media (SeSaMe) Centre (Collaborator: Mohan Kankanhalli)
PROJECT FUNDING
 MOE AcRF Tier 2 Grant : Scaling up Gaussian Process Predictive Models for Big Data, SGD $637,884, Jul 2017  Jul 2020
 SMART Subaward Agreements  FM IRG :
Autonomy in MobilityOnDemand Systems,
SGD $1,348,638.22, Aug 2010  Dec 2015
 Research Collaboration Agreements with Panasonic R&D Center Singapore : Sonar Data Fusion Algorithm for Object Distance Estimation, SGD $84,230.40, Feb 2016  Jul 2016, Dec 2016  Jul 2017
PROBLEM MOTIVATION
A Gaussian process regression (GPR) model is a Bayesian nonparametric model for performing nonlinear regression that provides a Gaussian predictive distribution with formal measures of predictive uncertainty. The expressivity of a fullrank GPR (FGPR) model, however, comes at a cost of cubic time in the size of the data, thus rendering it computationally impractical for training with massive datasets. To improve its scalability, a number of sparse GPR (SGPR) models
exploiting lowrank approximate representations have been proposed, many of which share a similar structural assumption of conditional independence (albeit of varying degrees) based on the notion of inducing variables and consequently incur only linear time in the data size. The work of QuinoneroCandela & Rasmussen (2005) has in fact presented a unifying view of such SGPR models, which include the subset of regressors (SoR), deterministic training conditional (DTC), fully independent training conditional (FITC), fully independent conditional (FIC), partially independent training conditional (PITC), and partially independent conditional (PIC) approximations.
To scale up these SGPR models further for performing realtime predictions necessary in many timecritical applications and decision support systems (e.g., ocean sensing, traffic monitoring), the work of Gal et al. (2014) has parallelized DTC while that of Chen et al. (2013) has parallelized FITC, FIC, PITC, and PIC to be run on multiple machines. The recent work of Low et al. (2015) has produced a spectrum of SGPR models with PIC and FGPR at the two extremes that are also amenable to parallelization on multiple machines. Ideally, these parallel SGPR models can reduce the incurred time of their centralized counterparts by a factor close to the number of machines. In practice, since the number of machines is limited due to budget constraints, their incurred time will still grow with an increasing size of data. Like their centralized counterparts, they can be trained using all the data.
When the data is expected to stream in over a (possibly indefinitely) long time, it is also computationally impractical to repeatedly use these existing offline sparse GP approximation methods or even the online GP model (i.e., quadratic time in the data size) for training at each time step.
PROPOSED METHODOLOGY
A more affordable alternative is to instead train a SGPR model in either an (1) online or (2) anytime fashion with a small, randomly sampled subset of the data at each iteration, which requires only a single machine:
 Our first work presents a novel online sparse GP approximation method that, in contrast to existing works mentioned above, is capable of achieving constant time and memory (i.e., independent of the size of the data/observations) per time step. We provide a theoretical guarantee on its predictive performance to be equivalent to that of the offline sparse PITC approximation method. Our proposed method generalizes the sparse online GP model of Csato & Opper (2002) by relaxing its conditional independence assumption significantly, hence potentially improving the predictive performance. We empirically demonstrate the practical feasibility of using our generalized online sparse GP approximation method through a realworld persistent mobile robot localization experiment.
 To the best of our knowledge, the only notable anytime SGPR model exploits a result of Titsias (2009) that DTC can alternatively be obtained using variational inference by minimizing the KullbackLeibler (KL) distance between the variational approximation and the GP posterior distribution of some latent variables given the data, from which a stochastic natural gradient ascent (SNGA) method can be derived to achieve an asymptotic convergence of its predictive performance to that of DTC while incurring constant time per iteration.
This anytime variant of DTC promises a huge speedup if the number of sampled subsets of data needed for convergence is much smaller than the total number of possible disjoint subsets that can be formed and sampled from all the data. But, it can be observed in our experiments that DTC often does not predict as well as the other SGPR models (except SoR) encompassed by the unifying view of QuinoneroCandela & Rasmussen (2005) because it imposes the most restrictive structural assumption. This motivates us to consider the possibility of constructing an anytime variant of any SGPR model of our choice whose derived SNGA method can achieve an asymptotic convergence of its predictive performance to that of the chosen SGPR model while preserving constant time per iteration.
However, no alternative formulation based on variational inference exists for any SGPR model other than DTC in order to derive such a SNGA method.
To address the above challenge, our second work presents a novel unifying framework of anytime SGPR models that can produce good predictive performance fast and improve their predictive performance over time. Our proposed unifying framework, perhaps surprisingly, reverses the variational inference procedure to theoretically construct a nontrivial, concave functional (i.e., of distributions) that is maximized at the predictive distribution of any SGPR model of our choice. Consequently, a SNGA method can be derived that involves iteratively following the stochastic natural gradient of the functional to improve its estimate of the predictive distribution of the chosen SGPR model and is guaranteed to achieve asymptotic convergence to it.
Interestingly, we show that if the predictive distribution of the chosen SGPR model satisfies certain decomposability conditions (e.g., DTC, FITC, PIC), then the stochastic natural gradient is an unbiased estimator of the exact natural gradient and can be computed in constant time (i.e., independent of data size) at each iteration. We empirically evaluate the tradeoff between the predictive performance vs. time efficiency of the anytime SGPR models spanned by our unifying framework (i.e., including stateoftheart anytime variant of DTC) on two realworld millionsized datasets.
PUBLICATIONS
 A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression.
Quang Minh Hoang, Trong Nghia Hoang & Kian Hsiang Low.
In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI17), pages 20072014, San Francisco, CA, Feb 49, 2017.
24.6% acceptance rate (oral presentation)
Abstract. While much research effort has been dedicated to scaling up sparse Gaussian process (GP) models based on inducing variables for big data, little attention is afforded to the other less explored class of lowrank GP approximations that exploit the sparse spectral representation of a GP kernel. This paper presents such an effort to advance the state of the art of sparse spectrum GP models to achieve competitive predictive performance for massive datasets. Our generalized framework of stochastic variational Bayesian sparse spectrum GP (sVBSSGP) models addresses their shortcomings by adopting a Bayesian treatment of the spectral frequencies to avoid overfitting, modeling these frequencies jointly in its variational distribution to enable their interaction a posteriori, and exploiting local data for boosting the predictive performance. However, such structural improvements result in a variational lower bound that is intractable to be optimized. To resolve this, we exploit a variational parameterization trick to make it amenable to stochastic optimization. Interestingly, the resulting stochastic gradient has a linearly decomposable structure that can be exploited to refine our stochastic optimization method to incur constant time per iteration while preserving its property of being an unbiased estimator of the exact gradient of the variational lower bound. Empirical evaluation on realworld datasets shows that sVBSSGP outperforms stateoftheart stochastic implementations of sparse GP models.
 A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data.
Trong Nghia Hoang, Quang Minh Hoang & Kian Hsiang Low.
In Proceedings of the 32nd International Conference on Machine Learning (ICML15), pages 569578, Lille, France, Jul 611, 2015.
26.0% acceptance rate
Abstract. This paper presents a novel unifying framework of anytime sparse Gaussian process regression (SGPR) models that can produce good predictive performance fast and improve their predictive performance over time. Our proposed unifying framework reverses the variational inference procedure to theoretically construct a nontrivial, concave functional that is maximized at the predictive distribution of any SGPR model of our choice.
As a result, a stochastic natural gradient ascent method can be derived that involves iteratively following the stochastic natural gradient of the functional to improve its estimate of the predictive distribution of the chosen SGPR model
and is guaranteed to achieve asymptotic convergence to it. Interestingly, we show that if the predictive distribution of the chosen SGPR model
satisfies certain decomposability conditions, then the stochastic natural gradient is an unbiased estimator of the exact natural gradient and can be computed in constant time (i.e., independent of data size) at each iteration. We empirically evaluate the tradeoff between the predictive performance vs. time efficiency of the anytime SGPR models on two realworld millionsized datasets.
 New Advances on Bayesian and DecisionTheoretic Approaches for Interactive Machine Learning.
Trong Nghia Hoang.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Feb 2015.
Abstract.
The explorationexploitation tradeoff is a fundamental dilemma in many interactive learning scenarios which include both aspects of reinforcement learning (RL) and active learning (AL): An autonomous agent, situated in an unknown environment, has to actively extract knowledge from the environment by taking actions (or conducting experiments) based on its previously collected information to make accurate predictions or to optimize some utility functions. Thus, to make the most effective use of their resourceconstrained budget (e.g., processing time, experimentation cost), the agent must choose carefully between (a) exploiting options (e.g., actions, experiments) which are recommended by its current, possibly incomplete model of the environment, and (b) exploring the other ostensibly suboptimal choices to gather more information.
For example, an RL agent has to face a dilemma between (a) exploiting the mostrewarding action according to the current statistical model of the environment at the risk of running into catastrophic situations if the model is not accurate, and (b) exploring a suboptimal action to gather more information so as to improve the models accuracy at the potential price of losing the shortterm reward. Similarly, an AL algorithm/agent has to consider between (a) conducting the most informative experiments according to its current estimation of the environment models parameters (i.e., exploitation), and (b) running experiments that help improving the estimation accuracy of these parameters (i.e., exploration).
More often, learning strategies that ignore exploration will likely exhibit suboptimal performance due to their imperfect knowledge while, conversely, those that entirely focus on exploration might suffer the cost of learning without benefitting from it. Therefore, a good explorationexploitation tradeoff is critical to the success of those interactive learning agents: In order to perform well, they must strike the right balance between these two conflicting objectives. Unfortunately, while this tradeoff has been wellrecognized since the early days of RL, the studies of explorationexploitation have mostly been developed for theoretical settings in the respective field of RL and, perhaps surprisingly, glossed over in the existing AL literature. From a practical point of view, we see three limiting factors:
 Previous works addressing the explorationexploitation tradeoff in RL have largely focused on simple choices of the environment model and consequently, are not practical enough to accommodate realworld applications that have far more complicated environment structures. In fact, we find that most recent advances in Bayesian reinforcement learning (BRL) have only been able to analytically trade off between exploration and exploitation under a simple choice of models such as FlatDirichletMultinomial (FDM) whose independence and modeling assumptions do not hold for many realworld applications.
 Nearly all of the notable works in the AL literature primarily advocate the use of greedy/myopic algorithms whose rates of convergence (i.e., the number of experiments required by the learning algorithm to achieve a desired performance in the worst case) are provably minimax optimal for simple classes of learning tasks (e.g., threshold learning). While these results have greatly ad vanced our understanding about the limit of myopic AL in worstcase scenarios, significantly less is presently known about whether it is possible to devise nonmyopic AL strategies which optimize the explorationexploitation tradeoff to achieve the best expected performance in budgeted learning scenarios.
 The issue of scalability of the existing predictive models (e.g., Gaussian processes) used in AL has generally been underrated since the majority of literature considers smallscale environments which only consist of a few thousand candidate experiments to be selected by singlemode AL algorithms one at a time prior to retraining the model. In contrast, largescale environments usually have a massive set of million candidate experiments among which tens or hundreds of thousands should be actively selected for learning. For such dataintensive problems, it is often more costeffective to consider batchmode AL algorithms which select and conduct multiple experiments in parallel at each stage to collect observations in batch. Retraining the predictive model after incorporating each batch of observations then becomes a computational bottleneck as the collected dataset at each stage quickly grows up to tens or even hundreds of thousand data points.
This thesis outlines some recent progresses that we have been able to make while working toward satisfactory answers to the above challenges, along with practical algorithms that achieve them:
 In particular, in order to put BRL into practice for more complicated and practical problems, we propose a novel framework called Interactive Bayesian Reinforcement Learning (IBRL) to integrate the general class of parametric models and model priors, thus allowing the practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the environment as often required in many realworld applications. Interestingly, we show how the nonmyopic Bayesoptimal policy can be derived analytically by solving IBRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Our empirical studies show that the proposed approach performs competitively with the existing stateoftheart algorithms.
 Then, to establish a theoretical foundation for the explorationexploitation tradeoff in singlemode active learning scenarios with resourceconstrained budgets, we present a novel ϵBayesoptimal DecisionTheoretic Active Learning (ϵBAL) framework which advocates the use of differential entropy as a performance measure and consequently, derives a learning policy that can approximate the optimal expected performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). To meet the realtime requirement in timecritical applications, we then propose an asymptotically ϵoptimal, branchandbound anytime algorithm based on ϵBAL with performance guarantees. In practice, we empirically demonstrate with both synthetic and realworld datasets that the proposed approach outperforms the stateoftheart algorithms in budgeted scenarios.
 Lastly, to facilitate the future developments of largescale, nonmyopic AL applications, we further introduce a highly scalable family of anytime predictive models for AL which provably converge toward a wellknown class of sparse Gaussian processes (SGPs). Unlike the existing predictive models of AL which cannot be updated incrementally and are only capable of processing middlesized datasets (i.e., a few thousands of data points), our proposed models can process massive datasets in an anytime fashion, thus providing a principled tradeoff between the processing time and the predictive accuracy. The efficiency of our framework is then demonstrated empirically on a variety of largescale realworld datasets which contains hundreds of thousand data points.
 GPLocalize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model.
Nuo Xu, Kian Hsiang Low, Jie Chen, Keng Kiat Lim & Etkin Baris Ozgul.
In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI14), pages 25852592, Quebec City, Canada, Jul 2731, 2014.
16.6% acceptance rate (oral presentation)
Also appeared in
RSS14 Workshop on NonParametric Learning in Robotics, Berkeley, CA, Jul 12, 2014.
Abstract. Central to robot exploration and mapping is the task of persistent localization in environmental fields characterized by spatially correlated measurements. This paper presents a Gaussian process localization (GPLocalize) algorithm that, in contrast to existing works, can exploit the spatially correlated field measurements taken during a robot's exploration (instead of relying on prior training data) for efficiently and scalably learning the GP observation model online through our proposed novel online sparse GP. As a result, GPLocalize is capable of achieving constant time and memory (i.e., independent of the size of the data) per filtering step, which demonstrates the practical feasibility of using GPs for persistent robot localization and autonomy. Empirical evaluation via simulated experiments with realworld datasets and a real robot experiment shows that GPLocalize outperforms existing GP localization algorithms.
 Generalized Online Sparse Gaussian Processes with Application to Persistent Mobile Robot Localization.
Kian Hsiang Low, Nuo Xu, Jie Chen, Keng Kiat Lim & Etkin Baris Ozgul.
In T. Calders, F. Esposito, E. Hüllermeier, R. Meo, editors, Machine Learning and Knowledge Discovery in Databases  European Conference, ECML/PKDD14 Nectar (New Scientific and Technical Advances in Research) Track, Part III, LNCS 8726, pages 499503, Springer Berlin Heidelberg, Nancy, France, Sep 1519, 2014.
Abstract. This paper presents a novel online sparse Gaussian process (GP) approximation method that is capable of achieving constant time and memory (i.e., independent of the size of the data) per time step. We theoretically guarantee its predictive performance to be equivalent to that of a sophisticated offline sparse GP approximation method. We empirically demonstrate the practical feasibility of using our online sparse GP approximation method through a realworld persistent mobile robot localization experiment.
PARALLEL AND DISTRIBUTED SPARSE GAUSSIAN PROCESSES FOR BIG DATA
PROJECT DURATION : Aug 2010  Present
PROJECT AFFILIATION

SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborator: Patrick Jaillet, MIT)

SensorEnhanced Social Media (SeSaMe) Centre (Collaborator: Mohan Kankanhalli)
PROJECT FUNDING
 MOE AcRF Tier 2 Grant : Scaling up Gaussian Process Predictive Models for Big Data, SGD $637,884, Jul 2017  Jul 2020
 SMART Subaward Agreements  FM IRG :
Spatiotemporal Modeling and Prediction of Traffic Patterns,
SGD $361,456.17, Oct 2011  Mar 2017
 Research Collaboration Agreement with Sumitomo Electric Industries, Ltd. : Estimation/Prediction Algorithm for Traffic Volume without Rich Installation of Detectors, JPY $3,000,000, Sep 2013  Nov 2014
PROBLEM MOTIVATION
Gaussian process (GP) models are a rich class of Bayesian nonparametric models that can perform probabilistic regression by providing Gaussian predictive distributions with formal measures of the predictive uncertainty.
Unfortunately, a GP model is handicapped by its poor scalability in the size of the data, hence limiting its practical use to small data. To improve its scalability, two families of sparse GP regression methods have been proposed: (a) Lowrank approximate representations
of the fullrank GP (FGP) model are wellsuited for modeling slowlyvarying functions with large correlation and can use all the data for predictions. But, they require a relatively high rank to capture smallscale features/patterns (i.e., of small correlation) with high fidelity, thus losing their computational advantage. (b) In contrast, localized regression and covariance tapering methods (e.g., local GPs and compactly supported covariance functions) are particularly useful for modeling rapidlyvarying functions with small correlation. However, they can only utilize local data for predictions, thereby performing poorly in input regions with little/no data. Furthermore, to accurately represent largescale features/patterns (i.e., of large correlation), the locality/tapering range has to be increased considerably, thus sacrificing their time efficiency.
Recent sparse GP regression methods have unified approaches from the two families described above to harness their complementary modeling and predictive capabilities (hence, eliminating their deficiencies) while retaining their computational advantages. Specifically, after approximating the FGP (in particular, its covariance matrix) with a lowrank representation based on the notion of inducing variables, a sparse covariance matrix approximation of the resulting residual process is made. However, this sparse residual covariance matrix approximation imposes a fairly strong conditional independence assumption given the inducing variables since the number of inducing variables cannot be too large to preserve time efficiency. We argue in this work that such a strong assumption is an overkill: It is in fact possible to construct a more refined, dense residual covariance matrix approximation by exploiting a Markov assumption and, perhaps surprisingly, still achieve scalability, which distinguishes our work here from existing sparse GP regression methods utilizing lowrank representations (i.e., including the unified approaches) described earlier.
As a result, our proposed residual covariance matrix approximation can significantly relax the conditional independence assumption (especially with larger data), hence potentially improving the predictive performance.
PROPOSED METHODOLOGY
This work presents a lowrankcumMarkov approximation (LMA) of the FGP model that is novel in leveraging the dual computational advantages stemming from complementing the reducedrank covariance matrix approximation based on the inducing variables with the residual covariance matrix approximation due to the Markov assumption;
the latter approximation is guaranteed to be closest in the KullbackLeibler distance criterion subject to some constraint. Consequently, our proposed LMA method can trade off between the number of inducing variables and the order of the Markov property to (a) incur lower computational cost than sparse GP regression methods utilizing lowrank representations with only the number of inducing variables or spectral points as the varying parameter while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale.
Interestingly, varying the Markov order produces a spectrum of LMAs with the partially independent conditional (PIC) approximation and FGP at the two extremes. An important advantage of LMA over most existing sparse GP regression methods is that it is amenable to parallelization on multiple machines/cores, thus gaining greater scalability for performing realtime predictions necessary in many timecritical applications and decision support systems (e.g., ocean sensing, traffic monitoring). Our parallel LMA method is implemented using the message passing interface (MPI) framework to run in clusters of up to 32 computing nodes and its predictive performance, scalability, and speedup are empirically evaluated on three realworld datasets (i.e., including a millionsized dataset).
PUBLICATIONS
 A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models.
Trong Nghia Hoang, Quang Minh Hoang & Kian Hsiang Low.
In Proceedings of the 33rd International Conference on Machine Learning (ICML16), pages 382391, New York City, NY, Jun 1924, 2016.
24.3% acceptance rate
Abstract. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a highorder Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two realworld datasets.
 Parallel Gaussian Process Regression for Big Data: LowRank Representation Meets Markov Approximation.
Kian Hsiang Low, Jiangbo Yu, Jie Chen & Patrick Jaillet.
In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI15), pages 28212827, Austin, TX, Jan 2529, 2015.
26.67% acceptance rate
Abstract. The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size.
To improve its scalability, this paper presents a lowrankcumMarkov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a lowrank approximate representation of the fullrank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the KullbackLeibler distance criterion subject to some constraint
and is considerably more refined than that of existing sparse GP models utilizing lowrank representations due to its more relaxed conditional independence assumption (especially with larger data).
As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale.
Interestingly, varying the Markov order produces a spectrum of LMAs
with PIC approximation and fullrank GP at the two extremes.
An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability.
Empirical evaluation on three realworld datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more timeefficient and scalable than stateoftheart sparse and fullrank GP regression methods
while achieving comparable predictive performances.
 Parallel Gaussian Process Regression with LowRank Covariance Matrix Approximations.
Jie Chen, Nannan Cao, Kian Hsiang Low, Ruofei Ouyang, Colin KengYan Tan & Patrick Jaillet.
In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI13), pages 152161, Bellevue, WA, Jul 1115, 2013.
31.3% acceptance rate
Abstract. Gaussian processes (GP) are Bayesian nonparametric models that are widely used for probabilistic regression. Unfortunately, it cannot scale well with large data nor perform realtime predictions due to its cubic time cost in the data size. This paper presents two parallel GP regression methods that exploit lowrank covariance matrix approximations for distributing the computational load among parallel machines to achieve time efficiency and scalability. We theoretically guarantee the predictive performances of our proposed parallel GPs to be equivalent to that of some centralized approximate GP regression methods: The computation of their centralized counterparts can be distributed among parallel machines, hence achieving greater time efficiency and scalability. We analytically compare the properties of our parallel GPs such as time, space, and communication complexity. Empirical evaluation on two realworld datasets in a cluster of 20 computing nodes shows that our parallel GPs are significantly more timeefficient and scalable than their centralized counterparts and exact/full GP while achieving predictive performances comparable to full GP.
 Gaussian ProcessBased Decentralized Data Fusion and Active Sensing Agents: Towards LargeScale Modeling and Prediction of Spatiotemporal Traffic Phenomena.
Jie Chen.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Dec 2013.
Abstract.
Knowing and understanding the environmental phenomena is important to many real world applications. This thesis is devoted to study largescale modeling and prediction of spatiotemporal environmental phenomena (i.e., urban traffic phenomena). Towards this goal, our proposed approaches rely on a class of Bayesian nonparametric models: Gaussian processes (GP).
To accurately model spatiotemporal urban traffic phenomena in real world situation, a novel relational GP taking into account both the road segment features and road network topology information is proposed to model real world traffic conditions over road network. Additionally, a GP variant called logGaussian process (lGP) is exploited to model an urban mobility demand pattern which contains skewness and extremity in demand measurements.
To achieve efficient and scalable urban traffic phenomenon prediction given a large phenomenon data, we propose three novel parallel GPs: parallel partially independent training conditional (pPITC), parallel partially independent conditional(pPIC) and parallel incomplete Cholesky factorization (pICF)based approximations of GP model, which can distribute their computational load into a cluster of parallel/multicore machines, thereby achieving time efficiency. The predictive performances of such parallel GPs are theoretically guaranteed to be equivalent to that of some centralized approaches to approximate full/exact GP regression. The proposed parallel GPs are implemented using the message passing interface (MPI) framework and tested on two large real world datasets. The theoretical and empirical results show that our parallel GPs achieve significantly better time efficiency and scalability than that of full GP, while achieving comparable accuracy. They also achieve fine speedup performance that is the ratio of time required by the parallel algorithms and their centralized counterparts.
To exploit active mobile sensors to perform decentralized perception of the spatiotemporal urban traffic phenomenon, we propose a decentralized algorithm framework: Gaussian processbased decentralized data fusion and active sensing (D2FAS) which is composed of a decentralized data fusion (DDF) component and a decentralized active sensing (DAS) component. The DDF component includes a novel Gaussian processbased decentralized data fusion (GPDDF) algorithm that can achieve remarkably efficient and scalable prediction of phenomenon and a novel Gaussian processbased decentralized data fusion with local augmentation (GPDDF+) algorithm that can achieve better predictive accuracy while preserving time efficiency of GPDDF. The predictive performances of both GPDDF and GPDDF+ are theoretically guaranteed to be equivalent to that of some sophisticated centralized sparse approximations of exact/full GP. For the DAS component, we propose a novel partially decentralized active sensing (PDAS) algorithm that exploits property in correlation structure of GPDDF to enable mobile sensors cooperatively gathering traffic phenomenon data along a nearoptimal joint walk with theoretical guarantee, and a fully decentralized active sensing (FDAS) algorithm that guides each mobile sensor gather phenomenon data along its locally optimal walk.
Lastly, to justify the practicality of the D2FAS framework, we develop and test D2FAS algorithms running with active mobile sensors on real world datasets for monitoring traffic conditions and sensing/servicing urban mobility demands. Theoretical and empirical results show that the proposed algorithms are significantly more timeefficient, more scalable in the size of data and in the number of sensors than the stateoftheart centralized approaches, while achieving comparable predictive accuracy.
PRESENTATIONS
 Gaussian ProcessBased Decentralized Data Fusion
and Active Sensing Agents:
Towards LargeScale Modeling & Prediction of Spatiotemporal Traffic Phenomena.
Kian Hsiang Low.
Invited speaker at the RSS13 Workshop on Robotic Exploration, Monitoring, and Information Collection: Nonparametric Modeling, Informationbased Control, and Planning under Uncertainty, Berlin, Germany, Jun 2728, 2013.
INTENTIONAWARE PLANNING UNDER UNCERTAINTY FOR INTERACTING OPTIMALLY WITH SELFINTERESTED AGENTS
PROJECT DURATION : May 2011  Present
PROJECT AFFILIATION :
SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborators: Emilio Frazzoli, MIT; Daniela Rus, MIT)
PROJECT FUNDING : SMART Subaward Agreements  FM IRG :
Autonomy in MobilityOnDemand Systems,
SGD $1,348,638.22, Aug 2010  Dec 2015
PROBLEM MOTIVATION
Designing and developing efficient planning algorithms for intelligent agents to interact and perform effectively among other selfinterested agents has recently emerged as a grand challenge in noncooperative multiagent systems. Such a challenge is posed by many realworld applications, which include automated electronic trading markets where software agents interact, and traffic intersections where autonomous cars have to negotiate with humandriven vehicles to cross them, among others. Modeling, predicting, and learning the other agents' intentions efficiently is therefore critical to overcoming this challenge.
In practice, it is highly nontrivial to model and predict the other agents' intentions efficiently. Existing works addressing this challenge are often undermined due to either the restrictive assumptions on the agents' behaviors or the prohibitively expensive cost of modeling and predicting their intentions:
 Gametheoretic approaches tend to assume the agents' behaviors to be perfectly rational using the wellfounded solution concepts of classical game theory such as Nash equilibrium that suffers from the following drawbacks: (a) Multiple equilibria may exist, (b) only the optimal actions corresponding to the equilibria are specified, and (c) they assume that the agents do not collaborate to beneficially deviate from the equilibrium, which is often violated by human agents.
 In contrast, decisiontheoretic approaches propose to extend singleagent sequential decision making frameworks under partial observability such as POMDP to explicitly characterize the bounded rationality of selfinterested agents. In particular, Interactive POMDP (IPOMDP) replaces a POMDP's flat belief over physical states with an interactive belief of k levels of hierarchy over both the physical state space and the other agent's beliefs, the latter of which are recursively defined as interactive beliefs of k1 levels of hierarchy.
As a consequence, the agent's "optimal" behavior computed at hierarchical level k is expected to be the best response to the other agent's expected behavior at hierarchical level k1. This surprisingly coincides with the wellfounded cognitive hierarchy model of games where k is referred to as the reasoning depth. Here, the bounded rationality of the agents is explicitly accounted for by making k finite and defining their expected behaviors at level 0 as uniformly random. Empowered by such enriched and highly expressive belief space, IPOMDP can explicitly model and predict the other agent's intention. Unfortunately, solving IPOMDP (e.g., solving for the agent's expected behavior at level k) is fraught with computational curses of dimensionality, history, and nested reasoning due to its highly sophisticated structure.
Furthermore, there is another practical concern for these decisiontheoretic approaches: They often require the behavioral model's parameters (e.g., hierarchical level k) to be completely specified by the practitioners, which can be very impractical in many situations where it is nontrivial to do so or the prior knowledge is insufficient to reliably derive these parameters. This essentially boils down to the need of learning while interacting with the other selfinterested agents and, interestingly, an exploitationexploration tradeoff to be made while doing so: Should an agent exploit the "best" action based on its (possibly misleading) knowledge to maximize the payoff or explore a "suboptimal" action to refine its knowledge?
Naively, one may attempt to solve it by directly tapping into the huge body of existing works in Bayesian Reinforcement Learning (BRL), which offers a broad range of principled treatments of this issue under singleagent contexts. However, most of these works often assume very simple and specific parameterizations of the unknown environments, thus rendering them inapplicable to the context where the other agent's behavior has a far more complicated parameterization. More importantly, the other agent's behavior often needs to be modeled differently depending on the specific context. Grounding in the context of existing BRL frameworks, either the domain expert struggles to best fit his prior knowledge to the supported set of parameterizations or the agent developer has to redesign the framework to incorporate a new modeling scheme. Arguably, there is no free lunch when it comes to modeling the agent's behavior across various applications.
The main focus of our work here is thus to investigate and address the following questions:
 How can intention prediction be efficiently exploited and made practical in planning under partial observability? In particular, how can the bounded rationality of the other agents be explicitly modeled without incurring prohibitive computational cost?
 How can existing BRL frameworks be refined to allow a domain expert to freely incorporate his choice of design in modeling the other agents' behaviors?
This question is signficant in putting theory into practice and, when answered, can potentially bridge the gap between learning in single and (selfinterested) multiagent systems.
PROPOSED METHODOLOGY
 To answer the first question, we first develop a novel intentionaware nested MDP framework for planning in fully observable multiagent environments. Inspired by the cognitive hierarchy model of games, nested MDP constitutes a recursive reasoning formalism to predict the other agent's intention and then exploit it to plan our agent's optimal interaction policy. We show that nested MDP incurs linear time in the planning horizon length and reasoning depth. Then, we propose an intentionaware IPOMDP Lite framework for planning in partially observable multiagent environments that, in particular, exploits a practical structural assumption: The intention of the other agent is driven by nested MDP, which is demonstrated theoretically to be an effective surrogate of its true intention when the agents have fine sensing and actuation capabilities. This assumption will allow the other agent's intention to be predicted efficiently and, consequently, IPOMDP Lite to be solved effectively, as demonstrated theoretically and empirically in our work.
 To tackle the second question, we present a novel generalization of BRL called Interactive BRL (IBRL) to integrate any parametric model and model prior of the other agent's behavior specified by domain experts, thus effectively allowing the other agent's sophisticated behavior to be represented in a finegrained manner based on the practitioners' prior knowledge. In particular, we show how the nonmyopic Bayesoptimal policy can be derived analytically by solving IBRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Empirically, we demonstrate IBRL's performance using an interesting traffic problem modeled after a realworld situation.
PUBLICATIONS
 New Advances on Bayesian and DecisionTheoretic Approaches for Interactive Machine Learning.
Trong Nghia Hoang.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Feb 2015.
Abstract.
The explorationexploitation tradeoff is a fundamental dilemma in many interactive learning scenarios which include both aspects of reinforcement learning (RL) and active learning (AL): An autonomous agent, situated in an unknown environment, has to actively extract knowledge from the environment by taking actions (or conducting experiments) based on its previously collected information to make accurate predictions or to optimize some utility functions. Thus, to make the most effective use of their resourceconstrained budget (e.g., processing time, experimentation cost), the agent must choose carefully between (a) exploiting options (e.g., actions, experiments) which are recommended by its current, possibly incomplete model of the environment, and (b) exploring the other ostensibly suboptimal choices to gather more information.
For example, an RL agent has to face a dilemma between (a) exploiting the mostrewarding action according to the current statistical model of the environment at the risk of running into catastrophic situations if the model is not accurate, and (b) exploring a suboptimal action to gather more information so as to improve the models accuracy at the potential price of losing the shortterm reward. Similarly, an AL algorithm/agent has to consider between (a) conducting the most informative experiments according to its current estimation of the environment models parameters (i.e., exploitation), and (b) running experiments that help improving the estimation accuracy of these parameters (i.e., exploration).
More often, learning strategies that ignore exploration will likely exhibit suboptimal performance due to their imperfect knowledge while, conversely, those that entirely focus on exploration might suffer the cost of learning without benefitting from it. Therefore, a good explorationexploitation tradeoff is critical to the success of those interactive learning agents: In order to perform well, they must strike the right balance between these two conflicting objectives. Unfortunately, while this tradeoff has been wellrecognized since the early days of RL, the studies of explorationexploitation have mostly been developed for theoretical settings in the respective field of RL and, perhaps surprisingly, glossed over in the existing AL literature. From a practical point of view, we see three limiting factors:
 Previous works addressing the explorationexploitation tradeoff in RL have largely focused on simple choices of the environment model and consequently, are not practical enough to accommodate realworld applications that have far more complicated environment structures. In fact, we find that most recent advances in Bayesian reinforcement learning (BRL) have only been able to analytically trade off between exploration and exploitation under a simple choice of models such as FlatDirichletMultinomial (FDM) whose independence and modeling assumptions do not hold for many realworld applications.
 Nearly all of the notable works in the AL literature primarily advocate the use of greedy/myopic algorithms whose rates of convergence (i.e., the number of experiments required by the learning algorithm to achieve a desired performance in the worst case) are provably minimax optimal for simple classes of learning tasks (e.g., threshold learning). While these results have greatly advanced our understanding about the limit of myopic AL in worstcase scenarios, significantly less is presently known about whether it is possible to devise nonmyopic AL strategies which optimize the explorationexploitation tradeoff to achieve the best expected performance in budgeted learning scenarios.
 The issue of scalability of the existing predictive models (e.g., Gaussian processes) used in AL has generally been underrated since the majority of literature considers smallscale environments which only consist of a few thousand candidate experiments to be selected by singlemode AL algorithms one at a time prior to retraining the model. In contrast, largescale environments usually have a massive set of million candidate experiments among which tens or hundreds of thousands should be actively selected for learning. For such dataintensive problems, it is often more costeffective to consider batchmode AL algorithms which select and conduct multiple experiments in parallel at each stage to collect observations in batch. Retraining the predictive model after incorporating each batch of observations then becomes a computational bottleneck as the collected dataset at each stage quickly grows up to tens or even hundreds of thousand data points.
This thesis outlines some recent progresses that we have been able to make while working toward satisfactory answers to the above challenges, along with practical algorithms that achieve them:
 In particular, in order to put BRL into practice for more complicated and practical problems, we propose a novel framework called Interactive Bayesian Reinforcement Learning (IBRL) to integrate the general class of parametric models and model priors, thus allowing the practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the environment as often required in many realworld applications. Interestingly, we show how the nonmyopic Bayesoptimal policy can be derived analytically by solving IBRL exactly and propose an approximation algorithm to compute it efficiently in polynomial time. Our empirical studies show that the proposed approach performs competitively with the existing stateoftheart algorithms.
 Then, to establish a theoretical foundation for the explorationexploitation tradeoff in singlemode active learning scenarios with resourceconstrained budgets, we present a novel ϵBayesoptimal DecisionTheoretic Active Learning (ϵBAL) framework which advocates the use of differential entropy as a performance measure and consequently, derives a learning policy that can approximate the optimal expected performance arbitrarily closely (i.e., within an arbitrary loss bound ϵ). To meet the realtime requirement in timecritical applications, we then propose an asymptotically ϵoptimal, branchandbound anytime algorithm based on ϵBAL with performance guarantees. In practice, we empirically demonstrate with both synthetic and realworld datasets that the proposed approach outperforms the stateoftheart algorithms in budgeted scenarios.
 Lastly, to facilitate the future developments of largescale, nonmyopic AL applications, we further introduce a highly scalable family of anytime predictive models for AL which provably converge toward a wellknown class of sparse Gaussian processes (SGPs). Unlike the existing predictive models of AL which cannot be updated incrementally and are only capable of processing middlesized datasets (i.e., a few thousands of data points), our proposed models can process massive datasets in an anytime fashion, thus providing a principled tradeoff between the processing time and the predictive accuracy. The efficiency of our framework is then demonstrated empirically on a variety of largescale realworld datasets which contains hundreds of thousand data points.
 Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with SelfInterested Agents.
Trong Nghia Hoang & Kian Hsiang Low.
In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI13), pages 22982305, Beijing, China, Aug 39, 2013.
13.2% acceptance rate (oral presentation)
Abstract. A key challenge in noncooperative multiagent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, selfinterested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intentionaware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the stateoftheart algorithms.
 A General Framework for Interacting BayesOptimally with SelfInterested Agents using Arbitrary Parametric Model and Model Prior.
Trong Nghia Hoang & Kian Hsiang Low.
In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI13), pages 13941400, Beijing, China, Aug 39, 2013.
28.0% acceptance rate
Abstract. Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayesoptimality is theoretically achievable by modeling the environment's latent dynamics using FlatDirichletMultinomial (FDM) prior. In selfinterested multiagent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a finegrained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multiagent reinforcement learning algorithms.
 IntentionAware Planning under Uncertainty for Interacting with SelfInterested, Boundedly Rational Agents.
Trong Nghia Hoang & Kian Hsiang Low.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 12331234, Valencia, Spain, June 48, 2012.
Abstract. A key challenge in noncooperative multiagent systems is that of developing efficient planning algorithms for intelligent agents to perform effectively among boundedly rational, selfinterested (i.e., noncooperative) agents (e.g., humans). To address this challenge, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intentionaware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions.
GAUSSIAN PROCESS DECENTRALIZED DATA FUSION & ACTIVE SENSING AGENTS FOR MOBILITYONDEMAND SYSTEMSTowards LargeScale Spatiotemporal Traffic Modeling and Prediction
PROJECT DURATION : Aug 2010  Present
PROJECT AFFILIATION :
SingaporeMIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG (Collaborator: Patrick Jaillet, MIT)
PROJECT FUNDING
 SMART Subaward Agreements  FM IRG :
Spatiotemporal Modeling and Prediction of Traffic Patterns,
SGD $361,456.17, Oct 2011  Mar 2017
 Research Collaboration Agreement with Sumitomo Electric Industries, Ltd. : Estimation/Prediction Algorithm for Traffic Volume without Rich Installation of Detectors, JPY $3,000,000, Sep 2013  Nov 2014
PROBLEM MOTIVATION
PRIVATE automobiles are becoming unsustainable personal mobility solutions in densely populated urban cities because the addition of parking and road spaces cannot keep pace with their escalating numbers due to limited urban land. For example, Hong Kong and Singapore have, respectively, experienced 27.6% and 37% increase in private vehicles from 2003 to 2011. However, their road networks have only expanded less than 10% in size. Without implementing sustainable measures, traffic congestions and delays will grow more severe and frequent, especially during peak hours.
Mobilityondemand (MoD) systems (e.g., Velib system of over 20000 shared bicycles in Paris, experimental carsharing systems) have recently emerged as a promising paradigm of oneway vehicle sharing for sustainable personal urban mobility, specifically, to tackle the problems of low vehicle utilization rate and parking space caused by private automobiles. Conventionally, a MoD system provides stacks and racks of light electric vehicles distributed throughout a city: When a user wants to go somewhere, he simply walks to the nearest rack, swipes a card to pick up a vehicle, drives it to the rack nearest to his destination, and drops it off. In this work, we assume the capability of a MoD system to be enhanced by deploying robotic shared vehicles (e.g., General Motors Chevrolet ENV 2.0 prototype) that can autonomously drive and cruise the streets of a densely populated urban city to be hailed by users (like taxis) instead of just waiting at the racks to be picked up. Compared to the conventional MoD system, the fleet of autonomous robotic vehicles provides greater accessibility to users who can be picked up and dropped off at any location in the road network. As a result, it can service regions of high mobility demand but with poor coverage of stacks and racks due to limited space for their installation.
The key factors in the success of a MoD system are the costs to the users and system latencies, which can be minimized by managing the MoD system effectively. To achieve this, two main technical challenges need to be addressed: (a) Realtime, finegrained mobility demand sensing and prediction, and (b) realtime active fleet management to balance vehicle supply and demand and satisfy latency requirements at sustainable operating costs. Existing works on load balancing in MoD systems, dynamic traffic assignment problems, dynamic onetoone pickup and delivery problems, and location recommendation and dispatch for cruising taxis have tackled variants of the second challenge by assuming the necessary inputs of mobility demand and traffic flow information to be perfectly or accurately known using prior knowledge or offline processing of historic data. Such an assumption does not hold for densely populated urban cities because their mobility demand patterns and traffic flow are often subject to shortterm random fluctuations and perturbations due to frequent special events (e.g., storewide sales, exhibitions), unpredictable weather conditions, unforeseen emergencies (e.g., breakdowns in public transport services), or traffic incidents (e.g., accidents, vehicle breakdowns, roadworks). So, in order for the active fleet management strategies to perform well in fleet rebalancing and route planning to service the mobility demands, they require accurate, finegrained predictive information of the spatiotemporally varying mobility demand patterns and traffic flow in real time, the former of which is the desired outcome of addressing the first challenge. To the best of our knowledge, there is little progress in the algorithm design and development to take on the first challenge, which will be a focus of our work here.
In practice, it is nontrivial to achieve realtime, accurate prediction of spatiotemporally varying traffic phenomena such as mobility demand patterns and traffic flow because the quantity of sensors that can be deployed to observe an entire road network is costconstrained. For example, static sensors such as loop detectors are traditionally placed at designated locations in a road network to collect data for predicting the traffic flow. However, they provide sparse coverage (i.e., many road segments are not observed, thus leading to data sparsity), incur high installation and maintenance costs, and cannot reposition by themselves in response to changes in the traffic flow. Lowcost GPS technology allows the collection of traffic flow data using passive mobile probes (e.g., taxis/cabs). Unlike static sensors, they can directly measure the travel times along road segments. But, they provide fairly sparse coverage due to low GPS sampling frequency (i.e., often imposed by taxi/cab companies) and no control over their routes, incur high initial implementation cost, pose privacy issues, and produce highlyvarying speeds and travel times while traversing the same road segment due to inconsistent driving behaviors. A critical mass of probes is needed on each road segment to ease the severity of the last drawback but is often hard to achieve on nonhighway segments due to sparse coverage. In contrast, we propose using the autonomous robotic vehicles as active mobile probes to overcome the limitations of static and passive mobile probes. In particular, they can be directed to explore any segments of a road network to gather realtime mobility demand data (e.g., pickup counts of different regions) and traffic flow data (e.g., speeds and travel times along road segments) at a desired GPS sampling rate while enforcing consistent driving behavior.
How then can the vacant autonomous robotic vehicles in a MoD system actively cruise a road network to gather and assimilate the most informative data for predicting a spatiotemporally varying traffic phenomenon like a mobility demand pattern or traffic flow? To solve this problem, a centralized approach to data fusion and active sensing is poorly suited because it suffers from a single point of failure and incurs huge communication, space, and time overheads with large data and fleet.
PROPOSED METHODOLOGY
Our work proposes novel decentralized data fusion and active sensing algorithms for realtime, finegrained traffic sensing, modeling, and prediction with a fleet of autonomous robotic vehicles in a MoD system. Note that the decentralized data fusion component of our proposed algorithms can also be used for static sensors and passive mobile probes.
The specific contributions of our work here include:
 Modeling and predicting a mobility demand pattern and traffic flow using, respectively, rich classes of Bayesian nonparametric models called a logGaussian process (lGP) model and a relational Gaussian process model, the latter of whose spatiotemporal correlation structure can exploit both the road segment features and road network topology information;
 Developing novel Gaussian process decentralized data fusion algorithms for cooperative perception of traffic phenomena called GPDDF and GPDDF+ whose predictive performance are theoretically guaranteed to be equivalent to that of sophisticated centralized sparse approximations of the fullrank Gaussian process (full GP) model: The computation of such sparse approximate GP models can thus be distributed among the MoD vehicles, thereby achieving efficient and scalable probabilistic prediction;
 Deriving consensus filtering variants of GPDDF and GPDDF+ that require only local communication between neighboring MoD vehicles instead of assuming alltoall communication between MoD vehicles;
 Devising decentralized active sensing algorithms (a) whose performance, when coupled with GPDDF, can be theoretically guaranteed to realize the effect of the spatiotemporal correlation structure of the traffic phenomenon and various parameter settings of the MoD system, and (b) that, when used for sampling a mobility demand pattern, can be analytically shown to exhibit, interestingly, a cruising behavior of simultaneously exploring demand hotspots and sparsely sampled regions that have higher likelihood of picking up users, hence achieving a dual effect of fleet rebalancing to service the mobility demands;
 Analyzing the time and communication overheads of our proposed algorithms: We prove that our algorithms can scale better than existing stateoftheart centralized algorithms in the size of the data and fleet;
 Empirically evaluating the predictive accuracy, time efficiency, scalability, and performance of servicing mobility demands (i.e., average cruising length of vehicles, average waiting time of users, total number of pickups) of our proposed algorithms on two datasets featuring realworld traffic phenomena such as a mobility demand pattern over the central business district of Singapore and speeds of road segments over an urban road network in Singapore.
PUBLICATIONS
 Gaussian Process Decentralized Data Fusion and Active Sensing for Spatiotemporal Traffic Modeling and Prediction in MobilityonDemand Systems.
Jie Chen, Kian Hsiang Low, Patrick Jaillet & Yujian Yao.
In IEEE Transactions on Automation Science and Engineering
(Special Issue on Networked Cooperative Autonomous Systems), volume 12, issue 3, pages 901921, Jul 2015.
Abstract. Mobilityondemand (MoD) systems have recently emerged as a
promising paradigm of oneway vehicle sharing for sustainable personal
urban mobility in densely populated cities. We assume the capability of
a MoD system to be enhanced by deploying robotic shared vehicles that
can autonomously cruise the streets to be hailed by users. A key
challenge of the MoD system is that of realtime, finegrained mobility
demand and traffic flow sensing and prediction. This paper presents
novel Gaussian process (GP) decentralized data fusion and active
sensing algorithms for realtime, finegrained traffic modeling and
prediction with a fleet of MoD vehicles. The predictive performance of
our decentralized data fusion algorithms are theoretically guaranteed to
be equivalent to that of sophisticated centralized sparse GP
approximations. We derive consensus filtering variants requiring only
local communication between neighboring vehicles. We theoretically
guarantee the performance of our decentralized active sensing
algorithms. When they are used to gather informative data for mobility
demand prediction, they can achieve a dual effect of fleet rebalancing
to service mobility demands. Empirical evaluation on realworld datasets
shows that our algorithms are significantly more timeefficient and
scalable in the size of data and fleet while achieving predictive
performance comparable to that of stateoftheart algorithms.
 Gaussian ProcessBased Decentralized Data Fusion and Active Sensing Agents: Towards LargeScale Modeling and Prediction of Spatiotemporal Traffic Phenomena.
Jie Chen.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Dec 2013.
Abstract.
Knowing and understanding the environmental phenomena is important to many real world applications. This thesis is devoted to study largescale modeling and prediction of spatiotemporal environmental phenomena (i.e., urban traffic phenomena). Towards this goal, our proposed approaches rely on a class of Bayesian nonparametric models: Gaussian processes (GP).
To accurately model spatiotemporal urban traffic phenomena in real world situation, a novel relational GP taking into account both the road segment features and road network topology information is proposed to model real world traffic conditions over road network. Additionally, a GP variant called logGaussian process (lGP) is exploited to model an urban mobility demand pattern which contains skewness and extremity in demand measurements.
To achieve efficient and scalable urban traffic phenomenon prediction given a large phenomenon data, we propose three novel parallel GPs: parallel partially independent training conditional (pPITC), parallel partially independent conditional(pPIC) and parallel incomplete Cholesky factorization (pICF)based approximations of GP model, which can distribute their computational load into a cluster of parallel/multicore machines, thereby achieving time efficiency. The predictive performances of such parallel GPs are theoretically guaranteed to be equivalent to that of some centralized approaches to approximate full/exact GP regression. The proposed parallel GPs are implemented using the message passing interface (MPI) framework and tested on two large real world datasets. The theoretical and empirical results show that our parallel GPs achieve significantly better time efficiency and scalability than that of full GP, while achieving comparable accuracy. They also achieve fine speedup performance that is the ratio of time required by the parallel algorithms and their centralized counterparts.
To exploit active mobile sensors to perform decentralized perception of the spatiotemporal urban traffic phenomenon, we propose a decentralized algorithm framework: Gaussian processbased decentralized data fusion and active sensing (D2FAS) which is composed of a decentralized data fusion (DDF) component and a decentralized active sensing (DAS) component. The DDF component includes a novel Gaussian processbased decentralized data fusion (GPDDF) algorithm that can achieve remarkably efficient and scalable prediction of phenomenon and a novel Gaussian processbased decentralized data fusion with local augmentation (GPDDF+) algorithm that can achieve better predictive accuracy while preserving time efficiency of GPDDF. The predictive performances of both GPDDF and GPDDF+ are theoretically guaranteed to be equivalent to that of some sophisticated centralized sparse approximations of exact/full GP. For the DAS component, we propose a novel partially decentralized active sensing (PDAS) algorithm that exploits property in correlation structure of GPDDF to enable mobile sensors cooperatively gathering traffic phenomenon data along a nearoptimal joint walk with theoretical guarantee, and a fully decentralized active sensing (FDAS) algorithm that guides each mobile sensor gather phenomenon data along its locally optimal walk.
Lastly, to justify the practicality of the D2FAS framework, we develop and test D2FAS algorithms running with active mobile sensors on real world datasets for monitoring traffic conditions and sensing/servicing urban mobility demands. Theoretical and empirical results show that the proposed algorithms are significantly more timeefficient, more scalable in the size of data and in the number of sensors than the stateoftheart centralized approaches, while achieving comparable predictive accuracy.
 Gaussian ProcessBased Decentralized Data Fusion and Active Sensing for MobilityonDemand System.
Jie Chen, Kian Hsiang Low & Colin KengYan Tan.
In Proceedings of the
Robotics: Science and Systems Conference (RSS13), Berlin, Germany, Jun 2428, 2013.
30.1% acceptance rate
 Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena.
Jie Chen, Kian Hsiang Low, Colin KengYan Tan, Ali Oran, Patrick Jaillet, John M. Dolan & Gaurav S. Sukhatme.
In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI12), pages 163173, Catalina Island, CA, Aug 1517, 2012.
31.6% acceptance rate
Also appeared in AAMAS12 Workshop on Agents in Traffic and Transportation (ATT12), Valencia, Spain, June 48, 2012.
Abstract. The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Googlelike MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on realworld urban road network data shows that our D2FAS algorithm is significantly more timeefficient and scalable than stateoftheart centralized algorithms while achieving comparable predictive performance.
PRESENTATIONS
 Gaussian ProcessBased Decentralized Data Fusion
and Active Sensing Agents:
Towards LargeScale Modeling & Prediction of Spatiotemporal Traffic Phenomena.
Kian Hsiang Low.
Invited speaker at the RSS13 Workshop on Robotic Exploration, Monitoring, and Information Collection: Nonparametric Modeling, Informationbased Control, and Planning under Uncertainty, Berlin, Germany, Jun 2728, 2013.
PLANNING UNDER UNCERTAINTY FOR LARGESCALE ACTIVE MULTICAMERA SURVEILLANCE
PROJECT DURATION : Mar 2010  Nov 2014
PROJECT AFFILIATION :
SensorEnhanced Social Media (SeSaMe) Centre (Collaborator: Mohan Kankanhalli)
MEDIA NEWS :
TODAY's Science section (15 May 2015)  'When CCTV cameras work together as one'
PROBLEM MOTIVATION
The problem of surveillance has grown to be a critical concern in many urban cities worldwide following a recent series of security threats like Mumbai terrorist attacks and London bomb blasts. Central to the problem of surveillance is that of monitoring, tracking, and observing multiple mobile targets of interest distributed over a largescale obstacleridden environment (e.g., airport terminals, railway and subway stations, bus depots, shopping malls, school campuses, military bases). It is often necessary to acquire highresolution videos/images of these targets for supporting realworld surveillance applications like activity/intention tracking and recognition, biometric analysis like target identification and face recognition, surveillance video mining, forensic video analysis/retrieval, among others.
Traditional surveillance systems consist of a large network of fixed/static CCTV (Closed Circuit Television) cameras that are placed to constantly focus at selected important locations in the buildings like entrance/exit, lobby, etc. Unfortunately, the maximum resolution of these cameras is limited to 720 x 480 pixels. So, they cannot capture highresolution images/videos of the targets, especially when the targets are far away from the cameras. As a result, they perform poorly in acquiring the closeup views of the targets and their activities. HDTV/Megapixel cameras have recently been introduced to overcome this resolution issue. Similar to CCTV cameras, these fixed/static HDTV/megapixel cameras are placed to constantly focus at specific locations in the environment. A relatively large network of such cameras has to be installed in order to observe the targets in any region of the environment at high resolution, which is impractical in terms of equipment, installation, and maintenance costs.
The use of active PTZ (Pan/Tilt/Zoom) cameras is becoming an increasingly popular alternative to that of fixed/static cameras for surveillance because the active cameras are endowed with pantiltzoom capabilities that can be exploited to focus on and observe the targets at high image/video resolution. Hence, fewer active cameras need to be deployed to be able to capture highresolution images/videos of the targets in any region of the environment. In order to achieve effective realtime surveillance, an efficient automated mechanism is required to autonomously coordinate and control these cameras' actions.
The objective of this work is thus to address the following central surveillance problem: "How can a network of active cameras be coordinated and controlled to maximize the number of targets observed with a guaranteed image resolution?"
PROPOSED METHODOLOGY
This work presents a novel principled decisiontheoretic planning under uncertainty approach to coordinating and controlling a largescale network of active cameras for performing highquality surveillance of large crowds of moving targets. In particular, our approach addresses the following practical issues affecting the surveillance problem:
(a) Multiple sources of uncertainty. A typical surveillance environment is fraught with multiple sources of uncertainty such as noisy cameras' observations, stochastic targets' motion, and unknown targets' locations, etc. These uncertainties make it difficult for the active cameras to know where to observe in order to keep the targets within their fields of view (fov). Consequently, they may lose track of the observed targets. To resolve this, our approach models a belief over the targets' states (i.e., locations, directions, and velocities) and updates the belief in a Bayesian paradigm based on probabilistic models of the targets' motion and the active cameras' observations;
(b) Cameratarget ratio. In crowded environments, the number of targets to be observed is usually much greater than the number of available cameras. When the number of targets increases, a surveillance system, if poorly designed, tends to incur exponentially increasing computation time, which degrades the realtime performance of the entire surveillance system;
(c) Tradeoff between maximizing the expected number of observed targets and the image resolution of observing them. Increasing the resolution of observing some targets through panning, tilting, or zooming may result in the loss of other targets being tracked. To address this tradeoff, the cameras' actions are coordinated to simultaneously improve the belief over the targets' states and maximize the expected number of targets observed with a guaranteed predefined resolution;
(d) Scalability. By exploiting the inherent structure of the surveillance problem, our approach can scale linearly in the number of targets to be observed;
(e) Realtime requirement. The cameras' actions are computed in real time;
(f) Occlusions. Many realworld surveillance environments contain obstacles that occlude the fov of some or perhaps even all of the cameras, thus preventing the cameras from persistently tracking their observed targets. The regions where the targets cannot be observed by any of the cameras due to obstacles are said to be occluded. When the targets reside in these occluded regions or are not within the fov of any camera, the surveillance system loses track of them, thus degrading the surveillance performance. Such environments are called partially observable in the sense that the exact locations of the targets may not be observed directly by the cameras at all times.
As demonstrated empirically through simulations, our approach can achieve highquality surveillance of a large number of targets in real time and its surveillance performance degrades gracefully with an increasing number of targets. The realworld experiments show the practicality of our decisiontheoretic approach to coordinate and control cameras in surveillance systems.
PUBLICATIONS
 Scalable DecisionTheoretic Coordination and Control for Realtime Active MultiCamera Surveillance.
Prabhu Natarajan, Trong Nghia Hoang, Yongkang Wong, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the
8th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC'14) (Invited Paper to Special Session on Smart Cameras for Smart Environments), pages 115120, Venezia, Italy, Nov 47, 2014.
Abstract. This paper presents an overview of our novel decisiontheoretic multiagent approach for controlling and coordinating multiple active cameras in surveillance. In this approach, a surveillance task is modeled as a stochastic optimization problem, where the active cameras are controlled and coordinated to achieve the desired surveillance goal in presence of uncertainties. We enumerate the practical issues in active camera surveillance and discuss how these issues are addressed in our decisiontheoretic approach. We focus on two novel surveillance tasks: maximize the number of targets observed in active cameras with guaranteed image resolution and to improve the fairness in observation of multiple targets. We discuss the overview of our novel decisiontheoretic frameworks: Markov decision process and partially observable Markov decision process frameworks for coordinating active cameras in uncertain and partially occluded environments.
 No One is Left "Unwatched": Fairness in Observation of Crowds of Mobile Targets in Active Camera Surveillance.
Prabhu Natarajan, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the
21st European Conference on Artificial Intelligence (ECAI14), including Prestigious Applications of Intelligent Systems (PAIS14), pages 11551160, Prague, Czech Republic, Aug 1822, 2014.
Abstract. Central to the problem of active multicamera surveillance is the fundamental issue of fairness in the observation of crowds of targets such that no target is "starved" of observation by the cameras for a long time. This paper presents a principled decisiontheoretic multicamera coordination and control (MC^{2}) algorithm called fairMC^{2} that can coordinate and control the active cameras to achieve maxmin fairness in the observation of crowds of targets moving stochastically. Our fairMC^{2} algorithm is novel in demonstrating how (a) the uncertainty in the locations, directions, speeds, and observation times of the targets arising from the stochasticity of their motion can be modeled probabilistically, (b) the notion of fairness in observing targets can be formally realized in the domain of multicamera surveillance for the first time by exploiting the maxmin fairness metric to formalize our surveillance objective, that is, to maximize the expected minimum observation time over all targets while guaranteeing a predefined image resolution of observing them, and (c) a structural assumption in the state transition dynamics of a surveillance environment can be exploited to improve its scalability to linear time in the number of targets to be observed during surveillance. Empirical evaluation through extensive simulations in realistic surveillance environments shows that fairMC^{2} outperforms the stateoftheart and baseline MC^{2} algorithms. We have also demonstrated the feasibility of deploying our fairMC^{2} algorithm on real AXIS 214 PTZ cameras.
 DecisionTheoretic Approach to Maximizing Fairness in MultiTarget Observation in MultiCamera Surveillance.
Prabhu Natarajan, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the
13th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS14), pages 15211522, Paris, France, May 59, 2014.
Abstract. Central to the problem of active multicamera surveillance is the fundamental issue of fairness in the observation of multiple targets such that no target is left unobserved by the cameras for a long time. To address this important issue, we propose a novel principled decisiontheoretic approach to control and coordinate multiple active cameras to achieve fairness in the observation of multiple moving targets.
 A DecisionTheoretic Approach for Controlling and Coordinating Multiple Active Cameras in Surveillance.
Prabhu Natarajan.
Ph.D. Thesis, Department of Computer Science, National University of Singapore, Dec 2013.
Abstract.
The use of active cameras in surveillance is becoming increasingly popular as they try to meet the demands of capturing highresolution images/videos of targets in surveillance for face recognition, target identification, forensic video analysis, etc. These active cameras are endowed with pan, tilt, and zoom capabilities, which can be exploited to provide highquality surveillance. In order to achieve effective, realtime surveillance, an efficient collaborative mechanism is needed to control and coordinate these cameras' actions, which is the focus of this thesis. The central problem in surveillance is to monitor a set of targets with guaranteed image resolution. Controlling and coordinating multiple active cameras to achieve this surveillance task is nontrivial and challenging because: (a) presence of inherent uncertainties in the surveillance environment (targets motion, location, and noisy camera observation); (b) there exists a nontrivial tradeoff between number of targets and the resolution of observing these targets; and (c) more importantly, the coordination framework should be scalable with increasing number of targets and cameras.
In this thesis, we formulate a novel decisiontheoretic multiagent planning approach for controlling and coordinating multiple active cameras in surveillance. Our decisiontheoretic approach offers advantages of (a) accounting the uncertainties using probabilistic models; (b) the nontrivial tradeoff is addressed by coordinating the active cameras' actions to maximize the number of targets with guaranteed resolution; and (c) the scalability in number of targets and cameras is achieved by exploiting the structures and properties that are present in our surveillance problem. We focus on two novel problems in active camera surveillance: (a) maximizing observations of multiple targets (MOMT), i.e., maximizing the number of targets observed in active cameras with guaranteed image resolution; and (b) improving fairness in observation of multiple targets (FOMT), i.e., no target is "starved" of observation by active cameras for long duration of time.
We propose two formal decisiontheoretic frameworks (a) Markov Decision Process (MDP) and (b) Partially Observable Markov Decision Process (POMDP) frameworks for coordinating active cameras in surveillance. MDP framework controls active cameras in fully observable surveillance environments where the active cameras are supported by one or more wideview static/fixed cameras to observe the entire surveillance environment at lowresolution. POMDP framework controls active cameras in partially observable surveillance environments where it is impractical to observe the entire surveillance environment using static/fixed cameras due to occlusions caused by physical infrastructures. Hence the POMDP framework do not have a complete view of the surveillance environment.
Specifically, we propose (a) MDP frameworks to solve MOMT problem and FOMT problem in fully observable surveillance environment; and (b) POMDP framework to solve MOMT problem in partially observable surveillance environment. As proven analytically, our MDP and POMDP frameworks incurs time that is linear in number of targets to be observed during surveillance. We have used maxplus algorithm with our MDP framework to improve its scalability in number of cameras for MOMT problem. Empirical evaluation through simulations in realistic surveillance environment reveals that our proposed approach can achieve highquality surveillance in real time. We also demonstrate our pro posed approach with real Axis 214 PTZ cameras to show the practicality of our approach in real world surveillance. Both the simulations and real camera experiments show that our decisiontheoretic approach can control and coordinate active cameras efficiently and hence contributes significantly towards improving the active camera surveillance research.
 DecisionTheoretic Coordination and Control for Active MultiCamera Surveillance in Uncertain, Partially Observable Environments.
Prabhu Natarajan, Trong Nghia Hoang, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the
6th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC'12), pages 16, Hong Kong, Oct 30  Nov 2, 2012.
Abstract. A central problem of surveillance is to monitor multiple targets moving in a largescale, obstacleridden environment with occlusions. This paper presents a novel principled Partially Observable Markov Decision Processbased approach to coordinating and controlling a network of active cameras for tracking and observing multiple mobile targets at high resolution in such surveillance environments. Our proposed approach is capable of (a) maintaining a belief over the targets' states (i.e., locations, directions, and velocities) to track them, even when they may not be observed directly by the cameras at all times, (b) coordinating the cameras' actions to simultaneously improve the belief over the targets' states and maximize the expected number of targets observed with a guaranteed resolution, and (c) exploiting the inherent structure of our surveillance problem to improve its scalability (i.e., linear time) in the number of targets to be observed. Quantitative comparisons with stateoftheart multicamera coordination and control techniques show that our approach can achieve higher surveillance quality in real time. The practical feasibility of our approach is also demonstrated using real AXIS 214 PTZ cameras.
 PhD Forum: DecisionTheoretic Coordination and Control for Active MultiCamera Surveillance.
Prabhu Natarajan.
In Proceedings of the
6th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC'12), pages 12, Hong Kong, Oct 30  Nov 2, 2012.
Best PhD Forum Paper Award
Abstract. In this thesis, we present novel decisiontheoretic
multiagent approaches for controlling and coordinating multiple
active cameras in surveillance. Decisiontheoretic approaches
models the interaction between active camera network and the
uncertain surveillance environment effectively. The goal of the
surveillance is to maximize the number of targets observed in
active cameras with guaranteed image resolution. We enumerate
the practical issues in active camera surveillance and discuss how
these issues are addressed in our decisiontheoretic approaches.
The existing camera control approaches have serious limitations
in terms of scalability in number of targets. Where as in
our approaches, the scalability in number of targets has been
improved by exploiting the structure and properties that are
present in our surveillance problem. We proposed two novel
decisiontheoretic frameworks: Markov Decision Process (MDP)
and Partially Observable Markov Decision Process (POMDP)
frameworks for coordinating active cameras in fully observable
and partially observable surveillance settings.
 DecisionTheoretic Approach to Maximizing Observation of Multiple Targets in MultiCamera Surveillance.
Prabhu Natarajan, Trong Nghia Hoang, Kian Hsiang Low & Mohan Kankanhalli.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 155162, Valencia, Spain, June 48, 2012.
20.4% acceptance rate
Abstract. This paper presents a novel decisiontheoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the tradeoff between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multicamera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decisiontheoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve highquality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decisiontheoretic approach in controlling and coordinating the active cameras in real surveillance system.
 DecisionTheoretic Approach for Controlling and
Coordinating Multiple Active Cameras in Surveillance.
Prabhu Natarajan.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), Valencia, Spain, June 48, 2012.
Doctoral consortium abstract
Abstract. This paper presents a novel decisiontheoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the tradeoff between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multicamera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decisiontheoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve highquality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decisiontheoretic approach in controlling and coordinating the active cameras in real surveillance system.
MULTIROBOT INFORMATIVE PATH PLANNING FOR ACTIVE SENSING OF SPATIOTEMPORAL ENVIRONMENTAL PHENOMENA
PROJECT DURATION : Jan 2010  May 2013
PROJECT AFFILIATION :
Collaborative MultiRobot Exploration of the Coastal Ocean (Collaborators: John M. Dolan, CMU; Gaurav S. Sukhatme, USC; Kanna Rajan, MBARI)
PROJECT FUNDING : MOE AcRF Tier 1 Grant :
Active Robotic Exploration and Mapping for Environmental Sensing Applications,
SGD $165,377, Apr 2010  Mar 2013
PROBLEM MOTIVATION
Research in environmental sensing and monitoring has recently gained significant attention and practical interest, especially in supporting environmental sustainability efforts worldwide. A key direction of this research aims at sensing, modeling, and predicting the various types of environmental phenomena spatially distributed over our natural and builtup habitats so as to improve our knowledge and understanding of their economic, environmental, and health impacts and implications. This is nontrivial to achieve due to a tradeoff between the quantity of sensing resources (e.g., number of deployed sensors, energy consumption, mission time) and the uncertainty in predictive modeling. In the case of deploying a limited number of mobile robotic sensing assets, such a tradeoff motivates the need to plan the most informative resourceconstrained observation paths to minimize the uncertainty in modeling and predicting a spatially varying environmental phenomenon, which constitutes the active sensing problem to be addressed in this work.
A wide multitude of natural and urban environmental phenomena is characterized by spatially correlated field measurements, which raises the following fundamental issue faced by the active sensing problem:
How can the spatial correlation structure of an environmental phenomenon be exploited to improve the active sensing performance and computational efficiency of robotic path planning?
In this work, we will investigate the above issue for an important broad class of environmental phenomena called anisotropic fields that exhibit a (often much) higher spatial correlation along one direction than along its per pendicular direction. Such fields occur widely in natural and builtup environments and some of them include (a) ocean and freshwater phenomena like plankton density, fish abundance, temperature and salinity; (b) soil and atmospheric phenomena like peat thickness, surface soil moisture, rainfall; (c) mineral deposits like radioactive ore; (d) pollutant and contaminant concentration like air, heavy metals; and (e) ecological abundance like vegetation density.
PROPOSED METHODOLOGY
This work presents two principled approaches to efficient informationtheoretic path planning based on entropy and mutual information criteria for in situ active sensing of environmental phenomena. In contrast to the existing methods described above, our proposed path planning algorithms are novel in addressing a tradeoff between active sensing performance and computational efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of anisotropic fields to improve time efficiency while preserving nearoptimal active sensing performance.
PUBLICATIONS
 MultiRobot Informative Path Planning for Active Sensing of Environmental Phenomena: A Tale of Two Algorithms.
Nannan Cao, Kian Hsiang Low & John M. Dolan.
In Proceedings of the
12th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS13), pages 714, Saint Paul, MN, May 610, 2013.
22.9% acceptance rate
Abstract. A key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient informationtheoretic path planning based on entropy and mutual information criteria for in situ active sensing of an important broad class of widelyoccurring environmental phenomena called anisotropic fields. Our proposed algorithms are novel in addressing a tradeoff between active sensing performance and time efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of Gaussian processbased anisotropic fields to improve time efficiency while preserving nearoptimal active sensing performance. We analyze the time complexity of our algorithms and prove analytically that they scale better than stateoftheart algorithms with increasing planning horizon length. We provide theoretical guarantees on the active sensing performance of our algorithms for a class of exploration tasks called transect sampling, which, in particular, can be improved with longer planning time and/or lower spatial correlation along the transect. Empirical evaluation on realworld anisotropic field data shows that our algorithms can perform better or at least as well as the stateoftheart algorithms while often incurring a few orders of magnitude less computational time, even when the field conditions are less favorable.
 InformationTheoretic MultiRobot Path Planning.
Nannan Cao.
M.Sc. Thesis, Department of Computer Science, National University of Singapore, Sep 2012.
Abstract.
Research in environmental sensing and monitoring is especially important in supporting environmental sustainability efforts worldwide, and has recently attracted significant attention and interest. A key direction of this research lies in modeling and predicting the spatiotemporally varying environmental phenomena. One approach is to use a team of robots to sample the area and model the measurement values at unobserved points. For smoothly varying and hotspot fields, there is some work which has been done to model the fields well. However, there is still a class of common environmental fields called anisotropic fields in which the spatial phenomena are highly correlated along one direction and less correlated along the perpendicular direction. We exploit the environmental structure to improve the sampling performance and time efficiency of planning for anisotropic fields.
In this thesis, we cast the planning problem into a stagewise decisiontheoretic problem. we adopt Gaussian Process to model spatial phenomena. Maximum entropy criterion and maximum mutual information criterion are used to measure the informativeness of the observation paths. It is found that for many GPs, correlation of two points exponentially decreases with the distance between the two points. With this property, for maximum entropy criterion, we propose a polynomialtime approximation algorithm, MEPP, to find the maximum entropy paths. We also provide a theoretical performance guarantee for this algorithm. For maximum mutual information criterion, we propose another polynomialtime approximation algorithm, M2IPP. Similar to the MEPP, a performance guarantee is also provided for this algorithm. We demonstrate the performance advantages of our algorithms on two real data sets. To get lower prediction error, three priciples have also been proposed to select the criterion for different environmental fields.
 Active Markov InformationTheoretic Path Planning for Robotic Environmental Sensing.
Kian Hsiang Low, John M. Dolan & Pradeep Khosla.
In Proceedings of the
10th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS11), pages 753760, Taipei, Taiwan, May 26, 2011.
22.1% acceptance rate
Abstract. Recent research in multirobot exploration and mapping has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing informationtheoretic exploration strategies for learning GPbased environmental field maps adopt the nonMarkovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, realtime active sampling. To ease this computational burden, this paper presents a Markovbased approach to efficient informationtheoretic path planning for active sampling of GPbased fields. We analyze the time complexity of solving the Markovbased path planning problem, and demonstrate analytically that it scales better than that of deriving the nonMarkovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markovbased policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on realworld temperature and plankton density field data shows that our Markovbased policy can generally achieve active sampling performance comparable to that of the widelyused nonMarkovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.
ENVIRONMENTAL BOUNDARY TRACKING & ESTIMATION WITH MULTIPLE ROBOTS
PROJECT DURATION : Jan 2010  May 2012
PROJECT COLLABORATORS : John M. Dolan, CMU; Steve Chien, JPL, Caltech
PROJECT FUNDING : MOE AcRF Tier 1 Grant :
Active Robotic Exploration and Mapping for Environmental Sensing Applications,
SGD $165,377, Apr 2010  Mar 2013
PROBLEM MOTIVATION
A fundamental problem in environmental sensing and monitoring is to identify and delineate the hotspot regions in a largescale environmental field. It involves partitioning the area spanned by the field into one class of regions called the hotspot regions in which the field measurements exceed a predefined threshold, and the other class of regions where they do not. Such a problem arises in many realworld applications such as precision agriculture, monitoring of ocean and freshwater phenomena (e.g., plankton bloom), forest ecosystems, rare species, pollution (e.g., oil spill), or contamination (e.g., radiation leak). In these applications, it is necessary to assess the spatial extent and shape of the hotspot regions accurately due to severe economic, environmental, and health implications. In practice, this aim is nontrivial to achieve because the constraints on the sampling assets' resources (e.g., energy consumption, mission time, sensing range) limit the number and coverage of in situ observations over the large field that can be used to infer the hotspot regions. Subject to limited observations, the most informative ones should therefore be selected in order to minimize the uncertainty of estimating the hotspot regions (or, equivalently, classifying/labeling the hotspots) in the large field, which motivates our adaptive sampling work in this work.
Mobile robot teams are particularly desirable for performing the above environmental sensing task because they can actively explore to map the hotspot regions at high resolution. On the other hand, static sensors lack mobility and are therefore not capable of doing this well unless a large quantity is deployed. While research in multirobot exploration and mapping have largely focused on the conventional task of building occupancy grids, some recent efforts are put into the more complex, general task of sampling spatially distributed environmental fields. In contrast to occupancy grids that assume discrete, independent cell occupancies, environmental fields are characterized by continuousvalued, spatially correlated measurements, properties of which cannot be exploited by occupancy grid mapping strategies to select the most informative observation paths. To exploit such properties, exploration strategies for learning environmental field maps have recently been developed and can be classified into two regimes: (a) widearea coverage strategies consider sparsely sampled (i.e., largely unexplored) areas to be of high uncertainty and consequently spread observations evenly across the field; (b) hotspot sampling strategies assume areas of high uncertainty and interest to contain extreme, highlyvarying measurements and hence produce clustered observations.
Formal, principled approaches of exploration have also been devised to simultaneously perform hotspot sampling when a hotspot region is found as well as widearea coverage to search for new hotspot regions in sparsely sampled areas. These strategies optimize their observation paths to minimize the uncertainty (e.g., in terms of meansquared error or entropy) of mapping the entire continuousvalued field. They are, however, suboptimal for classifying/labeling the hotspots in the field, which we will discuss and demonstrate theoretically and empirically in this work.
PROPOSED METHODOLOGY
This work presents a novel decentralized active robotic exploration (DARE) strategy for probabilistic classification/labeling of hotspots in a largescale environmental field. The environmental field is assumed to be realized from a rich class of probabilistic spatial models called Gaussian process (GP) that can formally characterize its spatial correlation structure. More importantly, it can provide formal measures of classification/labeling uncertainty (i.e., in the form of cost functions) such as the misclassification and entropy criteria for directing the robots to explore highly uncertain areas of the field. The chief impediment to using these formal criteria is that they result in costminimizing exploration strategies, which cannot be solved in closed form. To resolve this, they are reformulated as rewardmaximizing dual strategies, from which we can then derive the approximate DARE strategy to be solved in closed form efficiently. The specific contributions of our work include:
 Analyzing the time complexity of solving the DARE strategy: We prove that its incurred time is independent of the map resolution and the number of robots, thus making it practical for in situ, realtime active sampling. In contrast, existing stateoftheart exploration strategies for learning environmental field maps scale poorly with increasing map resolution and/or number of robots;
 Analyzing the exploration behavior of the DARE strategy through its formulation: It exhibits an interesting formal tradeoff between that of boundary tracking until the hotspot region boundary can be accurately predicted and widearea coverage to find new boundaries in sparsely sampled areas to be tracked. In contrast, ad hoc, reactive boundary tracking strategies typically require a hotspot region boundary to be located manually or via random exploration and are not driven by the need to maximize the fidelity of estimating multiple hotspot regions given limited observations;
 Providing theoretical guarantee on the active exploration performance of the DARE strategy: We prove that, under a reasonable conditional independence assumption, it produces the same optimal observation paths as that of the centralized costminimizing strategies, the latter of which otherwise cannot be solved in closed form. This result has a simple but important implication: The uncertainty of labeling the hotspots in a GPbased field is greatest at or close to the hotspot region boundaries;
 Empirically evaluating the active exploration performance and time efficiency of the DARE strategy on realworld plankton density and temperature field data: Subject to limited observations, the DARE strategy can achieve better classification of the hotspots than stateoftheart active exploration strategies while being significantly more timeefficient than those performing widearea coverage and hotspot sampling.
PUBLICATIONS
 Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing.
Kian Hsiang Low, Jie Chen, John M. Dolan, Steve Chien & David R. Thompson.
In Proceedings of the
11th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS12), pages 105112, Valencia, Spain, June 48, 2012.
20.4% acceptance rate
Also appeared in
IROS'11 Workshop on Robotics for Environmental Monitoring (WREM11), San Francisco, CA, Sep 30, 2011.
Abstract. A central problem in environmental sensing and monitoring is to classify/label the hotspots in a largescale environmental field. This paper presents a novel decentralized active robotic exploration (DARE) strategy for probabilistic classification/labeling of hotspots in a Gaussian process (GP)based field. In contrast to existing stateoftheart exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, realtime active sampling. Its exploration behavior exhibits an interesting formal tradeoff between that of boundary tracking until the hotspot region boundary can be accurately predicted and widearea coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal costminimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GPbased field is greatest at or close to the hotspot region boundaries. Empirical evaluation on realworld plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than stateoftheart active exploration strategies.
MULTIROBOT ADAPTIVE SAMPLING FOR ENVIRONMENTAL SENSING & MONITORING
PROJECT DURATION : Jul 2005  Mar 2010
PROJECT LAB : TSAR
PROBLEM MOTIVATION
Recent research in multirobot exploration and mapping has focused on sampling environmental fields, some of which typically feature a few small hotspots in a large region.
Such a hotspot field often arises in two realworld applications:
(1) planetary exploration such as geologic reconnaissance and prospecting for mineral deposits or natural gases, and
(2) environment and ecological sensing such as precision agriculture, and monitoring of ocean phenomena (e.g., plankton bloom, anoxic zones), forest ecosystems, rare species, pollution (e.g., oil spill), or contamination (e.g., radiation leak).
In particular, the hotspot field is characterized by continuousvalued, spatially correlated measurements with the hotspots exhibiting extreme measurements and much higher spatial variability than the rest of the field.
With limited (e.g., pointbased) robot sensing range, a complete coverage becomes impractical in terms of resource costs (e.g., resource consumption).
So, to accurately map the field, the hotspots have to be sampled at a higher resolution.
The hotspot field discourages static sensor placement because a large number of sensors has to be positioned to detect and refine the sampling of hotspots. If these static sensors are not placed in any hotspot initially, they cannot reposition by themselves to locate one. In contrast, a robot team is capable of performing highresolution hotspot sampling due to its mobility. Hence, it is desirable to build a mobile robot team that can actively explore to map a hotspot field.
PROPOSED METHODOLOGY
To learn a hotspot field map, the exploration strategy of the robot team has to plan the most informative resourceconstrained observation paths that minimize the uncertainty of mapping the hotspot field.
By representing the hotspot field using rich classes of Bayesian nonparametric models such as the Gaussian process or logGaussian process,
formal measures of mapping uncertainty (e.g., based on meansquared error [ AAMAS08] or entropy [ ICAPS09] criterion) can be defined and subsequently exploited by our proposed adaptive sampling algorithms for directing the robot team to explore highly uncertain areas of the field.
In contrast to nonadaptive sampling strategies that only perform well with smoothlyvarying fields,
our nonmyopic adaptive sampling algorithms can exploit clustering phenomena (i.e., hotspots) to plan observation paths that produce lower mapping uncertainty.
PUBLICATIONS
 Adaptive Sampling of Time Series with Application to Remote Exploration.
David R. Thompson, Nathalie Cabrol, Michael Furlong, Craig Hardgrove, Kian Hsiang Low, Jeffrey Moersch & David Wettergreen.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'13), pages 34633468, Karlsruhe, Germany, May 610, 2013.
Abstract. We address the problem of adaptive informationoptimal data collection in time series. Here a remote sensor or explorer agent throttles its sampling rate in order to track anomalous events while obeying constraints on time and power. This problem is challenging because the agent has limited visibility  all collected datapoints lie in the past, but its resource allocation decisions require predicting far into the future. Our solution is to continually fit a Gaussian process model to the latest data and optimize the sampling plan on line to maximize information gain. We compare the performance characteristics of stationary and nonstationary Gaussian process models. We also describe an application based on geologic analysis during planetary rover exploration. Here adaptive sampling can improve coverage of localized anomalies and potentially benefit mission science yield of long autonomous traverses.
 Telesupervised Remote Surface
Water Quality Sensing.
Gregg Podnar, John M. Dolan, Kian Hsiang Low & Alberto Elfes.
In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, Mar 613, 2010.
Abstract. We present a fleet of autonomous Robot Sensor Boats (RSBs) developed for lake and river fresh water quality assessment and controlled by our Multilevel Autonomy Robot Telesupervision Architecture (MARTA). The RSBs are low cost, highly maneuverable, shallow draft sensor boats, developed as part of the Sensor Web program supported under the Advanced Information Systems Technology program of NASA's Earth Systems Technology Office. They can scan large areas of lakes, and navigate up tributaries to measure water quality near outfalls that larger research vessels cannot reach. The MARTA telesupervision architecture has been applied to a number of domains from multiplatform autonomous wide area planetary mineral prospecting, to multiplatform ocean monitoring. The RSBs are a complementary expansion of a fleet of NOAA/NASAdeveloped extendeddeployment surface autonomous vehicles that enable insitu study of meteorological factors of the ocean/atmosphere interface, and which have been adapted to investigate harmful algal blooms under this program. The flexibility of the MARTA telesupervision architecture was proven as it supported simultaneous operation of these heterogenous autonomous sensor platforms while geographically widely separated. Results and analysis are presented of multiple tests carried out over three months using a multisensor water sonde to assess water quality in a small recreational lake. Inference Grids were used to produce maps representing temperature, pH, and dissolved oxygen. The tests were performed under various water conditions (clear vs. hair algaeladen) and both before and after heavy rains. Data from each RSB was relayed to a data server in our lab in Pittsburgh, Pennsylvania, and made available over the World Wide Web where it was acquired by team members at the Jet Propulsion Laboratory of NASA in Pasadena, California who monitored the boats and their sensor readings in real time, as well as using these data to model the water quality by producing Inference Gridbased maps.
 MultiRobot Adaptive Exploration and Mapping for Environmental Sensing Applications.
Kian Hsiang Low.
Ph.D. Thesis, Technical Report CMUECE2009024, Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, Aug 2009.
Abstract.
Recent research in robot exploration and mapping has focused on sampling hotspot fields, which often arise in environmental and ecological sensing applications. Such a hotspot field is characterized by continuous, positively skewed, spatially correlated measurements with the hotspots exhibiting extreme measurements and much higher spatial variability than the rest of the field.
To map a hotspot field of the above characterization, we assume that it is realized from nonparametric probabilistic models such as the Gaussian and logGaussian processes (respectively, GP and lGP), which can provide formal measures of map uncertainty. To learn a hotspot field map, the exploration strategy of a robot team then has to plan resourceconstrained observation paths that minimize the uncertainty of a spatial model of the hotspot field. This exploration problem is formalized in a sequential decisiontheoretic planning under uncertainty framework called the multirobot adaptive sampling problem (MASP). So, MASP can be viewed as a sequential, nonmyopic version of active learning. In contrast to finitestate Markov decision problems, MASP adopts a more complex but realistic continuousstate, nonMarkovian problem structure so that its induced exploration policy can be informed by the complete history of continuous, spatially correlated observations for selecting paths. It is unique in unifying formulations of nonmyopic exploration problems along the entire adaptivity spectrum, thus subsuming existing nonadaptive formulations and allowing the performance advantage of a more adaptive policy to be theoretically realized. Through MASP, it is demonstrated that a more adaptive strategy can exploit clustering phenomena in a hotspot field to produce lower expected map uncertainty. By measuring map uncertainty using the meansquared error criterion, a MASPbased exploration strategy consequently plans adaptive observation paths that minimize the expected posterior map error or equivalently, maximize the expected map error reduction.
The time complexity of solving MASP (approximately) depends on the map resolution, which limits its practical use in largescale, highresolution exploration and mapping. This computational difficulty is alleviated through an informationtheoretic approach to MASP (iMASP), which measures map uncertainty based on the entropy criterion instead. As a result, an iMASPbased exploration strategy plans adaptive observation paths that minimize the expected posterior map entropy or equivalently, maximize the expected entropy of observation paths. Unlike MASP, reformulating the costminimizing iMASP as a rewardmaximizing dual problem causes its time complexity of being solved approximately to be independent of the map resolution and less sensitive to larger robot team size as demonstrated both analytically and empirically. Furthermore, this rewardmaximizing dual transforms the widelyused nonadaptive maximum entropy sampling problem into a novel adaptive variant, thus improving the performance of the induced exploration policy.
One advantage stemming from the rewardmaximizing dual formulations of MASP and iMASP is that they allow observation selection properties of the induced exploration policies to be realized for sampling the hotspot field. These properties include adaptivity, hotspot sampling, and widearea coverage. We show that existing GPbased exploration strategies may not explore and map the hotspot field well with the selected observations because they are nonadaptive and perform only widearea coverage. In contrast, the lGPbased exploration policies can learn a highquality hotspot field map because they are adaptive and perform both widearea coverage and hotspot sampling.
The other advantage is that even though MASP and iMASP are nontrivial to solve due to their continuous state components, the convexity of their rewardmaximizing duals can be exploited to derive, in a computationally tractable manner, discretestate monotonebounding approximations and subsequently, approximately optimal exploration policies with theoretical performance guarantees. Anytime algorithms based on approximate MASP and iMASP are then proposed to alleviate the computational difficulty that arises from their nonMarkovian structure.
It is of practical interest to be able to quantitatively characterize the "hotspotness" of an environmental field. We propose a novel "hotspotness" index, which is defined in terms of the spatial correlation properties of the hotspot field. As a result, this index can be related to the intensity, size, and diffuseness of the hotspots in the field.
We also investigate how the spatial correlation properties of the hotspot field affect the performance advantage of adaptivity. In particular, we derive sufficient and necessary conditions of the spatial correlation properties for adaptive exploration to yield no performance advantage.
Lastly, we develop computationally efficient approximately optimal exploration strategies for sampling the GP by assuming the Markov property in iMASP planning. We provide theoretical guarantees on the performance of the Markovbased policies, which improve with decreasing spatial correlation. We evaluate empirically the effects of varying spatial correlations on the mapping performance of the Markovbased policies as well as whether these Markovbased path planners are timeefficient for the transect sampling task.
Through the abovementioned work, this thesis establishes the following two claims: (1) adaptive, nonmyopic exploration strategies can exploit clustering phenomena to plan observation paths that produce lower map uncertainty than nonadaptive, greedy methods; and (2) Markovbased exploration strategies can exploit small spatial correlation to plan observation paths which achieve map uncertainty comparable to that of nonMarkovian policies using significantly less planning time.
 InformationTheoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing.
Kian Hsiang Low, John M. Dolan & Pradeep Khosla.
In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS09), pages 233240, Thessaloniki, Greece, Sep 1923, 2009.
33.9% acceptance rate
Also appeared in IPSN09 Workshop on Sensor Networks for Earth and Space Science Applications (ESSA09), San Francisco, CA, Apr 16, 2009.
Also orally presented in RSS09 Workshop on Aquatic Robots and Ocean Sampling, Seattle, WA, Jun 29, 2009.
Abstract. Recent research in robot exploration and mapping has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decisiontheoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in largescale, highresolution exploration and mapping. To alleviate this computational difficulty, this paper presents an informationtheoretic approach to MASP (iMASP) for efficient adaptive path planning; by reformulating the costminimizing iMASP as a rewardmaximizing problem, its time complexity becomes independent of map resolution and is less sensitive to increasing robot team size as demonstrated both theoretically and empirically. Using the rewardmaximizing dual, we derive a novel adaptive variant of maximum entropy sampling, thus improving the induced exploration policy performance. It also allows us to establish theoretical bounds quantifying the performance advantage of optimal adaptive over nonadaptive policies and the performance quality of approximately optimal vs. optimal adaptive policies. We show analytically and empirically the superior performance of iMASPbased policies for sampling the logGaussian process to that of policies for the widelyused Gaussian process in mapping the hotspot field. Lastly, we provide sufficient conditions that, when met, guarantee adaptivity has no benefit under an assumed environment model.
 Cooperative Aquatic Sensing using the Telesupervised Adaptive Ocean Sensor Fleet.
John M. Dolan, Gregg W. Podnar, Stephen Stancliff, Kian Hsiang Low, Alberto Elfes, John Higinbotham, Jeffrey C. Hosler, Tiffany A. Moisan & John Moisan.
In Proceedings of the SPIE Conference on Remote Sensing of the Ocean, Sea Ice, and Large Water Regions, volume 7473, Berlin, Germany, Aug 31  Sep 3, 2009.
Abstract. Earth science research must bridge the gap between the atmosphere and the ocean to foster understanding of Earth's climate and ecology. Typical ocean sensing is done with satellites or in situ buoys and research ships which are slow to reposition. Cloud cover inhibits study of localized transient phenomena such as Harmful Algal Blooms (HAB). A fleet of extendeddeployment surface autonomous vehicles will enable in situ study of characteristics of HAB, coastal pollutants, and related phenomena. We have developed a multiplatform telesupervision architecture that supports adaptive reconfiguration based on environmental sensor inputs. Our system allows the autonomous repositioning of smart sensors for HAB study by networking a fleet of NOAA OASIS (Ocean Atmosphere Sensor Integration System) surface autonomous vehicles. In situ measurements intelligently modify the search for areas of high concentration. Inference Grid and complementary informationtheoretic techniques support sensor fusion and analysis. Telesupervision supports sliding autonomy from highlevel mission tasking, through vehicle and data monitoring, to teleoperation when direct human interaction is appropriate. This paper reports on experimental results from multiplatform tests conducted in the Chesapeake Bay and in Pittsburgh, Pennsylvania waters using OASIS platforms, autonomous kayaks, and multiple simulated platforms to conduct cooperative sensing of chlorophylla and water quality.
 Robot Boats as a Mobile Aquatic Sensor Network.
Kian Hsiang Low, Gregg Podnar, Stephen Stancliff, John M. Dolan & Alberto Elfes.
In Proceedings of the IPSN09 Workshop on Sensor Networks for Earth and Space Science Applications (ESSA09), San Francisco, CA, Apr 16, 2009.
Abstract. This paper describes the Multilevel Autonomy Robot Telesupervision Architecture (MARTA), an architecture for supervisory control of a heterogeneous fleet of networked unmanned autonomous aquatic surface vessels carrying a payload of environmental science sensors. This architecture allows a landbased human scientist to effectively supervise data gathering by multiple robotic assets that implement a web of widely dispersed mobile sensors for in situ study of physical, chemical or biological processes in water or in the water/atmosphere interface.
 Adaptive MultiRobot WideArea Exploration And Mapping.
Kian Hsiang Low, John M. Dolan & Pradeep Khosla.
In Proceedings of the
7th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS08), pages 2330, Estoril, Portugal, May 1216, 2008.
22.2% acceptance rate
Also presented as a poster in RSS09 Workshop on Aquatic Robots and Ocean Sampling, Seattle, WA, Jun 29, 2009.
Abstract. The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sampling using nonmyopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multirobot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and logGaussian processes, and analyze if the resulting strategies are adaptive and maximize widearea coverage and hotspot sampling. Solving MASP is nontrivial as it comprises continuous state components. So, it is reformulated for convex analysis, which allows discretestate monotonebounding approximation to be developed. We provide a theoretical guarantee on the policy quality of the approximate MASP (aMASP) for using in MASP. Although aMASP can be solved exactly, its state size grows exponentially with the number of stages. To alleviate this computational difficulty, anytime algorithms are proposed based on aMASP, one of which can guarantee its policy quality for MASP in real time.
 Adaptive Sampling for MultiRobot WideArea Exploration.
Kian Hsiang Low, Geoffrey J. Gordon, John M. Dolan & Pradeep Khosla.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'07), pages 755760, Rome, Italy, Apr 1014, 2007.
Abstract. The exploration problem is a central issue in mobile robotics. A complete coverage is not practical if the environment is large with a few small hotspots, and the sampling cost is high. So, it is desirable to build robot teams that can coordinate to maximize sampling at these hotspots while minimizing resource costs, and consequently learn more accurately about properties of such environmental phenomena. An important issue in designing such teams is the exploration strategy. The contribution of this paper is in the evaluation of an adaptive exploration strategy called adaptive cluster sampling (ACS), which is demonstrated to reduce the resource costs (i.e., mission time and energy consumption) of a robot team, and yield more information about the environment by directing robot exploration towards hotspots. Due to the adaptive nature of the strategy, it is not obvious how the sampled data can be used to provide unbiased, lowvariance estimates of the properties. This paper therefore discusses how estimators that are RaoBlackwellized can be used to achieve low error. This paper also presents the first analysis of the characteristics of the environmental phenomena that favor the ACS strategy and estimators. Quantitative experimental results in a mineral prospecting task simulation show that our approach is more efficient in exploration by yielding more minerals and information with fewer resources and providing more precise mineral density estimates than previous methods.
 Adaptive Sampling for MultiRobot Wide Area Prospecting.
Kian Hsiang Low, Geoffrey J. Gordon, John M. Dolan, and Pradeep Khosla.
In Technical Report CMURITR0551, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Oct 2005.
Abstract. Prospecting for in situ mineral resources is essential for establishing settlements on the Moon and Mars. To reduce human effort and risk, it is desirable to build robotic systems to perform this prospecting. An important issue in designing such systems is the sampling strategy: how do the robots choose where to prospect next? This paper argues that a strategy called Adaptive Cluster Sampling (ACS) has a number of desirable properties: compared to conventional strategies, (1) it reduces the total mission time and energy consumption of a team of robots, and (2) returns a higher mineral yield and more information about the prospected region by directing exploration towards areas of high mineral density, thus providing detailed maps of the boundaries of such areas. Due to the adaptive nature of the sampling scheme, it is not immediately obvious how the resulting sampled data can be used to provide an unbiased, lowvariance estimate of the regional mineral density. This paper therefore investigates new mineral density estimators, which have lower error than previouslydeveloped estimators; they are derived from the older estimators via a process called RaoBlackwellization. Since the efficiency of estimators depends on the type of mineralogical population sampled, the population characteristics that favor ACS estimators are also analyzed. The ACS scheme and our new estimators are evaluated empirically in a detailed simulation of the prospecting task, and the quantitative results show that our approach can yield more minerals with less resources and provide more accurate mineral density estimates than previous methods.
DISTRIBUTED LAYERED ARCHITECTURE FOR SELFORGANIZING MOBILE SENSOR NETWORKS
PROJECT DURATION : Nov 2002  Jun 2005
PROBLEM MOTIVATION
One of the fundamental issues that arises in a sensor network is coverage.
Traditionally, network coverage is maximized by determining the optimal placement of static sensors in a centralized manner,
which can be related to the class of art gallery problems.
However, recent investigations in sensor network mobility reveal that
mobile sensors can selforganize to provide better coverage than static placement.
Existing applications have only utilized uninformed mobility (i.e., random motion or patrol).
In contrast, our work here focuses on informed, intelligent mobility to further improve coverage.
Our network coverage problem is motivated by the following constraints
that discourage static sensor placement or uninformed mobility:
(a) no prior information about the exact target locations, population densities or motion pattern,
(b) limited sensory range, and
(c) very large area to be observed.
All these conditions may cause the sensors to be unable to cover the entire region of interest.
Hence, fixed sensor locations or uninformed mobility will not be adequate in general.
Rather, the sensors have to move dynamically in response to the motion and distribution of targets and other sensors
to maximize coverage.
Inspired by robotics, the above problem may be regarded as that of lowlevel motion control to coordinate the sensors'
target tracking movements in the continuous workspace.
Alternatively, it can be cast as a highlevel task allocation problem by segmenting the workspace into discrete regions
such that each region is assigned a group or coalition of sensors to track the targets within.
PROPOSED METHODOLOGY
This work presents a reactive layered multirobot architecture
for distributed mobile sensor network coverage in complex, dynamic environments.
At the lower layer, each robot uses a reactive motion control strategy
known as Cooperative Extended Kohonen Maps
to coordinate their target tracking within a region without the need of communication.
This strategy is also responsible for obstacle avoidance, robot separation to minimize task interference,
and navigation between regions via beacons or checkpoints plotted by a motion planner.
At the higher layer, the robots use a dynamic antbased task allocation scheme to cooperatively selforganize
their coalitions in a decentralized manner according to the target distributions across the regions.
This scheme addresses the following issues, which distinguish it from the other task allocation mechanisms:
Task Allocation for MultiRobot Tasks: Existing algorithms (e.g., auctionand behaviorbased) assume a multirobot
task can be partitioned into singlerobot tasks. But this may not be always possible or the multirobot task
can be more efficiently performed by coalitions of robots.
Coalition Formation for Minimalist Robots: Existing coalition formation schemes require complex planning, explicit
negotiation, and precise estimation of coalitional cost. Hence, they do not perform well in dynamic, realtime scenarios.
Cooperation of ResourceLimited Robots: Robots with limited communication and sensing capabilities (i.e., partial
observability) can only obtain local, uncertain information of the dynamic environment. With limited computational power,
their cooperative strategies cannot involve complex planning or negotiations.
PUBLICATIONS
 Autonomic Mobile Sensor Network with SelfCoordinated Task Allocation and Execution.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In IEEE Transactions on Systems, Man, and Cybernetics  Part C: Applications and Reviews
(Special Issue on Engineering Autonomic Systems), volume 36, issue 3, pages 315327, May 2006.
Andrew P. Sage Best Transactions Paper Award for the best paper published in IEEE Trans. SMC  Part A, B, and C in 2006
Abstract. This paper describes a distributed layered architecture for resourceconstrained multirobot cooperation, which is utilized in autonomic mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme selforganizes the robot coalitions to track efficiently across regions. It uses concepts of ant behavior to selfregulate the regional distributions of robots in proportion to that of the moving targets to be tracked in a nonstationary environment. As a result, the adverse effects of task interference between robots are minimized and network coverage is improved. In the lower task execution layer, the robots use selforganizing neural networks to coordinate their target tracking within a region. Both layers employ selforganization techniques, which exhibit autonomic properties such as selfconfiguring, selfoptimizing, selfhealing, and selfprotecting. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auctionbased negotiation show that our layered approach can provide better coverage, greater robustness to sensor failures, and greater flexibility to respond to environmental changes.
 Task Allocation via SelfOrganizing Swarm Coalitions in Distributed Mobile Sensor Network.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI04), pages 2833, San Jose, CA, Jul 2529, 2004.
26.7% acceptance rate
Abstract. This paper presents a task allocation scheme via selforganizing swarm coalitions for distributed mobile sensor network coverage. Our approach uses the concepts of ant behavior to selfregulate the regional distributions of sensors in proportion to that of the moving targets to be tracked in a nonstationary environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. Quantitative comparisons with other tracking strategies such as static sensor placement, potential fields, and auctionbased negotiation show that our approach can provide better coverage and greater flexibility to respond to environmental changes.
 Reactive, Distributed Layered Architecture for ResourceBounded MultiRobot Cooperation: Application to Mobile Sensor Network Coverage.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'04), pages 37473752, New Orleans, LA, Apr 26  May 1, 2004.
Abstract. This paper describes a reactive, distributed layered architecture for cooperation of multiple resourcebounded robots, which is utilized in mobile sensor network coverage. In the upper layer, a dynamic task allocation scheme selforganizes the robot coalitions to track efficiently in separate regions. It uses the concepts of ant behavior to selfregulate the regional distributions of robots in proportion to that of the targets to be tracked in the changing environment. As a result, the adverse effects of task interference between robots are minimized and sensor network coverage is improved. In the lower layer, the robots use selforganizing neural networks to coordinate their target tracking within a region. Quantitative comparisons with other tracking strategies such as static sensor placements, potential fields, and auctionbased negotiation show that our approach can provide better coverage and greater flexibility in responding to environmental changes.
PRESENTATIONS
 Task Allocation via SelfOrganizing Swarm Coalitions in Distributed Mobile Sensor Network.
Kian Hsiang Low.
Presented in 8th National IT Awareness Project Competition (NITA04), National University of Singapore, Mar 13, 2004
(Overall Best Project, Postgraduate Category).
VIDEO DEMOS
Coverage of 30 targets (green) with 15 ant robots (white)
 Selforganization of swarm coalitions to unknown, timevarying target distribution after
 Robot switching to region of higher task demand (i.e., targets to robots ratio)
ACTION SELECTION MECHANISM FOR MULTIROBOT TASKS
PROJECT DURATION : Sep 2002  Nov 2002
PROBLEM MOTIVATION
A central issue in the design of behaviorbased control architectures for autonomous mobile robots
is the formulation of effective mechanisms to coordinate the behaviors.
These mechanisms determine the policy of conflict resolution between behaviors,
which involves behavioral cooperation and competition to select the most appropriate action.
The actions are selected so as to optimize the achievement of the goals or behavioral objectives.
Developing such an action selection methodology is nontrivial
due to realistic constraints such as environmental complexity and unpredictability,
and resource limitations, which include computational and cognitive capabilities of the robot,
incomplete knowledge of the environment, and time constraints.
As a result, action selection can never be absolutely optimal.
Given these constraints, the action selection scheme should be able to choose actions
that are good enough to satisfy multiple concurrent, possibly conflicting, behavioral objectives.
PROPOSED METHODOLOGY
Our motivation of the action selection mechanism is to develop a motion control strategy for autonomous nonholonomic mobile robots
that can perform distributed multirobot surveillance in unknown, dynamic, complex, and unpredictable environments.
By implementing the action selection framework using an assemblage of selforganizing neural networks,
it induces the following key features that significantly enhance the agent's action selection capability:
selforganization of continuous state and action spaces to provide smooth, efficient and fine motion control,
and action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks:
(1) negotiation of unforeseen concave and narrowly spaced obstacles, and
(2) cooperative tracking of multiple mobile targets by a team of robots.
Qualitative and quantitative comparisons for single and multirobot tasks show that
our framework can provide better action selection than do potential fields method.
PUBLICATIONS
 An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Neural Computation, volume 17, issue 6, pages 14111445, Jun 2005.
Abstract. Selforganizing feature maps such as extended Kohonen maps (EKMs) have been very successful at learning sensorimotor control for mobile robot tasks. This letter presents a new ensemble approach, cooperative EKMs with indirect mapping, to achieve complex robot motion. An indirectmapping EKM selforganizes to map from the sensory input space to the motor control space indirectly via a control parameter space. Quantitative evaluation reveals that indirect mapping can provide finer, smoother, and more efficient motion control than does direct mapping by operating in a continuous, rather than discrete, motor control space. It is also shown to outperform basis function neural networks. Furthermore, training its control parameters with recursive least squares enables faster convergence and better performance compared to gradient descent. The cooperation and competition of multiple selforganized EKMs allow a nonholonomic mobile robot to negotiate unforeseen, concave, closely spaced, and dynamic obstacles. Qualitative and quantitative comparisons with neural network ensembles employing weighted sum reveal that our method can achieve more sophisticated motion tasks even though the weightedsum ensemble approach also operates in continuous motor control space.
 ContinuousSpaced Action Selection for Single and MultiRobot Tasks Using Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC'04)
(Invited Paper to Special Session on Visual Surveillance), pages 198203, Taipei, Taiwan, Mar 2123, 2004.
Abstract. Action selection is a central issue in the design of behaviorbased control architectures for autonomous mobile robots. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps so that more complex motion tasks can be achieved. Qualitative and quantitative comparisons for both single and multirobot motion tasks show that our framework can provide better action selection than do action superposition methods.
 Action Selection for Single and MultiRobot Tasks Using Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI03), pages 15051506, Acapulco, Mexico, Aug 915, 2003.
27.6% acceptance rate
Abstract. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative and quantitative comparisons for single and multirobot tasks show our framework can provide better action selection than do potential fields method.
 Action Selection in Continuous State and Action Spaces by Cooperation and Competition of Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the
2nd International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS03), pages 10561057, Melbourne, Australia, Jul 1418, 2003.
Abstract. This paper presents an action selection framework based on an assemblage of selforganizing neural networks called Cooperative Extended Kohonen Maps. This framework encapsulates two features that significantly enhance a robot's action selection capability: selforganization in the continuous state and action spaces to provide smooth, efficient and fine motion control; action selection via the cooperation and competition of Extended Kohonen Maps to achieve more complex motion tasks. Qualitative tests demonstrate the capability of our action selection method for both single and multirobot motion tasks.
VIDEO DEMO
 Cooperative tracking of moving targets by robots
using cooperative Extended Kohonen Maps
INTEGRATED ROBOT PLANNING AND CONTROL
PROJECT DURATION : Jul 2001  Sep 2002
PROBLEM MOTIVATION
Robot motion research has proceeded along two separate directions:
highlevel deliberative planning and lowlevel reactive control.
Deliberative planning uses a world model to generate an optimal sequence of collisionfree actions
that can achieve a globally specified goal in a complex static environment.
However, in a dynamic environment, unforeseen obstacles may obstruct the action sequence,
and replanning to react to these situations can be too computationally expensive.
On the other hand, reactive control directly couples sensed data to appropriate actions.
It allows the robot to respond robustly and timely to unexpected obstacles and environmental changes
but may be trapped by them.
PROPOSED METHODOLOGY
The problem of goaldirected, collisionfree motion in a complex, unpredictable environment can be solved
by tightly integrating highlevel deliberative planning with lowlevel reactive control.
This work presents two such architectures for a nonholonomic mobile robot.
To achieve realtime performance, reactive control capabilities have to be fully realized so that
the deliberative planner can be simplified.
These architectures are enriched with reactive target reaching and obstacle avoidance modules.
Their target reaching modules use indirectmapping Extended Kohonen Map to provide finer and smoother motion control
than directmapping methods.
While one architecture fuses these modules indirectly via command fusion,
the other one couples them directly using cooperative Extended Kohonen Maps,
enabling the robot to negotiate unforeseen concave obstacles.
The planner for both architectures use a slippery cells technique to
decompose the free workspace into fewer cells, thus reducing search time.
Any two points in the cell can still be traversed by reactive motion.
PUBLICATIONS
 Enhancing the Reactive Capabilities of Integrated Planning and Control with Cooperative Extended Kohonen Maps.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA'03), pages 34283433, Taipei, Taiwan, May 1217, 2003.
Abstract. Despite the many significant advances made in robot motion research, few works have focused on the tight integration of highlevel deliberative planning with reactive control at the lowest level. In particular, the realtime performance of existing integrated planning and control architectures is still not optimal because the reactive control capabilities have not been fully realized. This paper aims to enhance the lowlevel reactive capabilities of integrated planning and control with Cooperative Extended Kohonen Maps for handling complex, unpredictable environments so that the workload of the highlevel planner can be consequently eased. The enhancements include fine, smooth motion control, execution of more complex motion tasks such as overcoming unforeseen concave obstacles and traversing between closely spaced obstacles, and asynchronous execution of behaviors.
 A Hybrid Mobile Robot Architecture with Integrated Planning and Control.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the
1st International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS02), pages 219226, Bologna, Italy, Jul 1519, 2002.
26% acceptance rate
Abstract. Research in the planning and control of mobile robots has received much attention in the past two decades. Two basic approaches have emerged from these research efforts: deliberative vs. reactive. These two approaches can be distinguished by their different usage of sensed data and global knowledge, speed of response, reasoning capability, and complexity of computation. Their strengths are complementary and their weaknesses can be mitigated by combining the two approaches in a hybrid architecture. This paper describes a method for goaldirected, collisionfree navigation in unpredictable environments that employs a behaviorbased hybrid architecture with asynchronously operating behavioral modules. It differs from existing hybrid architectures in two important ways: (1) the planning module produces a sequence of checkpoints instead of a conventional complete path, and (2) in addition to obstacle avoidance, the reactive module also performs target reaching under the control of a selforganizing neural network. The neural network is trained to perform fine, smooth motor control that moves the robot through the checkpoints. These two aspects facilitate a tight integration between highlevel planning and lowlevel control, which permits realtime performance and easy path modification even when the robot is en route to the goal position.
 Integrated Planning and Control of Mobile Robot with SelfOrganizing Neural Network.
Kian Hsiang Low, Wee Kheng Leow & Marcelo H. Ang, Jr.
In Proceedings of the
IEEE International Conference on Robotics and Automation (ICRA'02), pages 38703875, Washington, DC, May 1115, 2002.
Abstract. Despite the many significant advances made in robotics research, few works have focused on the tight integration of task planning and motion control. Most integration works involve the task planner providing discrete commands to the lowlevel controller, which performs kinematics and control computations to command the motor and joint actuators. This paper presents a framework of the integrated planning and control for mobile robot navigation. Unlike existing integrated approaches, it produces a sequence of checkpoints instead of a complete path at the planning level. At the motion control level, a neural network is trained to perform motor control that moves the robot from one checkpoint to the next. This method allows for a tight integration between highlevel planning and lowlevel control, which permits realtime performance and easy modification of motion path while the robot is enroute to the goal position.
 Integrated Robot Planning and Control with Extended Kohonen Maps.
Kian Hsiang Low.
Master's Thesis, Department of Computer Science, School of Computing, National University of Singapore, Jul 2002.
Singapore Computer Society Prize for best M.Sc. Thesis 20022003
Abstract. The problem of goaldirected, collisionfree motion in a complex, unpredictable environment can be solved by tightly integrating highlevel deliberative planning with lowlevel reactive control. This thesis presents two such architectures for a nonholonomic mobile robot. To achieve realtime performance, reactive control capabilities have to be fully realized so that the deliberative planner can be simplified. These architectures are enriched with reactive target reaching and obstacle avoidance modules. Their target reaching modules use indirectmapping Extended Kohonen Map to provide finer and smoother motion control than directmapping methods. While one architecture fuses these modules indirectly via command fusion, the other one couples them directly using cooperative Extended Kohonen Maps, enabling the robot to negotiate unforeseen concave obstacles. The planner for both architectures use a slippery cells technique to decompose the free workspace into fewer cells, thus reducing search time. Any two points in the cell can still be traversed by reactive motion.
VIDEO DEMOS
 Robot motion in an environment with unforeseen stationary obstacle
using command fusion
 Robot motion in an environment with unforeseen moving obstacle
using command fusion
 Robot motion in an environment that changes using command
fusion
 Robot motion in an environment with
unforeseen stationary concave and narrowly spaced convex obstacles using cooperative Extended Kohonen Maps
 Robot motion in an environment with
unforeseen moving obstacles using cooperative Extended Kohonen Maps
