Mention “Bayesian Optimisation” to Professor Bryan Low Kian Hsiang and he begins to talk about baking cookies. That’s because to the uninitiated, concepts such as “distributed batch Gaussian process optimisation” and “decentralised high-dimensional Bayesian optimisation” can sound either downright intimidating or like pure gobbledygook.
So instead, picture an elderly grandmother baking cookies in a warm kitchen, says Low, an Assistant Professor at the NUS School of Computing. Grandma is on a quest to bake the tastiest cookies ever, and to do so, she plays around with various parameters. She tries different ingredients and toppings, changes cookie size and thickness, and experiments with varying oven temperatures and baking times. She bakes one batch, feeds them to her family and friends, and gathers their feedback. Then Grandma heads back into the kitchen to repeat the process all over again.
The cookies get closer to perfection, but over time, Grandma grows more tired, her savings deplete, and her family and friends inch towards Type 2 diabetes. Grandma’s quest for the ultimate cookie recipe might seem trivial, but it’s emblematic of a problem many — be they manufacturers, engineers, AI developers, biologists and so on — face: how to find the optimal settings or conditions for their products or research. The process of try, test, tweak, and repeat is no doubt a long and labourious one.
The mysterious black box
“The key is that you don’t want to run so many experiments because every experiment you run is going to be costly either economically or computationally speaking — it either takes a long time or requires a lot of money,” explains Low. “So what we want to run is as few experiments as possible.”
“Basically, you want to figure out the optimum settings with as few data points as possible,” he says.
In industry, such testing is referred to as black box or functional testing. It’s asking the question: “If I tell you what the input is, can you tell me what the output will be?” says Low. “Or in other words, it’s like asking Grandma, ‘If I give you the ingredients, the mixture, the quantity, can you tell me how tasty your cookies will be?’”
“It’s highly complex...nobody knows how to write such black box functions,” he says. “Plus it’s costly to evaluate. You need to bake and feed a lot of cookies to a lot of people.”
Reducing the search space
So in 2015, Low and his research group decided to address this problem. Their solution: use novel Bayesian Optimisation algorithms to simplify and quicken the search for optimal settings. One trick, Low says, is to identify which parameters are dependent on one another, and which are not.
“This is called sparse dependencies,” he says. “In the world of computer science, we are always trying to exploit some structure in the problem so we can do things in a more efficient way.”
In the case of Grandma’s cookies, for example, it would be asking questions such as: “Is the amount of sugar required to bake a cookie related to the heat you use?” An experienced baker would tell you the two parameters aren’t co-dependent. So if Grandma had to test ten different amounts of sugar and ten different temperatures in her bid to make the tastiest cookie, then she could test the two parameters separately. This means that instead of having to experiment with 100 different combinations (10x10), Grandma would only need to run 20 tests (10x2).
“If we can make these kinds of independence assumptions, we are able to reduce the search space by quite a bit,” says Low. “So we can quickly nail down the optimal settings or parameters.”
These findings — that of a novel algorithm called decentralized high-dimensional Bayesian Optimisation — were presented, and well received, at last year’s AAAI Conference on Artificial Intelligence by Low’s former PhD student Trong Nghia Hoang. Hoang is now based at the MIT-IBM Watson AI Lab in Cambridge, Massachusetts.
Multiple kitchens, multiple experiments
Low’s other work on Bayesian Optimisation, an algorithm called distributed batch Gaussian process optimisation, makes use of the same idea — sparse dependencies — to solve a slightly different optimisation problem.
“Very often we’re in a hurry, we don’t just want to run one experiment at a time and then wait for the feedback,” he explains. “Instead, we want to run 10 experiments in parallel, observe the outcome and then use all 10 in feedback.”
“If you do this, you will be able to converge to the optimal settings or parameters faster than if you run them one by one,” says Low. Just imagine how much quicker the perfect cookie recipe could be attained if Grandma was rich enough to have 10 different kitchens with staff to delegate the testing to.
“But how do you do these parallel experiments so that they are not wasted effort, that they actually give you novel insights or outcomes?”
The answer, again, lies in determining which experiments are related, he says. “Then I can improve the search for the optimal settings.”
From algae blooms to AlphaGo
In recent years, such Bayesian Optimisation algorithms have gained considerable traction. Their ability to reduce the time, cost, and effort of searching for the optimal conditions have seen their application in broad fields of research.
The algorithms can be used in precision agriculture to maximise crop yield by looking at nutrient level, temperature, amount of lighting, and other parameters. They can be used by environmental scientists to gather informative data to train models that can predict where algal bloom hotspots might appear. They can also be used by battery manufacturers to figure out what the right mixture of materials might be in order to fabricate long-lasting batteries. They’ve even been used by Google DeepMind to help develop AlphaGo, the first computer program to beat a professional human Go player.
In Low’s paper last year, the authors asserted they could optimise up to 1,800 parameters. That figure has now been raised to more than 9,000, he says. “But in the real-world, if you really want to use Bayesian Optimisation, you should be able to tweak tens of thousands, even millions, of parameters.”
“We’re looking forward to scaling up to that number,” Low says. “Because that’s what the industry needs.” And if all goes well, we’ll be able to grow crops more efficiently, have more accurate prediction models, among other things...and better cookies!
Decentralized High-Dimensional Bayesian Optimization with Factor Graphs