07 July 2023 Department of Computer Science , Faculty , Research , Feature
Featured Faculty

Just as we hurry about the day paying little attention to the sky above us and its fluffy white inhabitants, we pay the same careless heed to the clouds of computers distributed across various datacentres around the globe.

Yet our lives wouldn’t be the same without them: our weather system would be wonky without the former, and if the latter didn’t exist, there would be no Gmail, Slack, or Google Docs to enable our work; no Facebook, Skype, or WhatsApp to help us stay connected; no Netflix or Disney+ to unwind with at the end of the day.

But things for the digital cloud — the software and services that that run somewhere on the other end of the Internet connection, instead of one’s local computer — weren’t quite as smooth-sailing some two decades ago, says Djordje Jevdjic, an assistant professor at the NUS School of Computing.

“At that time, cloud computing was just getting started,” recalls Jevdjic, then a PhD student at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland. Salesforce, Amazon, and other bigwigs were jumping on the bandwagon to offer their services to anyone with an Internet connection. A big challenge they faced, however, was how to ramp up their computing power to deal with billions of users and the copious amounts of data involved, which was unprecedented in its sheer volume.

“My colleagues and I had this idea to look at the most common applications and software stacks that are running in the cloud to see what happens when they run on existing hardware,” explains Jevdjic, whose research at NUS centres on how to create more efficient server systems.

The study they conducted ended up taking the better part of two years to complete, the cumulative efforts of a team of ten. “We invested a lot of time and work into it,” he says. But the end result, a paper called “Clearing the Clouds”, was well worth it — winning the Best Paper Award at the prestigious annual conference of the Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Now more than a decade on, the paper continues to generate ripples in the computing science world. In March, it garnered another accolade from ASPLOS: the Influential Paper Award. “It’s also known as the Test of Time award,” explains Jevdjic. “The community looks at papers that have been around for at least a decade and asks: Which paper had the most impact in the field? Are the paper’s conclusions still valid? Have they stood the test of time?”

“It’s the most important recognition we can get for a paper,” he adds proudly. Best paper awards are often given to papers that conform to the current scientific “fashion”, whereas the most impactful papers that move the needle in practice are recognised by awards like this. And this is a rare situation where the same paper gets both awards.


Creating the benchmarks

Contrary to what its moniker suggests, data stored in the cloud has a terrestrial home — cavernous rooms, called datacentres, containing rows upon rows of servers. Because there are physical and power constraints to the size of these facilities, cloud providers must find other ways to grow. A key approach is to employ efficient processors, which is the reason why Jevdjic and his co-authors decided to focus on the hardware involved in cloud computing for their study.

The researchers’ first task was to come up with a set of benchmarking applications that mimic the proprietary software stacks that run in the cloud, and that hardware designers could use to measure their new products against. “The old benchmarks were so outdated and disconnected from the reality of cloud computing,” says Jevdjic. “What was missing was some kind of standardised measure whereby if you propose a new hardware, here’s a suite of benchmark applications you can run to see how your system works and pinpoint any problems.”

The team named the benchmarks they created CloudSuite, and made it open-source, freely available for all to download. The response they’ve received since its release has been tremendous. “We’ve had people from Google and Facebook telling us as they were running their data centres: your benchmark suite is an appropriate thing to mimic the behaviour of our proprietary applications,” says Jevdjic. “So that was a good confirmation that we’re actually doing the right thing and that the benchmarks we proposed are actually meaningful.”


Comparing demand and supply

After CloudSuite was completed, the team then turned their attention to the next task: studying the hardware used in cloud computing. They analysed the processor’s performance and efficiency when it is running CloudSuite, by looking at how its computing and memory space were provisioned, organised, and utilised, how many instructions it could execute every cycle, how much data per second it can process, among other metrics. They then compared this to the workload demands required to run the new cloud applications.

The findings were dismal: there appeared to be a large mismatch across various metrics — often up to 10 times — between what the processor architecture can provide and what the applications demand, thus implying that data centres were extremely inefficient.

The news, however, wasn’t surprising. “That’s because the hardware going into the cloud was largely the same hardware going into my laptop, which isn’t quite appropriate,” says Jevdjic. “Cloud applications have to tackle huge amounts of data compared to the conventional desktop applications such as power-point or a web browser, so they need different hardware mechanisms to run them efficiently.”

He adds: “This may sound like an obvious conclusion, but there’s a lot of business decisions behind what goes into a processor.” Companies, for instance, most often prefer to work with and make money of what they already have for as long as they can. “It’s a huge cost and risk to start something totally new, so instead they take an existing processor architecture and maybe add some more cache memory, which, as we showed in this paper, does not contribute much to either performance or efficiency.”

Practically speaking, their conclusions suggest that the processing units in general-purpose processors are unnecessarily complex for the task at hand. To illustrate this point, Jevdjic invokes an animal analogy: “imagine using a horse to do what a chicken could do on a farm. The price of keeping a horse, however, is huge, given the space it occupies, etc.”

Similarly, processing units are “too few and too bulky, each taking a lot of physical space on the chip, and full of features that the cloud applications don’t actually use,” he says. “And the reason is because they are built for general purpose things, with the assumption that one size should fit all. It may work for some business models, but it doesn’t give us the energy efficiency we need in a data centre.”

Instead, a better approach would be to use many smaller and more efficient processing units, specialised for the task at hand — and this is indeed the way things have evolved in the decade since the paper’s publication. “Now we see much more specialised hardware everywhere. One type of a processor is running machine learning, another running computer games, some are even specialised to support internal data centre needs, and so on,” says Jevdjic. “These days, all the big companies like Google and Microsoft have their own divisions that design hardware.”

“If you look at data centres today, they look nothing like they did at the time of the paper,” he says.

Reflecting back on his award-winning paper, which has been cited more than 1000 times, Jevdjic says he’s most proud of the actual impact that this work has had. “Today, more than 10 years, later, CloudSuite is widely used by hundreds of research labs around the world. And, importantly, our conclusions have been validated by the biggest cloud computing providers and served as design guidelines for multiple important and successful systems, both in industry and in academia,” Jevdjic concludes.


Paper: Clearing the Clouds: a study of emerging scale-out workloads on modern hardware