When cloud providers pool and throttle to win the race

15 June 2020
Assistant Professor
Department of Computer Science

When Yingda Zhai was working on his PhD in Austin, Texas, he used to stroll through the neighbourhood he lived in not too far from campus. On these walks, he saw something that puzzled him, something that would set the course of his research for the next few years.

What Zhai noticed was this: his neighbourhood wasn’t that well-to-do, and lining the streets were shops like MetroPCS, Cricket Wireless, and FamilyMobile. These small cell phone companies, also known as mobile virtual network operators (MVNOs), offered cheap plans without any contracts or credit checks. But the snag was that they came with slower connection speeds, smaller network coverage, and without features such as phone tethering.

The MVNOs are brands in their own right, but belong to one of the four major cell phone companies in the U.S., which allow smaller companies to use their cell towers to power their services.

“It’s interesting because why do the main carriers buy back these secondary virtual operators?” asks Zhai, now an assistant professor at NUS Computing. Curious, he wanted to find out more about how large network operators allocate their limited resources — in this case, minutes, radio frequency bandwidth, etc. — and design their pricing plans accordingly.

Cloud provider woes

Zhai decided to focus his efforts on cloud computing. The industry, he reasoned, has boomed in the past decade, plus it has large network operators, similar to how the telecommunications sector does.

The ‘cloud’ conjures notions of an abstract, non-physical place on the internet for many of us. It’s a place where we can offload photos from our phone, and backup files, contacts, and so on. But the cloud actually offers services beyond storage.

Because it comprises a vast network of remote servers — called “data farms” — around the world that are linked together to form a single computing infrastructure, the cloud provides individuals and organisations the benefits of flexible resources and economies of scale. It is used for a range of purposes, such as streaming videos and delivering online video games, creating virtual desktops and enabling webmail, testing and developing software, just to name a few.

But cloud providers, such as Microsoft Azure and Amazon Web Services, face a major challenge when it comes to operating: randomness. It is hard to predict when users might require their services, and to estimate the size and scope of their demands, says Zhai. Providers also need to reserve some system capacity to accommodate for any sudden surges in demand.

“As a result, some capacity remain idle most of the time,” he says. This, combined with varying utilisation levels can lead to inefficiencies and profit loss.

“So we wanted to see if we could find an optimal strategy for doing business,” says Zhai. He teamed up with Maxwell Stinchcombe and Andrew Whinston from the University of Texas at Austin, and the trio developed an analytical framework to study the economics of cloud computing.

Making the queue predictable

The first thing the team studied using the framework was the queuing system — the virtual line of customers a cloud provider has waiting to use its services at any one time. Similar to how the owner of a takeaway store has varying customers depending on the weather, time of day or week, etc., the number of customers in the queue can be hard to predict.

Minimising this randomness is much easier when you’re a large cloud provider, Zhai and his collaborators realised. “The bigger your network, the bigger the market you control, and the lower aggregate uncertainty you observe,” explains Zhai.

That’s because the law of large numbers comes into play. A takeaway store that is popular and has a good reputation knows it can always count on customers coming in to buy its food, and thus can order sufficient ingredients and prepare enough dishes accordingly without too much food going to waste. Likewise, a network provider that commands a large market share knows it will always have customers waiting to use its services, no matter the time of day.

“If you’re an operator that provides computing services to only one company from 8am to 8pm, it means your utilisation level is really low outside of these hours,” says Zhai. “But if you provide your services to 10,000 companies across the world, your network will always be on 24/7.”

“So even though each user’s demand is uncertain, when we pool it altogether, the aggregate uncertainty is gone,” he says. Being able to predict the queue enables a provider to plan its service features accordingly, better manage its capacity, and to maximise its profits.

Segmenting the market
The team then came to a second realisation with their framework, one that helped Zhai answer the question that first struck him all those years ago when walking around his Austin neighbourhood.

To maximise their earnings, large network providers tend to adopt what Zhai refers to as a throttling strategy. This involves segmenting customers into different tranches, according to their ability and willingness to pay for varying qualities of service. Similar to mobile phone users, customers of the cloud have differing tolerances for connection losses, speed slowdowns, and other disruptions.

“You have some serious users of the network who want little or no interruption. For example, if Netflix has a delay of service to its users, they’ll easily get crowded out by their competitors. So they need to make sure their service is of the highest quality and that’s why they pay higher amounts,” says Zhai.

“But people like myself who use the cloud on and off using a free service, we’re fine with more interruptions because we have lower delay costs,” he says.

Service providers have the capacity to provide better services, but they hold it back on purpose. Such a throttling strategy attracts the low end of the market while preventing high-end users from leaving their premium contracts, thereby helping firms maximise profit margins even further.

The team’s findings, published in this paper, are interesting because they can be applied to a number of other industries with large network providers. These include those that provide online storage facilities (such as Dropbox and Google Drive) and even travel agencies (like Expedia and Airbnb).

The team is now studying how cloud pricing is affected by aggregate uncertainty when there is correlated Internet traffic. These are instances where an individual’s or a company’s demand for cloud data affects others.

“This will help cloud providers deal with sudden surges in demand in light of unforeseen events, such as the coronavirus pandemic which is shifting businesses, schools, etc. from offline to online, dramatically increasing cloud demand in a short period of time,” says Zhai.

Mechanism Design in Large Cloud Computing Systems

Trending Posts