Roger Zimmermann has been in the business for a long time — nearly 25 years to be precise. He first started studying media streaming in the late 1990s, as a young, earnest PhD student at the University of Southern California.
Since then, the industry has undergone colossal changes, says Zimmermann, now a professor at NUS Computing. “Previously, people had these specialised connections from their cable providers. But now most video streaming, whether it’s via YouTube, Netflix, or Amazon Prime, happens over the internet.”
Digitalisation has made videos much more accessible and the variety of entertainment available today is simply astounding, he says. But there has been a major drawback: the latency, or the time delay of something appearing on your screen, has gotten longer and longer.
Poor latency is annoying enough when you’re trying to stream a show and it stalls, but it is especially frustrating when you’re viewing a live event. We want, for instance, to watch a football match and witness the goals being scored in real time as the ball flies from the striker’s foot into the net...and not have to wait on tenterhooks for the video to render through the results.
“Recently there has been a push to try and reduce that latency so that it becomes much closer to what we call the Live Edge, which is what’s happening right now and the ‘edge’ of the data that is available,” says Zimmermann.
It’s a fairly niche area of research, he admits, but one with widespread implications for everyone. “If you look at traffic on the internet, roughly 80% is video streaming. Many people don’t realise what a tremendous part of the internet it is.”
All in a second
The holy grail that researchers seek is a one-second latency when delivering live content. Attaining this would mean coming as close as possible to “glass-to-glass” latency, where the delay between an image being caught on the glass lens of the camera and appearing on the display glass of a viewing screen is zero, says Zimmermann.
Presently, what’s displayed on video playbacks typically lag behind the live action by five to 30 seconds, depending on the streaming conditions.
But achieving low latency is challenging for a number of reasons. For one, a video has to be captured, encoded, packed and transferred from a server to the end user in a matter of mere seconds. For another, data transmission is subject to a number of network conditions, such as how many people might be trying to watch Squid Game, conduct a Zoom call, or watch an e-sports match at the same time as you.
Most video platforms today, including Netflix and YouTube, stream their content using a technology called HAS. Short for ‘HTTP Adaptive Streaming’, HAS breaks down a video into small chunks of data and encodes these chunks at different quality levels. At the receiving end, a video player uses an algorithm to decide which quality chunk to download and playback the video.
“The player basically pulls data from the servers and makes decisions of ‘When do I need the next piece of data?’, ‘What quality do I need?’”, explains Zimmermann. “So if your connection is very good, the player is going to get a high-definition video. But if the connection is not that great, it will download a lower quality.”
A video player makes these decisions every few seconds, with the help of low-latency algorithms. There are many algorithms freely available, says Zimmermann, and it’s up to a streaming platform to decide one which to use. “The problem is there is no perfect algorithm. Every one involves some tradeoffs.”
If you make the latency too short, for instance, the video player might run out of data too quickly and the video being streamed is more likely to stall. Or its quality might fluctuate, with the image clear one minute but fuzzy the next.
“There are a number of factors that you want to carefully balance in order to get the best possible quality of experience for the end user,” says Zimmermann. “But there isn’t a perfect solution for this and that’s why there are a number of different algorithms and it’s why people still do research in this area.”
Filling a bucket
For Zimmermann and his lab, low latency is something they’re been pursuing for the past five years. They have since invented a number of award-winning algorithms for video players on DASH, the HAS protocol used by most of the world.
“We’re on the cutting edge of where things are, we’re really pushing the envelope,” says Zimmermann. “There are only two or three algorithms that can get below three seconds of latency — and we’re one of them.”
One successful algorithm Zimmermann and his team — comprising students May Lim and Mehmet Akcay, postdoc Abdelhak Bentaleb, and collaborator Associate Professor Ali C. Begen at Turkey’s Özyeğin University — have come up with is called Low-on-Latency (LoL), which they discuss in this 2020 paper.
“LoL is basically three modules working together to ensure video streaming goes smoothly,” he explains.
The job of the first module is to figure out how good the network connection is. “It’s not that simple because the connection is constantly changing due to other traffic on the network,” says Zimmermann.
“When we looked into this, we found that many of the measuring modules currently available actually give fairly inaccurate measurements,” he adds. “And if the measurement is inaccurate, then your decision module isn’t going to do a great job because it relies on those figures to decide how much data it has in its buffers, how long it can still play before it runs of data, which data chunk it’s going to get next, and so on.”
So the researchers’ first task was to build a better measuring module. Once that was accomplished, they then turned their attention to the second module, a decision-making one more formally known as the adaptive bitrate (ABR) selection module.
The video streaming process is akin to that of filling a bucket, says Zimmermann. The video player is like a bucket you’re trying to fill with data. “On one side, you want to get new data into the bucket. And on the other side, the data goes out of the bucket to your screen and speakers.”
Managing this data flow is key to preventing the bucket, or video player, from getting too full or too empty. The ABR selection module is the “logic centre” that processes various measurements taken by the first module and makes decisions on the data input and output rate, thus keeping the entire process in check.
LoL’s third module — the Playback Control module — is one of algorithm’s distinguishing features. “For most playback modules, there’s normally not much you can do with the data — it just goes out at a rate of 30 frames per second,” explains Zimmermann. “But with the adaptive playback module we have, we can slow that down a little.”
Low latency, or streaming content quickly, comes with the risk of the video player running out of data and stalling — a scenario that most viewers loathe, says Zimmermann. Reducing the playback rate to 26 or 28 frames a second, a lag the viewer would scarcely notice, prevents this from happening. “So this module is a little knob we can play with to adapt the playback rate accordingly,” he says. “It’s a neat little trick.”
The result? An algorithm that offers high-quality streaming at one of the lowest latency rates in town.
Zimmermann and his team have since improved on LoL, launching a new version called LoL+ last December. They are now working on adapting the algorithm for use with Apple’s video streaming player (unlike other platforms, Apple doesn’t use DASH).
“There are exciting directions and challenges ahead — things such as 3D streaming environments, as exemplified by the metaverse from Meta (previously Facebook),” says Zimmermann. “Combine this with fast 5G networking and beyond, it’s going to be an exhilarating future.”