24 September 2021 Department of Computer Science , Faculty , Feature , Systems & Networking

Choice is good, but sometimes having too much choice can be a bad thing. Just ask anyone who’s ever tried to delve into a new film on Netflix, discover new songs on Spotify, or search for a suitable toy to buy their niece on Amazon — the options seem endless and often paralysing.

Thank goodness then for the AI that comes to our rescue, offering recommendations on what to watch, listen, and buy, helping us to narrow our options and ease the decision-making process.

The fuel that feeds these clever machine learning algorithms? Data.

YouTube, for instance, gathers information from your search history, the channels you subscribe to, and what you’ve previously watched, to suggest new videos for you to view. “We also consider your context, such as your country and time of day...this helps us show you locally relevant news,” says YouTube on their website.

“Analysing the relationship between all these different data is very important,” says He Bingsheng, an associate professor at NUS Computing who specialises in big data management systems. “And doing it quickly so people can make a timely decision is important too — when you recommend an advertisement or video to your users, it doesn’t make sense to do it one year later.”

Sorting through the mess of data
Computers achieve this task through a process called graph processing. In computer science parlance, a graph refers to the way in which data is represented — entities are denoted by nodes (or vertices), while the relationships between them are depicted as edges (or connections). With the explosion of big data in recent years, graph processing has become immensely popular and important, with applications ranging from product recommendation to scientific research and fraud detection.

Picture of woman holding a smartphoneGraph processing, which is crucial for modern applications such as product recommendations, scientific research, and fraud detection, are enabled via accelerators. Associate Professor He Bingsheng and his research team set out to develop a new graph processing framework that would make graph processing ‘thunder fast’.

Computer scientists rely on a handful of ways to process graphs, most of which use accelerators to speed up computing power. One device in particular — the Field-Programmable Gate Array (FPGA), roughly the size of a small keyboard — has emerged as a favourite, lauded for its efficiency, low power consumption, and capacity for customisation, among other advantages. FPGAs are currently used to power Microsoft’s Bing search engine and Baidu’s machine learning platform.

But both graph processing and the devices that enable them, FPGAs, come with inherent problems. For one, graphs can contain a lot of information, says He. “Think of how big our population is and how many movies we watch everyday — the number is huge, in the tera or even peta scale,” he says, referring to the prefixes of one trillion and one quadrillion, respectively. (To appreciate just how much data that is, imagine this: the capacity of a human’s functional memory is estimated to be roughly 1.25 terabyte; playing 1 petabyte of songs would take over 2,000 years.)

“You want to have speed when you’re doing graph processing,” He explains.

“Ideally you would chop a graph into 10 pieces, for example, and then assign them to 10 different processors, and they would finish the work at the same time,” he says.

But because graphs are inherently irregular and unstructured — some nodes have more connections than others — it’s “difficult to achieve parallelism or assign the work equally,” says He. This, in turn, slows things down.

And when it comes to using FPGAs, software engineers face other challenges: they need to have a deep understanding and expertise of the hardware involved and use a tedious and time-consuming programming language.

Making graph processing “thunder fast”
So in 2018, He and his PhD students Chen Xinyu and Tan Hongshi set about trying to create a new graph processing framework that would help overcome these challenges. Together with collaborators from Singapore’s Advanced Digital Sciences Center, the University of Illinois, and China’s Alibaba Group, He devised a novel technique called ‘data shuffling,’ which helped dispatch the data in a graph to different processing elements more evenly and efficiently.

“Essentially, we tailored the hardware for the graph layout,” says He. “Because graphs are so irregular, the first task we wanted to do was improve the random access to data.”

Next, the researchers designed a caching mechanism that could “bring frequently accessed data closer to the processing element.” The result: processing times that were nearly three times quicker than existing methods.

“My students worked really hard to create this advanced system,” says He. “And because of all this parallelism, data access optimisation, and caching, our system performed much better than state-of-the-art systems.”

He’s team then turned their attention to integrating the technique into an easy-to-use framework. To do so, they developed a library comprising a number of software interfaces, or Application Programming Interfaces (APIs). All users have to do is pick their preferred APIs, write a few high-level functions, and let ThunderGP automatically generate the rest of the code.

“Developers can enjoy the performance of FPGA-accelerated graph processing by writing only a few lines of code using a basic language like C++, which typically takes a day,” explains He. “They don’t need to have any knowledge of the hardware or care about so many things.”

To test how well their new graph processing framework — which they called “ThunderGP” — performed in the real world, the researchers applied it to a case study that sought to estimate the prevalence of Covid-19 in a given population. The results were stunning: ThunderGP arrived at its predictions 419 times quicker than the standard CPU-based, graph processing approach.

Image of various surgical face masksAssociate Professor He Bingsheng and his research team developed a new graph processing framework called ThunderGP. To test how well it worked, ThunderGP was used to estimate the prevalence of Covid-19 in a given population. It arrived at its predictions 419 times quicker than the standard CPU-based, graph processing approach.

“Overall, ThunderGP is a very reasonable, ready-to-use system,” says He.

ThunderGP placed third at the 2020 Xilinx Adaptive Computing Developer Contest out of 79 teams. The framework is open-sourced and is freely available to download from Xilinx’s app store, the company that invented FPGAs.

Next up, He and his collaborators are looking into creating a hardware accelerator that is customisable. They are also exploring whether ThunderGP can be applied to support graph processing in neural networks, a subset of machine learning that’s employed in financial services, medical diagnostics, facial identification, and other applications.

The ultimate aim the end of the day, He says, is “to make our computing systems faster, greener, and cheaper.”


Paper: On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs