Spotting concurrency bugs in software with sampling

13 April 2023

Department of Computer Science, Faculty, Feature, Research

Umang Mathur

NUS Presidential Young Professor

Computer Science

SHARE THIS ARTICLE

In the summer of 1983, the government organisation Atomic Energy of Canada Limited launched its newest radiation therapy machine. The Therac-25 was highly anticipated — it boasted a revolutionary dual treatment mode (employing either a powerful electron beam or X-rays to kill cancer cells), was more compact than its predecessors, and could be controlled entirely by a computer.

But the machine would soon become synonymous with causing some of the worst accidents in the history of radiation therapy. In the first reported incident in June 1985, a 61-year-old woman receiving treatment for breast cancer in Georgia said she felt a “tremendous force of heat…this red-hot sensation” when the machine turned on. The radiation sloughed layers of skin off her back and arms, eventually causing her left arm to become paralysed. A year later, a different patient in Texas described seeing a bright light in his treatment room, hearing eggs frying, and feeling his face on fire. The man, who had skin cancer, died three weeks after. The cause of death? Radiation burns to his right temporal lobe and brainstem.

In all, Therac-25 machines would seriously injure and kill six people — the result of them delivering massive overdoses of radiation to patients, sometimes up to 100 times the safe limit. The machines were decommissioned in February 1987, with investigations tracing their malfunction to two main causes: inadequate safety checks and bugs in the software.

The Therac-25 tragedy illustrates how software errors can be devastating, says Umang Mathur, an assistant professor at NUS Computing. “In the past, they’ve led to loss of lives. They are also a leading cause of issues such as security vulnerabilities, data corruption, crashes in software, poor performance, blackouts, and so on.”

Today, nearly four decades on, the problem has only intensified. “The real issue is that software applications are becoming increasingly complex, and ensuring software correctness is becoming more and more challenging,” says Mathur, who specialises in detecting such software bugs.

Out of order

The particular bug in question is what’s known as a data race. These bugs occur in software applications that are comprised of many different components running concurrently — in other words, the way nearly all our devices function today. “Most computing devices, including our mobile phones, run on multicore processors, meaning they comprise of small computers that run different parts of a software application in parallel,” explains Mathur. “These different parts, often termed threads or processes, run at the same time, frequently interacting with each other to achieve a shared high-level task.”

Programmers typically use different forms of synchronisation to carefully choreograph the interaction between different threads by enforcing an order in which they execute. The precise order in which these threads exercise their programmed instructions is critical to the success of the larger task at hand. For instance, changing the order of loading two different components of a web page can lead to very different outcomes, he says. “If two threads do not communicate or synchronise properly, then a data race arises.”

Over the years, computer scientists have developed ways to assist programmers to automatically detect concurrency bugs like data races by analysing the dependencies between threads. “Basically, when your software is running, such methods try to observe the execution and make an inference about whether there is a bug or not,” says Mathur.

There’s a high price to pay for this observation, however. “It slows down the performance of your software very much,” he says. “As a result, if I have to use these tools to detect data races, I have to think a lot to determine if it’s really worthwhile to trade the performance of my software for the level of assurance given by these tools.”

That is partly why most large software firms today only test for concurrency bugs (such as data races) in-house, during the initial phases of software development, before the deployment phase. But the real world can vary vastly from the controlled in-house environment. Google, for instance, must deal with heavy user traffic directed towards its search engine — a scenario that is extremely hard to simulate during the testing phase. “What often happens is you might miss bugs that get triggered only when the software is actually running under heavier and more realistic workloads,” says Mathur.

Sampling offers a solution

The computer scientist began tackling the issue of data races at the end of 2021, around the time he joined NUS, after working as a researcher at Meta (then Facebook Inc.) and the University of Illinois Urbana-Champaign (UIUC), where he obtained his PhD. Last year, he and his colleagues in Denmark and the U.S. announced they had found a new way to track causality in concurrent systems — and thus detect data race bugs — in a manner that is much more efficient than existing techniques. They did this by employing a novel data structure called ‘tree clocks’ to implement timestamping, which is a fundamental operation employed in many distributed and concurrent applications.

This time, Mathur has designed another method that can detect data races in real world software, minus the performance degradation that existing techniques engender.

His new data race detection technique, called Race Property Tester or RPT for short, uses the concept of sampling to reduce the analysis cost involved in detecting data race bugs in multi-threaded software. It works by examining a small proportion of the events generated during the execution of a software application, and accurately determines if the underlying application contains data races.

“The insight underpinning RPT is to treat the data race detection problem as if it was a big data problem. With this view, if you’re looking at a stream of events generated during the execution of a multi-threaded software, instead of studying the entire stream, you only look at parts of it, which would naturally reduce your total analysis time,” explains Mathur. “The best part is: the number of events that RPT needs to sample does not grow even when the execution’s size keeps on growing!”

To assess RPT’s performance, Mathur and his UIUC collaborators compared it against more than 140 benchmark software applications, thoroughly evaluating its effectiveness by studying factors such as run time, likelihood with which it discovers a race, whether a minimum number of bugs need to be present for the likelihood of detecting bugs to be high, and so on. “Typically, researchers don’t perform an evaluation on such a large scale, but we opted for this scale because we wanted to be really sure that whatever we’re doing makes sense and is indeed useful for practitioners” says Mathur.

The empirical evaluation of RPT, he says, was in line with the theoretical assessment. “We found that even when software applications generate more than a billion events, RPT can detect data races with very high probability, by sampling only a very small number of events.” When compared with two state-of-the-art data race detection techniques, FastTrack and Pacer, RPT demonstrated the fastest run time. Additionally, it never reported any false positives.

The researchers presented their findings at the Symposium on Principles of Programming Languages (POPL), a premier computer science conference, held in Boston, Massachusetts this January. Their work was recognised with the Distinguished Paper Award — a recognition bestowed to the top 10% of conference papers.

Mathur and his team are now working to see if they can integrate RPT into fuzz testing (an emerging first-line approach used by software firms to detect bugs) and whether they can use a similar sampling-style approach for detecting concurrency bugs other than data races. “In general, a key theme in my research is to make software more reliable and robust,” says Mathur. “I think about how mathematical and algorithmic reasoning can help eliminate critical errors that can otherwise cause undesired behaviour in software applications.”

Paper: Dynamic Race Detection with O(1) Samples

Trending Posts

24 September 2021

Making sense of messy data with ThunderGP

Choice is good, but sometimes having too much choice can be a bad thing. Just ask anyone who’s ever tried to delve into a new film on Netflix, discover new ...

6 November 2024

Reasoning and Planning: New Frontiers for AI

If artificial intelligence (AI) were a person, it would be an adolescent who’s just gone through a growth spurt and come of age. AI can now detect tumours with great ...

20 February 2025

The Future of AI in Software Development: How AutoCodeRover is Changing the Game

Artificial Intelligence is transforming every industry, and software development is no exception. AI has already made its mark through tools like GitHub Copilot, which assists developers in writing code by ...

23 October 2020

The Perils of Paying for Product Reviews

These days, we live and buy by online reviews. Looking for a pair of headphones? Wondering what movie to stream or if you should splash out for the new PlayStation ...

13 August 2019

The dilemma of an unknown diameter

They say that in the future, vehicles will be able to talk. Not in the way that those in the Pixar movie “Cars” do, but more in the sense of ...

28 April 2022

Explainable AI gets more human-centric — thanks to cognitive psychology

Imagine if Amazon Alexa could recommend a tub of ice cream or Siri could play a cheerful song if they hear sadness in your voice. AI voice recognition can now ...

30 December 2024

Unlocking the Power of High-Dimensional Simulations with STDE

In a world increasingly driven by artificial intelligence and complex computations, tackling the most challenging problems—from modeling galaxies to designing personalized medicine—requires innovation. One such breakthrough is the Stochastic Taylor ...

26 September 2018

Souping up parity games

It was a sweltering summer morning in July 2016, and the olive grove close to the town of Andria in the Italian countryside baked silently in the 40-degree heat. But ...

26 July 2023

Motivating and Sustaining Heterogenous Exercisers: No One Size Fits All Solution

If you like to dabble in exercise — whether as a weekend warrior, Ironman contender, or somewhere in between — you might remember 2015 as being an exciting year. Fitbit ...

10 December 2021

No wonder our minds wander!

It’s a pandemic-era feeling we’re all familiar with — you’re listening to a colleague on Zoom or attending an e-learning course...when your mind starts to wander. How many emails do ...

4 June 2025

Seeing Safety: How Augmented Reality Could Transform Drone Inspections Forever

Associate Professor Ooi Wei Tsang and his team at NUS Computing have developed SafeSpect, an adaptive augmented reality system that enhances drone pilots' situational awareness for drone inspections. ...

19 February 2025

Finding the Fastest Route: How a New Algorithm is Revolutionizing Shortest Path Calculations

Finding the Fastest Route: How a New Algorithm is Revolutionizing Shortest Path Calculations Imagine you’re planning the fastest route to work, navigating through a city or even across a massive ...

3 September 2021

Empowering people through Fintech

In 2014, Tan Tianhui was in the second year of her PhD at NUS Computing when she heard about a “special product” that everyone back home in China couldn’t stop ...

1 April 2021

Aliens, spaceships, and time warps — programming lessons get funky with the Source Academy

When computer science freshmen first begin their undergraduate degree at NUS Computing, they’re required to take an innocuous-sounding module called CS1101S. There, they are introduced to the fundamentals of computer ...

22 October 2021

Bug-bane begone — enter the era of Automated Program Repair

Consider a programmer sitting at her desk, trying to fix an error in a software system. First, she had to determine what was causing the problem and trace its source ...

13 December 2024

Exploring DiffPath: A Revolutionary Approach to Detecting Out-of-Distribution Data with AI

In the world of artificial intelligence (AI), one major challenge is teaching models to recognise when they encounter something they’ve never seen before—known as out-of-distribution (OOD) data. Imagine training a ...

3 January 2023

So you have a dataset? Think about the values it’s missing

Imagine that you’re a book publisher gathering feedback for a new novel that your firm has recently released. Sales figures are useful, but you’re keen to find out more about ...

13 November 2018

Of beer and diapers, and other sale-boosting tricks

One of the most famous folklore in marketing and data mining goes like this: many years ago, Walmart noticed that on Fridays, men would head to the store, pick up ...

27 December 2019

Move over Alfred, there’s a new butler in town

The shiny, black robotic arm gleamed as it whirred into action and ‘waved’ at us, accompanied by Alexa’s robotic, yet (somehow) cheery, disembodied greeting, “Hello! My name is MICO.” Mohit ...

4 May 2025

Unlocking the True Potential of Enterprise Systems: Why User Behavior Matters More Than You Think

A new study by NUS Computing’s Assoc Prof Tan Chuan Hoo reveals how leadership, user mindset, and system design determine whether enterprise systems are used effectively—or fail despite good technology. ...

28 May 2019

What Bayesian Optimisation can teach us about baking better cookies and more

Mention “Bayesian Optimisation” to Professor Bryan Low Kian Hsiang and he begins to talk about baking cookies. That’s because to the uninitiated, concepts such as “distributed batch Gaussian process optimisation” ...

25 November 2019

Making Bitcoin Safer — By Breaking It

In Greek mythology, Erebus is the primeval god of darkness, son of Chaos. It’s also the region of the underworld, where souls pass through after dying. The word is so ...

8 February 2019

Sing Your Way to Language Success

Have you ever struggled to learn a new language? Maybe you should spend less time trying to speak it and start singing instead! ...

2 October 2019

Quicker MRIs in the future? Machine learning can help

If you’ve ever had an MRI done, you would know that it’s not the most comfortable experience. They can make you feel claustrophobic, you’ll often hear loud thumping or tapping ...

5 August 2021

Covid, cake-cutting, and fair resource allocation

The School of Computing at NUS is set in tranquil surroundings — buildings atop gentle hills are connected by breezy walkways, and research labs and classrooms look out onto lush ...

9 March 2023

Making IoT devices that are everlasting

If you awoke this morning feeling a little more tired than usual, you might have glanced at your FitBit to see how many REM sleep cycles you clocked last night. ...

9 July 2022

Mastering the beast that is the company-wide IT system

Anyone who’s part of an organisation, big or small, will likely be familiar with a company-wide IT system of some sort. It’s the boon and bane of many an employee’s ...

19 August 2020

The path to startup success: finding product market fit

In 2015, Shi Ying Lim was working on her Ph.D. in Austin, Texas. As part of her work, she studied a budding health IT startup that was trying to develop ...

27 March 2023

New Algorithm Revolutionises a Decades-Old Estimation Problem

When Covid-19 came barrelling through the world, it upended nearly every aspect of our lives, forcing us to live, work, and play in completely new ways. We became accustomed to ...

3 June 2025

Ripple Effects of Empathy: How a Distant Disaster Reshaped Global Microloans

A new study by NUS Computing’s Shaw Professor Bernard Tan and his former PhD students reveals how empathy and subtle social cues shape generosity on microlending platforms like Kiva. ...

Spotting concurrency bugs in software with sampling

SHARE THIS ARTICLE

Trending Posts

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES