Spotting concurrency bugs in software with sampling

13 April 2023
NUS Presidential Young Professor
Computer Science
SHARE THIS ARTICLE

In the summer of 1983, the government organisation Atomic Energy of Canada Limited launched its newest radiation therapy machine. The Therac-25 was highly anticipated — it boasted a revolutionary dual treatment mode (employing either a powerful electron beam or X-rays to kill cancer cells), was more compact than its predecessors, and could be controlled entirely by a computer.

But the machine would soon become synonymous with causing some of the worst accidents in the history of radiation therapy. In the first reported incident in June 1985, a 61-year-old woman receiving treatment for breast cancer in Georgia said she felt a “tremendous force of heat…this red-hot sensation” when the machine turned on. The radiation sloughed layers of skin off her back and arms, eventually causing her left arm to become paralysed. A year later, a different patient in Texas described seeing a bright light in his treatment room, hearing eggs frying, and feeling his face on fire. The man, who had skin cancer, died three weeks after. The cause of death? Radiation burns to his right temporal lobe and brainstem.

In all, Therac-25 machines would seriously injure and kill six people — the result of them delivering massive overdoses of radiation to patients, sometimes up to 100 times the safe limit. The machines were decommissioned in February 1987, with investigations tracing their malfunction to two main causes: inadequate safety checks and bugs in the software.

The Therac-25 tragedy illustrates how software errors can be devastating, says Umang Mathur, an assistant professor at NUS Computing. “In the past, they’ve led to loss of lives. They are also a leading cause of issues such as security vulnerabilities, data corruption, crashes in software, poor performance, blackouts, and so on.”

Today, nearly four decades on, the problem has only intensified. “The real issue is that software applications are becoming increasingly complex, and ensuring software correctness is becoming more and more challenging,” says Mathur, who specialises in detecting such software bugs.

Out of order

The particular bug in question is what’s known as a data race. These bugs occur in software applications that are comprised of many different components running concurrently — in other words, the way nearly all our devices function today. “Most computing devices, including our mobile phones, run on multicore processors, meaning they comprise of small computers that run different parts of a software application in parallel,” explains Mathur. “These different parts, often termed threads or processes, run at the same time, frequently interacting with each other to achieve a shared high-level task.”

Programmers typically use different forms of synchronisation to carefully choreograph the interaction between different threads by enforcing an order in which they execute. The precise order in which these threads exercise their programmed instructions is critical to the success of the larger task at hand. For instance, changing the order of loading two different components of a web page can lead to very different outcomes, he says. “If two threads do not communicate or synchronise properly, then a data race arises.”

Over the years, computer scientists have developed ways to assist programmers to automatically detect concurrency bugs like data races by analysing the dependencies between threads. “Basically, when your software is running, such methods try to observe the execution and make an inference about whether there is a bug or not,” says Mathur.

There’s a high price to pay for this observation, however. “It slows down the performance of your software very much,” he says. “As a result, if I have to use these tools to detect data races, I have to think a lot to determine if it’s really worthwhile to trade the performance of my software for the level of assurance given by these tools.”

That is partly why most large software firms today only test for concurrency bugs (such as data races) in-house, during the initial phases of software development, before the deployment phase. But the real world can vary vastly from the controlled in-house environment. Google, for instance, must deal with heavy user traffic directed towards its search engine — a scenario that is extremely hard to simulate during the testing phase. “What often happens is you might miss bugs that get triggered only when the software is actually running under heavier and more realistic workloads,” says Mathur.

Sampling offers a solution

The computer scientist began tackling the issue of data races at the end of 2021, around the time he joined NUS, after working as a researcher at Meta (then Facebook Inc.) and the University of Illinois Urbana-Champaign (UIUC), where he obtained his PhD. Last year, he and his colleagues in Denmark and the U.S. announced they had found a new way to track causality in concurrent systems — and thus detect data race bugs — in a manner that is much more efficient than existing techniques. They did this by employing a novel data structure called ‘tree clocks’ to implement timestamping, which is a fundamental operation employed in many distributed and concurrent applications.

This time, Mathur has designed another method that can detect data races in real world software, minus the performance degradation that existing techniques engender.

His new data race detection technique, called Race Property Tester or RPT for short, uses the concept of sampling to reduce the analysis cost involved in detecting data race bugs in multi-threaded software. It works by examining a small proportion of the events generated during the execution of a software application, and accurately determines if the underlying application contains data races.

“The insight underpinning RPT is to treat the data race detection problem as if it was a big data problem. With this view, if you’re looking at a stream of events generated during the execution of a multi-threaded software, instead of studying the entire stream, you only look at parts of it, which would naturally reduce your total analysis time,” explains Mathur. “The best part is: the number of events that RPT needs to sample does not grow even when the execution’s size keeps on growing!”

To assess RPT’s performance, Mathur and his UIUC collaborators compared it against more than 140 benchmark software applications, thoroughly evaluating its effectiveness by studying factors such as run time, likelihood with which it discovers a race, whether a minimum number of bugs need to be present for the likelihood of detecting bugs to be high, and so on. “Typically, researchers don’t perform an evaluation on such a large scale, but we opted for this scale because we wanted to be really sure that whatever we’re doing makes sense and is indeed useful for practitioners” says Mathur.

The empirical evaluation of RPT, he says, was in line with the theoretical assessment. “We found that even when software applications generate more than a billion events, RPT can detect data races with very high probability, by sampling only a very small number of events.” When compared with two state-of-the-art data race detection techniques, FastTrack and Pacer, RPT demonstrated the fastest run time. Additionally, it never reported any false positives.

The researchers presented their findings at the Symposium on Principles of Programming Languages (POPL), a premier computer science conference, held in Boston, Massachusetts this January. Their work was recognised with the Distinguished Paper Award — a recognition bestowed to the top 10% of conference papers.

Mathur and his team are now working to see if they can integrate RPT into fuzz testing (an emerging first-line approach used by software firms to detect bugs) and whether they can use a similar sampling-style approach for detecting concurrency bugs other than data races. “In general, a key theme in my research is to make software more reliable and robust,” says Mathur. “I think about how mathematical and algorithmic reasoning can help eliminate critical errors that can otherwise cause undesired behaviour in software applications.”

Paper: Dynamic Race Detection with O(1) Samples

Trending Posts