Detecting logic bugs in a way that’s quicker and more effective

5 January 2024
Logic Bugs
Assistant Professor
Computer Science

Sometime between 2019 and 2020, a curious phenomenon began surfacing on Signal, FaceTime, and four other mobile messaging applications: someone could ring a person up and listen in to the audio of their surroundings — all without the receiving party picking up the call.

It was “an attack scenario I had never seen considered on any platform,” wrote security researcher Natalie Silvanovich on Twitter (now X) in January 2021, when she first discovered the software flaws. Although the app developers moved quickly to fix the bugs, the incident represented a “serious” vulnerability that was “unusual and possibly unprecedented,” said Silvanovich, who works at Google Project Zero, a group that studies “the most insidious flaws” in hardware and software systems around the world.

Investigations would later trace the security lapse to logic bugs. Logic or logical bugs are bugs (e.g. assigning the wrong value to a variable) that can disrupt the intended workflow of software. Unlike some other bugs that can cause a system to crash or a particular process to terminate, logic bugs silently compute and execute an incorrect result, making it difficult to notice their effects. Malicious actors can take advantage of these flaws in computer code and manipulate software program into behaving in unintended, unexpected, or undesired ways.

Apart from the phone hacking Silvanovich identified, another well-publicised incident occurred in 2018 when hackers exploited Facebook’s “View As” function, gaining access to the personal information of 29 million users.

Computing errors aside, logic bugs “can have very severe consequences in the real world,” says Jinsheng Ba, a PhD student who studies the topic under the guidance of Manuel Rigger, an assistant professor at NUS Computing who heads the Trustworthy Engineering of Software Technologies (TEST) lab.

Elusive bugs

Data Management Systems, as such software is called, act as an interface between an end-user and a database, helping to extract, store, organise, manage, and utilise data. Almost all smartphones, web browsers, television sets, and set-top cable boxes, as well as countless millions of other applications like Skype, iTunes, and Dropbox run on Data Management Systems like SQLite.

Finding logic bugs is tougher, especially in database systems, which have many moving parts and keep track of information. For example, metrics currently used to check for issues, like code coverage, might not catch these logic bugs very well. Code coverage provides insights on how much of the code has been executed during testing. However, it may not be sufficient to fully measure the current condition of the system. Understanding and tracking the changing state of a database is more complex than in stateless systems, making it easier for subtle logic bugs to go unnoticed.

Catching logic bugs in a timely manner, therefore, is critical — not only in light of their potentially devastating impacts, but also because they have the ability to affect the fundamental software systems that underpin nearly every computing device.

To prise out logic bugs requires two things. “First, we need some kind of input — a test case — that we can pass to the system,” explains Rigger. “Then we need to validate that the system computes the correct result from this input.”

But for test cases to work well, they have to be sufficiently “interesting”. In other words, they have to stress various aspects of the Data Management System in order to increase the chances of picking up bugs. However, it is a Catch-22 situation: because software debuggers do not know in advance which logic bugs can affect the system, there is no clear definition or metric on what constitutes an “interesting” test case.

Intrigued by this problem, Rigger and Ba sought to find a solution. “Existing techniques generate test cases somewhat haphazardly,” says Rigger. So he and Ba aimed to devise a method that would help guide automatic testing towards interesting test cases and thus, bugs that would “make our databases more reliable,” says Ba. “That’s what motivated our research.”

Towards complexity

When Rigger and Ba began their work early last year, they decided to focus their efforts on query plans, the sequence of steps that a database management system uses to execute a command (an instruction given to a system to execute a specific action) or query (a request to get information from the system).

To understand how query plans work, Rigger says to imagine asking a chef for a quiche to satisfy your craving for a savoury tart. The chef has to prepare the dish via a number of steps: kneading the buttery pastry, preparing the cheese and egg filling, before baking it to perfection. “This sequence of steps corresponds to the query plan,” he says.

“We speculated that by covering more complex query plans, we could increase the likelihood of discovering logic bugs,”

When he and Ba examined the query plans of previously found logic bugs, they realised that they were all very simple in nature. “We speculated that by covering more complex query plans, we could increase the likelihood of discovering logic bugs,” Ba says.

To test this hypothesis, the pair devised a special algorithm called the Query Plan Guidance (QPG) that gradually applies promising mutations to the database state, thus guiding subsequent queries towards more unique and complex query plans.

More bugs, less time

When the researchers applied QPG to three widely used database systems, including SQLite, they detected 53 unique and previously undiscovered logic bugs (35 of which have since been fixed by developers).

“Three bugs in SQLite had been hidden for more than six years before we found them,” says Rigger.

Impressively, the new approach proved much more efficient than existing state-of-the-art bug-detection methods, covering nearly 410 times more query plans within a span of 24 hours.

“One key advantage is our method can work with any existing test oracles,” says Rigger, referring to the mechanism that determines whether a program has passed or failed a test.

“The biggest advantage of our method over previous ones is that it can generate diverse and interesting test cases to automatically find logic bugs without needing to acquire the source code of the target program,”

“The biggest advantage of our method over previous ones is that it can generate diverse and interesting test cases to automatically find logic bugs without needing to acquire the source code of the target program,” adds Ba. “So it can be applied to proprietary databases too.”

Their findings, which were published in May 2023 and presented at the esteemed Electrical and Electronic Engineers (IEEE)/Association for Computer Machinery (ACM) 45th International Conference on Software Engineering in the same month, have made waves both within academia and industry. For a start, their submission received the Distinguished Paper Award — a recognition bestowed upon the top 5% of papers at the conference. A handful of universities and software companies have also invited Ba to speak about the pair’s work.

Additionally, the researchers have made their work open-sourced, and since May, a joint research group from an academic institution and industry has independently reproduced their findings. 

Rigger and Ba are now working on detecting other types of bugs that attack software systems, including performance bugs. They recently published a paper on the topic. At the end of the day, Rigger hopes his lab will continue to propose new and interesting bug-detection approaches, and to create impactful open-source tools that will help improve the software we all rely on. “In this way, we hope to make these systems more reliable and increase people’s trust in them,” he says.

Paper: Finding Performance Issues in Database Engines via Cardinality Estimation Testing

Trending Posts