Fusing the Flows: How a New Testing Method Unearthed Decades-Old Flaws in PHP’s Core

20 August 2025
SHARE THIS ARTICLE

Fusing the Flows: How a New Testing Method Unearthed Decades-Old Flaws in PHP’s Core

 

The Invisible Engine of the Web

Every time you post on a blog, browse an online store, or check a university portal, there’s a good chance you’re interacting with PHP. This server-side scripting language quietly powers more than 70% of all websites, from personal pages to major platforms.

Its ubiquity is a double-edged sword. On one hand, PHP’s maturity and widespread use make it a trusted workhorse. On the other, any deep-seated flaw in the PHP interpreter – the program that executes PHP code – has enormous reach. Vulnerabilities at this level can threaten the confidentiality, integrity, and availability of millions of sites simultaneously.

Most security attention focuses higher up the stack, at the application layer: plugging SQL injection holes, patching cross-site scripting, and fixing logic errors. But the interpreter is a sprawling, million-line C codebase, susceptible to the kinds of low-level memory errors, including buffer overflows, use-after-free, null pointer dereferences, etc., that can enable severe exploits.

Finding those flaws is not trivial. They’re often buried in rarely traversed paths of code, triggered only by complex interactions that no one thought to test.

 

Why the Best Test Suites Still Miss the Worst Bugs

The PHP community isn’t careless about testing. Its official test suite, known as the “golden test bed,”  contains more than 19,000 cases covering core features and modules. These tests are valid, thorough, and widely respected.

Yet they share a common limitation: they tend to be simple, linear, and isolated. Each test probes a specific feature with straightforward inputs. Running two tests back-to-back doesn’t create the kind of intricate state interactions that can trip up deep memory management.

It’s in these untested interaction spaces, where one feature’s internal state bleeds unexpectedly into another, that the most stubborn bugs hide.

 

Enter FlowFusion

A team of security researchers at NUS Computing led by CS PhD student Jiang Yuancheng, in collaboration with fellow PhD students Zhang Chuqi, Ruan Bonan, and Liu Jiahao and advised by NUS Computing faculty members Assistant Professor Manuel Rigger, Associate Professor Roland Yap, and Associate Professor Liang Zhenkai, tackled this challenge with FlowFusion, the first automated fuzzing framework designed specifically to root out memory errors in the PHP interpreter.  Their paper, entitled “Fuzzing the PHP Interpreter via Dataflow Fusion,” recently won the Distinguished Paper Award at the 34th USENIX Security Symposium in Seattle, Washington, USA.  

Its signature move is data flow fusion; a method for creating rich, interaction-heavy tests by intelligently merging existing ones. Instead of simply concatenating scripts, FlowFusion analyzes the way variables and data structures move through one test and interleaves them with another. By weaving together two unrelated flows, it generates entirely new execution paths that were never explicitly tested before – paths that can reveal latent defects.

But here’s where FlowFusion moves from clever to transformative:

  • It is now incorporated directly into PHP’s official repository, making it a permanent part of the language’s quality-assurance pipeline.
  • It may be the top bug reporter for PHP today, rivaling or surpassing human submissions and other automated tools.
  • It continues to find new bugs every week on average, proving it isn’t just a one-off sweep but an ongoing engine of discovery.

Together, these points mean that FlowFusion isn’t just a research artifact; it has become one of the most important active contributors to the security of the web itself.

 

How Fusing Flows Finds Fossil Bugs

One discovery illustrates the power of the approach.

  • Test A checked DOM object behavior; variables flowed through reference and node manipulation.
  • Test B checked base64 encoding; values moved through string handling and encoding functions.

Individually, they’re worlds apart. But FlowFusion linked them, feeding a DOM object into base64 encoding routines via a newly created fusion variable. This forced the interpreter into an unusual sequence of operations, including a foreach loop in a context it had never been tested for.

The result? A heap use-after-free vulnerability that had been lurking in PHP’s core since version 5.0.0 in 2004; a flaw old enough to vote.

No conventional unit test or random fuzzing run had ever triggered this chain of events.

 

More Than Just Merging

FlowFusion layers multiple strategies to maximize bug discovery:

  1. Test Mutation tweaks seed tests before fusion by replacing expressions or constants, injecting special values, while keeping syntax valid.
  2. Interface Fuzzing feeds complex, fused variables into PHP’s internal C-level functions to stress-test hidden code paths.
  3. Environment Crossover runs tests under varied PHP configurations (modules loaded, memory limits, JIT compilation, opcache settings) to expose environment-specific faults.

This multi-pronged approach targets the three big variables in bug manifestation: input complexity, execution context, and environmental conditions.

 

Results: A Sweeping Code Cleanup

Deployed against PHP’s latest interpreter, FlowFusion uncovered:

  • 158 unique, previously unknown bugs
  • 125 fixed and 11 confirmed in the official PHP repository
  • 39 severe enough to crash the interpreter without specialized debugging tools

The defects spanned 10 distinct weakness categories (CWE) and touched over 80 source files, prompting changes to more than 5,000 lines of core code.

Examples include:

  • A heap overflow in the SQLite module due to improper buffer size checks.
  • A null pointer dereference in the Zend engine, caused by stale data structures generating invalid instructions.
  • Segmentation faults in the JIT compiler triggered by opcache mismanagement.

FlowFusion didn’t just find more bugs; it covered more ground. Compared under identical conditions, it achieved 24% higher code coverage than state-of-the-art general-purpose fuzzers like AFL++ and Polyglot after 24 hours.

 

Adoption by PHP’s Core Team

Perhaps the clearest vote of confidence came from PHP’s maintainers themselves, who integrated FlowFusion into their official toolchain. That means its methods are now part of the ongoing quality assurance process for one of the world’s most widely used software interpreters.

 

Why This Matters Beyond PHP

FlowFusion’s principles aren’t tied to PHP alone. Many large, mature software systems, especially those written in C or C++, share the same risk profile: complex, performance-critical codebases maintained over decades, with extensive but siloed test suites.

Think:

    • Database engines (MySQL, PostgreSQL, MongoDB)
    • Language interpreters and VMs (Python, Ruby, Node.js, JVM)
  • Operating system kernels
  • Network protocol stacks

In all these cases, intelligent fusion of existing tests could surface bugs that standard fuzzers miss—particularly those triggered by feature interactions no one anticipated.

 

From Patchwork Testing to Interaction-Aware Testing

FlowFusion’s contribution is as much about mindset as it is about method. Traditional testing often assumes features can be validated in isolation. But real-world software rarely runs in isolation; it’s the interplay of components, under diverse configurations, that exposes fragility.

By mining existing tests for their embedded knowledge and recombining them in semantically meaningful ways, FlowFusion automates what skilled human testers might attempt—at a scale and speed no team could match manually.

 

Future Uses and Impacts

    1. Proactive Vulnerability Discovery
      Embedding data flow fusion into CI/CD pipelines for critical infrastructure software could catch severe memory issues before they ship—protecting end-users from zero-day exploits.
    2. Security Audits of Legacy Systems
      Government and enterprise systems often run on outdated but critical platforms. Applying FlowFusion-like techniques could surface vulnerabilities that have lain dormant for decades, allowing safe remediation without disruptive rewrites.
  • Hardening AI/ML Frameworks
    Popular machine learning libraries (TensorFlow, PyTorch) have complex C/C++ backends. Their massive test suites could be fused to uncover edge-case crashes or data corruption bugs that impact reproducibility and safety.
  • Interoperability Testing
    In ecosystems with multiple interlinked components (e.g., browser engines combining HTML parsers, JavaScript engines, and rendering pipelines), fusing tests from different modules could reveal interaction bugs that manifest only in full-stack scenarios.

 

A Safer Internet Starts in the Engine Room

The lesson from FlowFusion’s success is clear: even the most trusted, thoroughly tested software can harbor deep, long-lived vulnerabilities. Finding them requires moving beyond conventional test case thinking and embracing methods that probe the unpredictable ways in which features interact.

For PHP, this has meant purging dangerous flaws, some as old as the interpreter’s early releases, and fortifying the security of millions of websites in the process. For the rest of the software world, it’s a proof of concept: if you can intelligently fuse the flows, you can force even the most mature systems to reveal their hidden weaknesses—before an attacker does.

 

Futher Readings: Jiang, Y., Zhang, C., Ruan, B., Liu, J., Rigger, M., Yap, R.H.C. and Liang, Z. (2025) “Fuzzing the PHP Interpreter via Dataflow Fusion,” 34th USENIX Security Symposium, Seattle: WA, August 13-15, https://www.usenix.org/conference/usenixsecurity25/presentation/jiang-yuancheng

 

Trending Posts