Targeted dictionary strategy boosts fuzzing coverage and vulnerability discovery

Research

Jiahong XIANG | 05/12/2025

As software systems increase in complexity, traditional fuzzing methods face challenges in discovering software vulnerabilities. Existing grey-box fuzzing techniques are often inefficient, especially when dealing with program states protected by complex conditions such as constant constraints.

Assistant Professor Yuqun Zhang’s research group from the Department of Computer Science and Engineering at the Southern University of Science and Technology (SUSTech) has made a significant breakthrough in this area. In their recent study, they are the first to systematically reveal the core performance boundaries of assisting strategies in grey-box fuzzing. They propose a novel solution—CDFuzz, a customized targeted dictionary technique—that achieves a dual breakthrough in coverage and vulnerability mining capabilities through a lightweight approach.

In recognition of this contribution, their paper titled “Tumbling Down the Rabbit Hole: How do Assisting Exploration Strategies Facilitate Grey-box Fuzzing?” received the ACM SIGSOFT Distinguished Paper Award at the 2025 International Conference on Software Engineering (ICSE), an international top-tier CCF-A conference in the field of software engineering.

CDFuzz offers three key advantages that distinguish it from existing strategies. First, it eliminates the need for additional instrumentation by automatically extracting key constants based on the program’s Control Flow Graph (CFG), thereby eliminating the reliance of traditional strategies on extra instrumentation and complex symbolic execution or gradient solving. Second, it dynamically generates a customized targeted dictionary for each input seed, precisely covering constant conditions in execution path constraints and improving efficiency by up to 16.1%. Finally, the approach is fully lightweight—its targeted dictionaries are seamlessly integrated into the entire fuzzing process, incurring no additional overhead during compilation or execution.

Performance overlap of complex strategies and dictionary mechanisms

To better understand how exploration strategies support grey-box fuzzing, the team conducted large-scale experiments involving nine fuzzing techniques and multiple mainstream assisting strategies—including symbolic execution, gradient solving, and input-state tracking—across 21 real-world projects. The study revealed several critical insights.

Over 90% of constraint breakthroughs involved constant comparison types, particularly input == CONSTANT constraints, confirming that most assisting strategies are in fact only effective at addressing constant constraints. Surprisingly, traditional dictionary mechanisms such as AFLDict outperformed symbolic execution strategies like QSYM in coverage performance under identical experimental settings. Moreover, when constraint depth exceeded 20, the success rate of symbolic execution drops sharply to 15%, while dictionary strategies are unaffected by this limitation.

This discovery overturns the conventional practice that “complex strategies are necessarily more efficient”, revealing the significant potential of lightweight optimizations.

CDFuzz targeted dictionary technology principle: Combining static analysis and dynamic optimization

CDFuzz achieves targeted dictionary generation through a two-stage process involving compile-time static analysis and runtime dynamic optimization. During the static constant extraction phase, it parses the program’s Control Flow Graph using LLVM IR to extract constant values embedded in branch conditions, such as 0xdeadbeef or 8BIM. In the dynamic path feedback phase, the system analyzes the execution path of each input sample and filters a subset of constants relevant to the current path constraints. This enables the construction of a focused, targeted dictionary tailored to the specific conditions encountered during runtime.

Figure 1. The workflow of CDFUZZ

Figure 2. Strategically building a dictionary for exploring jhead iptc.c

Enhanced coverage capability and vulnerability mining

CDFuzz has demonstrated significant advantages in both code coverage and vulnerability detection. In 24-hour benchmark tests, it achieved an average coverage increase of 16.1%, surpassing the current best strategy, AFL++Dict. In some cases, such as the strip project, coverage gains reached as high as 26.2%.

Beyond improved coverage, CDFuzz uncovered 37 previously unknown real-world vulnerabilities, including high-risk vulnerabilities such as heap overflows and uninitialized memory access. Of these, nine have already been officially confirmed, and seven have been fixed. The generalizability of CDFuzz was also validated across ten file formats, including ELF, JPEG, and SQL.

By replacing heavyweight constraint solvers with lightweight targeted dictionaries, CDFuzz showcases the immense potential of lightweight assisting strategies in fuzzing. This work not only points the way for optimizing assisting strategies but also promotes the evolution of industrial-grade testing tools towards higher efficiency and lower overhead. In the future, Professor Zhang’s research group will continue to explore lightweight fuzzing optimization techniques to further enhance fuzzing mechanisms that are lightweight and easy to implement in industrial applications. The CDFuzz paper (Arxiv) and project source code (GitHub) are publicly available (see related links below).

Ph.D. graduate Mingyuan Wu and master’s student Jiahong Xiang are the co-first authors of the paper. Assistant Professor Yuqun Zhang is the corresponding author, and SUSTech is the first corresponding institution. This work was completed in collaboration with Concordia University, The University of Hong Kong, and Ant Group.

The ICSE is recognized as the top international flagship conference in the field of software engineering. It is a category A conference recommended by the China Computer Federation (CCF) and is held in high academic regard. ICSE 2025 was the 47^th edition of the conference, which coincides with its 50^th anniversary. It was held in Ottawa, Canada, from April 27 to May 3. This year, ICSE received a total of 1,150 submissions, of which 245 papers were accepted, resulting in an acceptance rate of 21.3%. Among them, only 23 papers—accounting for 2% of total submissions—were selected for the ACM SIGSOFT Distinguished Paper Award.

Paper link: https://arxiv.org/pdf/2409.14541

Source code: https://github.com/GhabiX/CDFuzz

To read all stories about SUSTech science, subscribe to the monthly SUSTech Newsletter.

2025, 05-12

By Jiahong XIANG