DOI: 10.1145/3808096 ISSN: 2994-970X

Understanding the Limitations of C/C++ Binary Third-Party Library Detection Tool: An Empirical Study at Scale

Chengyue Liu, Zhengzi Xu, Kaixuan Li, Jiahui Wu, Sihao Qiu, Siyuan Li, Siyang Xiong, Yang Xiao, Yang Liu

Detecting third-party libraries (TPLs) in C/C++ binaries is essential for ensuring software security and compliance, particularly in safety- and performance-critical domains. While numerous academic and commercial Software Composition Analysis (SCA) tools have been proposed, their true capabilities remain unclear due to the absence of large-scale benchmarks and systematic evaluation. Equally lacking is a deeper understanding of why these tools often underperform, which limits both research progress and practical adoption.

We address this gap with a large-scale study of binary SCA tools. We construct the largest publicly available benchmark to date, encompassing 38,228 test cases across 1,873 libraries drawn from a defined scope of 13,675 libraries. Using this benchmark, we systematically evaluate 11 representative tools, covering all open source research prototypes and widely adopted commercial solutions, across versions, architectures, and feature-database scales. Beyond aggregate performance metrics, we perform the first fine-grained, feature-level analysis to identify the intrinsic challenges of binary TPL detection. Our results show that existing tools perform unsatisfactorily, with average recall below 60% and precision around 75%. Feature-level analysis reveals fundamental obstacles: binaries lose most source-code features during compilation, and libraries exhibit high feature overlap due to functional similarity and dependency propagation. These findings explain current shortcomings, and we build on them to provide design recommendations, research directions, and practical guidance for managing open-source risks in binary software.

More from our Archive