Large Language Models for Opaque Predicate Resolution: A Universal Control Flow Deobfuscation Framework
Xiao Chen, Qiuyun Wang, Shuwei Wang, Weize Zhang, Yuling Liu, Baoxu Liu, Zhengwei JiangCode obfuscation is a process that complicates reverse engineering, protects intellectual property, and conceals malware. Existing deobfuscation approaches often lack generality or struggle with complex, mixed, or unknown transformations. To address this issue, this paper proposes LUCID, a Large Language Model (LLM) based Universal Control-flow Integrated Deobfuscation framework. We first formalizes the control-flow deobfuscation task and introduce the Topologically Feasible Path Set (TFPS) as a new evaluation metrics. Building upon this foundation, LUCID leverages an LLM to infer Predicate Mapping Rules between basic blocks in linear time, which then guide the precise expansion of the Runtime Feasible Path Set to identify and eliminate spurious control flows. Finally, semantically equivalent paths are merged to reconstruct a clean, compilable, and behaviorally faithful control-flow graph, from which security-analyst-friendly C-like pseudocode is generated. Comprehensive evaluation on 780 binaries employing 13 distinct obfuscation techniques demonstrates that our method reduces average cyclomatic complexity by 52.4%, achieves full deobfuscation in 53.8% of cases, and suppresses TFPS inflation caused by bogus control flow by over 99%. The framework demonstrates superiority over existing state-of-the-art tools in terms of both generality and semantic consistency, thus evidencing the transformative potential of LLMs in facilitating scalable malware reverse engineering.