Unfulfilled Promises: LLM-Based Detection of OS Compatibility Issues in Infrastructure as Code
Georgios-Petros Drosos, Georgios Alexopoulos, Thodoris Sotiropoulos, Dimitris Mitropoulos, Zhendong SuModern infrastructures rely on Infrastructure as Code (IaC) systems to keep complex deployments consistent, reproducible, and scalable at production scale. The reliability of these infrastructures, however, depends on the correctness of their building blocks, which are reusable components (modules) that each performs a dedicated task, such as installing a package, managing an OS user, or configuring a service, and reconciling its state with the desired specification. A central promise of these components is portability: a specification written once should correctly manage the targeted resource on every OS the IaC component supports. When this property is violated, defects can propagate across entire infrastructures, causing outages, security vulnerabilities, and costly misconfigurations.
In this work, we introduce crOSsible, the first automated framework for cross-OS testing of IaC modules. crOSsible leverages large language models (LLMs) to synthesize and repair integration tests from structured module documentation, and executes them across 13 versions of 8 major Linux distributions. While our techniques are generally applicable to different IaC systems, we instantiate and evaluate them on Ansible, the most widely used IaC framework for managing individual servers. Evaluation across 259 popular Ansible modules demonstrates both effectiveness and real-world impact. In just 12 hours of testing, crOSsible uncovered 36 previously unknown bugs, including 22 portability violations. In total, 27 issues have been confirmed by maintainers, with 17 already fixed. The discovered issues range from crashes to dangerous soundness defects where modules reported success despite leaving systems misconfigured. Beyond bug discovery, crOSsible improved the code coverage of Ansible modules by 12.3% on average, systematically exercising OS-specific code paths that existing tests missed.