DOI: 10.1145/3797120 ISSN: 2994-970X

CertiCoder: Towards MISRA-Compliant C Code Generation with LLMs

Min Gou, Zhiyu Yao, Hualong Ma, Ende Zhang, Jian Zhou, Fei He

Large language models (LLMs) are increasingly applied to code generation in IDEs, CI pipelines, and automated workflows. Existing evaluations, however, have largely focused on functionality, with comparatively limited attention to compliance with established safety standards. This gap is particularly critical for C, where programmes may compile and pass unit tests yet still violate MISRA C:2012, a widely adopted guideline in safety-critical domains. We present CertiCoder, a post-training framework with rule-aware optimization that transforms tool-verified outcomes into per-rule contrasts and trains models through three stages: rule tuning, cold-start supervised fine-tuning, and rule-aware preference optimization. This design helps models not only distinguish compliant from violating outputs but also associate violations with specific rules. To support reproducible assessment, we construct a Codeforces-derived C benchmark with frozen splits, multi-level decontamination, and metrics that jointly measure MISRA compliance ( S 1 ), functional correctness ( F 1 ), and their conjunction ( J 1 ). On Qwen2.5-Coder backbones (3B–14B), CertiCoder substantially improves compliance from near-zero to measurable J 1 levels and generally preserves functional correctness, outperforming non–rule-aware baselines such as SFT and SafeCoder. To our knowledge, this makes CertiCoder among the first post-training frameworks to explicitly optimize both compliance and correctness, offering a practical step toward more auditable and extensible use of LLMs in safety-critical software systems.

More from our Archive