Automating Dockerfile Refactoring to Multi-stage Builds
Dongjin Chen, Wenhua Yang, Minxue Pan, Yu ZhouContainerization has become a cornerstone of modern software deployment, yet many projects still ship single-stage Dockerfiles that bundle compilers, build tools, and temporary files into production images, thereby hurting performance and security. Multi-stage builds are recommended, yet uptake appears uneven, plausibly because refactoring legacy Dockerfiles demands nontrivial reasoning about build lifecycles and dependency separation. This paper presents StageCraft, an automated refactoring approach that converts single-stage Dockerfiles into optimized multi-stage builds. StageCraft first performs static analysis to identify the technology stack and to infer build-time and runtime dependencies. It then applies a lightweight gate that estimates the refactoring benefit from a composite of image bloat, structural inefficiency, and security risk, proceeding only when refactoring is warranted. Finally, it synthesizes a multi-stage Dockerfile that isolates build tooling, copies only the artifacts needed at runtime, and applies production hardening. Evaluated on 521 real-world single-stage projects, StageCraft successfully produced working multi-stage Dockerfiles for 60.3% of targets. The resulting images were, on average, 52.2% smaller and contained 50.0% fewer high-risk vulnerabilities than the originals, outperforming baselines. StageCraft lowers the barrier to multi-stage adoption at scale, yielding leaner images with a reduced attack surface and improved maintainability. We release the tool, its knowledge assets, and the evaluation dataset to support reproducibility and future research.