Skeleton-Aware Deformable Alignment for Few-Shot Font Generation
Songshui Wu, Guangyong Zheng, Tao Jiang, Jinke YangFew-shot font generation can be viewed as a challenging conditional image generation task, where the goal is to synthesize target glyphs from only a few reference samples while preserving structural fidelity and style consistency. This problem becomes particularly difficult for characters with complex spatial layouts and fine-grained stroke topology, where existing methods often struggle to simultaneously maintain structural integrity, local continuity, and stylistic coherence under sparse-reference conditions. To address this issue, we propose a skeleton-aware deformable alignment framework for few-shot font generation. Specifically, explicit skeleton priors are introduced into the diffusion-based generation process to provide structural supervision during denoising. In addition, a structure-constrained deformable content alignment module is designed to improve local feature correspondence while suppressing unreasonable geometric deformation. We further develop a multi-module content aggregation strategy to jointly model global layout patterns and local stroke details through complementary multi-level representations. Extensive experiments demonstrate that the proposed method consistently outperforms state-of-the-art approaches in both quantitative and qualitative evaluations. The results show that our method provides stronger structural preservation, better perceptual quality, and improved generalization in structurally complex glyph generation and cross-lingual style transfer.