DOI: 10.3390/ai7070234 ISSN: 2673-2688

Efficiency-Aware Group Size Optimization for GRPO via Multi-Fidelity Bayesian Optimization

Taehyeon Kim, Kyung-Taek Lee

Group Relative Policy Optimization (GRPO) streamlines the alignment of Large Language Models (LLMs) and Vision–Language Models (VLMs) by eliminating the Critic model. However, its efficiency heavily depends on the group size, G. While a larger G improves reward estimation and stabilizes the Advantage, Ai, it drastically increases VRAM usage and reduces throughput. Standard heuristics like a fixed G of 64 create significant bottlenecks in resource-constrained settings. This paper introduces an Efficiency-Aware optimization framework utilizing Multi-fidelity Bayesian Optimization and Hyperband (BOHB) to dynamically identify the optimal group size, G*. The method uses a multi-objective function that balances reward accuracy, Ai variance, and hardware utilization, applying z-score normalization. By employing Successive Halving to quickly evaluate candidates at low fidelity, the framework reduces search costs by up to 74% compared with random search. Tested across text-only LLMs (Qwen2.5-7B/1.5B) and multimodal VLMs (Qwen2.5-VL-3B), the framework demonstrates that the discovered G* saves up to 72.5% in VRAM compared with the baseline of 64, while maintaining reward accuracy within 5.8%. Sensitivity analyses on hyperparameters like λ, α, and β confirm the framework’s robustness. Rather than treating group size as a mere engineering heuristic, this study establishes a principled methodological advance by formalizing the trade-off between statistical estimation stability and hardware constraints into a unified optimization framework for resource-efficient RLHF.

More from our Archive