Hybrid Compression Method for Trained 3D Gaussian Splatting Models Based on VQ and HEVC
Dong-Ha Kim, Byung-Yoon Choi, Kwan-Jung Oh, Gwangsoon Lee, Jae-Gon Kim3D Gaussian Splatting (3DGS) has recently emerged as an effective representation for immersive 3D scene rendering, providing high visual fidelity and real-time rendering efficiency. To support interoperable compression of trained 3DGS content, the Moving Picture Experts Group (MPEG) is exploring Gaussian Splat Coding (GSC), which mainly targets already trained 3DGS models following the INRIA reference format. The current video-based GSC anchor reorders 3DGS attributes into 2D attribute maps using Parallel Assignment Linear Sorting (PLAS) and compresses the resulting maps using High Efficiency Video Coding (HEVC). However, higher-order spherical harmonic coefficients (SH-AC) often remain irregular and exhibit low local spatial correlation even after PLAS reordering, limiting the coding efficiency of conventional video codecs. This paper proposes a VQ-HEVC hybrid compression framework that is structurally compatible with the video-based GSC anchor framework, in which SH-AC coefficients are represented by vector quantization (VQ) indices, while the remaining attributes are encoded using the same HEVC-based procedure as the GSC anchor. The proposed method adopts a two-stage VQ scheme that combines coarse VQ and product-quantization-based residual quantization, together with zero-masked residual VQ and flexible PQ grouping, to improve index-map coding efficiency across rate points. The generated VQ indices are packed into YUV400 index-map sequences and encoded using HEVC lossless coding, while the corresponding codebooks are transmitted as metadata. Experimental results on the Bartender and Cinema sequences of the MPEG GSC CTC demonstrate consistent rate–distortion improvements over the video-based GSC anchor across multiple objective quality metrics within the evaluated setting. In terms of RGB-PSNR, the proposed method achieves BD-rate reductions of 22.3% and 18.5% for the Bartender and Cinema datasets, respectively. These results suggest that, for the evaluated GSC CTC sequences, VQ-based SH-AC representation can effectively complement PLAS-based video coding while maintaining consistency with the existing GSC coding structure.