Memory-Efficient Bounding Volume Hierarchies with Merged Nodes for Hardware Ray Tracing 48
Jacob Haydel, Andrew Kensler, Erik Brunvand, Cem YukselThe performance of hardware-accelerated ray tracing on modern GPUs, even with advances in traversal hardware and BVH compression, is fundamentally limited by memory bandwidth. This makes memory traffic—not compute—the primary bottleneck in ray tracing, and demands data structures and construction algorithms that are explicitly optimized for bandwidth efficiency. We introduce a new block-based representation for wide bounding volume hierarchies (BVHs) that directly targets this bottleneck. Our approach organizes multiple primitives and internal nodes into compact, bandwidth-efficient blocks, reducing the number and cost of memory transactions during traversal. Unlike conventional layouts, our representation enables the merging of both internal and leaf nodes, forming composite bounding volumes that amortize memory accesses across larger portions of the hierarchy. To further align BVH construction with hardware realities, we introduce a memory-centric reformulation of the surface area heuristic (SAH). Rather than modeling traversal cost in terms of compute, our formulation estimates the cost of data movement, yielding a metric that more accurately predicts performance on modern GPUs. Under this model, our merged-node representation achieves substantial reductions in memory traffic. We describe both an optimal construction algorithm and an efficient greedy variant that leverage our representation and bandwidth-driven cost model. Across a range of scenes, our approach reduces acceleration structure size and significantly lowers data movement per ray, resulting in consistently faster rendering. These results demonstrate that treating memory bandwidth as a first-class design constraint leads to more efficient ray tracing acceleration structures.