DOI: 10.3390/electronics15132750 ISSN: 2079-9292

ParaChromo: Scalable and Seam-Coherent Inference for 3D Genome Diffusion

Xialin Su, Mingxiang Zhu, Wei Shang, Zhixin Ou

Diffusion models for 3D genome structures make inference an ensemble-generation and tiling problem. In the released ChromoGen workflow, millions of independent denoising trajectories are executed through a single-GPU path, while overlapping genomic windows are sampled without enforcing consistency of their shared physical interval. We introduce ParaChromo, a parallel inference framework for conditioned, tiled 3D genome diffusion workloads built around the trained diffusion U-Net and distance-map interface. ParaChromo organizes the workload into three inference-layer modules: a workload-dispatch module schedules region, guidance, and sample chunks across worker groups; an encoder-aware sharded-conditioning module scales and shards the EPCOT front end with FSDP while keeping the inner-loop U-Net replicated; and a seam-coherent tiled-synchronization module projects the shared 12-bead overlap of adjacent reverse chains in distance-map space. On eight A6000 GPUs, the combined reduced-step and task-parallel systems path raises throughput from 2.356±0.003 to 235.71±1.120 samples/s, a 100.04±0.486-fold gain over the released single-GPU baseline. The reduced-step setting is supported by a sweep from 50 to 1000 DDIM steps, where distance-distribution and Hi-C-based metrics remain stable across four chromosomes. For the synchronization module, the chr22 seam discrepancy falls from 150.9 pm to 7.9 pm, while matched internal and Hi-C-based quality metrics are preserved. The synchronized chr22 run also gives a chromosome-scale coordinate rendering over 32 paper-aligned tiles. Together, these results show that conditioned, tiled 3D genome diffusion can be executed as a scalable workload when throughput parallelism, sampler length, encoder placement, and spatial consistency are treated as separate but compatible constraints.

More from our Archive