DOI: 10.3390/pr14132131 ISSN: 2227-9717

Bulk CO2 Diffusivity in Brine and Porous Media: A Machine Learning Approach for Deep Saline Aquifer Conditions

Jose A. Benavides, Birol Dindoruk

Deep saline aquifers are among the most promising formations for long-term geological CO2 storage due to their extensive distribution and large storage capacity. Accurate estimation of the CO2 diffusion coefficient in brine is essential for modeling dissolution trapping, one of the most long-term reliable CO2 sequestration mechanisms. However, laboratory measurements under reservoir conditions are costly and time-intensive, motivating the development of efficient predictive tools. This study develops machine learning (ML) frameworks for predicting bulk and porous media CO2 diffusivity by combining two data augmentation strategies—Conditional Tabular Generative Adversarial Networks (CTGAN) and pseudo-labeling (PL)—with four ML algorithms: Random Forest Regression (RFR), XGBoost Regression (XGBR), Natural Gradient Boosting (NGBoost), and Gene Expression Programming (GEP). The database consists of 186 bulk diffusivity and 47 porous media diffusivity observations compiled from experimental and molecular dynamics studies, covering pressures of 0.1–30 MPa, temperatures of 286–673 K, salinities up to 300,000 ppm, and permeabilities of 0.05–2500 Darcy. Data augmentation increased dataset density by approximately 40%, resulting in hybrid datasets of 260 and 68 samples for bulk and porous media diffusivity, respectively. Results show that PL consistently outperforms CTGAN augmentation by preserving physically meaningful relationships and improving predictive accuracy. NGBoost achieved the best performance, with RMSE values of 0.33 and 0.61 for bulk and porous media diffusivity, respectively. Feature-importance analysis identified temperature as the dominant control on diffusivity, followed by pressure and salinity, while permeability exhibited limited influence. The developed framework provides a computationally efficient alternative to extensive laboratory measurements and offers a reliable tool for reservoir simulation, CO2-EOR studies, and geological carbon storage design under data-limited conditions.

More from our Archive