DOI: 10.2118/0726-0023-jpt ISSN: 0149-2136

Predictive Analytics Reduce Exploration Risks for High-Potential Geologic Hydrogen Discoveries

Chris Carpenter

_

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 224210, “Predictive Analytics for High-Potential Geologic Hydrogen Discoveries: Reducing Exploration Risks Across Diverse Geological Settings,” by Semaa Alessa, SPE, and Lori A. Hathon, SPE, University of Houston, and Hanaa M. Ali, University of Basrah, et al. The paper has not been peer-reviewed.

_

Geological hydrogen is emerging as a promising clean energy resource, but finding commercial quantities is challenging because of complex hydrogen production, migration, and accumulation dynamics in the subsurface. This study applies Monte Carlo simulation and an XGBoost regression model to assess the influence of various formations, geologic provinces, tectonic plate types, and boundary conditions on hydrogen concentrations. Key predictors identified include formation type, geological province, and proximity to province boundaries, highlighting the role of spatial relationships in hydrogen retention and potential lateral migration.

Data Set and Feature Engineering

Data Collection.

The data set used in this study primarily focuses on free hydrogen occurrences in geological settings, comprising a total of 128 data points. This data set includes hydrogen percentages, associated formations, and observation locations.

Feature Engineering.

First, the geological province type associated with each hydrogen discovery was identified, and the distance to the nearest geological province boundary was calculated using geodesic distance methods based on Vincenty’s formulae. Similarly, the distances from each hydrogen discovery to the nearest tectonic plate boundary (convergent, divergent, or transform) were determined using the same geodesic principles. Additionally, the tectonic plate type (rigid, microplate, or deformed) corresponding to each discovery was identified to provide further context for spatial relationships.

To deepen understanding of the relationship between geological formations and the provinces hosting hydrogen occurrences, it is crucial to organize the data such that machine-learning algorithms can interpret the geological sequence effectively. New feature combinations were engineered to capture higher-order relationships between existing variables. Similarly, tectonic plate classifications were paired with tectonic boundary types to highlight their combined influence on hydrogen occurrences. These engineered features were designed to uncover complex correlations that might remain hidden when analyzing individual variables in isolation. The data set was then systematically sorted by formation, followed by province, and finally by hydrogen percentage to ensure organized data representation and facilitate efficient analysis.

The final data set prepared for the XGBoost training included the following columns: H2 (%), formation, province type, nearest province type, distance to nearest province type, plate type, nearest tectonic plate boundary type, distance to nearest tectonic boundary, formation and province type (combined), province and nearest province type (combined), and tectonic plate and nearest boundary type (combined). These features were selected and engineered to encapsulate both individual and higher-order relationships, providing the model with a comprehensive data set to enhance predictive performance.

Model Establishment

XGBoost Regression.

This study used an XGBoost regression model to explore the relationship between geological and tectonic features and subsurface hydrogen concentrations. The data set included categorical features (e.g., tectonic plate and geological province types) and numerical features (e.g., distances to boundaries and provinces), with 20% of the data reserved for validation. The model was trained using optimized parameters, and achieved R² scores of 0.87 on training data and 0.76 on validation data, demonstrating strong predictive performance and generalizability.

More from our Archive