From Raw EO Data to AI-Ready Datasets: Lowering the Barrier to Geospatial Foundation Model Fine-Tuning
Mattia Santoro, Enrico Boldrini, Stefano Nativi, Paolo MazzettiGeospatial foundation models are a new frontier in artificial intelligence, designed to understand and analyze spatial data at scale. Trained on huge sets of EO data, these models can support a wide range of applications—from monitoring natural disasters to guiding urban development and tracking climate change. To this aim, researchers and practitioners need to fine-tune the foundation models for specific tasks, utilizing a relatively small amount of additional data. As a result, geospatial foundation models are reshaping how we observe, manage, and protect our planet. Fine-tuning a geospatial foundation model requires carefully curated training datasets that reflect specific regions, time periods, or tasks—such as detecting deforestation or mapping urban growth. Yet preparing these datasets is often labor-intensive, involving steps like selecting relevant imagery, aligning spatial formats, and generating accurate labels. In practice, this means that the effectiveness of GFMs hinges on the availability of AI-ready data. This bottleneck limits the accessibility and scalability of GFMs for scientific and operational applications. In this work, we introduce a software library designed to automate these preparatory steps, streamlining the transformation of geospatial datasets into consistent, high-quality inputs for GFM fine-tuning. By reducing technical overhead and ensuring data readiness, the library enables faster, more reliable, and more inclusive adaptation of foundation models to local environmental challenges and specialized domain needs.