DOI: 10.1785/0220250362 ISSN: 0895-0695

The Digital Archivist: Automating Legacy Macroseismic Data Processing Using Large Language Models

Aarnav Agrawal, Susan E. Hough, S. Mostafa Mousavi, Margaret Hellweg, William L. Ellsworth, Clara E. Yoon, Salvador Blanco

Abstract

Macroseismic data are a key resource to investigate shaking and damage from preinstrumental and early instrumental eras. However, data are often stored as inconsistently formatted reports describing observed shaking and damage, making manually parsing and interpreting accounts labor-intensive. We introduce a novel workflow using Google’s Gemini 2.5 Pro large language model (LLM) to automate the extraction and structuring of macroseismic observations from summary reports. We apply this workflow to the 22 March 1957 M 5.3 Daly City, California, earthquake as a case study. We used Gemini to extract addresses, originally assigned modified Mercalli intensity values, and descriptions from each report. To address coordinate precision limits, addresses were geocoded via Google’s Geocoding application programming interface. This workflow yielded over 2300 geocoded intensity reports for the Daly City earthquake. We use the geocoded accounts, with the original report intensity assignments, to develop a shaking intensity map that in some respects rivals modern Did You Feel It? Maps. We also extract and present data for the 9 February 1971 ML 6.7 Sylmar, California, earthquake. Our results demonstrate the potential of LLMs for reliably extracting and analyzing large, unstructured macroseismic datasets. LLMs offer a scalable solution for rapidly digitizing macroseismic archives, enabling their broader use to constrain ground-motion models in modern seismic hazard analysis and to improve our understanding of site effects in urban areas. The concepts explored here may also be applied to the handling of other legacy seismological and earth science data.

More from our Archive