Small Models, Big Cities: A Low-Cost AI Pipeline for Urban Regulatory Document Analysis in Metropolitan Planning
Francisco Vergara-PerucichBackground: Urban planning documents at metropolitan scale typically demand large, cloud-hosted language models that limit their adoption in Global South contexts. This study deploys Moondream, a 1.7-billion-parameter vision-language model (VLM) runnable locally via Ollama, for extracting geographic knowledge from Planes Reguladores Comunales (PRCs) across 29 processed Gran Santiago municipalities. The pipeline combines native PDF text extraction, keyword-based multi-label classification across six thematic axes, and VLM-based optical character recognition and cartographic interpretation. Results: The pipeline processes 2289 PRC articles in 4.3 min at an estimated energy cost of 0.000866 kWh and zero marginal monetary cost. Zoning (53.3%) and land use (43.1%) dominate PRC content, while social housing provisions appear in only 4.0% of articles; normative gap analysis identifies five municipalities where social housing is entirely absent from regulatory text. A comparative evaluation of Moondream against keyword baseline on an 88-article validation sample yields macro-F1 = 0.355 and mean Cohen’s κ = 0.004, confirming that generalist VLMs require domain fine-tuning for specialized legal text. It is argued that the cost asymmetry between industrial-scale and small-model approaches constitutes an epistemic asymmetry with direct consequences for the geographic distribution of urban data infrastructure.