DOI: 10.1111/tgis.70335 ISSN: 1361-1682

Mapping Rural Housing Modernization From Rural Street‐View Images Using Vision‐Language Models

Wenjun Jiang, Yu Gu, Hailong Zhao, Weihuan Deng, Xun Li

ABSTRACT

Rural housing inequality remains a major challenge for sustainable development in the Global South, yet its assessment has long been constrained by data poverty. Macro‐level statistics fail to capture village‐level variation, field surveys are costly, and conventional computer vision approaches depend heavily on large labeled datasets. To address this problem, this study proposes a Geo‐AI framework that combines VLM‐assisted pairwise labeling, PAIR‐CNN inference, TrueSkill aggregation, and spatial analysis. Using 4065 rural street‐view images from Huaiji County, Guangdong Province, we derived perceived modernization scores for 283 villages. GPT‐4o achieved substantial agreement with a human‐consensus benchmark (Cohen's kappa = 0.68), and the PAIR‐CNN reached an average accuracy of 78.76%. Spatial analysis further reveals a nested pattern of spatial differentiation, with overall inequality driven mainly by within‐township imbalance. The framework demonstrates how VLMs and lightweight vision models can support scalable and interpretable assessment of rural built environments.

More from our Archive