Street-level monitoring of urban tactile paving obstructions through visual-language models and street view imagery
Hanbei Chen, Jin RuiTactile paving is vital infrastructure for safe mobility among 2.2 billion visually impaired individuals worldwide, but in complex urban environments it faces both static damage and dynamic encroachment. This study develops an intelligent evaluation framework that integrates visual-language models (VLMs) with pedestrian-view street imagery to assess tactile paving usability around urban metro stations. Using GPT-4o and GoPro-collected imagery, we built a three-tier risk detection system covering the tactile paving body, a 250 mm proximity zone, and the surrounding environment. The framework includes 26 structural and 24 situational indicators with differentiated risk-scoring thresholds. Based on 110 metro stations within Beijing’s Third Ring Road, we analyzed the spatial distribution of tactile paving obstructions. The 250 mm proximity zone showed the highest obstruction rate (34.46%), exceeding the tactile body (33.03%) and environment (19.79%), mainly due to spatial pressure from wall attachments, poles, and adjacent facilities. Structural obstacles reflected persistent damage and encroachment, whereas situational obstacles showed greater temporality, peak intensity, and spatial variability, especially within the proximity zone. AI evaluations closely matched expert ratings (Pearson r = 0.943), and iterative scoring reduced false positives from 54% to 11%, confirming the reliability of VLMs in complex urban contexts. Fengtai District scored poorest in both indicator categories, with Majiapu Station as a key case. We recommend introducing a “proximity buffer zone” and improving fine-scale maintenance in high-density areas. The resulting intelligent platform is scalable and transferable for nationwide monitoring and governance of accessible infrastructure.