DOI: 10.3390/systems14070763 ISSN: 2079-8954

English-Normalized Text and Topic Analytics for FixMyStreet Brussels: Spatio-Temporal Hotspot Detection and Decision Support from Citizen Reports

Marian Pompiliu Cristescu

Citizen-reporting platforms generate high-volume, multilingual streams of service requests, yet operational triage often relies on coarse category labels and manual inspection. This study develops an explainable analytics pipeline with probability calibration for FixMyStreet Brussels reports, combining text-based urgency modeling, topic discovery, and spatio-temporal hotspot scoring to inform municipal analytic review. From 522,132 raw reports, we build an English-normalized text field for modeling, derive resolution-time outcomes from closed cases, and curate a 1000-item gold standard with an explicit high-urgency class. A TF-IDF logistic regression baseline achieves reasonable classification performance on the labeled split and, after probability calibration, yields confidence estimates that are more suitable for risk-aware prioritization than uncalibrated scores. Topic-level analyses reveal dominant themes related to sidewalks, road damage, and bulky waste, and hotspot scores highlight persistent, high-impact issue clusters. Event detection on aggregated signals did not identify events above the predefined z-score threshold during the analysis window, suggesting that the observed dynamics are more visible as chronic, recurring problems than as abrupt threshold-level anomalies. Explainability audits via Shapley Additive Explanations (SHAP) expose linguistically intuitive drivers for urgent cases (e.g., dangerous, risk, and accident) and complaint-oriented terms (e.g., abandoned, illegal, and dirty), providing transparent hooks for governance review. The analysis is therefore presented as an open-data, English-normalized decision-support prototype rather than as a validated native multilingual triage system. The labeled evidence base contains 2200 distinct human-reviewed reports. It comprises the 1000-report gold standard, a 200-report model-ranked high-urgency candidate set, and a 1000-report expanded validation subset. The expanded validation subset contains 439 high-urgency cases, including 42 cases from a random 500-report corpus sample. To avoid overstating language evidence, this study makes no native multilingual claim. The empirical claim is limited to English-normalized text analytics over reports that were originally submitted in a multilingual civic-reporting setting.

More from our Archive