A Comparative Framework for Political Violence Event Classification Using Machine Learning, Deep Learning, and Zero-Shot Language Models
Ujala Beenish, Saadia Ishtiaq Nauman, Sadaf Abdul Rauf, Fatima Mumtaz, Muhammad Ghulam Abbas Abbas Malik, Muhammad Imran, Muddesar IqbalPolitical violence poses a significant challenge to global stability, underscoring the need for comparative analytical models that support analytical interpretation of structured conflict data. This paper presents a comparative evaluation of 12 machine learning approaches, including traditional supervised models, deep learning architectures, and zero-shot large language models, for the classification of political violence events using the Armed Conflict Location and Event Data Project (ACLED) dataset (2010–2020, over 40,000 events). The results demonstrate that, on short structured event text represented via TF-IDF, fine-tuned traditional machine learning models achieve stronger performance than zero-shot LLM approaches and deep learning models on structured event data. We further introduce a multilingual classification framework for English and Urdu news content, illustrating cross-lingual transfer robustness using machine-translated Urdu data; results reflect translation-based evaluation conditions and should not be interpreted as performance on naturally occurring low-resource Urdu political-event text. As an exploratory extension, the framework is applied to 57,700 tweets related to the Article 370 crisis in Kashmir to illustrate applicability to unstructured social media text; given that the best Twitter model (55% accuracy) falls below the 69% majority-class baseline, these results should be interpreted solely as coarse discourse indicators and not as a validated classification component. Unlike prior work, this study systematically combines multilingual evaluation with zero-shot LLM analysis for political event classification. Geographic out-of-sample validation (leave-one-country-out or leave-one-region-out) was not conducted; the reported performance should therefore not be interpreted as evidence of cross-regional generalizability without further experimentation. The findings highlight practical considerations for designing data-driven analytical frameworks for conflict monitoring and analytical decision support.