DOI: 10.1145/3769298 ISSN: 2375-4699

Enhancing Arabic Offensive Tweet Classification Using an Ensemble Approach of AraBERT, Neural Networks, and LSTM Models

Ahlam Wahdan, Mostafa Al-Emran, Khaled Shaalan

The Arabic language presents unique modeling challenges, such as its morphological complexity, orthographic ambiguity, dialectal variations, and orthographic noise. Furthermore, the scarcity of linguistic resources dedicated to Arabic and the limited availability of detection tools compound the difficulties in effectively identifying and mitigating offensive language in Arabic text. Given Arabic's linguistic richness and diversity, it is essential to develop robust and accurate methods for detecting and addressing offensive language to ensure the safety and well-being of Arabic-speaking online communities. This study aims to enhance the performance of Arabic offensive tweet classification by proposing a novel framework combining advanced preprocessing techniques and state-of-the-art classification models in an ensemble methodology. It comprises two primary modules: preprocessing module and classification module. The preprocessing module incorporates AraBERT preprocessing, emoji-to-word interpretation, and punctuation removal. The classification module encompasses neural networks (NN), LSTM, and AraBERT. The proposed ensemble model combines a fine-tuned AraBERT with two NN layers, LSTM, ReLU, and Sigmoid activation functions. The model achieved state-of-the-art performance, surpassing all other approaches in terms of Accuracy and F1-score. These results highlight the effectiveness and potential of leveraging advanced models like AraBERT in detecting and classifying offensive language in Arabic text.

More from our Archive