When Emotions Conflict: A Reliability-Aware Framework for Arabic Multi-Label Emotion Detection
Mashary N. Alrasheedy, Sabrina Tiun, Fariza FauziArabic multi-label emotion detection (MLED) in social media remains challenging because dialectal variation, implicit affective cues, and polarity-opposed emotions may occur within the same post. Existing Arabic MLED studies have mainly emphasized thresholded predictive performance, with limited attention to whether model confidence remains reliable under emotionally conflicting conditions. In this study, we propose CONCORD-Emo (CONflict-aware Compositional Representation for Emotion Detection), a reliability-aware framework for Arabic MLED. The framework adopts established label-wise attention, mixture-of-experts routing, Monte Carlo (MC) dropout, and post hoc temperature scaling as supporting mechanisms, while its architecture-level contribution is the conflict-conditioned integration of a residual global anchor with a conflict-aware fusion gate supervised by an automatically derived polarity-conflict target. We evaluated the framework on three Arabic benchmarks: SemEval-2018-Ar, ExaAEC, and SemEval-2025-Arq using predictive and reliability-oriented criteria. CONCORD-Emo remains competitive with strong MARBERT-based baselines. On SemEval-2025-Arq, it attains point estimates of 0.471 for Jaccard, 0.606 for micro-F1, and 0.582 for macro-F1. Paired bootstrap confidence intervals show that most predictive differences include zero, whereas the lower Expected Calibration Error and Brier scores on SemEval-2018-Ar and ExaAEC are consistently supported relative to the controlled baselines. Conflict-conditioned analysis shows that polarity-conflict instances yield lower predictive performance and higher Brier scores than blended-emotion instances. Taken together, these results support a reliability-aware evaluation of Arabic MLED in which polarity conflict, calibration, uncertainty estimation, and selective prediction are examined alongside predictive performance.