DOI: 10.3390/fi18070346 ISSN: 1999-5903

Credibility Context Improves Political Claim Classification on LIAR and Transfers to LIAR2 NEW

Bushra Alkomah, Frederick Sheldon

Automated screening of political claims is challenging because many statements are short, underspecified, and labeled with fine-grained truthfulness categories that are difficult to separate using claim text alone. This study examines whether lightweight credibility context, represented by pre-existing speaker fact-checking history counts, can improve transformer-based political claim classification without external evidence retrieval. We evaluate the LIAR benchmark under both its native six-class formulation and a coarser three-class mapping using two strong pretrained encoders, DeBERTa-v3-large and RoBERTa-large. To isolate the effect of credibility context, we vary only one input factor: whether the five LIAR speaker-history counts (pants-fire, false, barely-true, half-true, mostly-true) are appended to the claim and standard metadata as structured text (Hist ON) or omitted (Hist OFF), while keeping the data split, model architecture, training pipeline, and evaluation protocol fixed. All experiments are repeated across five random seeds (7, 13, 42, 123, 2024) and reported as mean ± standard deviation. On LIAR, Hist ON improves macro-F1 and accuracy across both backbones and both label granularities, with the largest gains in the six-class setting where label ambiguity is highest. In the six-class task, macro-F1 increases from 0.3096 to 0.4773 for DeBERTa-v3-large and from 0.3127 to 0.4855 for RoBERTa-large. In the three-class task, the best Hist ON model reaches 0.6241 macro-F1. Because only five seeds are available, the minimum achievable two-sided paired Wilcoxon p-value is 0.0625; therefore, we do not claim conventional α=0.05 statistical significance and instead report paired mean differences, seed-level gains, and complementary prediction-level reliability analyses from saved test predictions. To assess whether the benefit is limited to the original LIAR test split, we further evaluate the LIAR-trained checkpoints directly on the full non-overlapping LIAR2 NEW test split without additional fine-tuning. This LIAR-to-LIAR2 NEW transfer evaluation shows that Hist ON improves macro-F1 over Hist OFF in all four backbone/granularity settings. The best absolute transferred macro-F1 is achieved by the three-class DeBERTa-v3-large setting (0.6462), whereas the largest Hist ON minus Hist OFF gain occurs in the six-class RoBERTa-large setting (+0.1433 macro-F1). These two values describe different quantities: absolute performance and improvement over the corresponding Hist OFF baseline. We frame the task as political claim classification, or screening, rather than evidence-grounded fact verification: the method uses speaker-level credibility priors and does not retrieve external evidence. The results support speaker-history credibility context as a low-cost, model-agnostic signal for improving claim screening, while the LIAR2 NEW findings should be interpreted as related-benchmark robustness rather than universal out-of-domain generalization.

More from our Archive