A Comparative Study of PEGASUS, BART, and T5 for Text Summarization Across Diverse Datasets
Eman Daraghmi, Lour Atwe, Areej JaberThis study aims to conduct a comprehensive comparative evaluation of three transformer-based models, PEGASUS, BART, and T5 variants (SMALL and BASE), for the task of abstractive text summarization. The evaluation spans across three benchmark datasets: CNN/DailyMail (long-form news articles), Xsum (extreme single-sentence summaries of BBC articles), and Samsum (conversational dialogues). Each dataset presents unique challenges in terms of length, style, and domain, enabling a robust assessment of the models’ capabilities. All models were fine-tuned under controlled experimental settings using filtered and preprocessed subsets, with token length limits applied to maintain consistency and prevent truncation. The evaluation leveraged ROUGE-1, ROUGE-2, and ROUGE-L scores to measure summary quality, while efficiency metrics such as training time were also considered. An additional qualitative assessment was conducted through expert human evaluation of fluency, relevance, and conciseness. Results indicate that PEGASUS achieved the highest ROUGE scores on CNN/DailyMail, BART excelled in Xsum and Samsum, while T5 models, particularly T5-Base, narrowed the performance gap with larger models while still offering efficiency advantages compared to PEGASUS and BART. These findings highlight the trade-offs between model performance and computational efficiency, offering practical insights into model scaling—where T5-Small favors lightweight efficiency and T5-Base provides stronger accuracy without excessive resource demands.