Automatic Paraphrase Generation at Phrasal, and Sentence Level for Urdu Language: Data and Methods
Zara Khan, Iqra Muneer, Rao Muhammad Adeel Nawab, Ahmad MahmoodParaphrasing involves rewording a text to maintain its meaning while using different language. In recent years, there has been growing interest among researchers in automatic paraphrase generation (APG). Previous studies have primarily focused on developing corpora and methods for APG tasks in English and other languages. However, there is a lack of comprehensive benchmark corpora and standardized methods specifically designed for APG in Urdu. To address this gap, this study introduces two extensive benchmark corpora: the Urdu Phrasal Paraphrase corpus (UPP-22 corpus) at the phrasal level and the Urdu Sentential Paraphrase corpus (USP-22 corpus) at the sentence level. The UPP-22 corpus contains 3,50,000 paraphrased pairs, while the USP-22 corpus consists of 11,000 paraphrased pairs, both specifically curated for the Urdu APG task. As a second major contribution, this research developed and applied various state-of-the-art deep learning models, including sequence-to-sequence models (using long short-term memory [LSTM], gated recurrent unit [GRU], bidirectional LSTM, and bidirectional GRU) and sequence-to-sequence models with attention mechanisms as baseline methods for the proposed Urdu Paraphrase corpora. Additionally, the study introduced a bidirectional and auto-regressive transformer-based model specifically tailored to these corpora as a third contribution. A fourth significant contribution was the development of a test corpus at the phrasal level, consisting of 10,000 instances, created through automatic translation, followed by manual inspection and correction, to evaluate the performance of APG tasks for the Urdu language. As a fifth major contribution, the study applied and fine-tuned a large language model (GPT-4-Mini) for APG in Urdu, resulting in significant performance improvements. The evaluation was carried out using the standard bilingual evaluation understudy (BLEU) metric. The best results were achieved with BLEU-1 = 75.89 for the UPP-22 corpus and BLEU-1 = 75.16 for the USP-22 corpus using the GPT-4-Mini model, representing a significant improvement over all other models including baseline. These corpora and methodologies will be made publicly available to encourage and promote further research in APG for under-resourced languages such as Urdu.