Redefining AI for Caribbean Creole: a framework for text and speech recognition
Christon Nunes, Matthew Mohammed, Keston Smith, Akash Pooransingh, Daniel Joseph RingisPurpose
Large language models are versatile enough to understand almost any high resource language perfectly. However, low resource languages, such as Trinidadian English Creole, are not as easily recognized, either as input or output. This regularly stems from a lack of available labelled data. This study discusses how local data combined with augmented data can improve the recognition and translation of Trinidadian English Creole by AI systems.
Design/methodology/approach
We propose a unique framework in which a small number of localized sample data were used to generate a larger appropriate dataset for training. This aided in the data availability problem often seen with languages such as Trinidadian English Creole. This generated dataset then had to be manually verified or fine-tuned to be used in the retraining of chatbots.
Findings
Using the fine-tuned AI system, the accuracy of translation had jumped up to a BLEU score of 75% from 59% of the evaluation set, with a 1-s response time. This is further verified with subjective testing results. This was using 3000 sentences of training data, and it is expected that with more data this result can be improved even further.
Originality/value
The framework presented is transferable and can be applied to any low resource language. This has great benefit to the wide range of Creole languages in the Caribbean region, thereby underscoring the importance of creating natural language processing tools that serve linguistically minority communities. Future work can expand this approach to include speech-based systems, code-switched data and regional Creole variants, all while continuing to engage native speaker communities in a participatory design process.