The Labeling Dilemma: Study of Clickbait Datasets and their Methodologies

doi:10.2478/fcds-2026-0008

DOI: 10.2478/fcds-2026-0008 ISSN: 2300-3405

The Labeling Dilemma: Study of Clickbait Datasets and their Methodologies

Avinash Shrivastava, Anamika Gupta, Aayush Arora, Anjali Tomar

Abstract

In today’s digital ecosystem, there is a dominance of attention driven content platforms that promote sensationalism over informational quality. These platforms use various means to manipulate users. Clickbait is among them. It often uses misleading or exaggerated headlines to lure people to click on a link. This leaves them in discontent as the promises are never met. The aim is to gain user engagement by either routing them to a page with lots of advertisements that, in turn, boost their revenue or simply spreading misinformation. This necessitate the development of automatic clickbait detection models. This article serves as a systematic review of the work done in this domain, focusing on two important areas: existing datasets with their labeling techniques, and the evolution of various clickbait detection models from ML to DL to new pre-trained language models such as BERT and RoBERTa. This paper aims to serve academic researchers and industry professionals seeking an overview of clickbait detection methods with particular emphasis on ground truth datasets generation and their labeling strategies.

Outline

The Labeling Dilemma: Study of Clickbait Datasets and their Methodologies

Abstract

More from our Archive