DOI: 10.1145/3828660 ISSN: 0360-0300

A Comprehensive Survey of Deep Learning for Entity Resolution

Dimitrios Karapiperis, Christos Tjortjis, Vassilios Verykios

Entity Resolution (ER) is a fundamental data integration task aimed at identifying records that refer to the same real-world entity. While traditional methods relied on brittle, handcrafted features, the recent shift to Deep Learning (DL) enables automatic feature learning via semantic embeddings, dramatically improving performance, particularly on noisy and unstructured textual data. This survey provides a structured overview of this rapidly advancing field. We chart the evolution of the embedding models that underpin modern ER, from static word vectors to context-aware Transformers. Following the canonical blocking and matching pipeline, we analyze state-of-the-art DL techniques for both stages, categorizing them by their learning paradigms and architectures. We also examine the emerging frontier of using Large Language Models (LLMs) in few-shot and chain-of-thought settings. Finally, we synthesize our findings, evaluate the limitations and inherent difficulty of existing benchmark datasets, discuss critical challenges like fairness and explainability, and outline key directions for future research.

More from our Archive