DOI: 10.3390/math14122227 ISSN: 2227-7390

From Wikidata to Smart Tourism: A Reproducible Pipeline Based on AI and Fuzzy Logic for Interpretable Multi-Category Classification of Points of Interest

Aristea Kontogianni, Konstantina Chrysafiadi, Maria Virvou, Efthimios Alepis

Wikidata provides extensive coverage of tourism-related Points of Interest (POIs), yet its heterogeneous type system and uneven metadata limit its direct use in smart tourism applications. This paper presents an end-to-end pipeline that transforms Wikidata POIs into a compact and interpretable tourism-oriented representation supporting multi-category assignments. We collect POIs from six countries—Greece, Italy, Spain, Norway, Sweden, and Denmark—and construct a dataset that integrates core identifiers with textual descriptions, type information, heritage indicators, geographic coordinates, and Wikipedia sitelinks. We introduce an eight-category tourism taxonomy capturing key themes, including cultural venues, archaeological and historic sites, monuments, fortifications, religious sites, protected areas, natural features, and coastal or water locations. As a reproducible baseline, category likelihoods are estimated using sentence embeddings and similarity to category anchor descriptions, producing a probability vector for each POI. Building on this baseline, we propose a fuzzy inference layer that integrates embedding-based probabilities with structured Wikidata signals to generate interpretable membership degrees across categories and enable principled multi-category classification. This fusion is particularly valuable for smart tourism applications, as it supports robust faceted exploration and personalized recommendations (e.g., “historic + coastal”), while providing evidence-based explanations that enhance user trust and facilitate curator oversight when POI metadata is sparse or ambiguous. The resulting pipeline produces ranked POI catalogs by country and category, country-level tourism profiles, and diagnostic views for examining uncertain cases. The approach is fully reproducible and readily adaptable to other geographic regions or domain taxonomies.

More from our Archive