DOI: 10.35377/saucis...1829206 ISSN: 2636-8129

Feature Extraction for Real Estate Images and Titles with LLMs

Afra Arslan, Tan Doruk Yetki, Arda Yücel, Hacer Turgut, Ömür Bali, Gülfem Işıklar Alptekin, Günce Keziban Orman
Images and titles often contain rich latent information about their associated objects, particularly on web-based platforms. Real estate websites provide a clear example, where listing images and titles provide important details that assist users in their decision-making. However, these unstructured elements cannot be directly utilized in downstream machine learning tasks, since their contextual meaning is not directly interpretable. This work aims to transform listing images and titles into structured, tabular representations, making them suitable for analytical and predictive modeling. To this end, we propose a modular framework based on state-of-the-art large language models. The framework incorporates ReAct, LLM-as-a-Judge, and few-shot prompting techniques. Its performance is evaluated on a real-world real estate dataset and compared with BERT and CLIP-based baselines. Experimental results demonstrate that our framework achieves up to a 44.26% improvement in recall for listing attributes, such as the presence of a balcony or the furnishing status of a property.

More from our Archive