DOI: 10.1145/3716855 ISSN: 2157-6904

Do Large Language Models have Spatial Cognitive Abilities?

Ruoling Wu, Danhuai Guo

Since the emergence of large language models, they have garnered significant attention from scholars and industry professionals alike. An important question that arises alongside this phenomenon is whether these large language models possess cognitive capabilities akin to humans. Spatial cognition, a crucial aspect of human cognitive capability, serves as a fundamental metric in this evaluation. This study endeavors to explore two central themes. Firstly, it seeks to ascertain whether large-scale models possess spatial cognitive capabilities. Secondly, it aims to discern optimal prompt methods for eliciting improved responses concerning spatial cognition, encompassing considerations of both result stability and accuracy. We design a series of experiments with 24 typical spatial scenes to assess whether the current array of eight popular large language models exhibits spatial cognition and examine the level of their spatial cognition proficiency. Subsequent discussions delve into strategies for enhancing the spatial cognition performance of large language models to bring them closer to human cognitive levels. Without additional prompts, the average accuracy of the eight large models in judging the three basic spatial relations (topological, direction, and distance relations) is 33.25%. After prompt optimization, the accuracy improves significantly, reaching 53.90%. Our methodological approach enabled us to systematically assess and compare these models, shedding light on their diverse capabilities in this domain. The benchmark is available at https://github.com/LLING000/SCABenchmark .

More from our Archive