Mapping the Movie-Watching Brain with AI-Derived Semantics
Muwei LiAbstract
Naturalistic paradigms offer a powerful tool to investigate human brain function, but it remains difficult to link rich, continuous movie content to distributed brain activity in an interpretable way. In this study, I use a multimodal large language model (Gemini) as an automated “semantic annotator” to bridge naturalistic movie stimuli, brain responses, and cognitive performance. Using the Human Connectome Project movie-watching dataset, I segmented the film into 293 overlapping clips, prompting Gemini to rate each clip on 11 psychologically interpretable dimensions. Simultaneously, I extracted clip-wise BOLD activation patterns from the fMRI images in 360 cortical ROIs. In this way, the AI and the brain effectively “watch” the same movies in parallel. For each brain ROI, I then fit linear regression models to predict clip-to-clip variation in movie-evoked responses from these features. Gemini-derived features robustly predicted movie-evoked responses in temporal, medial parietal, and lateral frontal association cortex, but explained little variance in unimodal somatosensory, dorsal parietal, insular, and piriform regions. Feature-weight maps reflected known functional specializations, and features with the largest global influence overlapped with the most explainable ROIs. Partial least squares analysis revealed that individual differences in resting-state connectivity strength and semantic explainability covaried along an asymmetric intrinsic axis: strongly integrated sensory-opercular systems at rest were associated with poorer AI predictability, whereas a smaller set of dorsal and medial association regions showed enhanced alignment. Finally, regional AI explainability in medial parietal and left perisylvian association areas was positively related to specific cognitive abilities. Together, these findings demonstrate that interpretable features from AI models provide a simple and scalable framework for quantifying AI-derived semantic predictability in naturalistic settings, offering a practical framework for utilizing artificial models as semantic references to probe human neural processing and individual differences.