DOI: 10.1162/imag.a.1299 ISSN: 2837-6056

A Modular Semantic-Structural Pipeline for Visual Decoding from Primate Spiking Data via Selective Temporal Integration

Matteo Ciferri, Matteo Ferrante, Nicola Toschi

Abstract

Characterizing the information content of intracortical signals during visual processing is a central challenge in systems neuroscience. We address the problem of decoding visual information from high-density intracortical recordings in primates, using the THINGS Ventral Stream Spiking Dataset. We systematically evaluate the effects of model architecture, training objectives, and data scaling on decoding performance. Results show that decoding accuracy is jointly driven by non-linearity and selective temporal aggregation, rather than heavier sequence modelling in this data regime. A simple model combining temporal attention with a shallow MLP achieves up to 70% top-1 image retrieval accuracy, outperforming linear baselines as well as recurrent and convolutional approaches. Scaling analyses reveal predictable diminishing returns with increasing input dimensionality and dataset size. Building on these findings, we design a modular generative decoding pipeline that combines low-resolution latent reconstruction with semantically conditioned diffusion, generating plausible images from 200 ms of brain activity. This framework provides principles for brain-computer interfaces and semantic neural decoding.

More from our Archive