The process of reconstructing visual stimuli and captions from brain activity provides a unique perspective for understanding how perception reconstructs the external world in neural dynamics. Although the depth generation model has made some progress in this field in recent years, generating images and subtitles with both detail accuracy and semantic consistency is still a major challenge. In this paper, we propose a novel framework brain imager for reconstructing visual stimuli and captions from functional magnetic resonance imaging (fMRI). We introduce panoramic segmentation and Generative Semantics for the first time, providing richer multi-level data support and a new perspective for the field of brain signal decoding. Through multi-scale fusion technology, we effectively combine the pixel features of natural images with the structural features of panoramic segmentation, and construct the most advanced "initial guess". At the same time, we explain the relationship between the semantics of text and image and the visual pathway of human brain from the perspective of neuroscience. Based on this neural paradigm, we propose a new semantic connection strategy to guide image reconstruction. In addition, by carefully calibrating the visual semantics to the coding space compressed by the language model, and further combining the understanding ability of our advanced retrieval module and the large language model (LLM), high-quality brain subtitles are generated. Qualitative and quantitative experimental results show that brain imager is superior to current methods in image reconstruction and brain caption tasks.

Brain Visual fMRI Signals Decoding Algorithm

Users can choose their local files or example files provided by us, and click "Upload", then it will decode and may take about 2 minutes.