98%
921
2 minutes
20
Human speech perception is multisensory, integrating auditory information from the talker's voice with visual information from the talker's face. BOLD fMRI studies have implicated the superior temporal gyrus (STG) in processing auditory speech and the superior temporal sulcus (STS) in integrating auditory and visual speech, but as an indirect hemodynamic measure, fMRI is limited in its ability to track the rapid neural computations underlying speech perception. Using stereoelectroencephalograpy (sEEG) electrodes, we directly recorded from the STG and STS in 42 epilepsy patients (25 F, 17 M). Participants identified single words presented in auditory, visual and audiovisual formats with and without added auditory noise. Seeing the talker's face provided a strong perceptual benefit, improving perception of noisy speech in every participant. Neurally, a subpopulation of electrodes concentrated in mid-posterior STG and STS responded to both auditory speech (latency 71 ms) and visual speech (109 ms). Significant multisensory enhancement was observed, especially in the upper bank of the STS: compared with auditory-only speech, the response latency for audiovisual speech was 40% faster and the response amplitude was 18% larger. In contrast, STG showed neither faster nor larger multisensory responses. Surprisingly, STS response latencies for audiovisual speech were significantly faster than those in the STG (50 ms 64 ms), suggesting a parallel pathway model in which the STG plays the primary role in auditory-only speech perception, while the STS takes the lead in audiovisual speech perception. Together with fMRI, sEEG provides converging evidence that STS plays a key role in multisensory integration. One of the most important functions of the human brain is to communicate with others. During conversation, humans take advantage of visual information from the face of the talker as well as auditory information from the voice of the talker. We directly recorded activity from the brains of epilepsy patients implanted with electrodes in the superior temporal sulcus (STS), a key brain region for speech perception. These recordings showed that hearing the voice and seeing the face of the talker evoked larger and faster neural responses in STS than the talker's voice alone. Multisensory enhancement in the STS may be the neural basis for our ability to better understand noisy speech when we can see the face of the talker.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1523/JNEUROSCI.1037-25.2025 | DOI Listing |
J Child Lang
September 2025
Department of Psychology, University of TorontoMississauga, Mississauga, Ontario, Canada.
A growing literature explores the representational detail of infants' early lexical representations, but no study has investigated how exposure to real-life acoustic-phonetic variation impacts these representations. Indeed, previous experimental work with young infants has largely ignored the impact of accent exposure on lexical development. We ask how routine exposure to accent variation affects 6-month-olds' ability to detect mispronunciations.
View Article and Find Full Text PDFJ Neurosci
September 2025
Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
Human speech perception is multisensory, integrating auditory information from the talker's voice with visual information from the talker's face. BOLD fMRI studies have implicated the superior temporal gyrus (STG) in processing auditory speech and the superior temporal sulcus (STS) in integrating auditory and visual speech, but as an indirect hemodynamic measure, fMRI is limited in its ability to track the rapid neural computations underlying speech perception. Using stereoelectroencephalograpy (sEEG) electrodes, we directly recorded from the STG and STS in 42 epilepsy patients (25 F, 17 M).
View Article and Find Full Text PDFProg Neurobiol
September 2025
The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, United States; Elmezzi Graduate School of Molecular Medicine at Northwell Health, Manhasset, NY, United States; Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, United States; Tr
Humans live in an environment that contains rich auditory stimuli, which must be processed efficiently. The entrainment of neural oscillations to acoustic inputs may support the processing of simple and complex sounds. However, the characteristics of this entrainment process have been shown to be inconsistent across species and experimental paradigms.
View Article and Find Full Text PDFAnn N Y Acad Sci
September 2025
BCBL, Basque Center on Cognition, Brain and Language, Donostia, Spain.
Neural tracking, the alignment of brain activity with the temporal dynamics of sensory input, is a crucial mechanism underlying perception, attention, and cognition. While this concept has gained prominence in research on speech, music, and visual processing, its definition and methodological approaches remain heterogeneous. This paper critically examines neural tracking from both theoretical and methodological perspectives, highlighting how its interpretation varies across studies.
View Article and Find Full Text PDFCereb Cortex
August 2025
Department of Psychology, University of Lübeck, Ratzeburger Allee 160, Lübeck 23562, Germany.
The human auditory system must distinguish relevant sounds from noise. Severe hearing loss can be treated with cochlear implants (CIs), but how the brain adapts to electrical hearing remains unclear. This study examined adaptation to unilateral CI use in the first and seventh months after CI activation using speech comprehension measures and electroencephalography recordings, both during passive listening and an active spatial listening task.
View Article and Find Full Text PDF