Neuroprosthetics breakthrough: brain implant enables real-time speech synthesis for ALS patient

Researchers at UC Davis have demonstrated the first brain-computer interface that directly translates neural activity into synthesised speech in real-time, allowing a man with severe ALS to communicate naturally with immediate audio feedback rather than text-based approaches.

The BCI study participant

The BCI study participant, a 45-year-old man with amyotrophic lateral sclerosis (ALS) and severe dysarthria, is connected to the BCI system by one of the researchers, C Iacobacci of the Department of Neurological Surgery, University of California, Davis.© UC Regents

Revolutionary approach transforms communication prospects

Scientists at the University of California, Davis, have achieved a breakthrough in speech neuroprosthetics by developing the first system capable of instantaneously translating brain signals into synthesised voice. The research, published in Nature on 12 June 2025, represents a significant advance over existing brain-computer interfaces that convert neural activity to text, offering the potential for more natural, real-time conversation.

The study participant, a 45-year-old man with amyotrophic lateral sclerosis (ALS) and severe dysarthria, successfully used the system to communicate with his family in real-time, modulate his intonation to ask questions, and even sing simple melodies. The technology enabled him to interrupt conversations naturally and express emphasis through vocal modulation – capabilities that text-based systems cannot provide.

“Translating neural activity into text, which is how our previous speech brain-computer interface works, is akin to text messaging. It’s a big improvement compared to standard assistive technologies, but it still leads to delayed conversation. By comparison, this new real-time voice synthesis is more like a voice call,” said Sergey Stavisky, senior author and assistant professor in the UC Davis Department of Neurological Surgery.

Researcher Maitreyee Wairagkar operates the BCI system © UC Regents

Overcoming technical challenges

The research team faced substantial technical hurdles, particularly the absence of ground-truth speech data from the participant, who could no longer produce intelligible speech. To address this challenge, the researchers developed an innovative approach that generated synthetic target speech waveforms from text cues and time-aligned these with neural activity to estimate intended speech patterns.

Four microelectrode arrays with a total of 256 electrodes were implanted into the participant’s ventral precentral gyrus, capturing neural activity that was processed through a sophisticated pipeline. The system extracted neural features within one millisecond and decoded these signals using a multilayer Transformer-based model to predict acoustic speech features every 10 milliseconds.

The entire processing chain, from neural signal acquisition to speech synthesis, occurred within 10 milliseconds – comparable to the natural delay between speaking and hearing one’s own voice. This near-instantaneous response enabled continuous closed-loop audio feedback, allowing the participant to hear his synthesised voice as he attempted to speak.

David Brandman, Department of Neurological Surgery, University of California, Davis

David Brandman, Department of Neurological Surgery, University of California, Davis © UC Regents


Maitreyee Wairagkar, Department of Neurological Surgery, University of California, Davis

Maitreyee Wairagkar, Department of Neurological Surgery, University of California, Davis © UC Regents


Sergey Stavisky, Department of Neurological Surgery, University of California, Davis

Sergey Stavisky, Department of Neurological Surgery, University of California, Davis © UC Regents

Remarkable performance outcomes

Human listeners demonstrated the system’s effectiveness through comprehensive evaluation studies. In transcript-matching tests, listeners achieved a mean accuracy of 94.34% when identifying synthesised sentences from multiple choice options. More challenging open transcription tests revealed median error rates of 34% for phonemes and 43.75% for words – a dramatic improvement compared to the participant’s residual dysarthric speech, which had error rates of 83.87% and 96.43% respectively.

The flexibility of the direct voice synthesis approach enabled the participant to produce various vocalisations beyond the training vocabulary, including pseudo-words, interjections, and letter-by-letter spelling. Remarkably, the system could be personalised to approximate the participant’s voice before developing ALS, using voice-cloning technology trained on pre-illness recordings.

Capturing prosodic elements of speech

A particularly innovative aspect of the research involved decoding paralinguistic features – the prosodic elements that convey meaning beyond words. The team discovered that neural activity in the precentral gyrus encoded information about pitch modulation and speech emphasis, enabling real-time control of vocal expression.

The participant successfully demonstrated the ability to modulate his synthesised voice to ask questions (with 90.5% accuracy) and emphasise specific words in sentences (with 95.7% accuracy). In a particularly compelling demonstration, he sang three-pitch melodies by controlling different pitch levels through his neural signals, with human listeners achieving 73% accuracy in identifying pitch differences.

“Our voice is part of what makes us who we are. Losing the ability to speak is devastating for people living with neurological conditions,” said David Brandman, co-director of the UC Davis Neuroprosthetics Lab and the neurosurgeon who performed the implant procedure. “The results of this research provide hope for people who want to talk but can’t. We showed how a paralysed man was empowered to speak with a synthesised version of his voice.”

Neural dynamics reveal planning mechanisms

The research provided unique insights into speech motor cortical activity through analysis of output-null and output-potent neural dimensions. The team observed that putatively output-null neural activity increased substantially before each spoken word and gradually decreased throughout sentence production, suggesting the presence of a neural buffer for entire sentences that emptied as speech progressed.

This preparatory activity proved particularly valuable for the causal decoding approach, as the authors noted: “This output-null activity seems to decrease over the course of a sentence. This may indicate that the speech motor cortex has a buffer for the whole sentence, which is gradually emptied as the sentence approaches completion.”

Clinical implications and limitations

While the results represent a significant advance, the researchers acknowledge important limitations. The study involved a single participant with ALS who retained some articulatory movement and vocalisation ability. Replication across additional participants with various aetiologies of speech loss will be crucial for establishing broader applicability.

The authors emphasise that “brain-to-voice neuroprostheses remain in an early phase” and note that synthesised words were not consistently intelligible. However, they predict that accuracy improvements are achievable through algorithm refinement and increased electrode numbers, which have previously enhanced brain-to-text decoding performance.

The BrainGate2 clinical trial continues to enrol participants, offering hope for individuals with speech paralysis caused by various neurological conditions. This breakthrough represents a critical step towards restoring the full range of human vocal expression through brain-computer interfaces.

Reference

Wairagkar, M., Card, N.S., Singer-Clark, T., Hou, X. et. al. (12 June 2025). An instantaneous voice-synthesis neuroprosthesis. Nature. https://doi.org/10.1038/s41586-025-09127-3