AI-Based Model Streams Intelligible Speech from Thoughts

April 2nd, 2025

Via: Berkeley Engineering:

Marking a breakthrough in the field of brain-computer interfaces (BCIs), a team of researchers from UC Berkeley and UC San Francisco has unlocked a way to restore naturalistic speech for people with severe paralysis.

This work solves the long-standing challenge of latency in speech neuroprostheses, the time lag between when a subject attempts to speak and when sound is produced. Using recent advances in artificial intelligence-based modeling, the researchers developed a streaming method that synthesizes brain signals into audible speech in near-real time.

“Our streaming approach brings the same rapid speech decoding capacity of devices like Alexa and Siri to neuroprostheses,” said Gopala Anumanchipalli, Robert E. and Beverly A. Brooks Assistant Professor of Electrical Engineering and Computer Sciences at UC Berkeley and co-principal investigator of the study. “Using a similar type of algorithm, we found that we could decode neural data and, for the first time, enable near-synchronous voice streaming. The result is more naturalistic, fluent speech synthesis.”

According to study co-lead author Cheol Jun Cho, who is also a UC Berkeley Ph.D. student in electrical engineering and computer sciences, the neuroprosthesis works by sampling neural data from the motor cortex, the part of the brain that controls speech production, then uses AI to decode brain function into speech.

“We are essentially intercepting signals where the thought is translated into articulation and in the middle of that motor control,” he said. “So what we’re decoding is after a thought has happened, after we’ve decided what to say, after we’ve decided what words to use and how to move our vocal-tract muscles.”

To collect the data needed to train their algorithm, the researchers first had Ann, their subject, look at a prompt on the screen — like the phrase: “Hey, how are you?” — and then silently attempt to speak that sentence.

“This gave us a mapping between the chunked windows of neural activity that she generates and the target sentence that she’s trying to say, without her needing to vocalize at any point,” said Littlejohn.

Because Ann does not have any residual vocalization, the researchers did not have target audio, or output, to which they could map the neural data, the input. They solved this challenge by using AI to fill in the missing details.

“We used a pretrained text-to-speech model to generate audio and simulate a target,” said Cho. “And we also used Ann’s pre-injury voice, so when we decode the output, it sounds more like her.”

Leave a Reply

You must be logged in to post a comment.