Big data projects presented at the Convention Research Symposium show how AI could revolutionize clinical diagnosis, monitoring, and treatment.
Editor's note: This article is part of a feature package on AI applications in communication sciences and disorders. See also:
The spotlight may be trained on ChatGPT, but many other artificial intelligence (AI) applications are poised to shake up communication sciences and disorders (CSD). The Research Symposium at ASHA’s 2023 Convention will delve into some of these high-profile projects, which hold the transformative power to:
These AI applications—being developed by multidisciplinary research teams of clinical practitioners, bioengineers, computer scientists, and data scientists—are just some of those to be discussed at the symposium. Their AI power comes from huge datasets of speech, language, hearing, and health information, like those fueling Google’s portfolio of AI projects to help adults with speech-language disorders communicate. During the symposium, which will be live-streamed, keynote speaker Philip Nelson, Google’s director of software engineering, will explore that portfolio’s implications for alternative and augmentative communication (AAC).
Symposium chair Jordan Green will add his perspective from working with Nelson on Google’s Project Euphonia, which functions as a kind of speech interpreter or clarifier, deployable by smartphone. It’s one of several such projects funded by big tech companies. Another is the Speech Accessibility Project—funded by Microsoft, Google, Apple, Meta, and Amazon, and the subject of a talk by project lead Mark Hasegawa-Johnson of the University of Illinois. That project is collecting samples from millions of people with speech-language challenges.
“There is a wide variety of AI applications for these data, including communication access, screening, and clinical monitoring,” says Green, chief scientific advisor at the MGH Institute of Health Professions. “That’s why all these companies are so interested in speech technology. Because speech is such a sensitive outcome measure. If you think about it, your cognition, motor skills, language skills, and emotional states are all conveyed through speech and communication.”
One major AI push is helping adults with speech disorders and degenerative disease be heard and understood. Take Google’s Project Euphonia. “This team is working on models that can recognize a wide range of non-standard* speech patterns, even when speech deteriorates over time,” Green says. “They are also tackling the formidable task of designing systems people can use in real-world settings, with natural conversation.” (*Note that terms used in the AI industry may differ from those used clinically in CSD.)
This is where an evolving AI technology—automatic speech recognition (ASR)—comes in. ASR systems trained on millions of speakers can be tuned to recognize unique speech patterns, which boosts accuracy. How is this tuning done? By training the models on a relatively small number of recordings from one person.
The promise of personalized ASR for helping people with speech challenges is evident in Project Euphonia: “They’ve shown, for example, that somebody who’s only 20% intelligible to a listener can be 80% intelligible to the computer using personalized ASR,” Green says. Google has already collected data on thousands of adult speakers with conditions including Down syndrome, post-stroke dysarthria, traumatic brain injury, and progressive neurologic diseases such as Parkinson’s and amyotrophic lateral sclerosis (ALS). And the company’s data-gathering efforts continue.
Although ASR of varying speech profiles has improved significantly over the past few years, these systems still struggle when used for functional communication, such as conversational speech. This is because, to date, most ASR models have been trained on speech samples recorded as people are reading (easy to obtain and transcribe). Overcoming this challenge, Green says, will require speech samples recorded during real-world interactions—for example, conversations with multiple partners at a restaurant.
A longitudinal study to be presented by Project Euphonia collaborator Richard Cave, of the Motor Neurone Disease Association, indicates that the more personalized the ASR is, the more effective it is. Other presenters will highlight AI applications for people who’ve lost all ability to speak. Rupal Patel, for example, of Northeastern University and VocalID, will describe work to make synthetic voices sound more natural through enhancing the diversity of voices that can be generated, and improving the efficiency of voice-cloning technologies.
And speakers Leigh Hochberg of Harvard Medical School and BrainGate, and Jun Wang, of the University of Texas, will discuss advancements in brain-computer interfaces to help patients with lost motor control express themselves and converse.
The above projects focus on recognizing or producing speech, but another line of research—called speech diagnostics or speech analytics—is concerned with analyzing it. Projects in this category use speech as a diagnostic pathway to detect neurological or mental health problems. So far, they’re not distinguishing between types of speech disorders like dysarthria and apraxia, but “we’ll get there,” says Green.
In one of these lines of inquiry, presenter Emily Provost and her University of Michigan team are tracking speech disfluencies, pauses, and word-finding delays to gauge cognitive function and possible impairment. In another, speakers Julie Liss and Visar Berisha, of Arizona State University, are using a new approach to clinical speech analytics to track speech-motor and cognitive outcomes in patients with neurodegenerative disorders. The technology has been used in many clinical trials and was recently recognized as a breakthrough device by the U.S. Food and Drug Administration.
Also using speech as a diagnostic marker (along with language, facial, cognitive, and motoric markers) is researcher Vikram Ramanarayanan of the University of California, San Francisco. He’ll describe his work with the company Modality.ai to build a chatbot-based assessor of neurological and mental health, looking for conditions such as ALS, Parkinson’s disease, and schizophrenia.
Describing how a clinician might use the tool, Green says, “I would have the patient perform the chatbot assessment, then send me the results. Then I would look at the results to inform their next clinical visit.”
But could this type of screening application potentially threaten clinicians’ expertise? Green has heard this concern raised but sees it as misplaced. “It’s not about replacement. The idea is for these tools to support and enhance clinicians’ diagnostics—and finding that compatibility will take time,” he says. “These technologies will hopefully also aid accuracy and alleviate some of clinicians’ workload on the assessment side, and free them up to actually spend time with patients—to work at the top of the license.”
Ideally, they’ll also aid accessibility of services, he says, “because you can do a lot of the diagnostics remotely and more cheaply. And that’s a game-changer.”
Another AI application is evaluating speech improvement in drug trials for degenerative diseases like ALS, which has spurred a big hiring push for people to develop speech analytics platforms, Green says. Using AI to monitor clients’ speech for improvement over time could also be useful for clinicians, he says, but notes that this application, too, can be controversial.
“It’s an interesting time because there are many strong opinions,” Green says. “People are very divided about is this a tool or is it a crutch? Or the beginning of the demise of human intellect? But that’s a little like saying an audiometer is a crutch for an audiologist. They’re just tools, and hopefully they provide more reliability and precision in our assessment.”
To work properly for communication, he says, AI applications will need to be human-verified by the diverse range of clinicians, patients, clients, families, and conversational partners who will use them; the stakeholders need to be involved in the research. “A system could be very accurate, but if it can’t be integrated into naturalistic conversation and environments, it’ll be of limited use,” says Green.
The symposium speakers are keenly aware of this critical need, he notes, after pioneering this research and working in the space for a decade. “Anyone can throw an algorithm at data, but these researchers have been moving the science forward since these were just crude tools,” Green says. “With this explosion of advances, they are well positioned to be AI thought leaders—and to see beyond the hype and understand the limitations, as well as the promise, of these technologies.”
Bridget Murray Law is editor-in-chief of The ASHA Leader. [email protected].
Learn more about AI and audiology on the ASHA Voices podcast, which features a conversation with the the two speakers from the 2023 Research Symposium on Hearing, Fan-Gang Zeng (UC Irvine) and Devin McCaslin (University of Michicgan). Visit on.asha.org/podcast.
The convention research symposition in CSD (funded in part by the National Institute on Deafness and Other Communication Disorders), includes two tracks. The first, on AI in audiology and hearing, will take place Friday morning. The second, on AI in CSD, will be held Saturday, all day. Both will be livestreamed as “Virtual Extras”* for those unable to attend in person. Visit on.asha.org/convention-pp for session locations.
Harnessing the Power of Artificial Intelligence to Improve Audiological Research and Care
Use of Machine Learning Techniques to Manage and Assess Dizzy Patients
Breaking Barriers with AI: Google’s Programs for Advancing Accessibility, Communication, and Social Inclusion (keynote)
Speech Disorder Research in the Big-Data Era: Large-Scale Databases and Speech Analytics
Advances in Speech Biomarkers for Monitoring Neurological and Mental Health
Personalized Speech Recognition and Vocal Synthesis—Advancing Clinical Care for Individuals with Speech Impairments
Update on Brain-Computer Interfaces
*Note that Virtual Extra sessions are open for everyone to view, but only those registered for the full ASHA Convention or virtual-only component may claim continuing education credit (one-day in-person registrants are not eligible). Learn more about convention live-streaming at on.asha.org/virtual-extra.