Someone’s asked you a question, and halfway through it, you already know the answer. While you think you’re politely waiting for your chance to respond, new research shows that you’re actually more impatient than you realize.
In the vocal equivalent of sitting on the edge of your seat, speakers position their vocal organs (tongues, jaw, lips) for the sounds they’re planning to produce in the near future, rather than passively waiting for their turn to speak.
The path to this discovery required real-time magnetic resonance imaging (MRI) of the vocal tract, not an easy technique.
Lead researcher Sam Tilsen, assistant professor of linguistics in the College of Arts & Sciences, worked with Yi Wang, the Faculty Distinguished Professor in Radiology and professor of biomedical engineering, and Pascal Spincemaille, assistant professor of radiology, both at Weill Cornell Medical College, to develop the MRI protocol, including the image reconstruction procedure.
They were assisted by doctoral and master’s degree engineering students, who helped develop algorithms to process data and extract useful images from MRI data.
The structural MRI’s took images of tissue in the vocal tract, much like an x-ray, very quickly – about 200 times a second -- providing high “temporal resolution” of what was happening as subjects moved their tongues and jaw. This allowed the researchers to measure rapid changes in the positions of vocal organs before, during and after the subjects spoke.
“It surprised us how some speakers positioned their vocal tracts to anticipate upcoming responses,” says Tilsen, “but also that there was a great variation in which vocal organs speakers used for this positioning. People don’t all behave in a unified, coherent way.”
Questions remain, says Tilsen. Why do some people anticipate vocal responses while others do not? Why do people use different vocal organs to anticipate different sounds? Using a structural MRI limits researchers to studying movements and other behavior to understand the relation between cognition and speech, says Tilsen. His goal is to overcome that limitation through neurotechnology.
“We want to develop the ability to simultaneously record brain signals and anatomical detail,” says Tilsen, “which could involve simultaneous MRI and fMRI scanning. It’s possible, but it’s challenging.”
Tilsen also wants to develop software that would enable a clinician and patient in a scanner to watch a video of words the patient just produced; this would be useful to help stroke patients or people who have suffered other forms of brain trauma to re-learn how to control their vocal organs.
“There’s only so much you can do with what you see and hear externally,” Tilsen says. “People don’t have a good sense of what’s going on with their tongue. Even I was surprised when I saw these MRI images of what the tongue is doing.”
Because speech is so complex a behavior, it’s often where abnormalities are first noticed when neurological disorders develop. Understanding how speech is produced will enable better and earlier treatment of neurological disorders, as well as treatment of neurological traumas that result in speech disruption. Tilsen also hopes his research will aid in developing voice synthesis and speech recognition technologies that operate more naturally.
Initially interested in cognitive linguistics (how concepts are structured and how meaning is constructed), Tilsen says “speaking seems to produce similar sorts of patterns as those we see in other biological, chemical and physical systems. I want to figure out whether the cognitive systems that give rise to speech can be understood in the same ways that we understand biological and physical systems.”
Linda B. Glaser is a staff writer for the College of Arts & Sciences.