Uncovering the sound of 'motherese,' baby talk across languages

Around the world, mothers speak differently to their children than they do to other adults — and Princeton researchers have found a new way to quantify that vocal shift.

Mothers and babies in the Baby Lab

Mothers interact with their babies in the Princeton Baby Lab, where researchers identified consistent shifts in vocal timbre between mothers speaking or reading to their children and speaking to other adults.

With their kids, mothers switch into a special communicative mode known as “motherese” or “baby talk” — an exaggerated and somewhat musical form of speech. While it may sound silly to adults, research has shown that it plays an important role in language learning, engaging infants’ emotions and highlighting the structure in language, to help babies decode the puzzle of syllables and sentences.

Child-directed motherese

A mother speaks to her baby.

0:000:00

And now, Princeton researchers have identified “a new cue that mothers implicitly use to support babies’ language learning,” said Elise Piazza, a postdoctoral research associate with the Princeton Neuroscience Institute (PNI). “We found for the first time that mothers shift their vocal timbre.

Adult-directed Motherese

The same mother speaks to a researcher in the Princeton Baby Lab, illustrating the shifts in pitch, cadence and timbre between regular speech and "motherese."

0:000:00

“Timbre is best defined as the unique quality of a sound,” explained Piazza. “Barry White’s silky voice sounds different from Tom Waits’ gravelly one — even if they’re both singing the same note.”

She and her colleagues found that the timbre shift was consistent across women who speak 10 languages, including English, and that the differences are strong enough to be reliably picked out by a machine learning algorithm. Their study appears today in the journal Current Biology.

To investigate the timbre of baby talk, Piazza and her colleagues, Marius Cătălin Iordan, also a PNI postdoctoral research associate, and Casey Lew-Williams, an assistant professor of psychology, invited 12 English-speaking women into the Princeton Baby Lab, where researchers study how babies learn to see, talk and understand the world. The researchers recorded the mothers while they played with or read to their 7- to 12-month-old infants and while they spoke to an adult experimenter.

Casey Lew-Williams, Elise Piazza and Marius Cătălin Iordan

From left: Casey Lew-Williams, Elise Piazza and Marius Cătălin Iordan research child development in the Princeton Baby Lab.

Quantifying baby talk

The scientists then quantified each mother’s vocal fingerprint — the overall statistical profile of her timbre — using a measure called the mel-frequency cepstrum. They found that adult-directed and infant-directed speech had significantly different fingerprints.

“It’s so consistent across mothers,” said Piazza. “They all use the same kind of shift to go between those modes.”

She and her colleagues found that the mothers’ speech timbre differed enough that a computer algorithm could learn to reliably classify infant- and adult-directed speech, even using just one second of recorded speech.

The researchers did not investigate fathers or other caregivers. “We used mothers to keep overall pitch range fairly consistent across participants,” said Piazza. “However, I’d predict that our findings would generalize quite well to fathers.”

Baby talk is not a new discovery, of course.

“We’ve known for a long time that adults change the way they speak when they are addressing babies,” said Jenny Saffran, a professor of psychology at the University of Wisconsin-Madison who was not involved in this research. “They speak more slowly, use shorter sentences, talk at a higher pitch and swoop their pitch up and down more often than when they are speaking to other adults.”

What sets this work apart, Saffran explained, was that “this is the first study to ask whether [mothers] also change the timbre of their voice, manipulating the kinds of features that differentiate musical instruments from one another. This is fascinating because clearly speakers are not aware of changing their timbre, and this new study shows that it is a highly reliable feature of the way we speak to babies.”

Once the Princeton team had established that the 12 mothers all had measurable shifts in their vocal timbre, they began thinking of how to expand the study, said Piazza.

“We wondered if this might generalize to mothers who aren’t speaking English,” she said. “So we took a second set of 12 mothers, who did not speak English as their native language, and asked them only to speak in their native, non-English language in all of the recordings. So now we have this new, rich dataset of recordings from Mandarin, Polish, Russian — nine different languages in all.”

When they looked at the data, the researchers found that this timbre shift between adult- and child-directed speech was “highly consistent” across languages from around the world: Cantonese, French, German, Hebrew, Hungarian, Mandarin, Polish, Russian and Spanish.

These shifts in timbre may represent a universal form of communication with infants, said Piazza.

Is timbre the same as pitch?

“Imagine the entire orchestra simultaneously playing the exact same pitch as they tune up,” said Piazza. “You hear the different rich timbres that separate the different instrument families.”

Vocal descriptors like raspy, gravelly, hoarse, nasal, or velvety apply to timbre, not pitch, she added. “We use it all the time to distinguish people, animals and other sounds,” she said.

Piazza and her colleagues isolated a shift in the vocal fingerprint of baby talk “through a combination of clever methods of measuring timbre and machine learning algorithms,” said Patrick Shafto, a data scientist and associate professor of mathematics and computer science at Rutgers University. The result is “the first successful quantitative formalization of vocal timbre which has been validated through modeling and an automatic method for classifying infant-directed versus adult-directed speech across languages.”

Their technique for quantifying timbre could also open doors to other types of speech analysis, noted Piazza.

“Our findings could enable speech recognition software to rapidly identify this speech mode across languages. Our work also invites future explorations of how speakers adjust their timbre to accommodate a wide variety of audiences, such as political constituents, students and romantic partners.”

This research was supported by a PNI C. V. Starr Postdoctoral Fellowship and grant HD079779 from the National Institute of Child Health and Human Development.