Foreigners sure talk fast, don’t they?

When you listen to someone speaking a foreign language, you’re sure they’re speaking very quickly, far faster than you speak in your more refined native language.

It turns out they think exactly the same thing about the way you talk.

It’s the “Gabbling Foreigner Illusion.” Speech scientists have studied it for decades. Everyone feels that foreign languages are spoken faster than their own language.

There are several reasons we need extra time to process words spoken in a language that we’re learning. I’ll tell you about them below.

It’s one of the reasons that everyone, absolutely everyone, is looking forward to technology that can translate conversations in real time. Maybe high tech glasses can display text from a conversation in front of our eyes, or maybe our phones can deliver a translation to our earbuds.

Advances with AI are bringing this closer all the time. Google has teased glasses that can do real-time translation for years. At a conference two weeks ago Google said AI-powered glasses will be available to the public in the next twelve months that do real-time translation. Meta just rolled out live translation to its AI-based Ray-Ban glasses, but reports suggest that the experience is frustrating and slow.

That’s because translation is an incredibly hard technology problem for exactly the same reasons that foreign languages sound fast when we listen to them. It may be a while before magic glasses transform our vacations and turn into Babel fish.

(The Babel fish is a memorable creation in The Hitchhiker’s Guide To The Galaxy. In the story, if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. It is worth recalling what Douglas Adams said about the results of giving everyone a translator: “The poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”)

There are four factors that make foreign language speech sound fast – and that make it challenging to train AI to be a translator.

Connected speech and reductions  I was gonna describe this but you understand already, doncha?

We connect words when we talk. We leave out vowels and we blend words together. The first step for translation – for people or for machines – is to accurately identify word boundaries and figure out what words are intended. We don’t make that easy when we talk naturally.

Manual processing  Our brain works harder when we’re listening to a foreign language – we have to process each word, retrieve its meaning, and understand the grammar. Native speakers do that automatically and quickly. Speech in a foreign language sounds fast because we’re literally working harder to keep up.

AI has an equivalent challenge: real-time translation requires immense processing power and efficient algorithms. Imagine that someone speaks in a foreign language to your fancy new AI glasses. The audio from the microphones in your glasses is sent to your phone; it’s uploaded to the AI servers in the cloud; the algorithms do speech recognition, translation, and convert it to speech in your language; the new audio is downloaded to your phone; and finally it’s sent to your glasses so you can hear it through the speakers in the stems.

Any latency in that process is noticeable. And there are a lot of places for delays, not the least of which is the time required for the online processing by the AI.

Unfamiliarity with rhythm  Each language has its own unique rhythm, intonation, and stress patterns. (The official term is “prosody” but research shows that 80% of people stop reading articles when they see the word “prosody.”) Differences in rhythm between languages can make unfamiliar speech sound rushed or unclear.

In your native language, you can recognize emotion and sarcasm and distinguish questions and statements from tone of voice and rhythm. Think about how easy it is to misunderstand text messages because they arrive without a tone of voice to put them in context.

AI models struggle to capture rhythm and tone and transfer them to another language so it sounds natural and expressive. Researchers are trying to figure this out but it’s a particularly hard part of AI translation.

Reduced predictability  When you’re learning a foreign language, you have fewer established chunks of language to rely on. You can’t as easily predict upcoming words or phrases. It slows down your comprehension and makes the actual speed feel overwhelming.

That’s a harder problem than you might think because languages have different sentence structures. Verbs are at the end of German sentences. The German sentence “Gestern Abend haben wir Herrn X ein Bier serviert” is translated word-for-word as “Yesterday evening have we Mr X a beer served.” Our brain has to spin extra cycles to turn that into the word order we expect in English: “Last night we served Mr X a beer.”

It causes havoc for AI translations. If an AI translator has to wait for an entire sentence before it begins its work, there is guaranteed latency that will slow down responses. And it will be more likely to be overwhelmed by the speed of the conversation – exactly the way you feel when you begin to fall behind.

Researchers are working on “simultaneous translation” models that translate incrementally, chunk by chunk, rather than waiting for full sentences, but this adds complexity and risk of errors.

Want an example of why AI translation is a hard computer problem? This paper proposes Translation by Anticipating Future (TAF).

  • The AI gets a portion of a sentence and predicts multiple possible continuations.

  • The system generates a translation of each predicted continuation.

  • TAF uses a majority-voting mechanism to select the most likely output.

  • But then it does another vote to decide whether to send out that chunk of the translation (WRITE) or wait a bit longer (READ), based on how confident it is.

  • The AI then gets more chunks and gains context, possibly releasing some chunks it held out, and continuously refining its decisions in the future, constantly influencing future predictions, majority votes, and READ/WRITE actions.

All that happens while you’re waiting for the translation to appear in your glasses or played in your earbuds.

The effect is that people seem to speak too fast when you’re learning a foreign language, and they speak too fast for AI to keep up with translation. We will get glasses that do real-time translation before long, but it will likely be several years before we can simply relax and talk to people in other countries without saying, “Please slow down and enunciate” – and having them say that back to you. We’re all gabbling foreigners to AI.

Share This