How Skype’s Star Trek Translator Works
Skype has been breaking down geographical barriers since its inception, but the new Skype Translator is set to break down language barriers, and fundamentally change the way people communicate across national borders. Imagine having a real-time conversation with someone on the other side of the planet who doesn’t speak a word of your language. For the first time ever, Skype Translator makes that possible.
What Is It?
Skype Translator does something that has long been a dream of science fiction : it allows people who speak different languages to communicate verbally. Each person’s words are translated in real time and repeated in the other’s native language — making for a very seamless, natural conversation. It’s not a perfect system, of course, but it has a lot of potential and will likely give Skype an edge amid growing competition from Apple’s FaceTime and smaller video chat startups .
How It Works
What makes Skype Translator interesting, to me, is that none of its components are necessarily revolutionary on their own. It’s a set of preexisting technologies that Skype — with help from its parent company Microsoft — has skillfully combined to build a truly innovative product. This, you could argue, is how the best products are made — not by inventing something totally new, but by combing the resources you already have in a way no one has done before.
When you travel to a foreign country where you don’t speak the language, a translator is essential. A translator acts as a middleman, allowing communication between people who don’t speak a common language. Skype Translator works similarly: it adds a bot to your conversation — essentially a third participant in the call — and the bot performs the function of a human translator. It translates what you say when you’ve finished talking, and it translates what the other person says when they’ve finished talking. Then, using text-to-speech technology, it reads the translation aloud.
On the surface, it looks pretty simple — but there’s more to it than meets the eye.
Speech recognition — a critical component of Skype Translator — has been terrible for a long time. The range and ambiguity of human speech is a nightmare for speech recognition engineers, especially when filtered through the noisy, tin-can speakers of mobile devices. Recently, however, you may have noticed a major improvement in the quality of speech recognition, for services like Siri, Cortana, and Google Now.
This is due to the influence of a technology called “Deep Neural Networks ” (DNN), a method shown to produce much more accurate, robust results than conventional speech-to-text systems. After applying more than 300 hours of speech-training to a DNN system for speech recognition, Microsoft researchers achieved a word-error rate of just 18.5 percent, down 33 percent when compared to results obtained from a leading conventional speech-to-text system.
After transcribing spoken words into text, Skype Translator translates that text from one language to another using the same technology that powers Bing Translator. The system is specifically trained for conversational language, which differs from other translation models designed for formal written text. It combines the broad language knowledge of Bing Translator with an additional layer of words and phrases commonly used in spoken conversations. On top of that, it automatically removes filler speech like ‘ahs’ and ‘umms’ and dead air.
The secret sauce behind each of the components of the Skype Translator is Microsoft’s robust machine learning platform. The software learns from a variety of sources — including translated Web pages, videos with captions, and actual one-to-one training data — to better understand and translate the wide variety of topics, accents, and dialects of Skype’s users. All of this data is entered into the machine learning system and used to build a statistical model of the words and their contexts — so when you say something in a Skype Translator conversation, the software can search for something similar within that statistical model and provide an appropriate translation accordingly.
I don’t speak a word of Spanish—I took German at school instead—but with Skype Translator I was able to have a spoken conversation with a Spanish speaker as if I were in an episode of Star Trek (as long as that episode isn’t Darmok, amirite?). I spoke English. A moment later, an English language transcription would appear, along with a Spanish translation. Then a Spanish voice would read that translation.
It took a moment to get used to the pacing of the conversation—the brief delay for the translation means that if you understand the language of the other person, there’s a temptation to respond immediately, without waiting for the voice to read the translation—but once this rhythm was learned, the conversation was fluent and continuous.
By all accounts, the experience is incredible.
Is It Practical?
As excited as I am about Skype Translator, I’m not sure I’ll use it to talk to my foreign language friends. Why? Because I don’t have any foreign language friends. Most people don’t — language is a pretty big barrier for starting any relationship. Will this technology change that? Perhaps — but it will take some time.
Skype Translator could become a valuable asset for international business, so long as it’s accurate. Beyond that, as Peter Bright points out, it could find value in mixed language families where grandparents and grandchildren do not share a common tongue.
There are a lot of applications for this technology, and as other companies begin to do similar things — as they almost certainly will — it will open up possibilities for relationships that never would have been possible before. Who knows? Maybe in a few years I will have some foreign language friends.
If you’d like to register to be Skype Translator preview user, you can do so here.
In the meantime, we want to hear your thoughts! Will you use Skype Translator? Are you excited about it? Let us know in the comments below!
Image Credit: Skype, TechCrunch