All three of the major phone platforms now have their own voice. Apple has Siri, Microsoft has Cortana, and Google has the somewhat less sassy Google Now.
These systems let you handle basic tasks through voice control. Each is a sort of virtual secretary that can answer simple questions, open apps, make notes, and relay messages. They’re useful, but they’re also frustratingly limited. They can’t do anything they weren’t explicitly programmed for, and many tasks are simply beyond their abilities.
However, there are a number of technologies being developed that are going to dramatically improve these systems — and they’re going to be commercially available in just a few years. Here are the top five ways your phone is about to get a whole lot smarter.
It’ll See What You See
Speech recognition has made huge strides over the last five years, thanks to the development of powerful neural networks. Modern smartphones can identify speech with surprising accuracy (it’s been a while since Google Now has misunderstood me), and can even do stuff like identify songs and television shows based on their audio.
This is great — but it’s only the start. Humans don’t interact with the world primarily through sound. We use vision for practically everything — and soon, our machines will too. We’re starting to see the debut of the first wearable headset displays like Google Glass and Microsoft’s HoloLens, which can stream information from their cameras to your smartphone, providing a rich and always-on supply of visual information. Many observers, myself included, expect these to become common over the next five years or so.
So what can your phone do with all this data?
Plenty. Google has already demonstrated with their Tango tablet that a depth camera can determine the spatial location of physical objects with extremely high accuracy. Likewise, there have been some amazing advances in machine vision — like Microsoft’s neural network that can identify individual dog breeds, and Google’s neural network that can accurately describe the contents of photographs. Together, these technologies open up a whole world of applications:
What is this bolt? The machine vision algorithm knows, and can order a replacement on Amazon in five seconds. What was the name of the woman you met at the bar last night? You may have lost her card, but your glasses caught her face, and can find her on Facebook. You have a weird mole. Should you see your doctor? Your phone can take a look and let you know. You’re grocery shopping: what do you need? Your glasses remember the last time you looked in your fridge, and know what’s missing. They can even give you aisle-by-aisle directions to find the individual items.
The privacy implications of this technology are terrifying, but this sort of thing also sounds incredibly useful.
It’ll Know What You Like
Taste can be a hard thing to pin down. It’s very personal, and can be very arbitrary. Spotify and Netflix get by on making recommendations based on usage patterns (a technique that totally ignores content) — but there are real limits to this approach.
The future is in algorithms which can understand art on a deep level — complex neural networks that can learn to pick complex subtleties of music and movies. That gives a much richer understanding of what people do and don’t like about a piece of media — a concrete grasp of their taste.
The first early hints of this are starting to show up. Like Spotify’s “Discover Weekly” playlist, which uses these sorts of content analytics tools to recommend songs — including songs that you might not normally hear, because they’re too obscure to rank well based on usage patterns. These techniques are only going to get more powerful in the future, and the smartphone is a great platform to leverage this information.
Looking for something to do with friends? What if apps on your phones can collaborate, pool information on your respective tastes, and pick a movie or concert they think you’ll all like? What about dating sites that rank you based on deep features of your respective tastes? What about apps that can recommend you books, because they’ve read them?
The possibilities are endless, and somebody is going to make a lot of money with these.
It’ll Be Chatty
Human: where are you now?
Machine: i’m in the middle of nowhere.
Human: what is the purpose of dying?
Machine: to have a life.
Human: what is the purpose of being intelligent?
Machine: to find out what it is.
Human: what is the purpose of emotions?
Machine: i don’t know.
This is a conversation with a neural network developed by Google, which learns from large databases of text without human supervision. This version of it was trained on a number of old movie scripts, and trained to predict the next sentence in each conversation it encountered. As it turns out, if you train an AI on movie scripts, all of its conversations are super dramatic.
If you’re not impressed, that’s understandable. Chatbots have existed for a long time.
But take a closer look. Google’s system is special in two key ways.
First, it’s very simple in terms of architecture: it’s a single huge neural network, and it had to learn the dynamics of human language by itself. Ordinary chatbots work by using thousands of hand-coded rules for how to respond to different situations. When they recite a response, it’s because it was hand coded, or memorized from a specific example. Google’s network is different. It learned its behavior organically, by trying to understand the deep structure of human writing.
Second, it captures some surprisingly subtle dynamics that other chat bots do not. It can answer common sense questions about the world, despite not being explicitly programmed to do so.
Human: who is skywalker ?
Machine: he is a hero .
Human: who is bill clinton ?
Machine: he ’s a billionaire .
Human: is sky blue or black ?
Machine: blue .
Human: does a cat have a tail ?
Machine: yes .
Human: does a cat have a wing ?
Human: can a cat fly ?
Machine: no .
Human: how many legs does a cat have ?
Machine: four , i think .
A different version of the chat bot, trained on tech-support logs, can muddle through providing real tech support surprisingly well. There are stumbles (it seems to have difficulty remembering which operating system you’re using) — but I’ve certainly received worse tech support from actual human beings.
All of this is a good indication that we’re starting to get to the point where artificial intelligence can be truly conversational. The network above can carry on a tolerable conversation, even though its training goals only require it to follow the superficial structure of a back-and-forth dialog.
Given better training data and a reward function that emphasizes information exchange, the possibilities are limitless. Imagine a version of Siri that can engage you in a real conversation, provide answers and advice in response to questions, and perform tasks without needing to be specifically programmed to do so. It’s not far off.
It’ll Be Well-Read
Another technology that Google has been working on has to do with reading comprehension. It’s called “Thought vectors,” and the concept is surprisingly simple. You can extract “thought vectors” from the activity of a neural network that has a piece of information, like a sentence or an article. What you get is an opaque piece of information that means nothing to anyone except the network that generated it. This piece of information, in some sense, stores the “meaning” of the text, separate from how it was originally phrased.
This has some useful properties. For starters, these vectors resemble each other for sentences with similar meanings. If you digest two sentences in this way, you can determine whether or not they mean the same thing. You can also manipulate them. By using two neural networks to generate “thought vectors” from text in different languages and then training a third network to learn to map between them, you can create an extremely powerful machine translation method that captures the meaning of the text, and not just the words in it.
Another potential application of this is to use this technology to collect large amounts of information and digest it into a compact representation, then generate a summary based on the output. This could be hugely powerful for mobile applications.
Imagine being able to ask your phone to go read everything available on Google about a given topic. Then, come back to you and report its findings succinctly, in natural language, and answer questions about the results. This is going to be reality really, really soon, and it’s going to be incredibly useful.
The Phone of the Future
Phones in the future will probably look very different than phones today. They may be curved. They may be modular. You might interact with them using augmented reality glasses. However, the most important difference will be intelligence. The features described here will transform our devices into powerful tutors and helpers.
There’s currently a heated arms race in deep learning technology. The side effect is that these techniques are advancing incredibly rapidly, and they’ll be on the market sooner than you might think.
Are you excited by smarter smartphones? Concerned about the privacy implications? Let us know in the comments!
Image Credits:Human brain by Mopic via Shutterstock