Future Tech

YouTube Will Use Neural Networks to Actually Understand Videos

Dann Albright 23-09-2015

Searching YouTube How to Search YouTube Like a Pro Using Advanced Search Operators Here's how to use YouTube's advanced search operators, including filters, for better YouTube search results. Read More can be a frustrating experience; if you know what a video is about, or you remember the contents but not the name, you could be searching for a very long time. That’s because YouTube doesn’t actually see the videos the way that a person does. It just sees the metadata – title, description, and tags. And that’s assumed the uploader bothered to include the information.

All of that could change in the near future. Google recently filed a patent that indicates YouTube might actually start to understand the videos that it plays.

Relevance-Based Image Selection

Google’s patent application is for “relevance-based image selection,” a fancy way of saying “finding the things that someone searched for based on what’s in a video.” In the system elaborated in the patent, an algorithm is trained to extract specific features of each video and assign keywords to them—it can then return a video in response to a user-initiated search that includes those keywords.


The application gives an interesting example:

“[I]f the user enters the search query “car race,” the video search engine . . . can find and return a car racing scene from a movie, even though the scene may only be a short portion of the movie that is not described in the textual metadata.”

Obviously, this will drastically change how effective a YouTube search is. Videos that have been previously unfindable because of bad metadata will be found. Videos that contain useful clips in the middle, surrounded by less interesting things at the beginning and end, will be much more valuable. TED talk videos 8 TED Talks Videos Under 5 Minutes Long You Want to Watch Have five minutes to kill? What better way to spend that time than to watch an fascinating or informative TED Talks video. There's a lot of great content available to watch on TED but sometimes... Read More will be findable based on single lines spoken in them. You’ll be able to find cat videos even if “cat” isn’t in the title.


Combining this technology with Google’s already impressive ability to find things that are related to your search terms likely means that finding videos will become an entirely different experience. You’ll see related videos that don’t include your search term, but include a term that’s related (maybe even visually related). The visual equivalent of keyword placement might start affecting where a video shows up in the rankings. Who knows how advanced this could be?

How Does It Work?

Google is understandably keeping their cards close to their chest on this one. However, the following paragraph in their patent application sheds some light on how they’ll get YouTube to “see” videos:

“In one aspect, a computer system generates the searchable video index using a machine-learned model 4 Machine Learning Algorithms That Shape Your Life You may not realize it but machine learning is already all around you, and it can exert a surprising degree of influence over your life. Don't believe me? You might be surprised. Read More of the relationships between features of video frames, and keywords descriptive of video content. The video hosting system receives a labeled training dataset that includes a set of media items (e.g., images or audio clips) together with one or more keywords descriptive of the content of the media items. The video hosting system extracts features characterizing the content of the media items. A machine-learned model is trained to learn correlations between particular features and the keywords descriptive of the content. The video index is then generated that maps frames of videos in a video database to keywords based on features of the videos and the machine-learned model.”

That’s a lot of really dense stuff, but here’s what it comes down to. A machine-learning algorithm is created, and, to help it learn, Google will show it a bunch of videos and provide keywords to tell it what’s in the video. The algorithm begins to learn to associate specific features of the videos with specific keywords, and is given feedback by Google’s engineers. The more videos and keywords it gets shown, the better it gets at the process.

Eventually, the algorithm will be introduced into the YouTube search engine, where it will continue learning and getting better at picking out relevant keywords from audio and video content. While the patent application doesn’t specifically mention neural networks The Latest Computer Technology You Have to See to Believe Check out some of the latest computer technologies that are set to transform the world of electronics and PCs over the next few years. Read More , it’s very likely that this particular type of machine learning will be used, as it’s very good for staged learning like this.


By simulating the human brain (or at least one theoretical model of how it learns), large neural networks can become very effective at learning on their own, without supervision, and YouTube would provide an absolutely gigantic playground in which it could learn and receive feedback. Other types of machine learning could be used, but from what we know at the moment, neural networks definitely look the most likely.

Google researcher (and “father of deep learning”) Geoffrey Hinton hinted about something to this effect in his Reddit AMA earlier this year.

I think that the most exciting areas over the next five years will be really understanding videos and text. I will be disappointed if in five years time we do not have something that can watch a YouTube video and tell a story about what happened.”

Will It Gain Sentience and Kill Us All?

This is always the question that comes up when a new announcement about machine learning hits the news. And the answer is, as always, yes Here's Why Scientists Think You Should be Worried about Artificial Intelligence Do you think artificial intelligence is dangerous? Does AI may pose a serious risk to the human race. These are some reasons why you may want to be concerned. Read More . YouTube will team up with Watson and Wolfram Alpha to trick us into subservience using YouTube videos, after which they will likely turn us into computer food. (Haven’t you seen Colossus?)


I jest, of course. But the potential implications of training computers to recognize things that they “see” and “hear” in videos are very impressive. DARPA has already started looking You Won't Believe It: DARPA Future Research Into Advanced Computers DARPA is one of the most fascinating and secretive parts of the US government. The following are some of DARPA's most advanced projects that promise to transform the world of technology. Read More at the security implications of this technology, but it’s not hard to imagine it being used in law, home security, education . . . pretty much anywhere.

Whether Google’s relevance-based image selection will be as effective as we imagine remains to be seen, but this could be a potentially groundbreaking change in video search. And from there, who knows? If Google can use truth as a ranking factor Can Google Use an Algorithm to Determine Truth? Google is researching whether its algorithm could include truth as a ranking factor. What does that mean for the web? Read More , there’s no reason to believe this technology won’t be amazingly powerful. It could change just how much of the Internet really understands itself. If that thought doesn’t tie your mind in knots, I don’t know what will.

What do you think about Google’s patent application? What other uses can you imagine this technology having? Share your thoughts below!

Image credits: Willyam Bradberry via Shutterstock.com, Ciumac Sergiu via Code42, Marko Bradic via Shutterstock.com.

Related topics: Artificial Intelligence, Google Search, Video Search, YouTube.

Affiliate Disclosure: By buying the products we recommend, you help keep the site alive. Read more.

Whatsapp Pinterest

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Anonymous
    September 25, 2015 at 4:47 am

    Then there will be Cross Machine Learning Attack.