These days you don’t have to limit your search to just websites. Many other forms of content are easy to find, including images. No matter what you’re looking for, an image is (for better or worse) just one image search away.
You may wonder, however, how image search works. How are images sorted and classified, making it possible to find tens or hundreds of relevant results? Perhaps you’re just curious, or perhaps you run a site and want to know so you can improve your own ranking. In either case, taking a deeper look could be helpful.
Old Fashioned Text
Some people assume that image search is conducted via fancy algorithms that determine what an image is about and then index it. I know that’s where I started. As it turns out, however, old fashioned text is one of the most important factors in an image’s ranking.
More specifically, the file name matters. Go ahead – do an image search. What do the top results have in common? Almost invariably, it’s a portion of their file name. Most of the top results for “pizza” have the word pizza in the file name.
That might seem obvious. But actually, it’s not. Most digital photographs, for example, will start life with a file name like “1020302.jpg.” It’s only later that they’re re-named. For webmasters, ensuring that a relevant file name is given to an image is just as basic and important as making sure that a webpage’s keyword appears in that page’s metadata title and/or description. But it’s not automatic. It takes constant effort.
Of course, search engines can’t just rely on file names. If that were the case, everyone would just name photographs of their dog something like “celebritywifeswap.png” and call it a day.
Search needs to go a bit deeper to make sure that the image, whatever its file name, is actually relevant to a given keyword. And to do that, the search engine relies on data found on the webpage the image is located. After all, images are usually used in support for text content, so that content can provide information about what the image represents. What this means is that, just as with text content, image search looks for patterns, and rewards websites that can create them. Sticking a picture in a place that’s relevant will increase its chances.
This seems a bit too obvious once you think about it, but it’s really no different from normal web search. There, websites that revolve around a common theme are almost always rewarded, while those with no focus are often penalized. Even having an appropriate URL can be a big deal.
Classification & Clustering
Now we get to the interesting stuff.
Once an image search engine has crawled an image, looked at its file name, and looked at the content surrounding it, it probably has a good idea what the image is about. People searching for an image by keyword can find what they’re looking for most of the time based on only file name and context, both of which have nothing to do with the content of the image at all.
Yet the content of the image does matter. What do you do if you want to search for pictures of Luke Skywalker in Star Wars? You could just type in “luke skywalker” and find a lot of options. You’ll find pictures of Luke Skywalker by himself, pictures of him doing things, pictures of him posing, pictures of the actor who played Luke in various settings at various ages, movie posters, and even fan art.
That’s where classification comes in. Image search engines analyze an image and ask some basic questions about it. Is there a face? What colors are used, and how frequently? What’s the resolution? Searchers can then narrow down their search based on this information, and some services (including Google Image Search) let users search by uploading an image.
Usually search is able to find a number of similar options, though busy and colorful images tend to throw a wrench in the works. Despite what you may have heard about advanced facial recognition techniques and other such technology, wide-scale image classification still seems to be a tricky business that doesn’t always generate the right results.
Another technique that assists users is clustering. This is an attempt to group together images that are similar in content. When you search using Google or Bing, for example, you will find a number of optional searchers at the top of the interface. In our Luke Skywalker example these end being things like “luke skywalker wallpaper” and “luke skywalker cartoon.” Some search engines, like Bing, even offer a list of related characters if your search related to a character from a popular movie.
As a webmaster, this is something that is largely out of your hands. All you can do is attempt to provide high-quality images that offer clear content. A busy picture with a lot going on will be harder to classify and cluster than one that is relatively simple, focusing on only one object or face.
Image search is interesting, but probably not as cutting-edge as some suspect. The algorithms used, though never precisely relayed to the public, obviously place a priority on simple information.
If you’re a user of image search and wondering if there’s some trick to getting the most out of it – well, there’s really not. The same techniques that you’d use to search for a webpage are usually applicable to images. Webmasters also will find this true when trying to rank images. Just as with web search, clarity, specificity and quality are important.