How Do Search Engines Work?
Pinterest Stumbleupon Whatsapp
Advertisement

Article updated by Joel Lee on 10/10/2017

For many, Google is the internet. It’s the starting point for finding new sites, and is arguably the most important invention since the internet itself. Without search engines, new web content would be inaccessible to the masses.

But do you know how search engines work? Every search engine has three main functions: crawling (to discover content), indexing (to track and store content), and retrieval (to fetch relevant content when users query the search engine).

Crawling

Crawling is where it all begins: the acquisition of data about a website.

This involves scanning sites and collecting details about each page: titles, images, keywords, other linked pages, etc. Different crawlers may also look for different details, like page layouts, where advertisements are placed, whether links are crammed in, etc.

But how is a website crawled? An automated bot (called a “spider”) visits page after page as quickly as possible, using page links to find where to go next. Even in the earliest days, Google’s spiders could read several hundred pages per second. Nowadays, it’s in the thousands.

How Do Search Engines Work? web crawler diagram

When a web crawler visits a page, it collects every link on the page and adds them to its list of next pages to visit. It goes to the next page in its list, collects the links on that page, and repeats. Web crawlers also revisit past pages once in a while to see if any changes happened.

This means any site that’s linked from an indexed site will eventually be crawled. Some sites are crawled more frequently, and some are crawled to greater depths, but sometimes a crawler may give up if a site’s page hierarchy is too complex.

One way to understand how a web crawler works is to build one yourself. We’ve written a tutorial on creating a basic web crawler in PHP, so check that out if you have any programming experience.

How Do Search Engines Work? google search on tablet

Note that pages can be marked as “noindex,” which is like asking search engines to skip its indexing. Non-indexed parts of the internet are known as the “deep web” What Is the Deep Web? It's More Important Than You Think What Is the Deep Web? It's More Important Than You Think The deep web and the dark web both sound scary and nefarious, but the dangers have been overblown. Here's what they actually and how you can even access them yourself! Read More , and some sites, like those hosted on the TOR network, can’t be indexed by search engines. (What is TOR and onion routing? What Is Onion Routing, Exactly? [MakeUseOf Explains] What Is Onion Routing, Exactly? [MakeUseOf Explains] Internet privacy. Anonymity was one of the greatest features of the Internet in its youth (or one of its worst features, depending on who you ask). Leaving aside the sorts of problems that spring forth... Read More )

Indexing

Indexing is when the data from a crawl is processed and placed in a database.

Imagine making a list of all the books you own, their publishers, their authors, their genres, their page counts, etc. Crawling is when you comb through each book while indexing is when you log them to your list.

Now imagine it’s not just a room full of books, but every library in the world. That’s a small-scale version of what Google does, who stores all of this data in vast data centers with thousands of petabytes worth of drives Memory Sizes Explained - Gigabytes, Terabytes & Petabytes in Layman's Terms Memory Sizes Explained - Gigabytes, Terabytes & Petabytes in Layman's Terms It is easy to see that 500GB is more than 100GB. But how do different sizes compare? What is a gigabyte to a terabyte? Where does a petabyte fit in? Let's clear it up! Read More .

Here’s a peek inside one of Google’s search data centers:

How Do Search Engines Work? google search data centers
Image Credit: Google

Retrieval and Ranking

Retrieval is when the search engine processes your search query and returns the most relevant pages that match your query.

Most search engines differentiate themselves through their retrieval methods: they use different criteria to pick and choose which pages fit best with what you want to find. That’s why search results vary between Google and Bing, and why Wolfram Alpha is so uniquely useful 10 Cool Uses Of Wolfram Alpha If You Read And Write In The English Language 10 Cool Uses Of Wolfram Alpha If You Read And Write In The English Language It took me some time to wrap my head around Wolfram Alpha and the queries it uses to spout out those results. You have to dive deep into Wolfram Alpha to really exploit it to... Read More .

Ranking algorithms check your search query against billions of pages to determine each one’s relevance. Companies guard their ranking algorithms as patented industry secrets due to their complexity. A better algorithm translates to a better search experience.

They also don’t want web creators to game the system and unfairly climb to the tops of search results. If the internal methodology of a search engine ever got out, all kinds of people would surely exploit that knowledge to the detriment of searchers like you and me.

How Do Search Engines Work? pen html search engine meta
Image Credit: photovibes via Shutterstock

Search engine exploitation is possible, of course, but isn’t so easy anymore.

Originally, search engines ranked sites by how often keywords appeared on a page, which led to “keyword stuffing” — filling pages with keyword-heavy nonsense.

Then came the concept of link importance: search engines valued sites with lots of incoming links because they interpreted site popularity as relevance. But this led to link spamming all over the web. Nowadays, search engines weight links depending on the “authority” of the linking site. Search engines put more value on links from a government agency than links from a link directory.

Today, ranking algorithms are shrouded in more mystery than ever before, and “search engine optimization” Demystify SEO: 5 Search Engine Optimization Guides That Help You Begin Demystify SEO: 5 Search Engine Optimization Guides That Help You Begin Search engine mastery takes knowledge, experience, and lots of trial and error. You can begin learning the fundamentals and avoid common SEO mistakes easily with the help of many SEO guides available on the Web. Read More isn’t so important. Good search engine rankings now come from high-quality content and great user experiences.

What’s Next for Search Engines?

Ah, now there’s an interesting question. The answer is “semantics”: the meaning of the page’s content. You can read more about in our overview of semantic markup and its future impact What Semantic Markup Is & How It Will Change The Internet Forever [Technology Explained] What Semantic Markup Is & How It Will Change The Internet Forever [Technology Explained] Read More .

But here’s the gist of it.

Right now, you can search for “gluten-free cookies” but the results may return recipes for gluten-free cookies. Instead, you might find regular cookie recipes that say “This recipe is not gluten-free.” It has the right keywords, but the wrong meaning.

With semantics, you can search for cookie recipes and then remove certain ingredients: flour, nuts, etc. You can also narrow down results to only recipes with prep times less than 30 minutes and review scores of 4/5 or greater. That would be cool, right? That’s where we’re heading!

Still confused about how search engines work? See how Google explains the process:

If you found this interesting, you might also like to learn about how image search engines work.

Image Credit: prykhodov/Depositphotos

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Shilongo Ya-Shalongo
    September 27, 2017 at 10:18 pm

    this was very clear and straight to the point. thank you very much!

  2. tech.nayapicture.in
    September 21, 2016 at 5:59 am

    very useful article

  3. Asad
    September 10, 2016 at 12:34 pm

    wonderfull Article..Thanks

  4. M.Barry
    February 7, 2016 at 3:57 am

    wow this well explained thank you for this heads up

  5. Andrew Turner
    June 1, 2015 at 2:17 pm

    Bit stupid to answer a question about how search engines work because like most things on the internet they do not work. They give you a load of dross and totally ignore what you are asking. I expect money and the paymaster has something to do with it. But when My search contains a specific word I expect the results to relate to that word rather than to things totally unrelated to that word. As usual internet professionals pander to the high and mighty rather than telling them to get there act together and return relevant responses. After all they are called search engines.

  6. Richard Eaves
    May 23, 2013 at 8:40 am

    Thanks for the refresher. Landing a coveted spot in Google’s SERPs is more complicated than it looks. But it’s safe to say that content still reigns supreme on the internet.

  7. Phuc Ngoc
    April 18, 2013 at 7:53 am

    Or else, see this amazing and easy-to-understand explanation from Google itself: http://www.google.com/insidesearch/howsearchworks/thestory/

    • unkerjay
      April 21, 2013 at 3:48 pm

      Watching Google explain how Search Engines work is a lot like listening to
      Microsoft explaining Operating Systems.

      Just a wee bit self serving.

  8. Mike
    April 18, 2013 at 5:18 am

    As long as search engines have a for profit model as their base, there will be misinformation and stacked decks in the results. Google Adwords give preference to those who pay well for given search results topping the list. Then there are those who "hack the algorithms" further stacking the results deck. There are misguided cultural / business decisions that either exclude results (China / Iran / North Korea / France etc) as well as decisions made on religious, social weightings of decency, morality, legality. Not to mention the extent to which politics and government affect accuracy.

    Most results will fall into the black and white area. Those that have no implicit or explicit bias or benefit - snow, cat, dog, etc. The rest, those for which there is a bias either implied or explicit, or some preferential benefit - democrat, republican, tennis shoes, gun control, abortion, homosexuality, porn, deviant sexuality will find results vary from one search engine to the next.

    Last, the internet is largely a medium of words, not numbers, not sound, not images and search engines reflect that. You can find music based on the musician, not on the melody. You can find an image if you can accurately describe it or its associated details. Google is one step above most if you can upload an image and have it do a lookup based on that image. It's better than tineye or other similar addons in that regard.

    As with "news" with search engines, rather than any one search engine, try the same search across multiple search engines either there will be a degree of sameness in the results - the microsoft rule of no better no worse than the competition or results will be favored from one search engine to the next depending on the search.

    Crowd searching, human intermediary (libraries, etc) can help, but, is no guarantee of better or more accurate results, just a glide path to comprehension relative to cultural / demographic references: e.g. history, entertainment, linguistic, regional, biographical, geographical references.

    The web is largely a for profit enterprise that is predicated upon fee for service entry. That in and of itself involves a certain degree of inclusion / exclusion to the process. No matter how well it does (or doesn't provide) results. Without access, without sufficient literacy of the means or process, results are secondary. That is changing, but there's still a LOT of room for improvement.

  9. Zhong J
    April 18, 2013 at 4:18 am

    Base from my computer science class, there are also meta search engines: combination of different web engines to achieve multiple results. When you're searching something, the top list will be links containing the most popular which are shared the most. So technically if you get manage to millions of people to share a article you've published, your keywords will appear on the top rank of Google.

    • James Bruce
      April 18, 2013 at 7:51 am

      Hmm, sounds like a pretty bad way to search - popularity doesn't always indicate truth , reliability or quality.

      • Zhong J
        April 18, 2013 at 3:58 pm

        True, most search engines checks for relevance which is usually combine with the most hits and the Internet these days, you cannot truly trust what you read or hear. Since we're the tertiary source for these information, not the primary.