Pinterest Stumbleupon Whatsapp

To many people, Google IS the internet. It’s the default homepage and the first port of call before accessing any site. It’s arguably the most important invention since the Internet itself. Without search engines, content would all be hand picked – just like newspapers and magazines. And while search engines have changed a lot since those first humble beginnings – and Google certainly isn’t the only search engine out there –  the underlying principles are the same as they always were.

Do you know how search engines work? There are three basic stages for a search engine: crawling – where content is discovered; indexing, where it is analysed and stored in huge databases; and retrieval, where a user query fetches a list of relevant pages.


Crawling is where it all begins – the acquisition of data about a website. This involves scanning the site and getting a complete list of everything on there – the page title, images, keywords it contains, and any other pages it links to – at a bare minimum. Modern crawlers may cache a copy of the whole page, as well as look for some additional information such as the page layout, where the advertising units are, where the links are on the page (featured prominently in the article text, or hidden in the footer?).

How is a website crawled exactly? An automated bot – a spider – visits each page, just like you or I would, only very quickly. Even in the earliest days, Google reported that they were reading a few hundred pages a second. If you’d like to learn how to make your own basic web crawler in PHP How To Build A Basic Web Crawler To Pull Information From A Website (Part 1) How To Build A Basic Web Crawler To Pull Information From A Website (Part 1) Read More  – it was one of the first articles I wrote here and well worth having a go at (just don’t expect to make the next Google).

The crawler then adds all the new links it found to a list of places to crawl next – in addition to re-crawling sites again to see if anything has changed. It’s a never-ending process, really.

how search engines work


Any site that is linked to from another site already indexed, or any site that manually asked to be indexed, will eventually be crawled – some sites more frequently than others and some to a greater depth. If the site is huge and content hidden many clicks away from the homepage, the crawler bots may actually give up. There are ways to ask search engines NOT to index a site, though this is rarely used to block an entire website.

There was even a time when large parts of the Internet were essentially invisible to search engines – the so-called “deep web 10 Search Engines to Explore the Invisible Web 10 Search Engines to Explore the Invisible Web We are familiar with the web. But did you know that there is a vast cache of information that search engines like Google don't have direct access to? This is the invisible web. Read More ” – but this is rare now. TOR-hosted websites (What is Onion Routing? What Is Onion Routing, Exactly? [MakeUseOf Explains] What Is Onion Routing, Exactly? [MakeUseOf Explains] Internet privacy. Anonymity was one of the greatest features of the Internet in its youth (or one of its worst features, depending on who you ask). Leaving aside the sorts of problems that spring forth... Read More ) for example, remain unindexed by Google, and are only accessible by connecting to the TOR network and knowing the address.

how do search engines work


You’d be forgiven for thinking this is an easy step – indexing is the process of taking all of that data you have from a crawl, and placing it in a big database. Imagine trying to a make a list of all the books you own, their author and the number of pages. Going through each book is the crawl and writing the list is the index. But now imagine it’s not just a room full of books, but every library in the world. That’s pretty much a small-scale version of what Google does.

All of this data is stored in vast data-centres with thousands of petabytes worth of drives. Here’s a sneaky peak inside one of Google’s:

how do search engines work

Ranking & Retrieval

The last step is what you see – you type in a search query, and the search engine attempts to display the most relevant documents it finds that match your query. This is the most complicated step, but also the most relevant to you or I, as web developers and users. It is also the area in which search engines differentiate themselves (though, there was some evidence that Bing was actually copying some Google results). Some work with keywords, some allow you to ask a question 10 Cool Uses Of Wolfram Alpha If You Read And Write In The English Language 10 Cool Uses Of Wolfram Alpha If You Read And Write In The English Language It took me some time to wrap my head around Wolfram Alpha and the queries it uses to spout out those results. You have to dive deep into Wolfram Alpha to really exploit it to... Read More , and some include advanced features like keyword proximity or filtering by age of content.

The ranking algorithm checks your search query against billions of pages to determine how relevant each one is. This operation is so complex that companies closely guard their own ranking algorithms as patented industry secrets. Why? Competitive advantage for a start – so long as they are giving you the best search results, they can stay on top of the market. Secondly, to prevent gaming of the system and giving an unfair advantage to one site over another.

Once the internal methodology of any system is fully understood, there will always be those who try to “hack” it – discover the ranking factors and exploit them for monetary gain.

how do search engines work

Exploiting the ranking algorithm has in fact been commonplace since search engines began, but in the last 3 years or so Google has really made that difficult. Originally, sites were ranked based on how many times a particular keyword was mentioned. This led to “keyword stuffing”, where pages are filled with mostly nonsense so long as it includes the keyword everywhere.

Then the concept of importance based on linking was introduced  – more popular sites would be more linked to, obviously – but this led to a proliferation of spammed links all over the web. Now each link is determined to have a different value, depending on the “authority” of the site in question. If a high level government agency links to you, it’s worth far more than a link found in a free-for-all “link directory”.

Check out for more examples of SEO gone wild.

search engine explanation

Today, the understanding of the exact algorithm is even more shrouded in mystery than ever, and the dark art of “Search Engine Optimization” has largely been crippled – the advice now is to focus on providing the best content, with a great user experience (how crazy, right?!). Considering that almost 60% of all searches end up clicking the first result, it’s easy to see why ranking your page well is so important.

What’s Next For Search Engines?

Ah, now there’s an interesting question. The answer is – semantics – the meaning and type of content a page contains. For more information on that, read my article on Semantic Markup and How it Will Change the Web Forever What Semantic Markup Is & How It Will Change The Internet Forever [Technology Explained] What Semantic Markup Is & How It Will Change The Internet Forever [Technology Explained] Read More .

Here’s the easiest example – right now, you could search for gluten-free cookies, but the pages you find might not actually be a recipe for gluten free cookies; they might have a regular cookie with a bit of text that says “this recipe is not gluten free“. In a world with semantics, you could search for cookie recipes and then remove regular flour from your list of acceptable ingredients. Then you could remove any with nuts, because you’re not particularly keen on nuts. Then you could narrow it down to only recipes with a review score of 4/5 or greater, and a total preparation time of less than half hour. That would be cool right?

Well, you can. Just head over to (international versions may not work), search for a recipe, and use the search tools to narrow it down to only results that are recipes. Then you’ll find an ingredients filter, and more!

how search engines work

And that, dear readers, is how search engines work. Still confused? Here’s how Google themselves explain the process:

If you found this interesting, you might also like to learn about how image search engines work.

Image Credit: ShutterStock – SEO

Leave a Reply

Your email address will not be published. Required fields are marked *

    September 21, 2016 at 5:59 am

    very useful article

  2. Asad
    September 10, 2016 at 12:34 pm

    wonderfull Article..Thanks

  3. M.Barry
    February 7, 2016 at 3:57 am

    wow this well explained thank you for this heads up

  4. Andrew Turner
    June 1, 2015 at 2:17 pm

    Bit stupid to answer a question about how search engines work because like most things on the internet they do not work. They give you a load of dross and totally ignore what you are asking. I expect money and the paymaster has something to do with it. But when My search contains a specific word I expect the results to relate to that word rather than to things totally unrelated to that word. As usual internet professionals pander to the high and mighty rather than telling them to get there act together and return relevant responses. After all they are called search engines.

  5. Richard Eaves
    May 23, 2013 at 8:40 am

    Thanks for the refresher. Landing a coveted spot in Google’s SERPs is more complicated than it looks. But it’s safe to say that content still reigns supreme on the internet.

  6. Phuc Ngoc
    April 18, 2013 at 7:53 am

    Or else, see this amazing and easy-to-understand explanation from Google itself:

    • unkerjay
      April 21, 2013 at 3:48 pm

      Watching Google explain how Search Engines work is a lot like listening to
      Microsoft explaining Operating Systems.

      Just a wee bit self serving.

  7. Mike
    April 18, 2013 at 5:18 am

    As long as search engines have a for profit model as their base, there will be misinformation and stacked decks in the results. Google Adwords give preference to those who pay well for given search results topping the list. Then there are those who "hack the algorithms" further stacking the results deck. There are misguided cultural / business decisions that either exclude results (China / Iran / North Korea / France etc) as well as decisions made on religious, social weightings of decency, morality, legality. Not to mention the extent to which politics and government affect accuracy.

    Most results will fall into the black and white area. Those that have no implicit or explicit bias or benefit - snow, cat, dog, etc. The rest, those for which there is a bias either implied or explicit, or some preferential benefit - democrat, republican, tennis shoes, gun control, abortion, homosexuality, porn, deviant sexuality will find results vary from one search engine to the next.

    Last, the internet is largely a medium of words, not numbers, not sound, not images and search engines reflect that. You can find music based on the musician, not on the melody. You can find an image if you can accurately describe it or its associated details. Google is one step above most if you can upload an image and have it do a lookup based on that image. It's better than tineye or other similar addons in that regard.

    As with "news" with search engines, rather than any one search engine, try the same search across multiple search engines either there will be a degree of sameness in the results - the microsoft rule of no better no worse than the competition or results will be favored from one search engine to the next depending on the search.

    Crowd searching, human intermediary (libraries, etc) can help, but, is no guarantee of better or more accurate results, just a glide path to comprehension relative to cultural / demographic references: e.g. history, entertainment, linguistic, regional, biographical, geographical references.

    The web is largely a for profit enterprise that is predicated upon fee for service entry. That in and of itself involves a certain degree of inclusion / exclusion to the process. No matter how well it does (or doesn't provide) results. Without access, without sufficient literacy of the means or process, results are secondary. That is changing, but there's still a LOT of room for improvement.

  8. Zhong J
    April 18, 2013 at 4:18 am

    Base from my computer science class, there are also meta search engines: combination of different web engines to achieve multiple results. When you're searching something, the top list will be links containing the most popular which are shared the most. So technically if you get manage to millions of people to share a article you've published, your keywords will appear on the top rank of Google.

    • James Bruce
      April 18, 2013 at 7:51 am

      Hmm, sounds like a pretty bad way to search - popularity doesn't always indicate truth , reliability or quality.

      • Zhong J
        April 18, 2013 at 3:58 pm

        True, most search engines checks for relevance which is usually combine with the most hits and the Internet these days, you cannot truly trust what you read or hear. Since we're the tertiary source for these information, not the primary.