3 Most Useful Discovery Engines: Find Similar Pages

We all know how the Web search works: all you need is to provide relevant terms that describe the concept or the topic you are interested in, click "Search" and the search engine will generate the list of results ranked by popularity.

But what if you don't exactly know how to describe the concept or the topic you are interested in? What if you just want "something of the kind"?

In this case, you need to try discovery search engines: these tools rank the Web by similarity (not by popularity). They allow you to discover more pages based on the one you found most relevant.

Here are the three useful and advanced discovery search engine tools.

I have mentioned this neat search operator when listing google tricks when you don't know what to search for. I also reviewed the visualization tool based on this operator called TouchGraph which can be used as discovery tool as well.

The index:

Obviously, the operator uses Google's own database (which is huge). However, for most searches you run, you are most likely to see no more than 30-50 results (which looks as if Google generates only the fraction of possible results).

The algorithm:

The basic algorithm behind the search operator is co-citation, which, in simple words, works like this: if a web page A links to both page B and page C, the latter pages (B and C) are likely to be related. Of course, not everything is that easy and straightforward (it never is with Google) but the basic algorithm is like that.

Google Search Results Support?

Yes, you can access the list of related sites right from Google search results page by clicking the "Similar" link:

Drawbacks:

It is hard to find faults with the mighty Google. The only drawback that comes to mind is that it is still Google and if you really need to test alternative user experience and get alternative results (ranked not by Google), you should try out other tools as well.

Similar Pages is a standalone tool that uses its own technology and claims to let users dig into the "hidden" parts of the web - those that you wouldn't be able to find using Google only: whereas "ordinary" search engines rank results by popularity (preventing us thus from seeing less popular pages), SimilarPages rank pages by similarity.

The index:

The tool uses their own database which is claimed to contain more than 3.2 billion pages. The FireFox addon is said to access 200 million sites.

The algorithm:

The tool uses "PageAffinity" that takes into account both the content of pages as well as the linking structure of the web to determine the level of similarity between web pages.

Google Search Results Support?

Yes, with their addon installed, you are able to view similar pages right within search results:

Drawbacks?

The tool has worked surprisingly well and suggested really good matches but it seems to be somewhat biased to home pages.

Similar Sites

SimilarSites (and its FireFox addon Similar Web) works similarly to the above one

The index:

The developers seem to be very secretive as to which technology they use and how many sites they crawl. All that I have been able to find out using the external sources is that they have mapped "millions" of sites, and "adding tens of thousands daily".

The algorithm:

Like the above two tools, this seems to be using page content and linking structure but the unique part is that they also analyze users' input (votes) as well as user browsing trends.

Google Search Support?

Yes, with their FireFox addon you access similar sites right from the Google search results (works only for site home pages that you come across when searching Google):

Drawbacks?

As the name suggests (and as we have seen from the above screenshot), the tool works only on the domain level. So no matter what the current page is about, the tool will only find similar sites to the current site (home pages). In other words, if you were testing it on this page (which is about search discovery), the tool would list sites about generic web tools and desktop tools and hacks (which is MUO is generally about)

Besides that, the tool inserts "sponsored" results throughout their search results (which are marked as sponsored but may still seem disturbing).

Any other great discovery search engines you are aware of? Please share them in the comments!

Image credit: VJ_fliks