How to Search Amazon for Millions of Public Documents, Images, and More
Pinterest Stumbleupon Whatsapp
Advertisement

Amazon Web Services (AWS) is the blue whale of cloud computing. You may not realize it, but most websites and web services run on this platform. And in fact, AWS’s public cloud is bigger than Microsoft, Google, and IBM combined.

Just like any other massive cloud platform, AWS hosts a variety of publicly accessible data. For instance, you can find huge 100 million strong datasets of Creative Commons images and videos from Flickr. Access it with the help of the YFCC100m Browser.

Try a search with Google. You will be surprised by the massive amount of public documents you can find on AWS. One of the quickest ways to search AWS for PDF files is to use good old Google and one of its advanced search operators.

[Keyword] filetype:PDF site:amazonaws.com

Of course, the files are open to the public and may be available from the search portals of the sites that host them on AWS. But this keyword approach is an “experimental search” that helps you dig into the huge haystack all at once instead of going to each site.

You can also use Google’s Advanced Search page to build your query more precisely and then execute it to search Amazon’s cloud. I prefer a little search tool called Advangle How to Instantly Boost the Accuracy of Search Results on Google and Bing How to Instantly Boost the Accuracy of Search Results on Google and Bing That's why the best way to search still requires the use of boolean search operators. But who wants to memorize all the different search operators? Fortunately, there's a solution! Read More , which helps you build search queries in a visual way.

An “Invisible” Place for Web Research

A search engine is a front door to the web. But there are many ways to search for deep data Journey Into The Hidden Web: A Guide For New Researchers Journey Into The Hidden Web: A Guide For New Researchers This manual will take you on a tour through the many levels of the deep web: databases and information available in academic journals. Finally, we’ll arrive at the gates of Tor. Read More and add to your research skills. Of course, don’t use the information (especially images) blindly. Find the site that owns the information and check their copyright restrictions.

I think Google’s advanced search operators should be part of our research habits. What do you think? Mention a few open directories where you like to do your online research. 

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. huggybear
    August 24, 2017 at 11:27 pm

    The support for PDF, DOC(X) and RTF is very strong on AWS. Searches for filetypes such as epub, chm, mobi, cbr/cbz, djvu, pdb, fb2, etc... zilch on the search term "manual" or other generic terms. There appears to be a conformist conservatism (probably business-related) to those who use AWS as a backend. This is not the case with regular 'filetype' searches on google.

    • Saikat Basu
      August 25, 2017 at 2:08 am

      Yeah, I noticed that too. Nice internet moniker by the way :)

  2. Keith
    August 23, 2017 at 5:11 pm

    I did not know AWS could be searched publicly. Very useful indeed!