Pinterest Stumbleupon Whatsapp
Advertisement

Yes, you can download websites for offline browsing and it can be a life saver. Maybe you need to showcase a website to a customer at their location or review resources while commuting to work. When you backup websites you can do all this and more.

Having access to a full website backup gives you a lot more freedom than limiting yourself to a few select pages. While browser extensions for offline reading Take Your Reading Offline With These 5 Chrome Apps Take Your Reading Offline With These 5 Chrome Apps Thanks to Chrome's offline apps, you don't need to rely on the Internet to catch up on your e-reading. Read More , like ScrapBook for Firefox, can save single pages, HTTrack is a standalone application which can download whole websites, including media files and outside links.

In this article you will learn how to set up HTTrack to download full websites for offline browsing. Note that while the application has not been updated since 2015, we tested it on the latest version of Windows 10 and found no problems.

What Is HTTrack?

HTTrack can download websites for offline browsing. You can copy an entire webpage from the internet to a local directory, including the full HTML code, images, and other files stored on the server. Once you have mirrored a website to your computer, you can launch it in your browser and navigate through the pages, as though you were looking at the original version. You can also update downloaded pages to capture recently added information.

Here are a few things HTTrack can do:

Advertisement
  • downloading of an entire website
  • authenticating with username and password
  • mirroring external files and websites
  • excluding specific files from the project, e.g. ZIP or GIF files
  • imaging or testing your bookmarks using your bookmark.html file

Advanced users can apply elaborate commands and filters to download exactly what they need. This guide by Fred Cohen will give you an overview of commands and how to use them. It also contains a troubleshooter, in case your website mirrors don’t work as expected.

Note that HTTrack does not support capturing of real time audio / video streaming. Likewise, java script and java applets may fail to download. Moreover, the program can crash if you tax it with a complex project.

Set Up HTTrack to Download Your First Page

HTTrack is simple to use, although it can become a little tricky when the default settings won’t work.

Download: HTTrack for Windows, Linux, and Android

New Project

From the start page, click Next > to set up your first project. Enter a Project name and set a Category if you like. Also choose a Base path, which is the local directory where HTTrack will save your project. For the purpose of this article, I’m backing up the science portal at Wikipedia. Click Next > when you’re done.

Download Modes

For a basic mirroring project, you can simply paste the URL/s of the websites you’d like to back up How Do I Download an Entire Website for Offline Reading? How Do I Download an Entire Website for Offline Reading? It's increasingly rare, but still occasionally true: sometimes you just don't have Internet access. Whether you're on a plane or your grandparent's place in the country, life occasionally brings all of us to places WiFi... Read More into the Web Addresses field. You can also add a list of URLs using a TXT file. If the website you want to copy requires authentication, select Add URL… and — in addition to the URL — enter your Login (username or email address) and Password; click OK to confirm.

Don’t forget to choose an Action for your project. The action depends on your objective. For this project, I’ll proceed with Download web site(s).

Here’s what the different actions will do:

  • Download web site(s) will download the desired pages with default options.
  • Download web site(s) + questions will transfer the desired sites with default options, and ask questions if any links are considered as potentially downloadable.
  • Get separated files will only get the files you specify within options, but will not spider through HTML files.
  • Download all sites in pages (multiple mirror) will download only the sites linked to from the selected site(s). If you drag & drop your bookmark.html file into the Web Addresses field, this option lets you mirror all your bookmarks.
  • Test links in pages (bookmark test) will test all indicated links.
  • * Continue interrupted download will complete an aborted download.
  • * Update existing download will update an existing project. The engine will go through the complete structure, checking each downloaded file for any updates on the website.

Preferences and Mirror Options

Let’s have a look at the options you have for your project. Click the Set options… link in the bottom right of the window.

This is where it gets a little more complicated. As you see, HTTrack supports Proxy settings; you can Configure the address, port, and authentication. Within Scan Rules you can use wildcards to define files your project should include or exclude in its backup. Limits is probably the most important tab because here you can set a depth for internal and external mirroring depth. In addition, you can limit the size of HTML files, time, transfer rate, number of connections per second, and number of links.

If you’re running into issues, for example projects that are aborted immediately, you can try to change your Browser ID or play with the settings in the Spider tab. Consult the FAQ & Troubleshooting section on the HTTrack homepage if you encounter barriers you can’t overcome yourself. Click OK to confirm your changes. Then click Next > to move on to the final step in setting up your project.

Final Adjustments

This last step lets you adjust minor settings. For example, you can let HTTrack Shutdown PC when finished, put the project On hold for a set amount of time, or Save settings only, do not launch download now.

And Action!

Once you hit Finish, the tool will immediately start saving files. As HTTrack is humming away, you can track its progress.

To test your project, head to the directory you selected, open the project folder, and click the index.html file to launch the mirrored website in your default browser.

If your project doesn’t work out of the gate, start over and play with the options. It can take some trial and error. And even if certain settings worked perfectly in a previous run, they may not work the next time around. As mentioned before, your best bet is to change the Browser ID or refer to the official FAQ & Troubleshooting page.

You can cancel a run anytime. After hitting the button once, the program will complete all running processes. If you want to abort the project immediately, just hit the cancel button again. To resume a backup start the project again and pick * Continue interrupted download from the menu on the respective setup step described previously.

Ready for Offline Browsing?

Isn’t it a liberating feeling to be able to take the web — or at least parts of it — anywhere Never Lose That Webpage Again: 6 Ways To Read It Later On Any Platform Never Lose That Webpage Again: 6 Ways To Read It Later On Any Platform Few things are more frustrating than needing a bookmark only to find there's nothing you can do to visit it. Rest assured, however, because there's a handy solution. Read More , independent of constantly being connected? Maybe that is taking it a bit too far. In any case, it’s a great option. What do you think?

Which websites do you always have to have with you? How else do you use the tool? Have you tried testing your Bookmarks with HTTrack?

Image Credit: ValentinT via Shutterstock.com

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. vinyll
    September 16, 2016 at 5:10 am

    "So it puzzles me how it can be called an offline browser"
    It's a "browser" in a sense of "content browser" – as in offline browsable webpages – not a browser app.

    • Tina Sieber
      September 16, 2016 at 3:48 pm

      That's exactly right. Thank you for the comment, Vinyll. Are you using HTTrack a lot?

  2. Slavik
    April 27, 2015 at 1:07 pm

    This is also cool.
    Linux terminal:

    $wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains pyqt.sourceforge.net --no-parent "http://pyqt.sourceforge.net/Docs/PyQt4/classes.html"
    ======================================================
    This command downloads the Web site http://www.website.org/tutorials/html/.

    The options are:

    --recursive: download the entire Web site.

    --domains website.org: don't follow links outside website.org.

    --no-parent: don't follow links outside the directory tutorials/html/.

    --page-requisites: get all the elements that compose the page (images, CSS and so on).

    --html-extension: save files with the .html extension.

    --convert-links: convert links so that they work locally, off-line.

    --restrict-file-names=windows: modify filenames so that they will work in Windows as well.

    --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).

    • Lew
      May 3, 2015 at 9:55 am

      Why doesn't this work?
      $ wget –recursive –no-clobber –page-requisites –html-extension –convert-links –restrict-file-names=windows –domains pyqt.sourceforge.net –no-parent “http://whitneygrammar.com"

      The result is ">" and nothing else.

      • Anonymous
        July 28, 2015 at 8:55 pm

        I believe it is because of the -domains "pyqt.sourceforge.net" parameter, remove this and it I think it will work.

        • Anonymous
          July 28, 2015 at 9:27 pm

          It gives a lot of "unable to resolve" messages.
          I also retried the complete command again though (I'm on LinuxMint this time) and now it does give a result, but not of the site requested.
          Error messages are:
          --2015-07-28 23:19:23-- http://xn--no-parent-p89d/
          Resolving –no-parent (xn--no-parent-p89d)... failed: Name or service not known.
          wget: unable to resolve host address ‘xn--no-parent-p89d’
          --2015-07-28 23:19:23-- http://xn--no-parent-p89d/
          Resolving –no-parent (xn--no-parent-p89d)... failed: Name or service not known.
          wget: unable to resolve host address ‘xn--no-parent-p89d’
          “http://whitneygrammar.com”: Scheme missing.

          I haven't looked at all this since my original post in early May, so tomorrow I'll freshen up some reading and retry.

  3. Reelix
    July 28, 2008 at 1:16 am

    Been using HTTrack for awhile now (Mainly to download Web-Comics :p)

    IT ROCKS!!!!!!!!

  4. Adam P.
    July 25, 2008 at 5:27 pm

    Nice you are reviewing this product, but I've been using it for at least a couple years already.

    • Aibek
      July 26, 2008 at 12:32 am

      Same here, I had been using HTTrack for a couple of years, had no complaints abt it. The only thing to keep in mind it's not a tool that can be used to backup wbesites :-)

  5. Rarst
    July 25, 2008 at 3:43 pm

    For me browser-integrated "Save as web archive (*.mht)" is more than sufficient for saving stuff I want to read when I am not connected. Downloading whole sites may serve some purpose if they are something like encyclopedia... But I had no need for that for years.