Pinterest Stumbleupon Whatsapp
Ads by Google

This is the second part of the Offline Browsing Anywhere Anytime mini series. If you missed it, part one is here Offline Browsing Anywhere Anytime - Part One Offline Browsing Anywhere Anytime - Part One Read More .

Downloading a website can come in handy in many situations. You may want to demonstrate a website to a customer at their house, browse the latest headlines while commuting to work or take your laptop to the wifi-less park to enjoy the weather and read a blog at the same time. Having access to a full website backup gives you a lot more freedom than being limited to a few select pages.

While ScrapBook is a Firefox extension, HTTrack is a standalone application, designed to download whole websites, including media files and outside links. The program is available for Linux and Windows.

HTTrack is said to be an offline browser, which doesn’t seem to make much sense. When I try to browse backed up pages, HTTrack opens Firefox to display the projects there. Correct me if I’m wrong, but apparently it needs a “real” browser. Of course to mirror websites it cannot be offline. So it puzzles me how it can be called an offline browser. Am I missing the point?

Anyway, HTTrack is simple to use although it can become a little tricky when the default settings won’t work. In the first window after starting the program you click ‘next’ to actually open a project. You can simply type into the project name and category fields. In this example I’m backing up the Science frontpage of Digg.

Ads by Google

On the Mirroring Mode page pick an action and add the URL of the website you would like to backup. The action depends on what you want to do. For your first project you should pick Download web site(s).

Download web site(s) will download the desired pages with default options.
Download web site(s) + questions will transfer the desired sites with default options, and ask questions if any links are considered as potentially downloadable.
Get individual files will only get the files you specify within options, but will not spider through HTML files.
Download all sites in pages (multiple mirror) will download only the sites linked to from the selected site(s). If you drag & drop your bookmark.html file into the Web Addresses field, this option lets you mirror all your bookmarks.
Test links in pages (bookmark test) will test all indicated links.
* Continue interrupted download should be used to continue a download that was interrupted or aborted.
* Update existing download should be used to update an existing project. The engine will go through the complete structure, checking each downloaded file for any updates on the web site.

Let’s have a look at the options. This is where it gets a little more complicated. HTTrack supports Proxy. Within Scan Rules you can add the files it should include or exclude in its backup. Limits is probably the most important tab because here you define how deep HTTrack will mirror the targeted page and how deep it will go into external links.

If you’re running into issues, for example projects that are aborted immediately, you can try to change your Browser ID or play with the settings in the Spider tab. There is also a great FAQ & Troubleshooting section on the HTTrack homepage that will hopefully solve any issues you may run into.

We’re moving on to the next site which allows a few minor settings, such as the option to shutdown the PC when finished or scheduling the start of the project. Unfortunately, the scheduling option is very basic, although in order to complete a project before you leave for work in the morning it may be sufficient.

Once you hit Finish it will begin saving files. If it doesn’t work, just start over and play with the options. It can take some time and even if certain settings worked perfectly in a previous run, they may not work the next time around. As mentioned before, your best bet is to try again and change the Browser ID or refer to the official FAQ & Troubleshooting.

You can cancel a run anytime. After hitting the button once the program will complete all running processes. If you want to abort the project immediately, just hit the cancel button again. To resume a backup start the project again and pick * Continue interrupted download from the menu on the Mirroring Mode page described previously.

Isn’t it a liberating feeling to be able to take the web – or at least parts of it – anywhere, independent of constantly being connected? Maybe that is taking it a bit too far. At any rate it’s a great option. What do you think?

  1. vinyll
    September 16, 2016 at 5:10 am

    "So it puzzles me how it can be called an offline browser"
    It's a "browser" in a sense of "content browser" – as in offline browsable webpages – not a browser app.

    • Tina Sieber
      September 16, 2016 at 3:48 pm

      That's exactly right. Thank you for the comment, Vinyll. Are you using HTTrack a lot?

  2. Slavik
    April 27, 2015 at 1:07 pm

    This is also cool.
    Linux terminal:

    $wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains pyqt.sourceforge.net --no-parent "http://pyqt.sourceforge.net/Docs/PyQt4/classes.html"
    ======================================================
    This command downloads the Web site http://www.website.org/tutorials/html/.

    The options are:

    --recursive: download the entire Web site.

    --domains website.org: don't follow links outside website.org.

    --no-parent: don't follow links outside the directory tutorials/html/.

    --page-requisites: get all the elements that compose the page (images, CSS and so on).

    --html-extension: save files with the .html extension.

    --convert-links: convert links so that they work locally, off-line.

    --restrict-file-names=windows: modify filenames so that they will work in Windows as well.

    --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).

    • Lew
      May 3, 2015 at 9:55 am

      Why doesn't this work?
      $ wget –recursive –no-clobber –page-requisites –html-extension –convert-links –restrict-file-names=windows –domains pyqt.sourceforge.net –no-parent “http://whitneygrammar.com"

      The result is ">" and nothing else.

      • Amir El-hamdy
        July 28, 2015 at 8:55 pm

        I believe it is because of the -domains "pyqt.sourceforge.net" parameter, remove this and it I think it will work.

        • Ludo Beckers
          July 28, 2015 at 9:27 pm

          It gives a lot of "unable to resolve" messages.
          I also retried the complete command again though (I'm on LinuxMint this time) and now it does give a result, but not of the site requested.
          Error messages are:
          --2015-07-28 23:19:23-- http://xn--no-parent-p89d/
          Resolving –no-parent (xn--no-parent-p89d)... failed: Name or service not known.
          wget: unable to resolve host address ‘xn--no-parent-p89d’
          --2015-07-28 23:19:23-- http://xn--no-parent-p89d/
          Resolving –no-parent (xn--no-parent-p89d)... failed: Name or service not known.
          wget: unable to resolve host address ‘xn--no-parent-p89d’
          “http://whitneygrammar.com”: Scheme missing.

          I haven't looked at all this since my original post in early May, so tomorrow I'll freshen up some reading and retry.

  3. Reelix
    July 28, 2008 at 1:16 am

    Been using HTTrack for awhile now (Mainly to download Web-Comics :p)

    IT ROCKS!!!!!!!!

  4. Adam P.
    July 25, 2008 at 5:27 pm

    Nice you are reviewing this product, but I've been using it for at least a couple years already.

    • Aibek
      July 26, 2008 at 12:32 am

      Same here, I had been using HTTrack for a couple of years, had no complaints abt it. The only thing to keep in mind it's not a tool that can be used to backup wbesites :-)

  5. Rarst
    July 25, 2008 at 3:43 pm

    For me browser-integrated "Save as web archive (*.mht)" is more than sufficient for saving stuff I want to read when I am not connected. Downloading whole sites may serve some purpose if they are something like encyclopedia... But I had no need for that for years.

Leave a Reply

Your email address will not be published. Required fields are marked *