Save and Backup Websites with HTTrack

This is the second part of the Offline Browsing Anywhere Anytime mini series. If you missed it, part one is here.

Downloading a website can come in handy in many situations. You may want to demonstrate a website to a customer at their house, browse the latest headlines while commuting to work or take your laptop to the wifi-less park to enjoy the weather and read a blog at the same time. Having access to a full website backup gives you a lot more freedom than being limited to a few select pages.

While ScrapBook is a Firefox extension, HTTrack is a standalone application, designed to download whole websites, including media files and outside links. The program is available for Linux and Windows.

HTTrack is said to be an offline browser, which doesn’t seem to make much sense. When I try to browse backed up pages, HTTrack opens Firefox to display the projects there. Correct me if I’m wrong, but apparently it needs a “real” browser. Of course to mirror websites it cannot be offline. So it puzzles me how it can be called an offline browser. Am I missing the point?

Anyway, HTTrack is simple to use although it can become a little tricky when the default settings won’t work. In the first window after starting the program you click ‘next’ to actually open a project. You can simply type into the project name and category fields. In this example I’m backing up the Science frontpage of Digg.

On the Mirroring Mode page pick an action and add the URL of the website you would like to backup. The action depends on what you want to do. For your first project you should pick Download web site(s).

Download web site(s) will download the desired pages with default options.
Download web site(s) + questions will transfer the desired sites with default options, and ask questions if any links are considered as potentially downloadable.
Get individual files will only get the files you specify within options, but will not spider through HTML files.
Download all sites in pages (multiple mirror) will download only the sites linked to from the selected site(s). If you drag & drop your bookmark.html file into the Web Addresses field, this option lets you mirror all your bookmarks.
Test links in pages (bookmark test) will test all indicated links.
* Continue interrupted download should be used to continue a download that was interrupted or aborted.
* Update existing download should be used to update an existing project. The engine will go through the complete structure, checking each downloaded file for any updates on the web site.

Let’s have a look at the options. This is where it gets a little more complicated. HTTrack supports Proxy. Within Scan Rules you can add the files it should include or exclude in its backup. Limits is probably the most important tab because here you define how deep HTTrack will mirror the targeted page and how deep it will go into external links.

If you’re running into issues, for example projects that are aborted immediately, you can try to change your Browser ID or play with the settings in the Spider tab. There is also a great FAQ & Troubleshooting section on the HTTrack homepage that will hopefully solve any issues you may run into.

We’re moving on to the next site which allows a few minor settings, such as the option to shutdown the PC when finished or scheduling the start of the project. Unfortunately, the scheduling option is very basic, although in order to complete a project before you leave for work in the morning it may be sufficient.

Once you hit Finish it will begin saving files. If it doesn’t work, just start over and play with the options. It can take some time and even if certain settings worked perfectly in a previous run, they may not work the next time around. As mentioned before, your best bet is to try again and change the Browser ID or refer to the official FAQ & Troubleshooting.

You can cancel a run anytime. After hitting the button once the program will complete all running processes. If you want to abort the project immediately, just hit the cancel button again. To resume a backup start the project again and pick * Continue interrupted download from the menu on the Mirroring Mode page described previously.

Isn’t it a liberating feeling to be able to take the web – or at least parts of it – anywhere, independent of constantly being connected? Maybe that is taking it a bit too far. At any rate it’s a great option. What do you think?


MakeUseOf Recommends

Tina Sieber

Tina is a freelance writer, editor, natural scientist, and cosmopolitan with a strong interest in sustainability. She has been writing for MakeUseOf since late 2007 and also is the Editor for MakeUseOf Answers.

The comments were closed because the article is more than 180 days old.

If you have any questions related to stuff mentioned in the article or need help with any computer issue, just ask it on MakeUseOf Answers.

Hide 7 Comments

  • Rarst July 25, 2008
    0 likes

    For me browser-integrated “Save as web archive (*.mht)” is more than sufficient for saving stuff I want to read when I am not connected. Downloading whole sites may serve some purpose if they are something like encyclopedia… But I had no need for that for years.

    | Like
  • Adam P. July 25, 2008
    0 likes

    Nice you are reviewing this product, but I’ve been using it for at least a couple years already.

    | Like
    • Aibek July 26, 2008
      0 likes

      Same here, I had been using HTTrack for a couple of years, had no complaints abt it. The only thing to keep in mind it’s not a tool that can be used to backup wbesites :-)

      | Like
  • Reelix July 28, 2008
    0 likes

    Been using HTTrack for awhile now (Mainly to download Web-Comics :p)

    IT ROCKS!!!!!!!!

    | Like