This is the second part of the Offline Browsing Anywhere Anytime mini series. If you missed it, part one is here.
Downloading a website can come in handy in many situations. You may want to demonstrate a website to a customer at their house, browse the latest headlines while commuting to work or take your laptop to the wifi-less park to enjoy the weather and read a blog at the same time. Having access to a full website backup gives you a lot more freedom than being limited to a few select pages.
While ScrapBook is a Firefox extension, HTTrack is a standalone application, designed to download whole websites, including media files and outside links. The program is available for Linux and Windows.
HTTrack is said to be an offline browser, which doesn’t seem to make much sense. When I try to browse backed up pages, HTTrack opens Firefox to display the projects there. Correct me if I’m wrong, but apparently it needs a “real” browser. Of course to mirror websites it cannot be offline. So it puzzles me how it can be called an offline browser. Am I missing the point?
Anyway, HTTrack is simple to use although it can become a little tricky when the default settings won’t work. In the first window after starting the program you click ‘next’ to actually open a project. You can simply type into the project name and category fields. In this example I’m backing up the Science frontpage of Digg.
On the Mirroring Mode page pick an action and add the URL of the website you would like to backup. The action depends on what you want to do. For your first project you should pick Download web site(s).
Download web site(s) will download the desired pages with default options.
Download web site(s) + questions will transfer the desired sites with default options, and ask questions if any links are considered as potentially downloadable.
Get individual files will only get the files you specify within options, but will not spider through HTML files.
Download all sites in pages (multiple mirror) will download only the sites linked to from the selected site(s). If you drag & drop your bookmark.html file into the Web Addresses field, this option lets you mirror all your bookmarks.
Test links in pages (bookmark test) will test all indicated links.
* Continue interrupted download should be used to continue a download that was interrupted or aborted.
* Update existing download should be used to update an existing project. The engine will go through the complete structure, checking each downloaded file for any updates on the web site.
Let’s have a look at the options. This is where it gets a little more complicated. HTTrack supports Proxy. Within Scan Rules you can add the files it should include or exclude in its backup. Limits is probably the most important tab because here you define how deep HTTrack will mirror the targeted page and how deep it will go into external links.
If you’re running into issues, for example projects that are aborted immediately, you can try to change your Browser ID or play with the settings in the Spider tab. There is also a great FAQ & Troubleshooting section on the HTTrack homepage that will hopefully solve any issues you may run into.
We’re moving on to the next site which allows a few minor settings, such as the option to shutdown the PC when finished or scheduling the start of the project. Unfortunately, the scheduling option is very basic, although in order to complete a project before you leave for work in the morning it may be sufficient.
Once you hit Finish it will begin saving files. If it doesn’t work, just start over and play with the options. It can take some time and even if certain settings worked perfectly in a previous run, they may not work the next time around. As mentioned before, your best bet is to try again and change the Browser ID or refer to the official FAQ & Troubleshooting.
You can cancel a run anytime. After hitting the button once the program will complete all running processes. If you want to abort the project immediately, just hit the cancel button again. To resume a backup start the project again and pick * Continue interrupted download from the menu on the Mirroring Mode page described previously.
Isn’t it a liberating feeling to be able to take the web – or at least parts of it – anywhere, independent of constantly being connected? Maybe that is taking it a bit too far. At any rate it’s a great option. What do you think?