Yes, you can download websites for offline browsing and it can be a life saver. Maybe you need to showcase a website to a customer at their location or review resources while commuting to work. When you backup websites you can do all this and more.
Having access to a full website backup gives you a lot more freedom than limiting yourself to a few select pages. While browser extensions for offline reading, like ScrapBook for Firefox, can save single pages, HTTrack is a standalone application which can download whole websites, including media files and outside links.
In this article you will learn how to set up HTTrack to download full websites for offline browsing. Note that while the application has not been updated since 2015, we tested it on the latest version of Windows 10 and found no problems.
What Is HTTrack?
HTTrack can download websites for offline browsing. You can copy an entire webpage from the internet to a local directory, including the full HTML code, images, and other files stored on the server. Once you have mirrored a website to your computer, you can launch it in your browser and navigate through the pages, as though you were looking at the original version. You can also update downloaded pages to capture recently added information.
Here are a few things HTTrack can do:
- downloading of an entire website
- authenticating with username and password
- mirroring external files and websites
- excluding specific files from the project, e.g. ZIP or GIF files
- imaging or testing your bookmarks using your bookmark.html file
Advanced users can apply elaborate commands and filters to download exactly what they need. This guide by Fred Cohen will give you an overview of commands and how to use them. It also contains a troubleshooter, in case your website mirrors don’t work as expected.
Note that HTTrack does not support capturing of real time audio / video streaming. Likewise, java script and java applets may fail to download. Moreover, the program can crash if you tax it with a complex project.
Set Up HTTrack to Download Your First Page
HTTrack is simple to use, although it can become a little tricky when the default settings won’t work.
From the start page, click Next > to set up your first project. Enter a Project name and set a Category if you like. Also choose a Base path, which is the local directory where HTTrack will save your project. For the purpose of this article, I’m backing up the science portal at Wikipedia. Click Next > when you’re done.
For a basic mirroring project, you can simply paste the URL/s of the websites you’d like to back up into the Web Addresses field. You can also add a list of URLs using a TXT file. If the website you want to copy requires authentication, select Add URL… and — in addition to the URL — enter your Login (username or email address) and Password; click OK to confirm.
Don’t forget to choose an Action for your project. The action depends on your objective. For this project, I’ll proceed with Download web site(s).
Here’s what the different actions will do:
- Download web site(s) will download the desired pages with default options.
- Download web site(s) + questions will transfer the desired sites with default options, and ask questions if any links are considered as potentially downloadable.
- Get separated files will only get the files you specify within options, but will not spider through HTML files.
- Download all sites in pages (multiple mirror) will download only the sites linked to from the selected site(s). If you drag & drop your bookmark.html file into the Web Addresses field, this option lets you mirror all your bookmarks.
- Test links in pages (bookmark test) will test all indicated links.
- * Continue interrupted download will complete an aborted download.
- * Update existing download will update an existing project. The engine will go through the complete structure, checking each downloaded file for any updates on the website.
Preferences and Mirror Options
Let’s have a look at the options you have for your project. Click the Set options… link in the bottom right of the window.
This is where it gets a little more complicated. As you see, HTTrack supports Proxy settings; you can Configure the address, port, and authentication. Within Scan Rules you can use wildcards to define files your project should include or exclude in its backup. Limits is probably the most important tab because here you can set a depth for internal and external mirroring depth. In addition, you can limit the size of HTML files, time, transfer rate, number of connections per second, and number of links.
If you’re running into issues, for example projects that are aborted immediately, you can try to change your Browser ID or play with the settings in the Spider tab. Consult the FAQ & Troubleshooting section on the HTTrack homepage if you encounter barriers you can’t overcome yourself. Click OK to confirm your changes. Then click Next > to move on to the final step in setting up your project.
This last step lets you adjust minor settings. For example, you can let HTTrack Shutdown PC when finished, put the project On hold for a set amount of time, or Save settings only, do not launch download now.
Once you hit Finish, the tool will immediately start saving files. As HTTrack is humming away, you can track its progress.
To test your project, head to the directory you selected, open the project folder, and click the index.html file to launch the mirrored website in your default browser.
If your project doesn’t work out of the gate, start over and play with the options. It can take some trial and error. And even if certain settings worked perfectly in a previous run, they may not work the next time around. As mentioned before, your best bet is to change the Browser ID or refer to the official FAQ & Troubleshooting page.
You can cancel a run anytime. After hitting the button once, the program will complete all running processes. If you want to abort the project immediately, just hit the cancel button again. To resume a backup start the project again and pick * Continue interrupted download from the menu on the respective setup step described previously.
Ready for Offline Browsing?
Isn’t it a liberating feeling to be able to take the web — or at least parts of it — anywhere, independent of constantly being connected? Maybe that is taking it a bit too far. In any case, it’s a great option. What do you think?
Which websites do you always have to have with you? How else do you use the tool? Have you tried testing your Bookmarks with HTTrack?
Image Credit: ValentinT via Shutterstock.com