Pinterest Stumbleupon Whatsapp

wget tricksSometimes it’s just not enough to save a website locally from your browser. Sometimes you need a little bit more power. For this, there’s a neat little command line tool known as Wget. Wget is a simple program which is able to download files from the Internet. You may or may not know much about Wget already, but after reading this article you’ll be prepared to use it for all sorts of tricks.

Wget is available to use natively in UNIX and Windows command-line, but it’s possible to install wget on Mac OS X How to Get Wget for Your Mac How to Get Wget for Your Mac Macs are great, with their neat UI and a Unix back-end. Sometimes you get the feeling you can do just about anything with them. Until one day you’re trying to do something simple and you... Read More with a bit of coaxing. So, once you know the sorts of things you can use Wget for, it is portable to whichever OS you’re using – and that’s handy. What’s even better is that wget can be used in batch files and cron jobs. This is where we start seeing the real power behind wget.

Basic Wget

The basic usage is wget URL.

 wget http://makeuseof.com/

wget tricks

The most simple options most people need to know are background (wget -b), continue partial download (wget -c), number of tries (wget –tries=NUMBER) and of course help (wget -h) to remind yourself of all the options.

wget -b -c --tries=NUMBER URL

Moderately Advanced Wget Options

Wget can also run in the background (wget -b), limit the speed of the download (wget –limit-rate=SPEED), no parent to ensure you only download a sub-directory (wget -np), update only changed files (wget -N), mirror a site (wget -m), ensure no new directories are created (wget -nd), accept only certain extensions (wget –accept=LIST) and set a wait time (wget –wait=SECONDS).

wget -b --limit-rate=SPEED -np -N -m -nd --accept=LIST --wait=SECONDS URL

Download With Wget Recursively

You can recursively download (wget -r), span hosts to other domains (wget -H), convert links to local versions (wget –convert-links) and set the level of recursions (wget –level=NUMBER using inf or 0 for infinite).

But some sites don’t want to let you download recursively and will check which browser you are using in an attempt to block the bot. To get around this, declare a user agent such as Mozilla (wget –user-agent=AGENT).

wget -r -H --convert-links --level=NUMBER --user-agent=AGENT URL

wget tricks tips

Password Protected Wget

It’s possible to declare the username and password for a particular URL while using wget (wget –http-user=USER –http-password=PASS). This isn’t recommended on shared machines as anyone viewing the processes will be able to see the password in plain text.

wget --http-user=USER --http-password=PASS URL

An example of this in action is using wget to back up your tasks from Remember The Milk Best Back-Up Tips For Your RememberTheMilk Tasks Best Back-Up Tips For Your RememberTheMilk Tasks For many of us who work in the cloud, we’ve come to rely on our favourite services quite heavily. We also sometimes love and trust them perhaps a little more than we should at times,... Read More .

wget tricks tips

Wget Bulk Download

First, create a text file of all the URLs you want to download using wget and call it wget_downloads.txt. Then to download URLs in bulk, type in this command:

wget -i wget_downloads.txt

wget tricks

Cool Uses For Wget

This will crawl a website and generate a log file of any broken links:

wget --spider -o wget.log -e robots=off --wait 1 -r -p http://www.mysite.com/

This will take a text file of your favourite music blogs and download any new MP3 files:

wget -r --level=1 -H --timeout=1 -nd -N -np --accept=mp3 -e robots=off -i musicblogs.txt

What else do you use wget for?

Image Credit: Social Media Connection via ShutterStock, Young Man Watching TV via Shutterstock, Globe via Shutterstock

  1. steve
    August 26, 2012 at 12:15 am

    cool stuff i wrote a windows front end ........ but its very buggy its at http://sites.google.com/site/venvirupa look for funnelwebget

    • Angela Alcorn
      August 27, 2012 at 10:36 am

      Thanks - might be useful!

  2. Santosh Kumar
    August 13, 2012 at 7:23 am

    For our convenience we can save all the arguments in a separate .wgetrc file. See for more info.

  3. Trevor
    July 16, 2012 at 6:06 pm

    So I'm trying to download all .war files from a hudson(a CI tool) web interface site here at work.

    The URL to one specific .war file is as follows, "http://hostname:port/hudson/job/ida/ws/target/filename.war"

    I've tried,
    'wget -A .war http://hostname:port/hudson/job/ida/ws/target/'

    and..

    'wget -r -l1 --no-parent -A .war http://hostname:port/hudson/job/ida/ws/target/'

    and tons of other things. All of them only download an "index.html" file.

    Any suggestions??

    • Angela Alcorn
      July 17, 2012 at 8:54 am

      Hmm, is this a password protected site or something? That could be it.

      Oh, hang on. The format for the hostname:port is different to what you've listed. Try this format with the brackets:

      http://host%5B:port%5D/directory/file

      Good luck!

  4. Eddy
    July 13, 2012 at 1:31 pm

    want to mirror site but only download a percentage or set amount of kbs of the files. any ideas? :)

    • Angela Alcorn
      July 15, 2012 at 1:44 pm

      Ooh, that's interesting. I don't think Wget can do it. Maybe you should ask on MakeUseOf Answers and see if there's a similar program that can do it.

      http://makeuseof.com/answers/

  5. Jatin
    April 13, 2012 at 7:40 am

    Is there anyway to download all the videos present on a site, just by mentioning home URL using wget ??

    • Angela Alcorn
      April 16, 2012 at 6:19 pm

      You can designate which file types to download and tell Wget to get all of them. Something like this would probably do it.

      wget -r --level=1 -H --timeout=1 -nd -N -np --accept=mp4 -e robots=off URL

      The full manual is here if that helps:
      http://www.gnu.org/software/wget/manual/wget.html

      Hope this helps!

      • Jatin
        April 20, 2012 at 7:42 am

        I tried using it and all it gives is just the index.html file.

        • Angela Alcorn
          April 24, 2012 at 12:59 pm

          Maybe you should try using a user agent, like "--user-agent=Firefox"

          If this doesn't work, I suggest you try asking at MakeUseOf Answers (because I can't think of anything else it might be).
          http://www.makeuseof.com/answers/

  6. Sean
    March 7, 2012 at 12:54 am

    I always use httrack for windows from the command line. I did use wget through Cygwin, but it can be kind of a pain to get set up. http://www.httrack.com/html/fcguide.html

    • Angela Alcorn
      April 16, 2012 at 6:20 pm

      htttrack is also good value. :)

  7. Lee
    March 6, 2012 at 11:56 pm

    Does anyone have a link to get wget for Windows? I googled it but it looks like it has a lot of dependencies that need to be installed too. I remember downloading it some time ago and I had to install Cygwin for it, which I don't want to do this time.

Leave a Reply

Your email address will not be published. Required fields are marked *