Want to know how you can capture a web page and save it as a PDF document or an image using the terminal? Luckily, Linux has a plethora of utilities that you can use to automate the task of converting HTML documents to PDF files and images.

This article will introduce you to wkhtmltopdf and wkhtmltoimage, utilities that you need to make your work easier.

How to Convert HTML to PDF

If you're looking to capture web pages and convert them into a PDF file, the wkhtmltopdf utility will help you out. Wkhtmltopdf is an open-source command-line tool used to render web pages into PDF documents.

Since the tool works headlessly inside the Linux terminal, you won't require any web driver or a browser automation framework like Selenium.

Install wkhtmltopdf on Linux

Wkhtmltopdf is not one of the standard packages that come pre-installed on Linux. You'll have to manually install it using your system's package manager.

To install wkhtmltopdf on Ubuntu and Debian-based distributions:

        sudo apt install wkhtmltopdf
    

On Arch-based distros like Manjaro Linux:

        sudo pacman -S wkhtmltopdf
    

Installing wkhtmltopdf on RHEL-based distros like Fedora and CentOS is easy as well.

        sudo dnf install wkhtmltopdf
    

Basic Syntax

The basic syntax of the command is:

        wkhtmltopdf webpage filename
    

...where webpage is the URL of the web page that you want to convert and filename is the name of the output PDF file.

To convert the Google homepage into a PDF document:

        wkhtmltopdf https://google.com google.pdf
    

Output:

convert google homepage to pdf

On opening the PDF file, you will notice that wkhtmltopdf has precisely rendered the web page into a document.

converted pdf file google

The --copies flag is a lifesaver if you want your output file to have multiple copies of the webpage. Note that when printing multiple copies, wkhtmltopdf won't generate multiple PDF files, but will add additional pages to a single document instead.

To create three copies of the Google homepage:

        wkhtmltopdf --copies 3 https://google.com google.pdf
    

The output PDF file will contain three pages as specified in the aforementioned command.

print multiple copies with wkhtmltopdf

Add a Grayscale Filter to the Output

To add a grayscale filter to the PDF file, use the -g or --grayscale flag with the command:

        wkhtmltopdf -g https://google.com google.pdf
wkhtmltopdf --grayscale https://google.com google.pdf

Output file:

change the output to grayscale

Change the Orientation of the PDF

By default, wkhtmltopdf generates the PDF file in vertical layout i.e. portrait. To change this default behavior and capture web pages in landscape instead, use the --orientation flag with the command:

        wkhtmltopdf --orientation landscape https://google.com google.pdf
    

Output:

using landscape orientation in wkhtmltopdf

Note that the landscape version of the document has a larger whitespace area as compared to the portrait one.

Don’t Include Images While Converting

While generating the output, if you don't want wkhtmltopdf to render images present in a web page, use the --no-images flag:

        wkhtmltopdf --no-images https://google.com google.pdf
    

Output:

don't render images in wkhtmltopdf

Related: Best Tools to Edit a PDF File Anywhere

How to Convert a Web Page to Images

The wkhtmltoimage utility is a part of the wkhtmltopdf package. If you're working on a report and want to include images of a website, then this tool will work in your favor. The Linux terminal not only makes it easier for you to capture the images but also gives you a range of options that allow you to customize your output.

Basic Syntax

Wkhtmltoimage has a syntax similar to wkhtmltopdf:

        wkhtmltoimage webpage filename
    

...where webpage is the URL of a website and filename is the name of the output image.

Convert a Web Page to an Image

Continuing with the aforementioned example, let's convert the Google homepage into images.

        wkhtmltoimage https://google.com google.png
    

Output:

capture web pages into images linux

You can also specify a custom file format that you want the output image to have. Wkhtmltoimage supports the following file extensions:

  • JPEG/JPG
  • PNG
  • SVG

For example, if you want to generate a JPG image, simply replace the file extension with JPG in the command:

        wkhtmltoimage https://google.com google.jpg
    

Related: JPG vs. JPEG: What Is the Difference Between These Image File Formats?

Capturing Web Pages Using the Linux Terminal

You must have a PDF viewer installed on your Linux system if you want to view the PDF files generated by wkhtmltopdf. While most of the Linux distributions come with a PDF editor preinstalled, you can manually choose and install a PDF editor that suits your needs.