Web pages are ephemeral—existing on someone else's computer, and under someone else's control. Information you rely on and need may endure for decades only to disappear overnight when you need it most. With Archivy you can easily save webpages as Markdown, then organize and edit them on your own system. Yours for eternity. Here's how.

Why Would You Want to Build Your Own Archive?

Almost all the world's information is available online: Wikipedia is the largest encyclopedia ever created, and MakeUseOf.com hosts excellent technical articles which show you how to do cool and interesting things. If you like an article, it's easy enough to bookmark it in your browser to visit later, and if you have a connected account with Google or another service, you can access your bookmarks on any device.

But web pages disappear, sites reorganize their linking structures, and often pages are updated to reflect the latest news, technology, and data. You may bookmark a set of instructions for a particular software version, only to return months later and discover that the steps have changed to suit the latest version. If you want to be able to rely on and return to the information you find online, it's best to keep your own copy offline.

What Is Archivy?

Archivy is one of several offline archiving solutions which you can run on your Raspberry Pi. Some, such as ArchiveBox, will scrape websites and save the output in a variety of formats, including HTML, PDF, and screenshots.

Archivy is a personal archive based around a tree structure of Markdown documents. You can create branching folders and if you add a bookmark, it will scrape the webpage and convert the text to Markdown for you—and create and convert the headings into a clickable table of contents, and will, in some cases, automatically download the images, and store them on your Pi.

You can edit the Markdown, add notes and tags to make the archive work for you, and even add standalone notes of your own thoughts and musings. It's more than a web archive: it's a personal archive you can access from anywhere.

How to Install Archivy on Your Raspberry Pi

Archivy is a Python app and is designed to be accessed through a browser, so before you start, you will need to set your Raspberry Pi up as a web server. If you don't have PIP and Python already installed on your Raspberry Pi, install them now.

While Archivy can use ElasticSearch to help you search and manage your archive, it works well with RipGrep as well. Install RipGrep with:

        sudo apt install ripgrep
    

Now you can install Archivy with:

        pip install archivy
    

Create a new directory where Archivy will store its data:

        mkdir ~/Archivy_data
    

Now to configure your system and create an admin user.

        archivy init
    

...will start the wizard

archivy wizard in a terminal

The wizard will ask you for the full path of your data directory, and whether you want to be able to use search. Type "ripgrep" at the prompt when asked what type you want to use. When asked if you want to create an admin user, enter "y".

You can start Archivy running with:

        archivy run
    

Archivy runs on port 5000, and you can access it by entering:

        your.local.pi.address:5000
    

...into a browser on your local network.

If you want to access your Archivy archive from outside your house, create a new Apache configuration file:

cd /etc/apache2/sites-availablesudo nano archivy.conf

In this new file, enter:

        <VirtualHost *:80>

   ServerName your.domain.tld

   ProxyPass / http://127.0.0.1:5000/

   ProxyPassReverse / http:/127.0.0.1:5000/

   ProxyPreserveHost On

</VirtualHost>

Save and exit with Ctrl + O then Ctrl + X. Then restart Apache with:

        sudo service apache2 restart
    

Obtain a new security certificate from Let's Encrypt with

        sudo certbot
    

Certbot will present you with a list and ask you to select which site you want a security certificate for. Enter the appropriate number and hit Return, and Certbot will check that everything is in order and create a certificate and key file on your system. Choose "redirect" when asked, then restart Apache once again.

Now when you visit your domain or subdomain, Archivy will be served over an encrypted connection.

Use Archivy to Archive the Internet and Your Ideas

archivy default interface

Log into Archivy with the admin username and password, and you'll see there's only one folder: root. You can create a new sub-folder by typing a name into the field next to Create sub directory, then clicking the button. Subdirectories are nested, and you can carry on as deep as you like. A tree diagram is generated on the left of the screen to help you navigate the structure.

archivy add new bookmark dialogue

To add a webpage to your archive, click on the New Bookmark button. You'll be asked for the URL, and to specify tags. You don't have to add tags, but it helps for navigation. When you're ready, hit Save, and Archivy will scrape the page and generate a formatted Markdown document, complete with tags and ToC.

muo article in markdown in Archivy

You can change the layout of the document by clicking the edit button, and using standard Markdown formatting to tailor it precisely. You can add extra tags by bracing your new tag with "#" anywhere within the document. If you click on any of the tags, you will see a list of other archived articles with the same tag. To add a file or note of your own, click New Note and enter the Markdown directly.

Archivy is still a work in progress, so you can expect new features to be added in the future, and as it's an open source project, you can even contribute to the code yourself.

Use Your Raspberry Pi for More!

The Raspberry Pi is an extraordinarily versatile machine, and performs extremely well as a server. The Raspberry Pi 4 in particular can handle an exceptional workload, and is able to run dozens of sites and services at the same time. Whatever your interests, from cooking to coding, archiving to audiobooks, there's a self-hosted solution which will run on your Raspberry Pi.