Project Gutenberg: More Than Just Free Books

By Mark O'Neill, http://www.markoneill.org/
cover Project Gutenberg: More Than Just Free Books
Free download as PDF / ePub . Or buy from Amazon

Table Of Contents

§1 – Introduction

§2 – Project Gutenberg – The Principle Of Public Domain Explained

§3 – Using The Project Gutenberg Site

§4 – What Else Do They Offer?

§5 – Project Gutenberg Self-Publishing

§6 – Distributed Proofreaders

§7 – Recommended Books To Get You Started On Gutenberg

1. Introduction

The Internet has brought the world many things, but one of the things that really stands out, is making the world’s culture more accessible to the population. Projects that would have seemed impossible 20 years ago are now being accomplished in the blink of an eye. Google Books is legally scanning every book they can get their hands on and the Internet Archive is digitizing every public domain movie, song, book, and webpage.

But when it comes to books, one of the major players is Project Gutenberg. With an army of volunteers scanning, proofreading, and editing public domain works, being able to discover obscure works of literature is now easier than ever. Sites like Project Gutenberg, which has over 45,000 books on offer (at the time of writing), will ensure that no book will ever truly disappear. Whoever out there in the world who wants a copy of something will always be able to find it.

1.1 Johannes Gutenberg – Who Was He & Why Is He Important?

Gutenberg 2 image

Until the 15th century, making books and other printed texts was a very laborious labor-intensive affair. The books were mostly Bibles which were individually hand-written by monks, and therefore slow to finish. No other form of book production was possible.

Therefore a book before the 15th century was considered a true work of art, but any form of mass production was impossible. But many people in this time period were illiterate, with the handwritten Bibles remaining in the possession of the Church.

Gutenberg 1 image

The possibility of printed books, and mass production of books finally came about in 1450 in Mainz, Germany. A businessman (some say a conman) called Johannes Gutenberg invented a printing press, using a process called “movable type”. This meant that single letters could be placed on an inked surface and then rolled onto the paper in seconds.

One of the first things Gutenberg produced? Bibles. The so-called “Gutenberg Bibles” are worth a fortune today, but only a few survive. One is in the US Library of Congress.

Gutenberg 1b image

How the printing press actually worked is outside the scope of this manual, but if you are interested, there is a great article about the printing press on Wikipedia. And if you are in Mainz, Germany, anytime, there is a Gutenberg museum with the original printing presses. I highly recommend a visit.

Gutenberg 1c image

Fast forward 500 years to the end of the 20th century and the beginning of the 21st century. eBooks are the hot new thing in reading, and the rise of the Internet makes distribution easy. So it makes almost poetic sense to name such a distribution site “Project Gutenberg”, in honour of the man who gave us the world’s first printing press, and in the process, gave us the printed book.

1.2 The Rise Of The eBook – Why eBooks Are Now Popular

eBooks have had a bit of a troubled birth. People have for 500 years read books as printed pages, so the idea of reading it electronically on a screen was a big jump to make. Especially when people realised that they didn’t technically own the content, and companies like Amazon could wipe the material at any time (or update it without informing you). But when Amazon brought out the Kindle and made eBook reading look cool, everyone jumped on the bandwagon. Now we have Kindles, Nooks, and lots of “no-name” brands.

Gutenberg 3 image

So why do some people prefer to read an eBook instead of a printed book? Well for a start, you can carry an entire library around with you in a device which is the size of a notepad, and which is extremely light. This also saves on space, so you don’t have lots of print books taking up space and collecting dust.

Secondly, people like the speed of obtaining a book. If you order a print book from Amazon, you typically have to wait 48 hours for it to arrive (or 24 hours if you are a Prime customer). Or if you want to buy a book from your local high street store, you have to go out and get there. But what if it is raining? What if the shop is far away? What if you’re just plain lazy?

On the other hand, eBooks can be bought in mere seconds at the click of a button, from the comfort of your home. It’s fast, and it’s convenient.

Thirdly, people have anonymity reading an eBook. If you are on the bus or train reading a print book, everyone can see what you are reading by looking at the cover. If you happen to be reading something embarrassing (erotica), or controversial (Hitler’s Mein Kampf), then that makes for some very awkward situations with either people smirking or showing disapproval. But if you are reading the eBook versions however, nobody knows what you are reading. You may as well be reading the phone book for all they know.

Speaking of “Mein Kampf”, you may be interested to know that due to the anonymity of eBooks, “Mein Kampf” is experiencing a surge of popularity. People no longer feel ashamed or embarrassed reading the horrendous text. Make of that what you will.

2. Project Gutenberg – The Principle Of Public Domain Explained

Gutenberg 4 image

There are many eBook downloading sites on the Internet, but Project Gutenberg is the largest and most well known. But before we examine the nuts and bolts of the site, we need to take a moment to examine the concept of “public domain”, which is the whole bedrock of the Gutenberg site.

Every book which comes out is copyrighted. This stops you and me from stealing the hard work of another author, and quite rightly so. The copyright lasts for the author’s entire life, and then for a period of time after the author has died. The copyright period after death varies slightly depending on the country, but in the European Union and North America, it is 70 years.

After that 70 years is up, the book then passes into what is called the “public domain”. This means basically that the book is up for grabs. People can print it out and sell their own versions, and more importantly, for the purposes of this manual, the book can be put on the Internet for anyone to download for free. This is where sites like Gutenberg come into the picture.

2.1 What Kind Of Books Do They Have?

So due to copyright laws, you are not going to get any recently released books on Project Gutenberg, or books where the author is still alive (so forget Harry Potter or John Grisham). And if the author is dead, remember the 70 year rule. So there will be no books by those authors either during that seven decade period.

So that leaves the public domain books, where the 70 year rule has come and gone. Project Gutenberg aims to digitise as many of these books as possible and make them available online for anyone to download freely. Considering the vast amount of material printed over the last several hundred years, this is a monumentally ambitious undertaking. Just think about it – novels, manuals, pamphlets, reference works on every conceivable subject. All of it will be individually scanned, read, and checked for the site. Is your head already spinning? Mine sure is.

2.2 How Do They Transfer A Book Online? Do They Type It Word-For-Word?

Project Gutenberg depends on an army of volunteers, and I will go more into the process later in the manual. But I briefly want to look at the technology behind transferring a book online.

They don’t type each book word-for-word. That really would make the process long and tedious, and they would never make any meaningful progress. Instead they use a technology called OCR (Optical Character Recognition).

Big websites like Evernote use OCR for their users to be able to find text files instantly by entering keywords.

Gutenberg 5 image

So what is OCR? It is the process of scanning each page of the book, and OCR then looks over each page line-by-line (or “reads” it if you want to look at it like that). It then turns the words into an editable text file.

Obviously this is not a perfect technology (yet). If the book has a unique font, or if the print is faded or damaged, then the OCR will have a hard time converting the text. This leaves errors, and that is where Project Gutenberg’s volunteers come in. But as I said, more on that later.

3. Using The Project Gutenberg Site

Gutenberg 5B image

Maybe it’s just me but the design of the site is rather depressing and uninspiring. It’s rather plain, drab, and unappealing. It could do with a fresh lick of paint, a nice new font, that sort of thing. But until they do a radical redesign, we are stuck with what we have. But the site works and everything is totally functional. That’s what counts in the end.

3.1 The Most Downloaded Books – What Are People Reading?

The front page contains a link to the most downloaded books on Project Gutenberg. This list is constantly updated as new books come out. The page also gives you download figures for up to the last 30 days. According to the page right now, there has been just under 4.6 million downloads in the past 30 days.

Gutenberg 6 image

3.2 How To Look For A Book

First, you obviously need to input either the title or the author into the search engine. Use the “search book catalog” search engine, not the “search website” one.

Let’s say you are looking for a Sherlock Holmes story. You can either type in “Sherlock Holmes”, or the author’s surname “Conan Doyle”. Both will bring up the Sherlock Holmes books, and typing in the surname will also bring up his other books. But just to keep things simple, let’s type in “Sherlock Holmes” and see what comes up.

Gutenberg 7 image

As you can see, there are many titles available. Typing in “Sherlock Holmes” brings up 48 entries. With each search result, you can see how many people before you downloaded it. This is useful, if for example, you are not sure you have the right title. If you see that a large number of people downloaded it before you, then it is a good indicator that you have the right book.

Gutenberg is now also hosting audio versions of the books. A logo will appear next to a title, indicating if it is a textbook (book icon) or an audiobook (loudspeaker icon).

You have looked at the search results and decided that you want to download the text version of “The Hound of the Baskervilles”. Let’s take a look at how to download it so you can read it.

3.4 How To Download A Book – The Different File Formats Explained

Gutenberg 8 image

Let’s take a look at the different options on offer, and how you would go about getting them to your favourite reader.

First, briefly, there are two types of ePub and Kindle files normally on offer – “images” and “no images”. As the words suggest, “images” is a book version with illustrations in it. Obviously these files are bigger in size (but usually not too much bigger).

There is also a QR code in the bottom hand right corner, if you want to quickly download the book to your smartphone. Just use a QR code scanner (free from the iOS App Store, and Google Play).

Now let’s examine the file formats.

HTML

This is a webpage version of the book, which you can use to read in your browser. Plus, since the books are public domain, you could also host the HTML version on your website. Just download the HTML file, and double-click it to open in your browser.

ePub

Short for “electronic publication”, this is one of the most common reading formats. ePubs work in a variety of readers, including Apple’s iBooks. Just download it and double-click on a Mac computer to open it on iBooks. Windows users could use the excellent Calibre as a possible reader.

Read more about Calibre for eBook management.

Gutenberg 9 image

Wikipedia has a list of possible ePub readers for various platforms.

Kindle

Amazon’s Kindle is probably the most popular and most widely used eBook reader ever. It’s the one that the others try to emulate. Getting eBooks from outside Kindle into the Kindle requires a few more hoops but nothing major.

Download the mobi file, and then go into the Kindle section of your account on Amazon. There, you will find a special secret address for sending files to your Kindle reader.

Gutenberg 10 image

These addresses can be changed if you don’t like the automatic ones provided to you by Amazon. When you have the email address you want, simply attach the mobi file to an email, add the secret Amazon email address, and send. Bear in mind that you must send it from an email address that you have listed as an authorised email address in the Kindle account settings.

You should also bear in mind that you only have 5GB of free space in which to send documents. Each eBook normally is very small in nature, but if you start downloading whole libraries, and / or lots of eBooks with images attached, then that 5GB is going to disappear pretty fast. So use this function conservatively.

After emailing the file to Amazon, it varies as to the time it takes to show up on your Kindle. With me, some have arrived within minutes, others took up to an hour, and some didn’t arrive at all. If it doesn’t arrive within a couple of hours, try again.

Plain Text

Gutenberg 11 image

The last text option is the simplest – plain text. No images, no formatted text, just the plain text. These files are also the smallest, which is good if you are crushed for space on your computer. Plain text files are good for reading on your smartphone. In the past, I have thrown all the text files into Dropbox, synced them to my phone, and then read them from Dropbox.

Audio (Ogg Vorbis, MP3, Apple iTunes, Speex)

Gutenberg 12 image

As I said, Gutenberg is now hosting the audio files of books. Probably not the rarely read books, but most definitely the popular titles. All audiobooks are read by volunteers for Librivox (more on that later).

For audio, it is a matter of just downloading the files and listening to them in the relevant audio player.

Two things to bear in mind though. One – these audiobooks are read by volunteers, so the voices and standards of recordings may vary. Secondly, if it is a big book, that will result in a LOT of MP3 files (normally one per chapter). There is currently no way to mass-download the files in one go unless you head to Librivox itself, so each file has to be downloaded individually. That could be a lengthy and tedious process.

3.5 Downloading Your Book To Dropbox, Google Drive, or OneDrive

Gutenberg 13 image

When you go to download your book, you will notice that some (not all) formats have Dropbox, Google Drive, and OneDrive logos next to them. This means that if you provide Gutenberg with the necessary authorisation to go into your account, then they will download the book directly there. Which is extremely convenient if that is where your eBooks end up in the end anyway, or if you are in a rush and need to download something quick for say a car journey.

If you are availing yourself of this function for the first time, then choose your favourite cloud service, and when Gutenberg requests access, then naturally grant it. This access can be revoked at any time by going to the cloud storage website and removing the Gutenberg site.

Once Gutenberg has access, it will create a special Gutenberg folder in your cloud storage and then it will drop the eBook you want in there. Keep the Gutenberg folder – all future eBooks will also be put there.

4. What Else Do They Offer?

Books are not the only thing that Gutenberg offers. Let’s take a look at what else is on offer on the site.

4.1 Audiobooks (Librivox)

Gutenberg 14 image

If you have any sight deficiencies, then audiobooks are literally a lifesaver and a link to the outside world. But it’s not only the blind and visually impaired that benefit from audiobooks. People with normal sight enjoy them too. They can be listened to in the car, while doing the housework, while exercising on the cross-trainer, while out walking, or quite simply while stretched out relaxing in bed or in your favourite chair.

Audio recordings are also good for people learning a new language, as they can be listened to while reading the text. It’s a great way to learn correct pronunciation.

4.2 About Librivox

Librivox takes the idea of crowdsourcing to audiobooks. How it goes is like this – a moderator assigns a book and then volunteers sign up to read specific chapters.

Gutenberg 15 image

Once all the chapters have been read by the various volunteers, it is all checked and put together by the site. Then you can download it from Project Gutenberg.

4.3 Volunteer To Read For Librivox

If you would like to volunteer for Librivox and be immortalised in an audiobook, then just go to the Librivox site, and click on the green “volunteer” button.

Recordings have to be of very high quality, and the voice has to be clear and easy to understand. The best recording equipment to do the job is normally Audacity. Just fire up the software, put on your headset, and then go for it. Read more about recording audio with Audacity.

It is actually quite difficult and nerve-wracking to do something like this. So it will take a few attempts for you to get it right.

4.4 Sheet Music

Gutenberg 16 image

This section is a bit of an enigma. The section is marked as dormant and the files (once unzipped) have the file format .mus. According to the site, this requires Finale. The XML files require SharpEye. Neither of these programs are free (they are both free trials and then paid). For a site like Gutenberg that professes to be public domain-focused, having file formats that require paid software is a bit strange.

Gutenberg 17 image

But if you have Finale or SharpEye (or you’re willing to pay for it), then there is quite a lot of sheet music available from composers such as Beethoven, Bach, Brahms, and Mozart.

4.5 Downloading Parts Of Gutenberg To Your Computer

Gutenberg 18 image

If you are a REAL hardcore book person, and you have globs of spare storage space on your computer, then you may want to consider downloading part of the Gutenberg library. The files are in ISO format (which can be opened by a program such as Virtual Clone Drive).

Gutenberg 18b image

Since the size of the Gutenberg library is immense, you will need a DVD disk to save it all, and many of the files are in ZIP format, to keep the sizes as small as possible.

At the time of writing (March 2014), this is what is available :

  • The August 2003 CD – 600 eBooks.
  • The December 2003 DVD – 9,400 of the first 10,000 books.
  • The July 2006 DVD – 17,000 books from the first 19,000 titles.
  • The March 2007 Science Fiction Bookshelf CD – most of the science fiction titles.
  • The April 2010 (Dual Layer) DVD – 29,500+ books.

Just go to this Gutenberg page to find out the various ways to download the disks. Choices include BitTorrent, FTP, and more. The page also provides you with a disk label, both in high-definition PNG format, and in Photoshop format if you want to alter it in any way.

5. Project Gutenberg Self-Publishing

Gutenberg 26 image

Project Gutenberg also has a free self-publishing portal area. What is this service? Well, no-one explains it better than the Gutenberg page itself.

From Project Gutenberg, the first producer of free eBooks, now comes the free Authors Community Cloud Library, a social network Self-Publishing Portal. This Portal allows authors to share their works with readers as well as allows readers to provide comments, reviews and feedback to the authors. Every eBook has its own Details Page, Star Ratings, and Reader Comment area.

Gutenberg 27 image

So if you have created an eBook and would like to donate it to Gutenberg, then you can, via the Self-Publishing Portal.

Check out MakeUseOf’s guide to self-publishing for a great introduction to the subject.

6. Distributed Proofreaders

Gutenberg 19 image

The Gutenberg project obviously needs volunteers to proofread and edit the books that are waiting to go on the site.  With so many books entering the public domain, there is always a need for volunteers, so your services will never be refused.

And what is good is that there is no minimum time commitment. You can do the spare 5 minutes here and there whenever you have a moment. If one week you have no time, then no worries. No-one is going to chastise you for letting the side down.

If you decide to volunteer, you will get the chance to see a variety of different books in different genres. The entire project is utterly fascinating. One moment you will be checking a page from a detective novel, the next you will be checking a page from an explosives text.

6.1 Volunteering To Become A Project Gutenberg Proofreader – How To Apply

Gutenberg 20 image

The application process is extremely easy – there’s no interview involved. Simply sign up for an account and you are in. It’s that simple.

This is the page where all of the proofreading takes place. To get started, click the “register” link in the top right hand corner, and fill in the form. Click the link in the activation email that is sent to you, and that is it. Congratulations. You have officially joined Distributed Proofreaders.

6.2 The Different Levels of Proofreading & The Rules

Gutenberg 21 image

When you start volunteering for DP for the first time, you will be restricted to P1, which is the first level. This is designed to give you a taste of the proofreading process, and to let you dip your toes into the water to see if it is something for you.

Once you have done enough to familiarize yourself with the procedures, and you like the work, then other levels will eventually open up to you. But you need to start off slowly, build your way up, and learn the proofreading rules.

As the Project Gutenberg page says, the main rules to remember are :

  1. Don’t rewrap lines. Leave the ends of lines where they are in the image (except, please put words that are broken across lines back together).
  2. Use a blank line before each paragraph and don’t indent at the beginning of a paragraph.
  3. Remove extra spaces around punctuation mistakenly inserted by the OCR software.
  4. Don’t correct the original spelling.
  5. When in doubt, make it look like the original and use [** Notes for the next proofreader or PM would go here] to flag the spot.

It all sounds complicated with lots of rules but once you get started, it will become much easier.

6.4 Choosing A Book & Getting Started

To get started, just go to the P1 page (Proofreading Round One), and scroll down to the “Projects Currently Available” section. There, you will see the current books being proofread.

Gutenberg 22 image

Each title will specify what language it is in, so you don’t get something in double-Dutch. Some will also specify BEGINNERS ONLY. If you are a beginner, then these titles are for you to practice on and find your feet. But once you get the hang of things, leave these titles alone for other new beginners.

So, assuming you have practiced on one of the beginner titles, and you’re now feeling pretty confident, choose a title that looks interesting to you, and click the link. You should always preferably choose something you have an interest in. You are less likely to overlook mistakes in the text if the subject matter has your complete and undivided attention.

Once you have opened one of the books up, you will then be randomly assigned a page. Let’s take a look at the set-up.

Gutenberg 23 image

This first window is full of stats about the book you have chosen. You don’t need to really pay attention to this (unless you really want to, of course).

Further down the screen, you can choose to be notified by email when the book in question passes through the following rounds, and finally makes its way into the Gutenberg website.

Gutenberg 24 image

Slightly above from that is a link that says “Start Proofreading”. Click that to be taken to your page.

6.5 How To Successfully Complete & Submit A Page

The scanned page takes up most of the window, and underneath is a text box of what the OCR managed to understand. In this initial round, you merely have to compare the scanned text with the editable text in the box underneath.

Remember the rules we went over earlier :

  • Don’t rewrap lines. Leave the ends of lines where they are in the image (except, please put words that are broken across lines back together).
  • Use a blank line before each paragraph and don’t indent at the beginning of a paragraph.
  • Remove extra spaces around punctuation mistakenly inserted by the OCR software.
  • Don’t correct the original spelling.
  • When in doubt, make it look like the original and use [** Notes for the next proofreader or PM would go here] to flag the spot.

Just read the scanned text, and compare it with the editable text. If you see any errors in the editable text, then correct it.

Once the page has been done, either click “Save as ‘done’ and proofread next page” (if you want to do another page), or “Save as ‘done'” (if you want to finish and come out of the editing area).

7. Recommended Books To Get You Started On Gutenberg

To finish up, here’s a great selection of books you can download for free from Project Gutenberg. Happy reading!

Sherlock Holmes

Gutenberg holmes image

Alice’s Adventures In Wonderland

Gutenberg alice image

Grimm’s Fairy Tales

Gutenberg grimm image

Dracula

Gutenberg dracula image

Peter Pan

Gutenberg peterpan image

War & Peace

Gutenberg warpeace image

Legend of Sleepy Hollow

Gutenberg sleepyhollow image

Treasure Island

Gutenberg treasure image

Count Of Monte Cristo

Gutenberg countmontecristo image

Dr Jekyll & Mr Hyde

Gutenberg jekyll image

Guide Published: April 2014 | Cover Art: Azamat Bohed

This manual is the intellectual property of MakeUseOf. It must only be published in its original form. Using parts or republishing altered parts of this guide is prohibited without permission from MakeUseOf.com

Think you’ve got what it takes to write a manual for MakeUseOf.com? We’re always willing to hear a pitch! Send your ideas to manuals [at] makeuseof [dot] com