Believe it or not, some people still print documents to physical pieces of paper. Optical Character Recognition (OCR) software takes those printed documents and converts them right back into machine-readable text. We’ve found some of the best free OCR tools and compared them for you here.
No OCR program is perfect, so you’ll have to double check the results and fix a few problems. Still, it’s a lot faster than typing the entire document back into the computer. Each of these free OCR software tools has its own strengths, and all of them will get the job done.
To compare these tools, I printed out MakeUseOf’s About page and scanned it back into the computer. With multiple columns and imperfections, this is not an ideal document to OCR. But that’s a good thing – real-life OCR operations will often be with imperfect documents.
Google Docs has integrated OCR support. It uses the same OCR engine that Google uses to scan books and understand text in PDF files.
To get started, open the Google Docs website and start uploading a file. You can’t scan directly from your scanner into Google Docs; you’ll have to scan the document as an image or PDF file first. If you don’t have a scanner, you can try scanning a document with your smartphone’s camera.
Enable the “Convert text from PDF and image files to Google documents” check box when you upload the file.
After you upload the file, it will appear as a new text document in Google Docs.
Google Docs did a pretty good job here. It struggled to understand the web addresses, but all these tools did.
FreeOCR is a simple, easy-to-use frontend for the open-source Tesseract OCR engine, originally developed by HP Labs. FreeOCR comes bundled with Tesseract, so you don’t have to install anything extra. We’ve mentioned FreeOCR in the past.
Use the Scan button to scan a page directly from your scanner or use the Open option to open an image or PDF file.
Click the red X button to clear the pre-filled text before continuing. After opening the file, click the OCR button on the toolbar and FreeOCR will start crunching away.
FreeOCR gave good results on the main block of text, but it was confused by the columns. To fix this, I selected the block of text I wanted and used the “Crop image to selected area” option.
The results were much better after running the OCR operation on a specific section of the document.
Cognitive Technologies developed Cuneiform as a commercial OCR solution, but eventually released it as freeware. Cuneiform OpenOCR has an unpolished interface, but there’s an excellent OCR engine underneath.
The download page is in Russian — scroll down and click the “english version” setup link to download and install Cuneiform.
After you install it, you’ll find that it didn’t create appropriate Start menu entries. The “NewShortcut6” shortcut will launch OpenOCR.
From the File menu, use the Open option to open a file or the Scan option to scan a document.
After it’s in OpenOCR, use the Recognize option in the Recognition menu to OCR the text. Cuneiform OpenOCR will save it as a file.
Cuneiform OpenOCR provided good text recognition. It also preserved the formatting and text size differences – unlike the other programs here.
Each of these programs has its strengths.
- Google Docs can OCR documents without downloading anything to your computer.
- FreeOCR offers a very easy-to-use interface, but requires some fiddling if a document contains columns.
- Cuneiform OpenOCR preserves much of the original document’s formatting, but its interface lacks polish.
Once you’re done with the OCR process, you may want to spell-check your document. Depending on your use, you may not even have to OCR documents at all – you can convert a paper book to an ebook without OCRing it.
Which OCR software works best for you? Do you have a different favorite OCR program that we didn’t mention here? Leave a comment and let us know.
Image Credit: Woman’s Hand With Working Copier via Shutterstock