Pinterest Stumbleupon Whatsapp
Ads by Google

Javier Asks:

I’m a writer of short stories and tales. I’m looking for a free Optical Character Recognition (OCR) or Intelligent Character Recognition (ICR) program to scan my old manuscripts from images or photographs so I can convert them into Microsoft Word files.

Are there any free and accurate programs able to do this? Unfortunately, I don’t have a scanner, but I do have access to a digital camera with a 20 megapixel resolution.

manuscript

Kannon’s Reply:

As you’ve already mentioned, there’s several kinds of character recognition technologies that can automatically convert handwritten or typed writing into digital characters. The level of accuracy of these kinds of software varies greatly between different implementations. Some convert on a letter-for-letter basis and others can convert entire words. There’s three general categories of this software:

  • Optical Character Recognition (OCR)
  • Intelligent Character Recognition (ICR)
  • Intelligent Word Recognition (IWR)

Optical Character Recognition

In truth, OCR is a generic term and oftentimes all the methods outlined in this article are referred to as OCR — Wikipedia, however, gives OCR its own classification, but modern implementations tend to lump together multiple methods. So what does it do? OCR converts individual –typed or handwritten– letters into digital characters. So the software looks at a document and then attempts to convert it into plain text by guessing what each character is.

The software isn’t perfect. OCR software can misinterpret individual characters with similar appearances, resulting in misspelled words and inaccurate outputs. Most of the time, users can copy the text generated by an OCR program into a word processor and automatically fix the spelling errors. Oftentimes errors will appear as similar characters. For example the letter “d” might be represented as “cl”.

But when it comes to handwritten texts, OCR doesn’t do very well. At least, the majority of the free implementations are tragically bad. There are some commercial products that can really nail handwritten transcription, but their pricing places them completely out-of-reach for the general public. For example, there’s Lexmark’s ReadSoft OCR software. This enterprise-only software costs thousands of dollars.

annotated-text

Ads by Google

Intelligent Character Recognition

ICR is a subset of OCR that specializes in converting handwritten text into individual digital characters. Given that your notes and manuscripts are handwritten, an ICR program is the most useful. However, I’m not sure how accurately they can convert texts written in foreign languages, such as Spanish. As with OCR, users can improve the quality of the outputted texts by copying them into a word processor with spelling correction turned on, and then proofreading by hand.

A Test Of Character: 10 Font Games That Prove Typography Can Be Fun

Intelligent Word Recognition

The latest evolution of OCR and ICR is Intelligent Word Recognition software. Rather than recognizing individual characters, it attempts to translate entire handwritten words. Like OCR and ICR, Intelligent Word Recognition often mistranslates words, and requires the user manually correct any mistakes made.

What’s the Best Free OCR Software?

Tesseract

There’s lots of options available. Tesseract is probably the best open source (and free) OCR software out there. To my knowledge, it only looks at individual characters and not entire words.

Because you’re using Microsoft Word (which has the best, most customizable spell check 8 Ways To Spell & Grammar Check In Microsoft Word Using Different Dictionaries & Languages 8 Ways To Spell & Grammar Check In Microsoft Word Using Different Dictionaries & Languages Microsoft Word's built-in spelling and grammar checking tools can be customized to meet your needs. You can even use AutoCorrect to speed up your typing. Read More in the businesses), you can just copy the entire text into Word and then run a spell-check to clean up misspellings.

Tesseract is actually an OCR engine which runs from the command line. Unless you’re willing to deal with the difficulty of wielding a command line tool, you’ll likely want to install something more user-friendly. There’s a downloadable “front-end” (or a Graphical User Interface) that allows you to use Tesseract as a drag-and-drop tool: PDF OCR X. First, install the software package, then run it. You’ll then see a window:

PDF OCR interface

Then you just drag-and-drop the image file onto the window. Once the image loads, run the OCR transcription software. It may take a minute or so.

Unfortunately, it proved entirely inadequate for handling your text. Here’s what it looks like after extracting text from the document:

OCR image extraction using tesseract

Microsoft OneNote

Since it appears you’re already using Microsoft Office, the best option is probably also from Microsoft. I’m going to guess that you own a copy of Microsoft Office, which includes OneNote. This comes equipped with fairly advanced OCR technology.

Also, on both iOS and Android, there’s also the completely free Microsoft Office Lens, which can convert JPEG (and other image formats) directly into text. What makes the mobile versions so wonderful is that you can shoot an image, upload it to Microsoft’s cloud computing system, and then run the text extraction from OneNote on a desktop.

The process is fairly simple. First, take a photo of your text. If you’ve decided to use the OneNote app, then you’ll only need to save the file to your OneDrive account. Otherwise, transfer the image to your computer and drop onto OneNote.

Then right-click on the image and select Copy Text from Picture from the context menu.

onenote extraction of text

Then right-click on a blank portion of OneNote (or in a text reading application) and paste the text in. The outputted text from your document looks like this:

onenote OCR text extraction

Unfortunately, the results from OneNote don’t do anywhere near a good job, producing utter nonsense. This could be caused by a combination of factors, such as a distorted image or writing that is not done in a straight line, or simply because the software isn’t good enough.

Google Keep

Right now the best solution for OCR on handwritten documents relates to machine learning: Specifically, deep-learning. Deep-learning is a sophisticated method of training a computer to perform tasks that previously only human excelled at, such as facial recognition (Picasa does facial recognition How To Use Facial Recognition in Picasa Web Albums How To Use Facial Recognition in Picasa Web Albums Read More , believe it or not). Google recently purchased DeepMind, which develops deep-learning technology Cool Research Projects That Could Change the Future Cool Research Projects That Could Change the Future Will mega-corporations like Google, Microsoft, IBM, and Intel go on to give future generations the world we can only imagine now? These exciting research projects say that it is a promise they will keep. Read More . This key acquisition had a big effect: Microsoft loses to Google in OCR Microsoft vs Google - Who Leads the Artificial Intelligence Race? Microsoft vs Google - Who Leads the Artificial Intelligence Race? Artificial intelligence researchers are making tangible progress, and people are starting to talk seriously about AI again. The two titans leading the artificial intelligence race are Google and Microsoft. Read More . Right now Google offers one of the most advanced (and free) methods: Google Keep.

Google Keep (which we first reviewed in 2013 Simple Notes On The Go: Google Keep For Android Reviewed Simple Notes On The Go: Google Keep For Android Reviewed There are some amazing note-taking apps out there, but here's one that merits a closer look: Google Keep, a note-keeper which has both a great-looking Android app and a slick webapp. Read More ) also offers a mobile version of their app for Android. As with OneNote, you can shoot the image and transfer it directly into Google’s cloud. Just drag the image onto the Google Keep window. Then click on the menu button (three vertical dots) and select Grab image text from the context menu.

google keep

Here’s what it looks like after extracting the text:

2015-07-18_19h45_34

Google Keep Wins

As you can see, Google Keep dominates on the competition. The results can be even further improved by using an image editing tool 10 Free Photo Editor Tools To Make The Most Of Your Shots 10 Free Photo Editor Tools To Make The Most Of Your Shots Whatever happened to all the snapshots you've taken in the past? If you've got plans and ideas, here are some great Windows and some cross platform tools to process them with. Read More to increase the contrast and straighten up the image.

Hopefully those options help. In case you need more OCR options, please check out the 5 best OCR tools The 5 Best OCR Tools for Extracting Text From Images The 5 Best OCR Tools for Extracting Text From Images When you have reams of paper, how do you get all that printed text converted into something that a digital program will be able to recognize and index? Keep a good OCR software close by. Read More , for more information.

  1. Rachit
    June 21, 2016 at 10:15 am

    Hi Javier,
    I have been trying to do something similar but with typewritten documents and I have found something that works really well, again using Google's tech but in a slightly different way.

    If you save all your photos on Google Drive it will be faster for you to convert multiple documents together. Go to your Google Drive, select the photos you want converted into text >> Right click and select the option >> "Open with Google Docs". Google will create a new document with the text extracted from the image.

    This works with image as well as PDF files. If you use a scanner app on your phone and save all your work as a multi-page PDF file then all the docs will be converted into one single Google Doc.

    Hope this helps.
    Rachit

    • Kannon Yamada
      July 1, 2016 at 9:46 pm

      Thanks for the tip Rachit! Your option is the best I've seen so far!

  2. Kristen Mathew
    June 18, 2016 at 11:40 am

    Nice Article!!! Try handwrittenocr.com that can convert handwritten and cursive writings to text for free. It can also email you the text and the image.

    • Kannon Yamada
      June 19, 2016 at 12:26 am

      Thanks for the tip!

  3. fero
    May 16, 2016 at 7:55 am

    awesome......

  4. Abhilash Salvaji
    April 28, 2016 at 8:37 am

    Wonderful work Kannon.. Great information for me. I was looking for a open source OCR, ICR, IWR tool that can integrate with SharePoint or Microsoft applications..

    Thanks..

  5. killer
    April 28, 2016 at 6:27 am

    You save my time its great.

    • Kannon Yamada
      April 28, 2016 at 6:47 am

      Thanks for the kind words!

Leave a Reply

Your email address will not be published. Required fields are marked *