If you want to convert any printed text into digital text that you can copy, paste, edit, and search, you’ll need to use Optical Character Recognition (OCR) scanners.
When you choose to scan or take a photo of a document, this will be saved in a format such as JPEG or PDF. OCR software can then recognize the letters and numbers within these documents, and convert them into a searchable PDF, or into a file that you can edit in programs like Microsoft Word.
The problem is, some OCR scanners work a lot better than others, with the very best being pretty heavy on the wallet.
Omnipage18 for instance, costs $150, but is especially good at recognizing different languages. Adobe Acrobat Pro DC costs an eye-watering $400 but has incredible accuracy. ABBYY FineReader costs $150, but is fantastic at converting documents such as magazines and brochures into searchable text. We will be testing ABBYY’s online offering later on in this article.
However, if you’re after free alternatives that you can download and use on Windows or OS X, you should try out these OCR Tools. But if you’d rather use a free, online OCR tool, keep reading, as we’ve tried out the top few, with the results below.
Seems most people now use their smartphones to do their scanning for them, I decided to use Evernote’s Scannable app (Free on iOS and Android). I scanned the first page of Richard Dawkin’s Climbing Mount Improbable, to see what results we could get with very basic formatting. I also scanned a page of Tim Ferriss’ The 4-Hour Chef to try out the scanners with some slightly more complicated formatting. I saved each of these files as a PDF.
These documents were then run through some of the supposedly best online OCR tools to see how well they fared.
Happily, there’s no registration required to use Free Online OCR. And I was doubly impressed when I saw their claim to keep the formatting and layout of my document.
The site claims to be able to support PDF, GIF, BMP, JPEG, TIFF, and PNG as input. Outputs can either be DOC, a PDF text document, RTF, and TXT. Unfortunately, I couldn’t find out if they had a file size limit.
Basic Document to PDF
Converted absolutely perfectly. There’s not much more to say! We’re off to a very good start.
Basic Document to DOC
The actual words seem to have converted flawlessly, apart from the “ount” from “Mount Rushmore” somehow going AWOL. The formatting is a different story, though. Many commas were replaced with underscores, and random spaces were inserted at points throughout the document. When you later see how the premium software fared in this test though, this isn’t a bad effort at all.
Complex Document to PDF
Converting the document took a whopping 120 seconds! Once complete, all of the text had been converted with about 95% accuracy, though the text in the separate box at the top right of the page was unsearchable. A few other characters throughout the PDF were incorrect, too.
Complex Document to DOC
This time, the conversion only took 10 seconds, with the text again converted with around 95% accuracy. There were some strange spacing issues, and the software had trouble converting the font at the top right of the document, and missed out a few characters here and there.
If you want to convert simply-formatted documents to PDF, this is a fantastic tool. In terms of converting to DOC the results weren’t anything to write home about.
i2OCR makes some impressive claims. The tool recognizes over 60 languages, can handle multi-column layouts (by removing the formatting), has no file-size limits, can convert uploaded files and from URLs. And you don’t need to register to use this tool either.
The service works by simply extracting the text from your image, then outputting unformatted text. You can quickly correct any mistakes in the side-by-side view, before copying the text to other programs, or downloading as DOC, PDF, or HTML.
Note: when I tried to upload my PDF documents, these were rejected by i2OCR, so I needed to convert these to JPEG (by taking a screenshot of them, then uploading the files).
Basic Document to Plain Text
Due to the way this tool works, all formatting is lost, though the conversion from image to text was almost perfect. There were some small errors such as paragraph spacing, and some commas were replaced with periods, but these are small niggles.
Complex Document to Plain Text
The majority of the text was converted without too many mistakes, apart from the title and the recipe at the top right, which was unreadable for this tool. The way the columns were converted to plain text was far from ideal. If you want to make this conversion workable, a lot of time would be needed to rearrange the lines into coherent sentences.
For basic documents, i2OCR works great. The ability to edit the text before downloading is also a very nice touch. For more complex documents however, the conversion is still pretty accurate, but the way the text is outputted will not make your life much easier.
Online OCR currently supports 46 different languages, and can convert PDF, JPG, BMP, TIFF, and GIF into Word, Excel, or Plain Text format. The site claims “converted documents look exactly like the original — tables, columns and graphics”.
The version you can use without registering allows you to convert up to 15 images per hour (5mb limit). If you sign up for an account, you can purchase more pages on top of this limit, while also being able to convert multi-page documents and ZIP archives, too.
Basic Document to DOC
The basic document converted flawlessly apart from the Roman numeral I not being picked up. As the site promised, the formatting was exactly as it was in the book. Kudos to this tool.
Complex Document to DOC
After being disappointed by the previous OCR tools in converting the complex document, I was massively impressed by Online OCR. The layout was near perfect, as you can see above. Once again though, the recipe wasn’t picked up too well, but any other minor mistakes were negligible.
Absolutely fantastic results from Online OCR. The only downside I see is that there’s no way to download the converted documents as PDFs as the output formats mentioned include DOCX, XLSX, and TXT only.
As mentioned earlier, ABBYY is one of the market leaders in OCR software, costing around $150 for their full, downloadable program. They do offer a 10-page free trial for their online tool, though (registration required). For a $5 subscription, their online tool will allow you to convert 200 pages every month.
Files accepted can be up to 100mb, in any of these formats: PDF, JPG, JPEG, TIF, TIFF, PCX, DCX, BMP, and PNG. ABBYY also recognizes almost 200 languages. Outputs are especially impressive, with a choice between DOCX, XLSX, RTF, TXT, PPTX, ODT, PDF, FB2, and EPUB.
You can even try out a couple of BETA features during your trial. The first is the option to translate your document into another language. The other is to export your converted document to your cloud storage account, whether that be Dropbox, Google Drive, Evernote, Microsoft OneDrive, or Box.
Basic Document to DOCX
The overall results were good, but not amazing considering this is a premium product. Multiple commas and periods were swapped around, several inverted commas were replaced with an asterix, a couple of capital letters were missing, and one word (literalist) was spelt incorrectly.
Complex Document to DOCX
Once converted there were very few faults in the text within the document (apart from the OCR struggling with that recipe’s font again!), but the formatting left a lot to be desired.
The three columns somehow took up two pages, with the central column only appearing on the second page. If you wanted to actually do anything with this converted document, you’d end up pulling your hair out.
Basic Document to PDF
While reviewing the converted PDF, I couldn’t find any fault at all. Perhaps we’ve found where ABBYY excels. Fantastic results.
Complex Document to PDF
Again, I couldn’t find any errors in this converted file. ABBYY obviously knows how to convert to PDF exceptionally well.
If you’re happy paying a few dollars, converting to PDF seems to work phenomenally well with this service, and being able to sync converted files to your cloud storage is especially useful if you’re scanning a large volume of documents. As with the other options though, ABBYY still hasn’t figured out how to flawlessly convert documents to DOC for easy editing.
The Final Outcome
If, like most people, you’re just looking to scan a few magazine articles, and some household bills, you won’t need to edit these documents. Therefore, converting direct to a PDF will be suitable for you, because you’ll still be able to search those documents. For this, Free Online OCR was definitely the best free tool we tested. That being said, if you’re willing to pay $5 per month for near-perfection, ABBYY’s FineReader Online was slightly more accurate.
When it comes to converting documents to DOC, we didn’t manage to find any solution that was perfect, but by far the best results came from Online OCR. The conversion wasn’t perfect, but the integrity of the formatting was largely kept intact, and mistakes were negligible. When we compare these results to the “premium” offering from ABBYY, you can’t help but be massively impressed.
We didn’t include Google Drive’s OCR capabilities in this post; a little bit for Google’s everywhereness, but more for the fact that we wanted to test a few other free online OCR services out there.
Over to you: Which other online OCR tools would you recommend to our readers? And which have you tried that you’d never use again?