OCR'ed PDF - Side by Side Review?

Good morning!

I’m in a situation where I have some family history documents that I’d like to digitize. My plan is to scan and OCR to PDF using the Doxie scanner/software, which is what I typically use.

A few of these documents have been photocopied a number of times, and for them to be useful I’d really need to be able to proofread the OCR results and edit as necessary.

Is there out there somewhere an application which would take a PDF that has both an image layer and a text layer, and would display the two side-by-side to enable proofreading/ review/ correction of the text layer?

I think (but can’t swear to it) that the complete version of Acrobat Pro could do it – but this is the kind of project I’m apt to do in dribs and drabs, and while I’m not inherently opposed to software licensing I think $20/month is pretty steep for how I’ll end up using it.

My instinct is that this should be a pretty obvious use case – “I have a scanned/OCRed PDF that I want to proofread and correct” – but I don’t have any leads on software that’s designed to make this easy/straightforward.

Thanks for any insight anyone might have.

Dave Scocca

1 Like

I use OwlOCR to do the OCR and then display text in a second pane. The product page https://www.owlocr.com shows how this looks.

OwlOCR uses Apple inbuilt OCR engine which, with my usage, does a much better OCR than OCR included with scanner software.

To correct an already OCRd image needs a more professional level software, file Nitro https://www.gonitro.com

But OCR is just a tool towards an end purpose. For example, 1) enable searching using Spotlight, 2) searching within a PDF, or 3) extracting the text into another application/format. What is your end goal?

6 Likes

Setapp has Nitro Pro, so I use it from that. But I find it a bit fiddly, TBH. Wish there were better solutions than that for editing the OCR’d text and keeping any corrections as part of the search-ability inside the original doc (I typically don’t bother as too fiddly), apart from the full Acrobat Pro which costs a LOT.

I’m undertaking the same task you are, with family history documents.

Hamrick’s VueScan is what I use. After each scan, VueScan automatically opens the PDF. The text in every PDF it generates is searchable with the Find function in Apple’s Preview.

In my workflow I select the text, copy it, then paste it into the document of my choice (usually Nisus Writer). It never needs proofing.

NOTE: Even though every word is searchable within Preview, Spotlight fails to index the words in a scanned PDF. This is a puzzling stumbling block to my project.

2 Likes