ChatGPT Atlas Digitized Book Tables That Stymied Other OCR Tools

Originally published at: ChatGPT Atlas Digitized Book Tables That Stymied Other OCR Tools - TidBITS

I got sucked down another geeky rabbit hole. While I doubt that wanting to perform optical character recognition on large tables in a book is a common desire, it was enough of a learning experience with modern AI tools that I can’t resist sharing.

Enabling Calculations in a Workout App

I coach weekly indoor track workouts for the Finger Lakes Runners Club from November through April. Somewhere between 40 and 70 people show up every Tuesday to follow my instructions while running in ovals. (The power!) To account for the fact that my runners range in speed from a 4:19 mile to a 10:30 mile, I base my workouts on the system developed by running coach Jack Daniels. I had the pleasure of learning from Jack in person before he passed away recently—he lived nearby and was a super nice guy. Jack defined specific paces for different types of training: Easy, Marathon, Threshold, Interval, and Repetition.

Briefly, the system works in two steps. First, you use a recent race time to look up a number called VDOT in one table—Jack’s shorthand for aerobic fitness. Then you use that VDOT to look up your prescribed training paces in a second table. For instance, if you want to know how fast to run a 400m rep at Interval pace, you’d find your VDOT from your recent 5K time, then look up the corresponding 400m time. These tables are in his book, Daniels’ Running Formula, Fourth Edition, and they’re big—the training pace table covers nearly four pages.

Jack Daniels training pace tables

With the help of the Beyond Better AI development app, which I’ve used for building an iPhone app to help with backup race timing and will be writing more about soon, I wanted to write a Web app that would let my runners enter a race time to find their VDOT and then see how fast they should run for different distances (some of which aren’t even in the tables) at different specified paces. Various online calculators already do this, but I wanted mine to let me build a workout like a 200m-400m-600m-800m-600m-400m-200m ladder with the shorter distances (200m and 400m) at Repetition pace and the longer ones (600m and 800m) at Interval pace. It’s a fun workout, but it requires runners to keep a lot of numbers in their heads. Tonya writes them on her hand. Once I have built a workout, I want to share a URL to the calculator in the workout announcement, so that when any runner clicks it, it shows them their specific times for each rep.

As has been my experience with AI development, the hard stuff was easy and the easy stuff was hard. After a lengthy discussion with Beyond Better to nail down exactly what I wanted, it took only a few prompts to build the app and get it working roughly the way I wanted. Before I put the effort into the final layout and making sure it worked on multiple platforms and all that, I decided to spot-check its numbers against the book. AI chatbots are magic, but you can’t trust them, particularly with numbers.

That’s when I started to go down the rabbit hole. You see, the tables of numbers in Daniels’ Running Formula are derived from equations that Jack Daniels and his friend Jimmy Gilbert (whom Jack coached and who later became a programmer for NASA) developed based on years of testing many runners of different ability levels.

Initially, the app’s training pace numbers were utterly ridiculous, so I directed Beyond Better to research and implement the Daniels–Gilbert equations. It did so and got much closer, but the times were still 2–4 seconds too fast per lap. I gave it a few examples of the correct numbers, which caused it to cheat by hard-coding the correct numbers for my examples and interpolating from them, which brought it closer, but still not right.

OCR Options

After some additional back-and-forth, Beyond Better suggested that if I wanted the training paces to match the book exactly, I should provide that data. It wasn’t readily available online in its entirety, and I didn’t relish the idea of manually entering 700-plus numbers. “Surely there must be a better way,” I said as I turned the first corner in the rabbit hole, “there are lots of ways to extract text from an image now.” I took photos of the tables and went looking for a solution.

  • ChatGPT: I’ll admit it. Once an AI chatbot has worked magic for you once, it’s hard not to go back to the well. I fed my photos to ChatGPT, which is typically pretty good at extracting text. However, ChatGPT informed me that the table was too large and the numbers too small and too close together for accurate recognition. I took five photos and cropped them to focus on specific columns, which it liked better, but it still wanted to process ten rows at a time and have me verify all the numbers. The very first check revealed some errors, and I didn’t want to have to verify 700-plus numbers manually either.
    OCR in ChatGPT
  • Live Text: Several years ago, Apple introduced Live Text, which lets you select text in images in various macOS apps, including Photos and Preview. When I first selected some text and pasted it into BBEdit, the result was a mess, with every number on its own line. The results were somewhat better when I tried Numbers and Excel, both of which somehow extracted the fact that the data was tabular from the Live Text clipboard contents. Unfortunately, because I wasn’t willing to damage my signed copy of the book, I couldn’t get a perfectly straight photo—there was always some page curl that I couldn’t eliminate, which caused the data to become offset in tricky ways.
    OCR with Live Text
  • Microsoft Excel: Who knew that Excel had the option to insert data from a picture? I didn’t until ChatGPT told me, and even then I had to ask for more help to find the From Picture button on the Insert ribbon—not the Insert menu. Unfortunately, although it offered a nice interface for correcting data it had trouble with, it wasn’t nearly accurate enough.
    OCR in Excel
  • ChatGPT Atlas: In an effort to automate comparing my app’s results with an online calculator, I tried using ChatGPT Atlas in agentic mode. On a whim, I fed it my five photos and told it to make me a CSV. Unlike the regular ChatGPT, it took on the challenge of cutting the photos into smaller pieces and running OCR on those manageable chunks. Astonishingly, 13 minutes later, it had gotten everything right, even inserting the data from one photo into the proper spot in the CSV, even though it didn’t explicitly include a column of VDOT values (it was the inside of the page spread). Curiously, where the book reports all times under 100 seconds as just seconds, ChatGPT Atlas chose to rewrite some of them in minutes and seconds, so 85 seconds became 1:25. But only some! Strange, but not problematic.
    OCR with ChatGPT Atlas
  • A friend with Shottr: While I was working on this, I was scheduled to do a podcast about agentic browsers with Allison Sheridan, who thoroughly appreciates the lure of a good rabbit hole. When I explained what I was working on, she asked me to send her the photos because she thought the Shottr app, which she likes for screenshots, could digitize the data. It could, though she desaturated the images in Preview and increased the exposure to make the text stand out better. Then, in Shottr, she captured with breaks, which preserved the tabular data. She did have to capture relatively small chunks at a time and fix some mistakes, but it sounded like many fewer than I saw when using Live Text on the entire image. Allison’s CSV matched the one ChatGPT Atlas generated, and both passed all my spot checks.OCR with Shottr
  • Gemini: After the initial publication of this article, TidBITS reader John Holland suggested I try Google Gemini, which he found to be even better at OCR. He was right—I fed Gemini the same five photos and prompt, and it quickly returned a CSV file with no mistakes. As a bonus, it didn’t reformat any times in seconds to minutes and seconds.

There are probably other techniques I could have used. Dedicated OCR software might have done a better job or known how to format tabular data correctly. I also could have improved the results if I had been willing to break the book’s binding or even cut the pages out to get better scans. But I didn’t want to put a ton of effort into either image preparation or digitization because I still don’t know whether I’ll use this data—I may discover a reason to revert to equations.

Moving Beyond Autocomplete

For a quick digitization task from quickly snapped photos, ChatGPT Atlas surprised me, especially with how it leveraged its agentic capabilities to go beyond what ChatGPT could do on its own.

Many AI skeptics dismiss large language models as nothing more than “autocomplete on steroids.” And look, I understand the impulse. Given the AI investment bubble and the industry’s inflated promises, reducing the technology to its basic mechanism feels like cutting through the hype.

But here’s the thing: I handed ChatGPT Atlas an ill-formed problem. I gave it five poorly composed photos with page curl and minimal guidance on handling the spatial relationships between images. I didn’t even tell it that one photo was from the inside of a page spread and lacked the VDOT column that would simplify matching the rows with the other photos. Nevertheless, ChatGPT Atlas figured all that out. It autonomously decided to break the images into manageable chunks, ran OCR on each, reconstructed the table structure, and placed everything in the correct order. It made decisions I would have had to make myself with traditional tools, and it made them correctly.

If we’re going to criticize AI when the results are problematic—and we should (see “Can Agentic Web Browsers Count?,” 30 October 2025)—we also have to acknowledge when it does something impressively well. Even though ChatGPT Atlas uses the same underlying large language model technology as the standalone chatbots, turning five photos into a clean table of digital data is well beyond what anyone would consider autocomplete.

8 Likes

That’s a big wow.

Dave

That is impressive. I would not have expected the agentic variant to go beyond the OCR capabilities of the underlying model. Kudos for attempting it, and thanks for reporting the success.

That’s a useful article. I have used Perplexity to analyse student grades over years, some scanned and some in multipage PDFs. Typically tables of grades against classes for students, outputs of the nightmare database that seems to be ubiquitous in colleges here. My checks on the results have found it to be accurate, despite the multi page structure of the documents and it used header pages which contained maps of names of classes to codes to answer queries. I agree when it does things I would have done to figure something out but which I didn’t tell it to do, that’s when it impresses.

First of all, I’d checksum each column to assess accuracy.

Second, this is the perfect task for the Amazon Mechanical Turk. Would you trust AI with 7,000 numbers or 70,000?

Very cool, @ace . Also interesting given your recent caution about using agentic browsers! :wink:

1 Like

My understanding with Mechanical Turk is that if accuracy matters, the best approach is to have three sets of people do exactly the same task so you can compare the results and identify the inevitable human mistakes.

Yeah, I was thinking that too. As I said in Be Cautious with Agentic Web Browsers, though, I think it’s entirely reasonable to use agentic browsers for the occasional task where they might be useful, if not as a daily driver.

In this case, ChatGPT Atlas was useful for comparing numbers calculated by my script against those from an online calculator. They didn’t match, but that was the script’s issue, and it saved me from having to calculate the same VDOT in both calculators and then compare paces manually. And using it as a fancy OCR system wasn’t something I’d previously considered, but worked surprisingly well.

I’ve just updated the article with another bullet point about Google Gemini, which reader John Holland suggested I try. It worked perfectly and didn’t reformat things like 98 seconds to 1:38 like ChatGPT Atlas had done. So if you’re in a situation that calls for OCR on dodgy originals, give Gemini a try as well!

Could you not simply instruct ChatGPT to format the numbers it as you wish?

I found that COPILOT gave me the best results in extracting tables from a digitized book 240 years old!

There were several tables not very readable and copilot was able to read all of them extract the relevant data, produce a csv and also several plots showing the data.

1 Like

I could, but it’s immaterial because they’re being used in calculations and are functionally identical.

Interesting that you say this.

In fact there are many thousands (probably millions) of old datasets in books that need converting to digital format. I particularly know of efforts to convert meteorological data, but the Victorians were especially avid recorders of data, and most of this is useful today to give historical context.

An added layer of complexity is that many, but by no means all of these datasets are handwritten. An example is Irish Weather Rescue, which is quietly digitising 19th-century Irish weather records using crowdsourcing. At the moment, this work is done manually, more than once, to ensure as few errors as possible. If AI could help automate this process, it would be a massive help understanding climate change over the past few centuries.

There’s projects like this for weather records all over the world.

2 Likes

I’m about five years retired from the data analysis world and stopped keeping up with analysis tools cold turkey.

I checked out Beyond Better and was just blown away with its capabilities.

Thanks for the article. Examples using real problems are the best teachers.

1 Like

6 posts were split to a new topic: Creating personal apps with AI

Really cool to read these real world examples.

A post was merged into an existing topic: Creating personal apps with AI