Spelunking a PDF

nello · October 12, 2022, 2:39am

I’m using Hazel to rename and file PDFs based on their content. Unfortunately, my CONTENTS CONTAIN MATCH Rules are not firing and I need to step through exactly what Hazel is “seeing” in order to debug them.

How do I view the content of a PDF in the same way that Hazel sees it?

Yes, I already posted this question on Hazel’s support board.

But people here might have some ideas for spelunking through the nether regions of a PDF.

Thank you for reading.

pcarrington · October 12, 2022, 12:04pm

Tough one. I tried over and over years ago to develop a script in Automate to name pdf files I’d downloaded by title and it ends up that, unless the title is in the metadata, it’s not done. How does one know? One can use Get Info and look at ‘Name and Extension’ and see if there’s a title. If there is, you are in luck. If not, I couldn’t find a way. If you find a better way, I can’t wait to use it. Best, Patrick

nello · October 12, 2022, 3:12pm

I’ve never used Automator so I can’t comment on how to use it to peer into PDF files and rename or move such a file based on its content.

However, I did a few searches and found that Automator has an “Extract PDF Text” command. Perhaps after extracting the text, you can search the extracted text and the rename and move the PDF based on conditions.

Here are links to some web pages that demonstrate uses of this command:

As for Hazel …

Hazel can examine the attributes as well as content of files, including PDFs, and take a variety of actions based on what it finds.

You can find the answer to my question on Hazel’s support board.

Thank you for taking the time to reply to my question.