What's the best option for working with large PDFs?

I need help to accelerate a workflow. I’m doing research for a project that requires me to have numerous PDFs open simultaneously. Because the PDFs display Japanese characters, they can be quite large, from 200MB to larger than 1GB. I’m currently using a 2021 MacBook Pro with an Apple M1 Max processor, and it’s struggling once I have more than 5 or so PDFs open. I’m opening the PDFs using Adobe Acrobat Pro DC 2022. It’s not a problem to switch between Acrobat and MS Office apps quickly, but scrolling and searching through the PDFs can get really slow.

Does anyone have a similar experience? Any recommendations for making my workflow go quickly and smoothly? Use other software to open, search and read the PDFs? Buy a new Apple desktop with an M2 Ultra processor and more than 64GB of memory? I can’t downsize the PDFs, since it would degrade the quality of the Japanese characters.

I don’t think Japanese text as such should cause file sizes to be so huge. Is there a url where we can look at one?

Sorry, can’t do that. It’s why I’m looking for options involving either different software or hardware.

Can you do some system performance analysis while this is happening to help us identify the bottleneck? Can you tell if CPU is maxed? Or I/O? Or RAM? Otherwise we’ll just be guessing about how to fix it.

You could start by opening Activity Monitor, enabling the CPU history window (we have to see individual cores, not just the aggregate), and sharing what else is visible under the CPU, Disk, and Memory tabs.

Just try some other software and see what works best? Preview (of course), Skim (lightweight, fast, free), PDF Expert, DEVONthink. I never use Adobe for PDFs as it always seems slower and less useful than the other options.

Another thought: perhaps you could split the PDFs into smaller sections, like by chapter or whatever. Presumably any app you choose would handle them better in that form.

1 Like

A very common reason that pdf files become especially large is graphics. I suspect given how consistent the problem appears to be with the files using Japanese characters, that those characters are in fact graphic. The size problem can be appreciably worse if the document is not in the latest version of pdf formatting (now Version 10 I believe)

There is a function in Adobe that can fix this without adversely impacting appearance and readability.
– Open the document in Acrobat Pro
– Select File > Save as Other > PDF Optimizer
– the next screen has a complex set of parameters
I will try to attach a screen shot
in the event I can’t here are the settings I use routinely

  • top right - Make Compatible with - select Acrobat 10 and later
  • right sidebar checklist – uncheck only fonts – but you may need to have this checked this because if the characters are images but Adobe considers that font not images. It may also realted to the document carrying the full definition for the fontused. But if these files are being read by persons expecting to open Japanese fonts they likely have them routinely on their computer.
    Image Settings
    • Color - 150ppi if above 225; Compress JPEG
    • Grayscale – same
    • monochrome – 300ppi if above 450; Compress CCITT Gp 4 – I do not know if a different compression is the best for Japanese characters

Optimize images only if reduction in size – I have it checked, but if there is no effect it may be reasonable to try again unchecked. I have no idea of Adobe’s percent definition of “reduction” – it could be trivial for a single character and have huge impact with thousands of characters.

I hope this is helpful – would like to hear if it works ;-))

thanks

Bob

I regularly use a Mac app called PDF Shrink to reduce the size of PDF files. You can choose the degree of “shrinkage”, for example print quality or email quality.

Sometimes it has reduced file size by a factor of 10.
You may need to experiment to ensure that japanese characters are still legible.

Note also that Preview has a Reduce File Size function that does similar things.

Sounds like the source of the documents (whoever is generating them) is not using a very good PDF-creation tool.

Japanese text, like any other language, should be representable as a series of Unicode characters, usually encoded with either UTF-8 or UTF-16. Along with the fonts used by that text.

Full Unicode fonts can be big, but not that big. For instance, “Arial Unicode MS” is about 23MB. “Hiragino Sans” (a font designed for Japanese) consists of 10 fonts (10 different weights), which all together consumes about 59MB, with each individual weight’s font consuming between 4 and 8 MB.

I’m also pretty sure PDF supports partial embedding of fonts, so only the characters used by the document are present in the file, which would make the file much smaller, since there would no longer be a need to embed the full content of these large Unicode fonts.

So I wonder what’s going on in these documents. Are they using a lot of different Japanese fonts? Do they consist of bitmaps instead of text? Are they using ancient bitmap-based fonts instead of modern TrueType fonts?

If the source of these documents are scans of printed documents, then maybe someone can see if there’s an OCR solution that can be used. That would reduce the document size by orders of magnitude.

That suggests that the characters are images rather than Unicode, which explains the huge size of the files and the slowness. Text display in PDFs is relatively quick, but graphics rendering can be sluggish when there are lots of high-resolution images. Reducing image quality would have zero effect on the quality of properly encoded text, even with large character sets like Japanese.

Ultimately, the best solution is going to be to find a way to get those graphics converted to text. I’m going to presume that asking the source(s) of these files to provide them as text isn’t an option. I genuinely don’t know what the state of OCR of Japanese text is, but if it’s good enough, that may be your overall best option.

Otherwise, I’d echo the advice of others to find a minimal PDF reader. If all you’re doing with these is displaying them, Acrobat is overkill, full of features that add to bloat without contributing to your project in the least. If the text is actually images, text-handling isn’t going to be a concern; you just need something with fast image display. I don’t have any specific recommendations, but others in this thread have made some good ones to start with.

Agree with other TidBITsers that it very much sounds like the text has been converted to outlines (vector graphic shapes) or rasterised (turned into pixels). Guess the first step is confirm whether that is the case.

Assuming the PDFs aren’t just scans of documents (in which case there was never any live text), in Acrobat go to File > Properties > Fonts. If that window’s completely blank or only non-Japanese fonts are listed then indeed the text has been converted to outlines or rasterised. Options as others have suggested are OCR, split the PDFs into multiple low page-count PDF files, play with the Optimise options to see whether you can strike a balance between size reduction and image degradation.

While in that Properties dropdown click on the Description tab (to the left of Fonts). That will tell you when the PDF was created and by which software; both the application used and the PDF Production engine. That might direct you to an application which can read the file natively…

Sounds like they need to be optimised.

The best thing to do would be prelight them and see if anything stands out. Check the resolution of images as they may be way over what’s required. It will also tell you the originating app and you might be able to glean something from this.

Also be aware of things like vignettes and multiple layers which can add considerably to the weight of a PDF.

Thanks, I’ll try those out!

I appreciate everyone’s comments and suggestions. As some have surmised, I’m working with scans of documents. These documents are quite old and the text quality is fair to poor, so they regrettably had to be output as graphics to maximize character legibility.
FYI, I have experience with the optimize function in Acrobat, and it actually does degrade the legibility of the fonts in these types of files.