Split Out Portraits from Yearbook Page

nello · March 18, 2023, 1:38pm

I need to split scans of yearbook pages into the individual senior portraits each contains.

Can anyone recommend an automated way of doing this? Ideally, it would use edge detection to isolate the portraits and name the resulting image files with the person’s name shown on the page. (I can dream, can’t I?)

Someone from LMUG referred me to these techniques:

Any other ideas?

Thank you.

Matt_McCaffrey · March 18, 2023, 3:06pm

I recall flatbed scanner drivers that can be set up to do this automatically, but haven’t needed to do this in a while. As for the file naming, I think any batch scan routine like that could be set up to add a prefix to each file name. After the preview scan, I’d set the prefix to be the page number from the yearbook page, or perhaps even scan each page into a new folder so that you have some organization.

It’s good to dream about including the person’s name, and maybe our AI overlords will hallucinate a solution for that!

Shamino · March 18, 2023, 10:56pm

I know that many flatbed scanner apps can auto-detect negatives or slides. They do it for identifying rectangular regions of image surrounded by black (the usual color for the film/slide holders).

But I haven’t seen something that can do this for prints. And nothing bundled is going to extract text for automatic labeling.

gastropod · March 19, 2023, 3:41am

Image Capture on Sierra with the Epson V600 does pretty well with prints in batches. IC doesn’t have enough control over the scan to get the best color results, especially when they’re batched, but it’s adequate for backup and triage of some family albums. Once the preview scan is done, it’s possible to change the area that’s scanned per photo, and for some that have a caption taped to the bottom that’s been great. I’ve been scanning each batch twice, once with ‘color restore’ and once without to improve the odds a bit.

For year book or other photos which are on a regular enough grid, it might be possible to use ImageMagick (open source command line) to crop them out from full page scans using pre-defined values which would include the caption. I’ve never actually used IM, so I have no idea what the learning curve is like. There might be scripts or guis to IM that do what you want. There are other image manipulation libraries, too, but the ones I know of are either more specialized or they’re fairly low level and you have to surround them with your own program (ImageJ, Python Image Library).

https://www.imagemagick.org/Usage/crop/#crop_tile
https://www.imagemagick.org/Usage/crop/#crop_equal

Shamino · March 19, 2023, 3:03pm

Image Capture is a very bare-bones app. It will let you scan images, but for any kind of real control, you should use a third-party application. My experience with EpsonScan (back when I had a Mac that could run it) was that it is also a bare-bones app, but one that supports Epson’s product features (e.g. digital ICE). The last time I looked, however, it did not support batch scanning.

SilverFast has a batch-scan feature. You do a preview scan to see what’s on the glass and then you can drag multiple rectangles around the different regions you want to scan. Then when you begin the batch scan, each region will be scanned into a separate file (or a separate editor window if you use the Photoshop plugin). You can, if you like, change the scan settings (resolution, format, color adjustments, etc.) on a per-region basis if you’re so inclined.

The locations of the regions are remembered from one scan to the next, so if each page you’re scanning has the images in the same location, you can just place it on the scanner, do a pre-scan to double-check the region locations and then do another batch scan, without having to manually draw them again.

I believe VueScan also supports batch scanning (via the crop/multi-crop feature) as well. I haven’t used it, so I can’t comment beyond stating that the feature exists.

mpainesyd · March 19, 2023, 8:28pm

Sounds like a challenge for an AI on the web?
But then maybe all those people whose name & picture become data on the web would not appreciate this.

But then again, maybe the pictures are already online!

nello · March 19, 2023, 10:06pm

Here’s what ChatGPT says:

There are several software applications that can be used to cluster and crop individuals from a group picture. Here are some examples:

1. Adobe Photoshop: Adobe Photoshop is a popular photo editing software that has tools for automatic segmentation, clustering, and cropping of images. It has features like the Magic Wand tool, which can be used for automatic selection of regions, and the Crop tool, which can be used to crop the selected regions.

2. GIMP: GIMP is a free and open-source image editing software that has tools for automatic segmentation and cropping of images. It has features like the Foreground Select tool, which can be used for automatic selection of regions, and the Crop tool, which can be used to crop the selected regions.

3. OpenCV: OpenCV is a popular computer vision library that has tools for image segmentation, clustering, and cropping. It has a wide range of functions and algorithms for various image processing tasks, including segmentation and clustering.

4. MATLAB: MATLAB is a programming language and environment that has tools for image processing and computer vision. It has functions and algorithms for image segmentation, clustering, and cropping.

5. Python libraries: There are several Python libraries that can be used for image processing and computer vision, including scikit-image, OpenCV, and Pillow. These libraries have functions and algorithms for image segmentation, clustering, and cropping.

Overall, the choice of software will depend on the specific requirements of the task and the user’s familiarity with the software.

nello · March 19, 2023, 10:26pm

Yes, the Crop Tab appears to have an option to use edge detection for its suggested cropping. But it appears to apply to the entire image, as opposed to breaking out the individual protraits on a page:

Auto

Analyzes the image and uses built-in rules to find the edges automatically. This setting works well most of the time if you want to capture the whole image.

nello · March 19, 2023, 11:42pm

gastropod:

For year book or other photos which are on a regular enough grid, it might be possible to use ImageMagick (open source command line) to crop them out from full page scans using pre-defined values which would include the caption.
https://www.imagemagick.org/Usage/crop/#crop_tile
https://www.imagemagick.org/Usage/crop/#crop_equal

The Split option (described on page 20 of the version 12 manual) in the File → Save As ... dialog of GraphicConverter also seems to do this.

rick2 · March 20, 2023, 1:45am

We already have the yearbook album pages, got them from Classmates.com. The pictures are very closely spaced, with maybe 1/16th inch whitespace (so automating a crop will take careful aim to get the crops done right.)
SO, the data is already “out there”
Thanks!

rick2 · March 20, 2023, 1:47am

We’ve already gotten the public pages of those yearbook scans.
And we’d rather not have to re-scan…My 50-yr old copy of the yearbook’s binding is starting to wear out, and flattening it further for scanning would probably really cause some damage.
THanks!