AppBITS: Hyperspace Reclaims Space Used by Identical Files

ddmiller · March 25, 2025, 11:17am

Files stored in iCloud (and other online sync services) are ignored. From the documentation:

Any file that is backed by a cloud storage service (e.g., iCloud Drive, OneDrive, and recent versions of Dropbox) in a way that the app can detect is ignored.

Most likely this is because of optimized storage: there is no way to determine if a file located in iCloud is going to be optimized while the app is running.

bit_of_a_tid · March 25, 2025, 9:10pm

It seems that, unlike Gemini2, Hyperspace doesn’t scan a whole system of drives at one time. Is this true?

ace · March 26, 2025, 2:01pm

Yes, you have to select a folder. The way Hyperspace works, duplicates can only be found on the same APFS volume, so it makes some sense to limit it to one drive at a time.

bit_of_a_tid · March 26, 2025, 7:48pm

Thanks for the reply. That’s quite a shortcoming, being limited to one drive – at least if you’re like me and have 10s of thousands of photos on 7 external drives. 'Tis a challenge!

ace · March 26, 2025, 8:54pm

Indeed—I’m not sure what the best way would be to find duplicates across a set of seven drives. I’m not sure Gemini can do that either.

josehill · March 26, 2025, 9:32pm

It’s an interesting tool, for sure, but it seems to me that it is best viewed as a useful complement to traditional de-duplicating tools, which can work across drives and perform other functions.

FWIW, I have 1.5 TB of files on a 2 TB drive, and Hyperspace found around 50 GB of identical files in my home directory.

ace · April 1, 2025, 9:56pm

3 posts were split to a new topic: Python script to analyze the iMazing library

ace · April 7, 2025, 6:03pm

More on APFS clones from Howard Oakley.

jeff1 · April 7, 2025, 9:29pm

I’m intrigued but still puzzled by Hyperspace, how it works, and how it would meet my needs. When I write an article, I usually store a series of copies so I can go back later to retrieve information that I had edited out earlier. I also save copies of documents I use as references, and may mark up one copy of the document and keep a second copy clean if I want to ask an outside source what they think of the research. How does it decide whether documents are identical? Does it do a bit-by-bit comparison or use titles or some internal code? Most the documents I work with are text or documents (e.g., copies of published scientific papers), but I also have photographs that might be duplicated accidentally.

ddmiller · April 7, 2025, 9:51pm

That’s all in the FAQ: (posted in Adam’s article, but it’s here: Hypercritical: Hyperspace)

tl;dr: First it compares file sizes. If they are the same, then it calculates three different crytographical hashes for the file data (and three for the resource fork, if there is one.) If those also all match, then the files are the same.

gingerbeardman · April 7, 2025, 11:38pm

These would not be considered identical.