How much cache space do your backup clients burn?

Hi guys!

Question for you.

How much cache space does your backup software burn on your local drive to keep track of what has been backed up?

I’m using Arq, and mine is >50GB right now. The cache can be found in:

/Library/Application Support/ArqAgent/cache.noindex

According to the developer, this size is based on the number of files and the number of backup records, where one backup record appears to log each time it backs up, noting that it retains historical files, and thins aging backups over time.

He also confirmed that bundles like a Photos Library treat each enclosed file as an individual. That’s preferred for incremental backup efficiency, but clearly will make this cache much larger, especially for someone with 115,000 photos and videos in my library.

So I did some math to see if 50GB was a reasonable size by counting the number of files of the two directories I backup:

sh-3.2# pwd
/Users
sh-3.2# find . | wc -l
1278154 (x44 backup records)

sh-3.2# pwd
/Volumes/Photos Image 2
sh-3.2# find . | wc -l
402696 (x41 backup records)

Multiplying:

files x backup records = 72749312

Dividing this into 50GB, gives me about 690 bytes per entry. I don’t know what goes in there, but that seems a little bloated to map an inode, a backup record ID, and possibly a date/time. But I’ll grant there’s plenty I don’t know here.

But I’m curious what size cache you guys are harboring? The much more challenging question will be if you can normalize that size by the number of files you have (and the number of backup records, if your software works the same way). My little shell commands could help you arrive at the former, if you feel adventurous!

I’m also using Arq.

On my MBA, 22.3 MB. (I don’t back up much from this machine, obviously.)

On my Mac mini, which backs up almost everything I need (synced from my two MacBooks and my iMac) - Desktop files, stuff in iCloud Drive, and Sync.com (Dropbox substitute for me), and Desktop, Documents, Downloads, etc., plus all of my Music Library: 1.01 GB.

On my iMac, which does back up my Photo Library (and my wife’s, plus her home directory folders): 10 GB.

I ran across this issue when I found my /Library/Backblaze.bzpkg was over 30GB. I talked to BackBlaze support about it and was able to reduce it by deleting some of the content of the package folder, but there was no full solution except to “repush the data” which is the Backblaze term for a fresh upload of your data, which I did. Immediately after doing that it was about 2GB.

As you say it contains a complete record of transactions, so its size will depend on how much data you have and how much you move it around, rename it etc. I have 3TB and I have a bad habit of frequently renaming, moving, reconfiguring!

The BackBlaze.bzpkg file on my M2 MBA is currently 11GB.

Another Arq user. The size of my cache.noindex directory is 38GB. I’m backing up a MBP M1 Pro, primarily the /Users directory plus an external SSD which has mostly movies and my Photo Library. My /Users directory has 2,691,574 items in it, and my external SSD has 42,073.

Thanks for the replies.

Mine is up to 56GB.

A few weeks back, the dev said:

In Arq 7.12 (shipped Dec 17, 2021) we changed the way Arq combines small objects. Before 7.12 it combined all smaller objects into “pack files”, and it cached those pack files locally to reduce the overhead in reading items from within the pack files.
As of 7.12, Arq stores very small objects in “pack files” and caches those locally; it stores slightly larger objects in “large pack files” and does not cache those files locally. This usually results in much smaller cache sizes while still making things like browsing backup records pretty efficient.
If you have backup records created by Arq prior to 7.12, the size of the cache might be larger than if you started the backup plan with 7.12 or later.
If you’re not happy with the cache size, you could try creating a new backup plan (and eventually deleting the old one); the cache size resulting from that new backup plan will probably be smaller.

But I’ve been on 7.18 for a while, and my current backup to Dropbox just started this past October. 7.12 is a few years old. So I’m apparently already on the “much smaller size” cache files.

Oh well, I guess it’s the cost of doing business.

One wonders if 1) I were doing a full backup, including /Library, and 2) the cache were a tad less efficient still, if Arq would end up in a positive feedback loop and never finish backing up :sweat_smile:

Worth pointing out that, after he told me that one of the factors in the cache size was the number of “backup records”, I lowered Monthly Backup retention down to 24 (before it was something higher, maybe 36 or even 60):


Those older ones should have gotten lopped off. But I have not seen any reduction in the size of my cache.

Update: Okay, it just hit me (the way it only hits me after I’ve published something online) that since this round of backups just started in October, lowering monthly backups to 24 months would not discard anything. And in fact, if growth is linear, I may be looking at 4x the cache growth over what I have already!

With Arq 7 on Windows, the cache routinely grew to over 100 GB – over 30% of the used space on the drive was Arq cache. I would then have to delete the cache to keep it from using up the drive space.

I think it was because Arq would keep flipping from backing up changed files to backing up every file.

And I suspected that had to do with either OneDrive Files On Demand (which uses weird NTFS attributes for state tracking) or Arq running when the user was logged out (or both).

But Arq support couldn’t figure it out; they wanted me to tell them why Arq didn’t work correctly. So after 4 months of fruitless emails, I gave up and reverted to Arq 5.

I don’t have these problems with Arq 7 on macOS.

Interesting field notes!

I’m not convinced it couldn’t be cached more efficiently. Believe me, Google doesn’t have a data to index ratio like that. I have often offered to help him technically (for free) but he seems to have no interest in benefitting from me.

Oh well. I don’t feel Arq is a long term solution but it has been working for me.

1 Like