Performance of Backblaze vs. Arq Backup

Hi folks,

I’ve been using Arq for a number of years since CrashPlan’s demise. Generally it has worked fine. But it seems to have a performance / optimization problem. A typical incremental run can take 10 hours, thought it varies widely. Depending on when the runs take place, the iMac can effectively be backing up ALL THE TIME. And in spite of the options to Throttle Disk I/O and other resources, I almost always have to pause it to keep my iMac from being unusably slow.

Here is a sample run with I/O throttling DISABLED:

20-Sep-2021 08:00:05 EDT Backup activity started
20-Sep-2021 08:00:05 EDT Arq version 7.7.2
20-Sep-2021 08:00:05 EDT macOS 11.6
20-Sep-2021 08:00:05 EDT Storage location: AWS (S3:/akiauenjtrxfwvdeqwuf-arq)
20-Sep-2021 08:00:05 EDT Backup plan: Back up to AWS (D2758EBB-E649-4832-AE6D-XXX)
20-Sep-2021 08:00:05 EDT Preventing computer sleep while backing up
20-Sep-2021 08:00:31 EDT Creating APFS snapshot for Macintosh HD - Data (/System/Volumes/Data)
20-Sep-2021 08:00:31 EDT Created APFS snapshot for Macintosh HD - Data (/System/Volumes/Data)
20-Sep-2021 10:18:43 EDT /Users: Created a new backup record.
20-Sep-2021 10:18:43 EDT /Users: 482.871 GB, 1,067,779 files backed up
20-Sep-2021 10:46:46 EDT /Volumes/Photos Image 2: Created a new backup record.
20-Sep-2021 10:46:46 EDT /Volumes/Photos Image 2: 1,391.454 GB, 369,503 files backed up
20-Sep-2021 10:46:46 EDT Total scanned: 1,874.325 GB, 1,437,298 files
20-Sep-2021 10:46:46 EDT Total uploaded (compressed): 2.371 GB, 2054 files
20-Sep-2021 10:46:46 EDT Thinning backup records according to backup plan settings.
20-Sep-2021 10:46:47 EDT Thinning backup records: Deleting backup record 2021-08-19 09:21:10 +0000
20-Sep-2021 10:46:48 EDT Thinning backup records: Deleting backup record 2021-08-19 06:08:51 +0000
20-Sep-2021 10:46:48 EDT Removing unreferenced data
20-Sep-2021 10:47:39 EDT Total stored size before cleanup: 2,029.852 GB
20-Sep-2021 18:08:55 EDT Total stored size after cleanup: 2,027.527 GB
20-Sep-2021 18:08:55 EDT Removing APFS snapshot for /System/Volumes/Data
20-Sep-2021 18:08:55 EDT Removed APFS snapshot for /System/Volumes/Data
20-Sep-2021 18:08:56 EDT Backup activity ended

It has a “thinning” process which keeps hourly backups for 24 hours, daily backups for 30 days, etc. But when it actually goes to purge the linked data blocks, the process I believe they call “cleanup”, it spends an inordinate amount of time. (CPU and Network and memory do not appear to be busy during this operation).

I have been working with the developer, but I’m not optimistic he will be able to fix it.

So now I’m considering alternatives, and BackBlaze appears to be the favorite child these days. Yes?

As such, can anyone provide performance data about how BB handles, say 1-2TB of data into the Cloud? (Eg, how many hours a day is it running, how does it affect system performance, etc?)

Thanks in advance!

Dave

I started using Backblaze again a few months ago. My full backup is about 7.5 TB. The initial backup took about three weeks, but that wasn’t continuous and was mostly throttled. I helped keep the frequently-changing data current by excluding paths with large, unchanging files initially and then slowly adding them into the backup as each one completed.

I haven’t noticed any performance hit on my 2019 16-inch MBP during backups. During the initial backup, I had to throttle the backup during the day in order to preserve bandwidth for my wife, who’s working from home and spends a significant amount of time on video conferences, but once that completed, I haven’t had to throttle it at all.

Based on the performance I get now, with continuous backup and no throttling, I’d guess that had I let the initial backup run unimpeded, it probably would have completed in about ten days.

Thanks, this is helpful feedback, especially from someone with a larger backup!

Yea, so I really don’t care too much about the initial full backup, since I expect those to take a long time. My real focus here is on the incremental backups which should be fairly speedy, and which Arq is slugging on.

So you say you don’t have to throttle, which is great. But your Mac is newer and other variables may be at play. So, do you have logs that show how LONG each incremental backup takes? And maybe how many times a day they run? That will help me understand better how BB compares to Arq.

Also, since I’m asking, what storage network do you use? I recently ditched Wasabi in favor of AWS. I’m not sure that this would matter, but it could.

Thanks!

I do 4x daily incremental backups using Arq, to AWS, and don’t experience any performance problems. My feeling is there is a lot of moving parts in any backup strategy so it’d take a lot of digging to find where the bottleneck is definitely occurring.

Thanks for sharing that.

Are you using Arq 7? Do you mind sharing a typical log so I can compare? You will find it under the Activity pane on the left sidebar.

Also, which AWS Storage Class do you use? I’m using Deep Glacier Archive because it’s cheap and I’m not worried about slow restoration time. They have not said that there is any throttling of upload, but I can’t help but wonder.

Dave

Yes, Arq 7 to AWS Virginia. I’m using Glacier storage class (not Deep Archive).

Thanks for the feedback.

Yea, it’s tempting to set up a separate storage class that’s not Deep Archive and see if it behaves differently. It’s cheap enough to experiment. But my time, alas…

I’m still interested in seeing your log if you’re wiling to share it. For starters, how much data are you backing up? Do you have thinning enabled?

Backblaze doesn’t do incremental updates, it does continuous backup. So you can’t really compare the two. I use Backblaze as well, and I’ve never noticed any performance hits.

But you do have to be careful because you could end up with a short window of time (up to 2 hours) before a file is backed up. See Glenn’s just-published article:

Thanks for the reply and he heads up!

Is this a real technical difference or mainly a marketing term? Does “continuous” perhaps just mean “incremental backups done as frequently as we can manage without affecting your system performance, but every 2 hours at the minimum”?

In that regard, we could probably compare with Arq by choosing a comparable interval.

More likely, I’m suspecting the thinning process of Arq (see my log). I recall that BB is stricter about retention of old files. Perhaps that allows BB to lop off aged files more efficiently than Arq, which burns hours doing so. But I’d have to get more under the hood with BB architecture to know…

Given that Arq seems to be taking so long as to be running all the time, I’m not sure there is much difference. Except that we never notice Backblaze working, whereas you say the Arq makes your Mac unusably slow.

I presume that Backblaze’s thinning happens on the server side, so it shouldn’t affect local processing at all. Perhaps @yevp can clarify.

I use both Backblaze and Arq on several Macs, and never experienced any performance problems with either.

But if I am not mistaken, there is a fundamental difference between the two apps.

Backblaze operates with what I would call a client app on the Mac side, which communicates with a Backblaze server app that takes care of things on the storage side, in their data center.

Arq is a Mac app that does everything, using the storage location you assign.

So imho, when it comes to performance and run times, it is really apples and oranges.

Thanks for touching on this, which is the next area I wanted to cover! That is, whether there is a client/server architecture for Back Blaze.

I never studied BB at any length, save the articles here on TidBITS over the years. But I THOUGHT BB used to have an option to let you bring your own storage, but that some time later they offered their own B2 storage to form a all-in-one offering. Is my memory wrong about that? So…

  1. does that mean you can no longer bring your own storage to BB?

  2. and does BB, in fact, use a client/server architecture, where some processing is done on the server side? If so, that has the potential to be a huge performance benefit. You can see from the Arq logs that Arq is spending countless hours clearing blocks that have been freed by the thinning process. I have a hypothesis that each of these blocks may be incurring a round trip from Mac to AWS to be freed. But having tens of thousands of operations execute locally on the server, rather than over a WAN, can behave much better.

Do you use the thinning options in Arq? People who don’t use the thinning options may not realize why their performance is satisfactory. But thinning is an essential feature to me.

I have been using both for quite a few years. I do not remember that you were ever able to bring your own storage to Backblaze. That was possible with CrashPlan though.
I do use thinning, and sometimes it takes a long time but never slows my Macs down in any noticeable way.
With BB I do continuous backups, with Arq l do daily Backups at 4am. I do not throttle either in any way.

Thanks for that info!

Curious, are you ever up at 4am to observe it running, or is it usually done before you wake up?

Arq runs at 4am by pure precaution to avoid any interference with other tasks.

Here is some real data from one of my laptops running Arq this morning:

Arq version: | 7.8.2 |

  • | - |
    Start Time: | 28.09.21, 04:04 |
    End Time: | 28.09.21, 04:11 |
    Errors: | 0 |
    Scanned Bytes: | 0 |
    Scanned Files: | 0 |
    Uploaded Bytes (compressed): | 431,2 MB |
    Uploaded Files: | 295 |

That should give you a general idea. It usually take minutes rather than hours, but even if takes the whole day, which happens, I do not notice my laptop slowing down.

What type of problems are you trying to anticipate?

Backblaze used backup is unlimited storage used exclusively by their backup client software. Backblaze B2 cloud storage is separate.

B2 is pay-for-what-you-use cloud storage that’s not just for backups but also for hosting files; it’s comparable to Amazon S3 standard or Wasabi. B2 has a command-line client but it provides an API so anyone can write their own client, Arq can use it as a backup destination.

Looking at your Arq logs, it takes over two hours to create a backup of /Users but your Photos drive takes only half an hour, even though it’s almost three times as much GB. That’s likely because that drive has about one-third as many files, 369K vs. 1,067K, and probably far fewer that have changed between runs.

I see Arq creates an APFS snapshot of /System/Volumes/Data but it doesn’t show it being used. Just going by the logs, it shows it’s backing up the active /Users directory, not from the snapshot, meaning the backup client has to also deal with files that are locked for use by other processes. Regardless, the backup of data from your boot drive is likely involves the most disk I/O and most noticeably affects performance.

I don’t really know the details of how Arq works. If it locally stores a catalog of everything in each backup, most of the 7+ hours for “cleanup” is probably sending Delete API calls for each file in Amazon S3 Glacier Deep Archive and waiting for a confirmation response. Since a user profile contains lots of files that change or are ephemeral, there are probably many.

Something to note about selecting backup storage is Glacier and Glacier Deep Archive are meant for files that are uploaded and untouched for months or years, both have a minimum storage costs. Glacier is $0.004/GB/month with a 3 month minimum, delete a 1GB file 1 minute after upload or exactly 3 months after and the storage cost is the same, .004 * 3 = $0.012. Deep Archive is the same but the minimum is 6 months so for 1GB it’s .00099 * 6 = $0.00594.

Compared to storing 1GB on standard S3 for 3 months (.023 * 3 = $0.069) or just 1 month (.023 * 1 = $0.023), both Glacier levels are still substantially cheaper even if you’re mostly paying for storage time you didn’t use.

But with standard S3 there’s no minimum time (or if there is, it’s like 1 minute) so a file that’s kept for only 1 week is cheaper than a file on Deep Archive 0 to 6 months (.023 / 30 * 7 = $0.00536); a file that’s kept only for 1 day is much cheaper (.023 / 30 = $0.00076).

In a user profile, I’m sure the large majority of data changes infrequently but it’s harder to say what percentage of files change on a weekly, daily, hourly basis.

Thanks for your analysis! It’s very close to my thinking.

Yes, and if it’s block level backup then there may be many times more round trip API calls than the number of files.

I’m tempted to try a storage class other than Deep Archive in order to see if that’s the variable that makes mine run slower than apparently everyone else’s. But I still don’t get why a slow run would result in poor system performance, given that no one else has the issue.

I use Arq7 and have since it came out (Arq6 was not a very good app). I am not noticing performance issues. After a restart my MBA will run some tasks slowly (namely, video playback), and Arq is usually high on the CPU list, but once things settle down, Arq is fine.

I have four backup sets that run hourly on my MBA: one to my Mac mini using FTP within the network; a small one to OneDrive (sine I am paying for 1 TB of storage that I don’t come close to using otherwise) for critical files; and two to B2 - one with a set that matches the one that goes to the Mac mini, and one just for my photo library. I do use thinning for all of the backup sets.

The Mac mini has a set that backs up some critical files to B2.

And the family iMac (that rarely gets used these days) backs up all user home folders to B2 as well.

I haven’t used Glacier or AWS with Arq for years now. I switched first to Google Cloud Storage and then to B2 a few years ago. Actually, backing up my music library and photo library to Glacier would probably be a good idea.

Yes thanks. It’s complicated!

The AWS bill has been super tiny. It’s crazy.

Archive retrieval for S3 Glacier Deep Archive is “within 12 hours”. (48 hours for bulk). I wonder if the backup operation requires several or many small retrievals that are throttled, making this process take forever.

Maybe I shouldn’t have cancelled my a Wasabi account…

Maybe I’ll try regular Glacier and see if that speeds things up.

Because really, I have nothing better to do with my time! :sweat_smile: