Roll Your Own Cloud Backups with Arq and B2

Originally published at:

Until now, Internet backups required subscribing to a cloud-based backup service and using proprietary software to handle archiving and retrieval. Now, there’s a DIY alternative: Arq backup software and Backblaze’s B2 cloud storage system.

Great article. I’ve been using Arq for years, and storing to B2 with it since September, with three different Macs. It’s reliable and it’s inexpensive.

1 Like

Since we’re talking about “rolling your own”, what about really doing it by hand? What about a situation where you have off-site ssh access to some box with lots of storage and halfway decent bandwaidth. Is there anything special you’d need to consider if you’d decide to just use rsync to do your own “networked version” of TimeMachine?

If we assume this box has a fixed IP it’s easy. Often that might not be the case though. Then let’s say this is a Mac. Any easy way to exploit FindMyMac so that you could use some kind of generic hostname that would automatically get forwarded to the current IP of your remote Mac? Something like

And assuming that’s not possible, what about exploiting FindMyMac to at least get the current IP of that remote Mac? There used to to be the free OpenDNS with its DNS forwarding daemon, but of course that went all commercial so there’s no more free option there. I would assume FindMyMac must allow doing this some way or another…

No offense to any TidBITS reader, but that’s really beyond the scope of the publication. We have some number of readers for whom it wouldn’t be a big deal, but it’s the kind of thing that could spiral out of control to provide the documentation for, and it’s really more of a Unix-style solution.

Back To My Mac (not Find My Mac, which is opaque to users) has a lot of tunneling and reliability issues that I believe have led people in the past trying to build remote-connection and other services that determined this address to halt development on those products! I remember some AppleTalk bridges, for instance!

I also think there’s an issue of having a graphical front end and being able to use reliable third-party software that’s automated. I just don’t want to recommend generally to people to work at the bare-metal level. And there’s a fair amount of advice on this all over the net, if you’re looking for it.

1 Like

I’ve been using Arq for about a year (local and cloud backups) since Crashplan started becoming problematic, which was a few months before they abandoned non-business users.

Initially I used B2 but found it somewhat unreliable and would suffer timeout issues.

I switched to using Arq/Wasabi instead and all those issues went away. Cheap too.

Another example of Apple here Apple is missing in action, particularly as stated that Apple has all the components in place. But then Apple has never really got the Cloud as evident by its past web storage and services mishaps and mistakes.

Apple, once a leader in innovation and exploration, now sadly trails the Amazons and Googles.

Glenn, I wonder why, although you mention Amazon Drive in the context of unused capacity, that your piece elevate B2 above it (it is in the title, after all), as Amazon Drive seems like a relatively more straightforward thing for many in TidBITS readership to set up, and is, as I understand it, the same cost regardless of whether or not there is storage capacity unused in the $60 tier or not (ACD is $60/TB/yr = $5/TB/month vs. B2 at $0.005/GB per month = $5.12/TB/month).

Thus, is there another benefit of B2 that causes you to put it first? Durability, transfer speeds, etc…?

A bunch of reasons, some mentioned, some implied.

  • Not everybody has Amazon Drive or wants to pay for that, and B2 is on demand.
  • We don’t know Amazon’s future intentions with the drive product and the ability of Arq to use it for backup. They might ban such a purpose (it’s not an included feature in the offering).
  • B2 is on demand. If you only need a few hundred gigs and you’re not paying for 1 TB on Amazon Drive, its price doesn’t make sense if you’re frugal.
  • B2 is designed to be robust cloud storage. Amazon Cloud Drive is designed for consumer usage. I haven’t done head-to-head speed tests, however.
  • People may not trust Amazon for data storage, even if it’s encrypted and under their control.

ARQ has a listing of various destinations and prices at the bottom of a suggestion of which destination is best for you here:

Two things about the question of Amazon Drive vs. B2 (or Wasabi, or AWS, or Google Cloud Storage). Glenn is right; with designations like B2, you pay for what you use rather than a full TB, like with Amazon Drive. I do not use a full TB, so I pay about $3/month for B2 storage. Also, you’ll see in that table that B2 is listed as “Best” speed, Amazon Drive as “Better” (though I haven’t really measured the speed in any sort of test). I should also mention that I have Office365, which comes with 1 TB of storage for the same $70/year fee, of which I don’t use that much file storage, sot I do use it as another ARQ destination so that I have another online location (not of exactly the same folders, but really my most critical files. For example, I don’t back up my iTunes library there because even if B2 fails, I can use iTunes Match as an emergency restoration for that if it comes to that.)

Oh, and there are transmission fees with B2, AWS, and Google Cloud Storage, which you don’t pay with Amazon Drive, OneDrive, Dropbox, etc., but they are low. (B2 charges fees to download at $0.01/GB, so you’d pay when you needed to restore something, plus small transaction fees for writing, deleting, etc., files.) So, the storage of a full 1TB would be slightly higher than 0.005/GB/month, as you’d have to pay for actually writing the files onto storage. B2 seems to compare well with AWS’s inexpensive Glacier storage without the download rate limiting that Glacier does when you need to restore. Also, one more advantage of B2 is that, like Crashplan used to do, you can have them put a snapshot of your data on a hard drive and mail it to you for a fee, which I don’t think Amazon would do with Amazon Drive, if you truly had a disaster and wanted to get a local copy of the data (I believe it is sent FedEx next day.)

Lastly, with the online drive sync services like Amazon Drive, you have to remember to customize options to prevent downloading/syncing all of the folders used for backup data to all of your synced devices. It’s not a huge deal, but you have to remember to do it.

This is harder than you might think, and there’s a lot of things to consider that don’t seem obvious at all when you start.

The first is that there are many files on a UNIX system that you cannot simply copy and expect that copy to work. Mostly database files that are open. So if you sync your /var/db/ you have what you thin is a backup, but is probably not.

Second is versioning. This is doable in rsync, but not easy.

Third is making sure that the local drives are always mounted (and yes, this is a problem on Mac OS more often than you’d think).

Those are the main issues, but there are others (privacy? Are you backing up users mail? How do you secure that?)

I do this myself, backing up my servers via rsnapshot and running some scripts on the server to dump backups of the databases in a format that can be backed-up and scripts to do many other things, but it is not something I would recommend someone try to do as it took me many months to figure out how to get it all to work well, and years later I am still tuning it.

And, of course, this is all shell script unix stuff where the Mac is nothing more than another bash session, there’s no GUI. No Mac-goodness. It’s just Unix all the way down.

(My servers get back-up to my home computer which gets backed up to backblaze with an encryption key so the files on backblaze are secured).

But for remotely backing up a Mac something like what Time Machine does? There just isn’t a roll-your-own solution that works. Hell, having a Synology or Drobo on site mostly doesn’t work because of the frequent “Must discard time machine backup and create a new one” issues which are far too frequent. This even happens with a local dedicated drive, but far less often.

1 Like

I’m not sure how frequently they keep that updated, but it’s a good guide. I picked B2 as it’s the best choice for most readers, as it’s intended for API-based cloud storage, it has no tiny transaction fees (only storage and download fees), and doesn’t require any real sorting out. Its Web-based front-end is also simple. Amazon S3 is baroque and Google Cloud is complicated but better. (Also, it’s Arq, not ARQ—not an initialism or an uppercase name.)

The tiny fees for GET, PUT, and other operations add up to almost nothing for backup operations (as opposed to continuous interactions with stored data for other purposes, where they can be meaningful).

I can’t find that option for B2. Can you point to an article? They do have an (expensive) upload option for up to 70 TB via a rentable storage unit.

I realized yesterday it’s actually a lot easier than I first thought.

Basically, it only required I set up port forwarding for AFP/SMB. Then the ssh session opens up a tunnel for the remote mount of a disk hooked up to the remote system via afp://localhost:forwarded_port. Then TM to that. Done.

The only thing I haven’t got figured out yet is what to do when the remote IP changes. I have other means of checking on that, but in general I’d think that there must be some way to use Back To My Mac (yeah, that’s what I should have written) as a DNS forwarding service.

It’s on their pricing page: Cloud Storage Pricing Comparison: Calculate Your Costs (under “data by mail”). Also, just saw this: where they say that returning the drive makes the process free (except for shipping) - though it also says that it’s a trial program.

I also just found it here:

7 Get Your Files from B2 by Mail

You have a choice of how to receive your data from B2. You can download data directly or request that your data be shipped to you via FedEx.

That is extremely neat.

Weird, I’d searched their help files. Thanks!

So, B2 promises 99.999999% durability and is in a single data center in Sacramento, CA and Google Cloud Drive promises 99.999999999% and is spread across multiple data centers.

1000x more reliable sounds good in theory, although perhaps the solution for reliability isn’t a more reliable single cloud solution, but rather, a second, entirely redundant (different software, etc…) cloud solution.

Since this is the reliability of a backup, not of the only copy of data, I can’t see it mattering much. The odds of something happening to a backup that has 99.999999% durability at the same time you need to restore data would seem infinitesimally low. And of course, the Internet backup shouldn’t be your only backup, so something would have to happen to both the original and your local backups, plus the Internet backup.

None of my data is so valuable that I’d go beyond three backups (bootable duplicate, local archive, remote archive). Others may have different opinions. :-)

1 Like

Exactly! Also, I presume based on Backblaze’s history (and that of Amazon S3 and other cloud providers) that their backup of my data is probably 1,000x more reliable than any local backup I make! They do redundancy in such a way that they’re doing backups of my backups.

If the remote backup is remote enough, that’s true. But if I lived in Northern CA, I’d a couple of qualms about the ‘remote’ backup being with BB in Sacramento. Sacramento has a lower earthquake risk than the rest of CA, but it’s not immune. There’s also risk of volcano–Shasta is about due to pop off as are a few others. Ash is amazingly nasty and if the winds are going the wrong way it could easily mess up all of N. CA.

At least BB says where their servers are. Amazon Drive and the other consumer ones I looked at don’t. Probably back east somewhere, but after the disaster is the wrong time to find out for sure.

Is anyone else disturbed that Backblaze’s only option for two-factor auth is insecure SMS? I created a free test account a while back, but couldn’t even play with it because it demands a phone number and sms before letting you have access. I might be able to tolerate that once, but as a second factor, it’s not only insecure (NIST says flat out not to use it), but since I’m anonymously prepaid, if I lose the phone, I’ve maybe permanently lost that phone number and the second factor.

Does Wasabi do something more sensible? I can’t find anything in their help.