Apple Network Failure Destroys an Afternoon of Worldwide Mac Productivity

Most of the contacts with Apple servers are from macOS processes, not their apps.

Yes, that has proven to be something of an over-reaction from many users and some misinformation from a self-described security expert, but also a wake-up call to Apple over privacy concerns they appear not to have considered. I personally went though the entire 6 hour period without noticing it, probably because all the apps I was using were already open before the problems started. I was more focused on problems trying to download the Big Sur installer that appears to have been the root cause of this other issue.

1 Like

Hello @ace
For the less tech savvy TidBITS members, could you take a few minutes to explain what you mean by “apple network” (as opposed to iCloud etc), and how central problems with Apple can affect individuals working with stand-alone macs ?
thank you !

This happened in the middle of our 4 day cycle of sending pages to our printer (we are a magazine publisher). A few days later I took advantage of what happened and suggested to our Editor that perhaps we shouldn’t keep allowing work to be done later a later. We are now always past deadline if you take our old deadlines that everything should be completed the day before it ships. I just said we now have so little slack that something like this could lead to us missing a printer’s deadline. Just making Lemonade from this lemon.

I also used as an example of our VPN failing a couple of days after deadline.

2 Likes

It will require a bit of hand waving, but in essence, Apple maintains an extremely complex and distributed network infrastructure (and iCloud is part and parcel of it). This infrastructure supports everything your Mac does that involves Apple, which might include checking for updates, checking the time, verifying certificate revocations, syncing data via iCloud, distributing two-factor authentication data, and much, much more. You can use a Mac offline for a while, but what you can do it with it will be somewhat limited, and after some period of time, it will probably want to connect again. That’s not really a major problem, though, since most of what most people want to do with their Macs involves online access.

We might think of this as “Apple’s network” or a particular service as running on an “Apple server,” but that’s far from the truth. Apple operates multiple data centers around the world and has systems that spread the load from all this traffic among those data centers and other content delivery networks like Akamai.

I hope that helps a little—there’s no way to be more specific without having inside knowledge of Apple’s network setup and operations, and the number of ways that a Mac might not function as expected if it were kept entirely offline for a long time is quite substantive.

2 Likes

VERY interesting. I am frankly astounded. Thanks very much @ace for your time and explanation.

Last question and I will leave you in peace. Is this a potential source of hacking ? Could someone basically “cripple” all Macs which seems incredible ?.

Of course anything is possible, but there are multiple layers of protection against such an outcome. Most all of these Apple networks are secured with three layers of certificates with the purpose of guaranteeing that you are connecting to the intended server operated by Apple or one of their trusted vendors. One or more of those certificates would be immediately revoked as soon as any issue involving them was discovered. And such servers located in physically secured facilities.

There was just an example of this system crippling Apple users with HP printers when HP mistakenly asked Apple to revoke certificates of all it’s print drivers. Although Apple was able to fix that issue rather quickly, some users are still trying to recover from it.

And, of course, this discussion points out another example of how things can go wrong, even when security has not been compromised. Not a complete crippling, but many complaints from users during the six hours or so that productivity was impactacted.

2 Likes

thank you for a very instructive answer ! greatly appreciated

@alvarnell’s answer is spot on, and my only addition is that Apple’s network infrastructure is among the most complex on the planet, and Apple has undoubtedly hired top network and security people to oversee it at all times. That’s not to say that mistakes like this one can’t happen—it’s difficult to predict failure cascades in situations of unprecedented load, or it could have been related to a hardware or network connectivity failure that also cascaded to other systems.

But you have to assume that Apple has one of the largest targets in the world painted on its back, so Apple systems are almost certainly under constant probing/attack from everyone from individual hackers to organized crime to government intelligence agencies (possibly even including ours). I don’t know anyone at Apple in that area, but I do know someone who used to work for Google in such a role, and my understanding is that it’s a non-stop battle to protect systems against such attacks.

So, as Al said, nothing’s impossible, but it’s not like Apple is a babe in the woods here.

You and me both.

I suspect the drive was in the process of writing when the failure began, and the drive in bay 3 was the unfortunate victim of a timing anomaly.

Who would know for sure? BUT: The SoftRAID logs bear out that it happened at the exact time that Apple’s service went down. Very thankful that it was just one drive.

OWC is replacing that drive. I shipped it off Wednesday, and am crossing my fingers that the three remaining drives in the RAID unit all do perform well over the next 36 hours while I edit this weekend’s program. RAID 5 allows one drive to be pulled, and the remaining three contain enough information to recreate what was on the missing drive. But it can’t do that if another drive fails in the meantime.

My redundancy move will be to purchase a couple of spares…better quality and larger capacity, to support an eventual upgrade to the unit. Adding a larger capacity drive will not actually expand the capacity of the array until all four drives are replaced, so I’ll do that over time.

I’m also looking at what category of files on the array could be considered archival, or reusable resources. Most of them are critical on the weekend they are produced, then their value is gone because in my work we don’t do re-runs.

Sounds like the “just-in-time” framework that spread from manufacturing to…well, everything. When applied to a parts inventory for, say, a vehicle assembly line, it was said to be very efficient. But as soon as something goes wrong (say, one part is delayed in delivery), the whole line is crippled.

I spent some time in newspaper production, in both editorial and graphics. Some deadlines could be bent, a bit, if it was important, but there was a big cost for that.

Good for you in seeing this unfortunate event as an opportunity.

So I can’t help wondering if Apple, trying to fix some underlying problem from November 12, has caused a new problem. Because apparently a lot of people who downloaded operating system installers — High Sierra and above – are having trouble getting App Store apps to launch. I just upgraded a client’s laptop from Yosemite to High Sierra and now every App Store app throws up a codesigning error. For an example, see Apps from Mac App Store crash when starte… - Apple Community

Code signing would seem be involved here, but I doubt it’s related to the OCSP issues discussed in this article. There’s a thread on it at:

1 Like

Howard Oakley talks about the history of Apple’s code signing approach here:

1 Like

There’s a new solution that appears to work by rebuilding a cache:

Hope this helps everyone:

  1. Open a terminal
  2. run the following command
sudo rm -f /Library/Keychains/crls/valid.sqlite3

I haven’t tested, but I think a reboot is not even necessary. And this does survive a reboot.

It looks like this file may be used by “trustd”. So worse case scenario, this will rebuid trustd’s cache, so that should not affect the security of your system.