The Case for ARM-Based Macs

Charles3 · June 14, 2020, 9:12pm

The chip foundry issue is the problem. Intel is barely capable of fab at 10nm while the A processors made by TSMC have been at 8nm for years. They did an early release of 5nm A14 chips to Apple, I heard they’re working on 4nm.
This is what happened to Motorola, and why Apple switched to Intel.

al45tair · June 14, 2020, 9:53pm

You can do this on ARM too. Well, maybe not VMWare, but Docker and Kubernetes work just fine on ARM (there are, to be sure, some images on Docker Hub that are x86 only, which is kind of annoying, but it’s usually easily fixable). Also, if you need it, Docker can actually run x86 images on an ARM system — or, as it happens, vice-versa — courtesy of qemu.

(I suspect VirtualBox might exist for ARM Linux too; I haven’t checked, mind.)

al45tair · June 14, 2020, 10:14pm

ARM is by far the most popular processor family in the world.

Strictly speaking I’m not sure that’s true. I think I remember reading that, in fact, the lesser known ARC platform is more popular, because it’s embedded everywhere. (ARC also has an interesting history, for those of a curious persuasion.)

Whatever, ARM is certainly a very popular platform.

Users of the iMac, and especially the iMac Pro, probably want a more powerful processor than any ARM chip shipping now

I doubt it, actually. Ampere and Marvell (previously Cavium) are both shipping fast, high core count variants of 64-bit ARM that are certainly competitive in terms of overall performance with Intel’s Xeon chips. They might not be as fast per-core, but they make up for that with the large number of cores.

Apple’s 64-bit ARM cores are certainly faster than Marvell’s (I haven’t tried Ampere’s silicon, but I have run things on ThunderX2), and more on a par with Intel’s; if Apple wanted to make a high-end chip with a large number of cores, it could certainly do so, and the result would likely outrun the existing players in this space — not that that would bother them, because it seems unlikely that Apple will compete in the server market, which is what Ampere and Marvell are really aiming for.

It’s certainly possible for Apple to develop a competitive ARM processor—the only question is how long that would take.

For the high core count part of the market, this is the million dollar question, and I suspect we’ll see the answer fairly soon.

ron · June 14, 2020, 11:23pm

At least by some performance measures, current Graviton2 ARM chips outperform Intel’s Skylake and Cascade Lake processors on a per-core basis:

I take issue with some of the assumptions the authors make, but there’s no doubt that ARM-based chippery is capable of impressive performance.

al45tair · June 15, 2020, 5:07am

Indeed. I should probably have written single-core, which is really what I was talking about. Which is faster in multi-core is going to depend somewhat on the workload, but there’s no doubt the current ARM offerings are very powerful and give extremely good bang per buck.

I’d love to see a high core count CPU based on Apple’s performance cores; that would really be something.

JakeRobb · June 15, 2020, 12:48pm

That’s good to know. I wasn’t aware of QEMU. Seems promising!

A bit of digging shows that VirtualBox is x86-64 only (older versions supported IA32). It doesn’t seem likely to change, although I do expect something equivalent to show up eventually.

I keep hearing that ARM is really slow at emulating x86-64 due to some unspecified characteristic of its instruction set. That doesn’t really make sense to me on the surface, but the people saying it keep using that claim as a basis for a prediction that x86 apps will not function at all on ARM Macs. I can’t imagine that Apple would settle for that, shipping computers completely incompatible with all pre-existing software, but the people making the argument just stop there. “Emulation would be slow, so Apple won’t do it.” And then they move on, as if the issue is settled.

If the slowness really is an issue, and it is so slow that performance is unacceptable despite how much faster Apple’s ARM cores are than anything Intel has ever made, it seems to me that Apple would be well positioned to supplement the instruction set with a few custom opcodes that would eliminate the problem. With a Rosetta-like software binary translation engine aware of those opcodes, we’d be good to go. Or they could go farther and add a hardware translation layer, such that x86-64 code could execute natively without a software translator, but that seems unlikely to me. Just have the hardware folks focus on making the cores faster (and more efficient, of course).

I was around for both the PPC and x86 transitions. Both Classic and Rosetta were said to be slow relative to native implementations, but my personal upgrade path was such that the new Mac I was using was 3+ years newer than the old one, and that meant that my emulated experience was faster than the prior native one. I expect that most users will be in the same boat, and as such any loss in performance due to emulation will be irrelevant.

al45tair · June 15, 2020, 1:09pm

I’ve heard that claim too. It’s plausible that there is some problem there (e.g. emulating the fast inverse square root algorithm used on Intel chips on PPC was a nuisance; the Intel code will move from the FPU stack to an integer register, which on PPC necessitates flushing the data to memory and reading it back). I haven’t investigated in any detail whether or not there genuinely is a problem.

Even if it’s true, as you say, there are viable workarounds for Apple.

Shamino · June 15, 2020, 2:14pm

Everybody said the same thing when Java was invented. Then just-in-time compilers were invented that mostly (but not completely, of course) solved that problem.

It is very likely that Apple’s solution, whatever it is, will involve some kind of JIT-compilation of x86 code to ARM, much like the way Transitive’s tech (used by Rosetta) converted PPC code to x86.

This is more complicated and expensive to develop than a simple interpreter, but it’s not a groundbreaking new concept either. It’s well within Apple’s ability to develop (or license or acquire) this technology.

Simon · June 15, 2020, 4:44pm

This I believe no longer holds nowadays. The performance jump people experienced in 1995 or 2006 when upgrading from 3+ year-old hardware was huge compared to what we see nowadays. The very reason people now upgrade less often.

Emulation will need to be extremely efficient for people’s workflow not to suffer in all but the cases where the system they’re upgrading from is very very old.

I do believe Apple can pull that off, as @Shamino indicates above, but I’m not talking any of that for granted. It will require lots of work.

BrianS · June 16, 2020, 1:33am

“I wonder if it’s technically possible to do exactly that - make many apps on the Mac App Store immediately available on ARM”

Apple introduced BitCode in 2015, so I think it’s technically possible if Apple starts collecting the intermediate code when Mac apps are submitted to MAS. A more in-depth technical review is available here

The penultimate paragraph is worth noting.

m.hedley · June 16, 2020, 1:53am

Thank you to all for very interesting and insightful discussions.

romad · June 16, 2020, 3:22am

So basically Apple went from CISC to RISC to CISC and now back to RISC.

ssteiff · June 18, 2020, 1:04pm

I don’t really care much about Windows compatibility on Mac. I do care, however, about software feature compatibility. My fear is that software developers, who will now need to develop and maintain an x86 version for their Windows audience and an ARM version for their Mac audience, may not maintain both versions at the same pace. Office 365, Adobe and others…

Shamino · June 18, 2020, 1:27pm

I wouldn’t be too concerned about that. Developers haven’t written application software in assembly language for a very long time.

Apps are written in high-level languages like C, C++ and others. These languages are mostly portable across processor architectures. Apps need to be careful about things like word size and endianness, but these are issues that developers have had to deal with for a very long time.

Most portability issues these days revolve around the fact that different operating systems use different APIs to provide similar functionality. These issues exist today when trying to make a cross-platform app and those issues will remain the same if Macs should switch to yet another CPU architecture.

But even that isn’t the biggest issue. Quite a lot of cross-platform apps use a single code-base and a portability library (whether developed in-house or third-party) to target the different platforms. So the biggest portability concern will be waiting for the developer of the portability library to make a version for the new platform.

JakeRobb · June 18, 2020, 3:01pm

As you mentioned later, Dalvik, like any modern JVM, does a lot of JIT compilation and optimization, resulting in code that is just as fast, and sometimes faster, than equivalent code written natively for the underlying processor in a low-level language like C. I think you’ll find that this “substantial performance penalty” is actually quite trivial, especially for any app where it really matters.

The guys on the Accidental Tech Podcast interviewed Chris Lattner (creator of LLVM) a few years ago (relevant bit is around ten minutes in, but the whole interview is amazing), and he explained the situation in some detail. IIRC, one notable issue is that Intel and ARM don’t use the same endianness. LLVM is not sufficiently expressive to compensate for that. Doing some research now, it seems that ARM is bi-endian, which means it can switch on the fly, but that has to be supported in hardware.

I’d agree with that if Apple’s ARM chips from last year weren’t still faster than anything Intel had to offer, and likely to get even faster still, as TSMC is racing ahead with smaller fabrication processes while Intel has been stuck for years, and many of the world’s most talented microprocessor engineers and designers are flocking to Apple in order to work on the bleeding edge of their fields.

Simon · June 18, 2020, 3:20pm

That, in its generality is of course not true. Just because the very best $2k iPad Pro beats a $1200 MBA in GeekBench does not mean any existing Axx can hold a candle to the kind of CPU performance we find in a modern MP or iMP.

I agree that Intel’s progress has become painfully slow and embarrassingly timid, and I also agree that Apple’s Axx show great potential. But any judgement as to whether Axx can actually serve across the entire Mac line due to alleged vastly superior performance is entirely premature until Apple actually starts talking about their plans, shows hardware and presents realistic benchmark data. IOW any of that talk before WWDC is IMHO really just religious at this point.

rogershepherd · June 18, 2020, 3:27pm

Intel and ARM (as deployed by Apple) are the same (little) endianness. [ARM was very bi-endian but the new ARM architecture is essentially little endian with the ability to handle big-endian data].

ddmiller · June 18, 2020, 3:57pm

FWIW, just to be pedantic, Dalvik was replaced years ago (with Lollipop 5.0, which I believe was in 2014) with ART (Android Runtime). ART was offered as a preview with Dalvik with KitKat prior to Lollipop, but Dalvik was scrapped entirely with 5.0. ART is even better than Dalvik; apps are compiled into native code and run that way when they are installed instead of at run time.

Shamino · June 18, 2020, 4:25pm

I hear proponents of JIT compilers make this claim all the time, but I don’t believe it. It doesn’t make any sense how code compiled to byte-code and then JIT-compiled to native instructions could possibly be as efficient as the original source code compiled directly to native instructions.

How else do you explain the fact that Android phones seem to have processors that are much faster than Apple’s, but produce apps with comparable (and often lower) performance?

I refuse to believe that Apple’s hardware engineers are super geniuses capable of creating an ARM-based SoC massively superior to anything Qualcomm’s engineers are capable of producing. Maybe a little better, but not nearly enough to explain the discrepancy between hardware capabilities and user experiences.

This is new to me too.

Just to be clear, Android is still using a JVM-like environment where developers compile their apps to byte-code. The difference between Dalvik and ART you’re describing is simply a matter of when the compilation from byte-code to native code takes place - at installation time instead of at run-time.

JakeRobb · June 18, 2020, 6:45pm

Apologies, I overgeneralized. The A13 Bionic scores higher in single core (Geekbench score of 1327 from the 11 Pro) than any Intel chip Apple has ever shipped in a Mac (the i9-9900K in an iMac, scoring 1244).

I don’t see much point in comparing multi-core scores for this conversation. Apple can put as many of their cores as they wish into the custom-designed silicon that will presumably land in a Mac someday. It’s the speed of the individual cores that drives this conversation, and the existence of Intel chips that don’t suit Apple’s needs and therefore aren’t in any Macs isn’t particularly relevant (although it could become relevant if Intel made a particular set of decisions which they’ve never indicated any intention to make).

Cool, thanks for the additional context. I think I knew that at one point, but had completely forgotten.