The M1 Mac SoC (system on a chip) and the A14 in the iPhone 12 and iPad Air share a common, scalable architecture - and the thing that’s got everyone in a tither are the Firestorm high performance cores in both.
The A14 has two high performance Firestorm cores and four high efficiency Icestorm cores; the M1 has four Firestorms and four Icestorms.
What makes a computer feel fast are the high performance cores - not all tasks are capable of being multithreaded and multithreading (or multiprocessing) is an arduous process introducing complexity and the chance of unintended consequences (bugs).
In servers, multiprocessing is the norm - most of the processes run by a server are discrete and you can assign a processing engine (core) to each client and process the work of a bunch of clients in parallel - on consumer computers, introducing a computer running on a bunch of cores can easily lead to a bunch of idle cores. The speed of your fastest process is still limited to the speed of your fastest core.
The hardest thing you can do is produce a faster core - by comparison, adding cores and the attendant boogymen of cache coherence and the like are relatively simple in comparison. When you see a CPU maker add a bunch of cores to a consumer computer, what you’re seeing is a band-aid - a way to keep the numbers up without doing the really hard work of improving the speed of your single core performance.
Sure, there are workloads which can benefit from multiprocessing - transcoding video can be chopped into subtasks by chopping the video into chunks between keyframes and dispatching work to any number of cores. Your OS can give a display refresh task a core to keep the video buffer refreshed independent of other processing going on the the foreground. But the number of parallel tasks which can be assigned in a consumer computer doesn’t justify the 10 core 20 thread CPU used in my 2020 iMac 5K - most of the time, most of the cores simply sit there and soak up power, though the dispatcher does round robin dispatching work on various cores to keep them from looking idle and to even wear.
So how did Apple - that lifestyle company from Cupertino - end up designing one of the fastest cores available giving the Wintel alliance sleepless nights all around the world?
In 2008, Apple acquired PA Semi and worked with cash strapped Intrinsity and Samsung to produce a FastCore Cortex-A8; the frenemies famously split and Apple used their IP and Imagination’s PowerVR to create the A4 and Samsung took their tech to produce the Exynos 3. Apple acquired Intrinsity and continued to hire engineering talent from IBM’s Cell and XCPU design teams, and hired Johny Srouji from IBM who worked on the POWER7 line to direct the effort.
This divergence from standard ARM designs was continued by Apple who continued to nurture and build their Silicon Design Team (capitalized out of respect) for a decade, ignoring standard ARM designs building their own architecture, improving and optimizing it year by year for the last decade.
Whereas other ARM processor makers like Qualcomm and Samsung pretty much now use standard ARM designed cores - Apple has their own designs and architecture and has greatly expanded their own processor acumen to the point where the Firestorm cores in the A14 and M1 are the most sophisticated processors in the world with an eight wide processor design with a 690 instruction execution queue with a massive reorder buffer and the arithmetic units to back it up - which means its out-of-order execution unit can execute up to eight instructions simultaneously.
x86 processor makers are hampered by the CISC design and a variable instruction length. This means that at most they can produce a three or four wide design and even for that the decoder would have to be fiendishly clever, as it would have to guess where one instruction ended and the next began.
There’s a problem shared with x86-64 processor makers and Windows - they never met an instruction or feature they didn’t like. What happens then is you get a build-up of crud that no one uses, but it still consumes energy and engineering time to keep working.
AMD can get better single core speed by pushing up clocks (and dealing with the exponentially increased heat though chiplets are probably much harder to cool), and Intel by reducing the number of cores (the top of the 10 core 20 thread 10900K actually had to be shaved to achieve enough surface area to cool the chip so it at 14nm had reached the limits of physics). Both run so hot they are soon in danger of running into Moore’s Wall.
Apple OTOH ruthlessly pares underused or unoptimizable features.
When Apple determined that ARMv7 (32 bit ARM) was unoptimizable, they wrote it out of iOS, and removed those logic blocks from their CPUs in two years, repurposing the silicon real estate for more productive things. Intel, AMD, and yes even Qualcomm couldn’t do that in a decade.
Apple continues that with everything - not enough people using Force Touch - deprecate it, remove it from the hardware, and replace it with Haptic Touch. Gone.
Here’s another secret of efficiency - make it a goal. Last year on the A13 Bionic used in the iPhone 11s, the Apple Silicon Team introduced hundreds of voltage domains so they could turn off parts of the chip not in use. Following their annual cadence, they increased the speed of the Lightning high performance and the Thunder high efficiency cores by 20% despite no change in the 7nm mask size. As an aside, they increased the speed of matrix multiplication and division by six times (used in machine learning).
This year they increased the speed of the Firestorm high performance and Icestorm high efficiency cores by another 20% while dropping the mask size from 7nm to 5nm. That’s a hell of a compounding rate and explains how they got to where they are. Rumor has it they’ve bought all the 3nm capacity from TSMC for the A16 (and probably M2) next year.
Wintel fans would deny the efficacy of the A series processors and say they were mobile chips, as if they used slower silicon with wheels on the bottom or more sluggish electrons.
What they were were high efficiency chips which were passively cooled and living in a glass sandwich. Remove them from that environment where they could breathe more easily and boost the clocks a tad and they became a raging beast.
People say that the other processor makers will catch up in a couple of years, but that’s really tough to see. Apple Silicon is the culmination of a decade of intense processor design financed by a company with very deep pockets - who is fully cognizant of the competitive advantage Apple Silicon affords. Here’s an article in Anandtech comparing the Firestorm cores to the competing ARM and x86 cores. It’s very readable for an article of its ilk.
Of course these are the Firestorm cores used in the A14, and are not as performant as the cores in the M1 due to the M1’s higher 3.2 ghz clock speed.