Arm, the leading semiconductor and software design company, has unveiled its latest generation processors, including the Cortex-A720 big core and Cortex-A520 small core. These cores offer significant advancements in performance, energy efficiency, and architectural improvements, solidifying Arm’s position as a leader in processor technology.
The Cortex-A720, codenamed “Hunter,” is Arm’s most versatile performance core, often paired with the Cortex-A500 series of little cores in DSU configurations. It delivers a remarkable 20% improvement in power efficiency compared to its predecessor, the Cortex-A715.
Arm’s annual updates alternate between large pipeline improvements and energy efficiency and area enhancements. While the Cortex-A715 focused on larger decoding due to the discontinuation of 32-bit support, the Cortex-A720 maintains a large depth and width while prioritizing power efficiency.
In the Cortex-A720, Arm has made significant improvements to the front end, reducing mispredict penalties and improving branch prediction. These enhancements lead to better real-world application performance and improved power efficiency without sacrificing performance.
The Cortex-A720’s back end also sees improvements. Arm has connected the FDIV/FSQRT unit to the pipeline, resulting in a notable performance boost. Data transfers between FP/Vector units and integer units have been optimized, and the core features improved store data latency. These enhancements contribute to improved overall performance and efficiency.
Memory improvements include reducing the L2 cache hit latency from 10 cycles to 9 cycles. Arm has introduced a new L2 spatial prefetch engine, enhancing overall performance. The Cortex-A720 demonstrates cross-generation accuracy and coverage improvements over existing prefetchers.
An interesting feature of the Cortex-A720 is its dual configuration capability. It offers both an area-optimized configuration and a full configuration. In the area-optimized configuration, the Cortex-A720 achieves a 10% improvement over the Cortex-A78. In the full configuration, it delivers up to 20% more energy efficient than the Cortex-A715, albeit at a higher area cost.
Alongside the Cortex-A720, Arm introduced the Cortex-A520, a highly efficient small core. The Cortex-A520 builds upon the success of its predecessor, the Cortex-A510, with microarchitectural improvements and optimizations. It is designed to be combined with the Cortex-A720 in various configurations using the DSU-120, making it suitable for area-constrained devices.
The Cortex-A520 fully supports Armv9 and has an updated underlying ISA support level of version 9.2. It also supports expanding the PAC functionality and implements the QARMA3 algorithm for address authentication, reducing PAC overhead and latency.
Arm has prioritized energy and area efficiency in the Cortex-A520 by removing or scaling back certain components. Notably, one ALU has been removed, resulting in power savings throughout the pipeline. The memory system has also been rebuilt for improved efficiency.
Arm claims that the Cortex-A520 offers a 22% power reduction at the same performance level or approximately 8% more performance at the same power level.