Intel has further detailed its next-gen Xeon Data Center CPU lineup for 2024 which will include Granite Rapids & Sierra Forest.
Intel Brings All The Xeon Goodness To HotChips 2023, Details Granite Rapids P-Core & Sierra Forest E-Core CPU Families For 2024
The HotChips 2023 session kicked off with Intel explaining how the modern data center needs are expanding and becoming more and more workload-centric. You can have a wide variety of HPC, AI, compute-intensive, High-Density, & general-purpose workloads which means that a single type of core isn't always the best choice to do all sorts of these works. As such, Intel has already committed to offer its own answer to AMD's Zen 4 and Zen 4C strategy and that comes with P-Core and E-Core specific Xeon CPUs.
The P-Core Xeon CPUs are branded under the Granite Rapids family which are optimized around compute-intensive and AI workloads while the E-Core Xeon CPUs are branded under the Sierra Forest family & these are optimized for Efficiency in high-density and scale-out workloads. Both CPU families share a common platform foundation and shared software stack which means that they are both compatible with each other on the same platforms.
Diving into the modularity aspects of its next-gen SoC architecture, Intel details how its Xeon CPUs, especially Granite Rapids, will feature separate compute and IO silicon chiplets. These chiplets will be interconnected using the EmiB fabric on the same package, offering high bandwidth and low latency path lanes.
Talking about platform, the Intel Granite Rapids-SP Xeon CPUs will scale across 1S and up to 8S platforms while Sierra Forest chips will scale across 1S and up to 2S solutions. Both CPUs will feature a range of SKUs with variable core counts and thermal targets. The next-gen Xeon CPU platform will also support up to 12-channel DDR/MCR (1-2DPC) memory, up to 136 PCIe Gen 5 lanes with 6 UPI links (CXL 2.0).
Some SKUs shown as renders give us a look at three possible configurations which include a single compute die with two IO chiplets, a dual Compute Chiplet & dual IO chiplet configuration, and lastly the top triple Compute Chiplet and dual IO chiplet configuration.
Intel will be using a modular mesh fabric to access all the chiplets which enables:
- Logically Monolithic Mesh enables direct access between agents within the socket
- Last Level Cache is shared amongst all cores and can be partitioned into per-die sub-numa clusters
- EmiB technology extends the high-speed fabric across all die in the package
- Modularity and flexible routing allow per-die definition of rows and columns
- Fabric distributes IO traffic across multiple columns to ease congestion
- Global infrastructure is modular/hierarchical
The Compute die on the Intel Xeon Granite Rapids chips will aim for higher performance & efficiency by utilizing the latest Intel 3 process node and also enable a flexible row/column structure. The Core Tile itself will consist of the CPU cores which are based on the Redwood Cove architecture and L2 cache, LLC+SF+CHA slice, and the mesh fabric interface.
The Core Tile can be adapted to both P-Cores and E-Cores. Plus, there's the Advanced Memory subsystem on the same die which features a common controller/IO and full support for CXL-attached memory.
- Intel Xeon processors with E-cores (Sierra Forest) are enhanced to deliver density-optimized computing in the most power-efficient manner. Xeon processors with E-cores provide best-in-class power-performance density, offering distinct advantages for cloud-native and hyperscale workloads.
- 2.5x better rack density and 2.4x higher performance per watt.
- Support for 1S and 2S servers, with up to 144c per CPU and TDP as low as 200W.
- Modern instruction set with robust security, virtualization, and AVX with AI extensions.
- Foundational memory RAS features such as machine check, and data cache ECC standard in all Xeon CPUs.
- Intel Xeon processors with P-cores (Granite Rapids) are optimized to deliver the lowest total cost of ownership (TCO) for high-core performance-sensitive workloads and general-purpose compute workloads. Today, Xeon enables better AI performance than any other CPU, and Granite Rapids will further enhance AI performance. Built-in accelerators give an additional boost to targeted workloads for even greater performance and efficiency.
- 2-3x better performance for mixed AI workloads.
- Enhanced Intel AMX with support for new FP16 instructions.
- Higher memory bandwidth, core count, and cache for compute-intensive workloads.
- Socket scalability from one socket to eight sockets.
For Sierra Forest, the Intel E-Core Xeon family, the CPU's core tile will offer 2-4 cores per module which will utilize a shared L2 cache, a shared frequency/voltage domain, and a shared mesh fabric interface. Each core is single threaded so that gives us 144 cores and 144 threads in the top SKU's Compute Tile packaged within 36 core tiles. The LLC slice is shared amongst all cores in a socket and offers a high bandwidth pipeline. So that gives us up to 144 MB of L2 cache and 108 MB of LLC.
Based on the current roadmap, Intel's 5th Gen Emerald Rapids Xeon CPUs will be launching in Q4 of 2023 followed by Sierra Forest in the first half of 2024 and these will soon be followed by the P-Core powered Granite Rapids family.
Intel Xeon CPU Families (Preliminary):
Family Branding | Diamond Rapids | Clearwater Forest | Granite Rapids | Sierra Forest | Emerald Rapids | Sapphire Rapids | Ice Lake-SP | Cooper Lake-SP | Cascade Lake-SP/AP | Skylake-SP |
---|---|---|---|---|---|---|---|---|---|---|
Process Node | Intel 20A? | Intel 18A | Intel 3 | Intel 3 | Intel 7 | Intel 7 | 10nm+ | 14nm++ | 14nm++ | 14nm+ |
Platform Name | Intel Mountain Stream Intel Birch Stream | Intel Mountain Stream Intel Birch Stream | Intel Mountain Stream Intel Birch Stream | Intel Mountain Stream Intel Birch Stream | Intel Eagle Stream | Intel Eagle Stream | Intel Whitley | Intel Cedar Island | Intel Purley | Intel Purley |
Core Architecture | Lion Cove? | TBD | Redwood Cove+? | E-Core | Raptor Cove | Golden Cove | Sunny Cove | Cascade Lake | Cascade Lake | Skylake |
IPC Improvement (Vs Prev Gen) | TBD | TBD | TBD | TBD | 1%? | 19% | 20% | 0% | 0% | 10% |
MCP (Multi-Chip Package) SKUs | Yes | TBD | Yes | Yes | Yes | Yes | No | No | Yes | No |
Socket | LGA 4677 / 7529 | LGA 4677 / 7529 | LGA 4677 / 7529 | LGA 4677 / 7529 | LGA 4677 | LGA 4677 | LGA 4189 | LGA 4189 | LGA 3647 | LGA 3647 |
Max Core Count | Up To 144? | TBD | Up To 136? | 144-336 | Up To 64? | Up To 56 | Up To 40 | Up To 28 | Up To 28 | Up To 28 |
Max Thread Count | Up To 288? | TBD | Up To 272? | 144-336 | Up To 128 | Up To 112 | Up To 80 | Up To 56 | Up To 56 | Up To 56 |
Max L3 Cache | TBD | TBD | TBD | 144-336 MB L3? | 320 MB L3 | 105 MB L3 | 60 MB L3 | 38.5 MB L3 | 38.5 MB L3 | 38.5 MB L3 |
Vector Engines | AVX-1024/FMA3? | TBD | AVX-512/FMA3? | TBD | AVX-512/FMA2 | AVX-512/FMA2 | AVX-512/FMA2 | AVX-512/FMA2 | AVX-512/FMA2 | AVX-512/FMA2 |
Memory Support | Up To 12-Channel DDR6-7200? | TBD | Up To 12-Channel DDR5-6400 | Up To 8-Channel DDR5-6400? | Up To 8-Channel DDR5-5600 | Up To 8-Channel DDR5-4800 | Up To 8-Channel DDR4-3200 | Up To 6-Channel DDR4-3200 | DDR4-2933 6-Channel | DDR4-2666 6-Channel |
PCIe Gen Support | PCIe 6.0 (128 Lanes)? | TBD | PCIe 5.0 (96 Lanes) | PCIe 5.0 (TBD Lanes) | PCIe 5.0 (80 Lanes) | PCIe 5.0 (80 lanes) | PCIe 4.0 (64 Lanes) | PCIe 3.0 (48 Lanes) | PCIe 3.0 (48 Lanes) | PCIe 3.0 (48 Lanes) |
TDP Range (PL1) | Up To 500W? | TBD | Up To 500W | Up To 350W | Up To 350W | Up To 350W | 105-270W | 150W-250W | 165W-205W | 140W-205W |
3D Xpoint Optane DIMM | Donahue Pass? | TBD | Donahue Pass | TBD | Crow Pass | Crow Pass | Barlow Pass | Barlow Pass | Apache Pass | N/A |
Competition | AMD EPYC Venice | AMD EPYC Zen 5C | AMD EPYC Turin | AMD EPYC Bergamo | AMD EPYC Genoa ~5nm | AMD EPYC Genoa ~5nm | AMD EPYC Milan 7nm+ | AMD EPYC Rome 7nm | AMD EPYC Rome 7nm | AMD EPYC Naples 14nm |
Launch | 2025? | 2025 | 2024 | 2024 | 2023 | 2022 | 2021 | 2020 | 2018 | 2017 |
WccftechContinue reading/original-link]