January 23, 2025

Westside People

Complete News World

New P and E cores, Xe2-LPG graphics, and new NPU 4 deliver more AI performance

New P and E cores, Xe2-LPG graphics, and new NPU 4 deliver more AI performance

Intel this morning lifted the lid on some of the finer architectural and technical details about its upcoming Lunar Lake SoC — the chip that will be the next generation of Core Ultra mobile processors. Once again, Intel held one of its increasingly regular Tech Tour events for media and analysts, and this time Intel set up shop in Taipei just before the start of Computex 2024. During the Tech Tour, Intel revealed several aspects of Lunar Lake, including the new P. The basic design is codenamed Lion Cove And a new wave of electron nuclei that look a bit like the pioneering low-energy Meteor Lake electron nuclei. Also unveiled was the Intel NPU 4, which Intel claims delivers up to 48 TOPS, exceeding Microsoft Copilot+ requirements for the new era of AI-enabled PCs.

Intel’s Lunar Lake represents a strategic evolution in its mobile SoC portfolio, building on the launch of Meteor Lake last year, with a focus on enhancing power efficiency and improving performance across the board. Lunar Lake dynamically allocates tasks to efficient cores (E-cores) or performance cores (P-cores) based on workload requirements by leveraging advanced scheduling mechanisms, which are set to ensure optimal power usage and performance. However, once again, Intel Thread Director, along with Windows 11, plays a pivotal role in this process, directing the operating system scheduler to make real-time adjustments that balance efficiency with computational power depending on the severity of the workload.

Intel CPU architecture generations
Alder/Raptor Lake meteor
lake
lunar
lake
Arrow
lake
Tiger
lake
P-Core architecture golden bay/
Raptor Cove
Redwood Cove Lion Cove Lion Cove Cougar Cove?
Basic electronic architecture Gracemont Crestmont Skymont Crestmont? Darkmont?
Graphics processing unit architecture XE-LP XE-LPG Xe2 XE2? ?
Architecture NPU nothing NBU 3720 National unity 4 ? ?
Active tiles 1 (homogeneous) 4 2 4? ?
Manufacturing processes Intel 7 Intel 4 + TSMC N6 + TSMC N5 TSMC N3E + TSMC N6 Intel 20A + more Intel 18A
slice Mobile + desktop Moving LB Mobile HP Mobile + Desktop Moving?
Release date (OEM) Fourth quarter of 2021 Fourth quarter of 2023 Third quarter of 2024 Fourth quarter of 2024 2025

Lunar Lake: Designed by Intel, built by TSMC

While there are many aspects of Lunar Lake you can dive into, perhaps it’s best to start with what is certainly the most notable: who built it.

Intel’s Lunar Lake tiles are not built using any of its own foundry facilities – a sharp departure from historical precedence, and even modern Meteor Lake, where compute tiles are made using the Intel 4 process. Instead, both Lunar Lake tiles are made Separated at TSMC, using a combination of TSMC N3E and N6 processes. In 2021, Intel has set out to free up its chip design stacks to use the best possible foundry — whether internal or external — and nowhere is that more evident than here.

Overall, Lunar Lake represents the second generation of bespoke SoC architecture for the mobile market, replacing the Meteor Lake architecture in the low-end space. At this time, Intel revealed that it is using a 4P+4E (8 core) design, with Multi-Threading/SMT disabled, so the total number of threads supported by the processor is simply the number of CPU cores, e.g., 4P+4E /8 T.

The Lunar Lake build combines synergistic collaboration between Intel’s architectural design team and TSMC’s contract manufacturing process to bring the latest Lion Cove P cores to Lunar Lake, enhancing Intel’s architectural IPC as you would expect from a new generation. At the same time, Intel is also offering Skymont E cores, which replace Meteor Lake’s Low Power Island Cresmont E-core cores. However, it is worth noting that these E cores do not connect to the ring bus like the P cores, making them a type of hybrid LP E core, combining the efficiency gains of the more advanced TSMC N3E node with a double-digit gain in IPC compared to previous Crestmont cores.

The entire compute board, including the P and E cores, is built on TSMC’s N3E node, while the SoC board is built using TSMC’s N6 node.

At a higher level, Intel is once again using its Foveros packaging technology here. Both the compute board and the SoC board (now the “platform controller”) are placed on top of the core board, providing high-speed/low-power routing between the tiles, and more connectivity to the rest of the chip and beyond.

In another first for a mainstream Intel Core product, the Lunar Lake SoC also includes up to 32GB of LPDDR5X memory on the same chip package. This is arranged as a pair of 64-bit memory chips, providing a total memory interface of 128 bits. As with other vendors using on-package memory, this change means users can’t upgrade DRAM at will, and memory configurations for Lunar Lake will ultimately be determined by the SKUs Intel chooses to ship.

With Lunar Lake, Intel is also putting a strong focus on AI, as the architecture integrates a new NPU called NPU 4. This NPU is rated for up to 48 TOPS of INT8 performance, making it PC-ready for Microsoft Copilot+ AI. This is the goal that all PC SoC vendors are aiming for, including AMD and Qualcomm as well.

Intel’s integrated GPU will also be a contributing player here. Although it’s not quite as efficient a machine as a dedicated NPU, the Arc Xe2-LPG brings dozens of extra T(FL)OPS performance with it, and some extra flexibility that an NPU doesn’t come with. That’s why you’ll also see Intel’s rating of these chips’ performance in terms of total platform impressions — in this case, 120 TOPS.

Intel’s collaboration with Microsoft improves workload management through the legendary Intel Thread Director, which is optimized for applications like Copilot Assistant. Given the timing of Lunar Lake’s introduction, it somewhat paves the way for a launch in Q3 2024, which coincides with the 2024 holiday market.

Intel Lunar Lake: Intel Series Manager update and power management improvements

To say that energy efficiency is a major goal for Lunar Lake would be an understatement. Although Intel has a high position in the laptop CPU market (AMD’s share there is still only a small fraction), the company has been feeling pressure over the past few years from Apple-turned-customer, which has been the M-series Apple’s own silicon has been setting the standard for energy efficiency over the past few years. Now with Qualcomm trying to do the same things for the Windows ecosystem with its upcoming Snapdragon

Intel’s thread manager and power management updates for Lunar Lake show various and significant improvements over Meteor Lake. The thread manager uses a heterogeneous scheduling policy, initially assigning tasks to one E-core and expanding to another E-core or P-core when needed. OS containment zones are designed to limit tasks to specific cores, which directly improves power efficiency and provides the performance the right kernel needs for the workloads at hand. Integration with power management systems and the chip’s quartet of Power Management Controllers (PMC), in coordination with Windows 11, enable context-aware adjustments, ensuring optimal performance with minimal power usage and waste.

Lunar Lake’s scheduling strategy efficiently handles power-sensitive applications. One example provided by Intel is that video conferencing tasks are kept within the efficiency core, using electronic cores to maintain performance while reducing power consumption by up to 35%, as shown in data provided by Intel. These improvements are achieved through collaboration with operating system developers like Microsoft for seamless integration to achieve the best balance between power consumption and performance.

Focusing on the power management system for Lunar Lake, Intel uses its own power management on the SoC, operating in efficiency, balance, and performance modes specifically designed to adapt to any workload demands at runtime. This multi-layered approach allows the Lunar Lake SoC to operate efficiently. Again, as with Intel Thread Director, PMCs can balance power usage with performance needs.

Intel also plans to improve the Thread Manager by increasing scenario granularity, implementing AI-based scheduling hints, and enabling over-IP scheduling within Windows 11. These improvements essentially equate to workload management designed to boost overall power efficiency and deliver performance across different applications when they’re needed Without wasting the power budget by allocating lighter tasks to higher power P cores.

Over the next few pages, we’ll explore the new P and E cores and Intel’s update to Arc Xe integrated graphics (Xe2-LPG).