The CPU on your device performs millions of computations every second and is responsible for how your computer functions. Working with the CPU is the Arithmetic Processing Unit (ALU), which is responsible for mathematical tasks and is driven by the CPUs microcode.

Now, that CPU microcode isn't static and can be improved, and one such improvement was Intel's AVX-512 instruction set. However, Intel is set to kill AVX-512, removing its functionality from its CPUs for good. But why? Why is Intel killing off AVX-512?

How Does an ALU Work?

Before getting to know the AVX-512 instruction set, it's essential to understand how an ALU works.

As the name suggests, the Arithmetic Processing Unit is used to perform mathematical tasks. These tasks include operations like addition, multiplication, and floating-point calculations. To accomplish these tasks, the ALU uses application-specific digital circuitry, which is driven by the clock signal of the CPU.

Therefore, the clock speed of a CPU defines the rate at which instructions are processed in the ALU. So, if your CPU runs on a 5GHz clock frequency, the ALU can process 5 billion instructions in one second. Due to this reason, CPU performance improves as the clock speed increases.

Chipsets on a motherboard

That said, as the CPU clock speed increases, the amount of heat generated by the CPU increases. Due to this reason, power users use liquid nitrogen when overclocking their systems. Unfortunately, this increase in temperature at high frequencies prevents CPU manufacturers from increasing the clock frequency over a certain threshold.

So how does a new generation processor offer better performance compared to older iterations? Well, CPU manufacturers use the concept of parallelism to boost performance. This parallelism can be achieved by using a multicore architecture where several different processing cores are used to improve the computational power of the CPU.

Another way to improve performance is by using a SIMD instruction set. In simple terms, a Single Instruction Multiple Data instruction enables the ALU to execute the same instruction across different data points. This type of parallelism improves the performance of a CPU, and the AVX-512 is a SIMD instruction used to boost a CPU's performance when performing specific tasks.

How Does Data Reach the ALU?

Now that we have a basic understanding of how an ALU works, we need to understand how data reaches the ALU.

hard drive with a empty background

To reach the ALU, data has to move through different storage systems. This data journey is based on a computing system's memory hierarchy. A brief overview of this hierarchy is given below:

  • Secondary memory: The secondary memory on a computing device consists of a permanent storage device. This device can store data permanently but is not as fast as the CPU. Due to this, the CPU cannot access data directly from the secondary storage system.
  • Primary memory: The primary storage system consists of random access memory (RAM). This storage system is faster than the secondary storage system but cannot store data permanently. Therefore, when you open a file on your system, it moves from the hard drive to the RAM. That said, even the RAM is not fast enough for the CPU.
  • Cache memory: The cache memory is embedded in the CPU and is the fastest memory system on a computer. This memory system is divided into three parts, namely the L1, L2, and L3 cache. Any data which needs to be processed by the ALU moves from the hard drive to the RAM and then to the cache memory. That said, the ALU cannot access data directly from the cache.
  • CPU registers: The CPU register on a computing device is very small in size, and based on the computer architecture, these registers can hold 32 or 64 bits of data. Once the data moves into these registers, the ALU can access it and perform the task at hand.

What Is AVX-512, and How Does It Work?

The AVX 512 instruction set is the second iteration of AVX and made its way to Intel processors in 2013. Short for Advanced Vector Extensions, the AVX instruction set was first introduced in Intel's Xeon Phi (Knights Landing) architecture and later made it to Intel's server processors in the Skylake-X CPUs.

In addition, the AVX-512 instruction set made its way to the consumer-based systems with the Cannon Lake architecture and was later supported by the Ice Lake and Tiger Lake architectures.

The main goal of this instruction set was to accelerate tasks involving data compression, image processing, and cryptographic computations. Offering double the computation power compared to older iterations, the AVX-512 instruction set offers substantial performance gains.

So, how did Intel double the performance of its CPUs using the AVX-512 architecture?

Well, as explained earlier, the ALU can only access the data present in a CPU's register. The Advanced Vector Extensions instruction set increases the size of these registers.

Due to this increase in size, the ALU can process multiple data points in a single instruction, increasing the system's performance.

In terms of register size, the AVX-512 instruction set offers thirty-two 512-bit registers, which is double when compared to the older AVX instruction set.

Why Is Intel Ending AVX-512?

As explained earlier, the AVX-512 instruction set offers several computational advantages. In fact, popular libraries like TensorFlow use the instruction set to provide faster computations on the CPUs supporting the instruction set.

So, why is Intel disabling AVX-512 on its recent Alder Lake processors?

Well, the Alder Lake processors are unlike the older ones manufactured by Intel. While the older systems used cores running on the same architecture, the Alder Lake processors use two different cores. These cores in the Alder lake CPUs are known as P and E-cores and are powered by different architectures.

While the P-cores use the Golden Cove microarchitecture, the E-cores use the Gracemont microarchitecture. This difference in architectures prevents the scheduler from working correctly when particular instructions can run on one architecture but not on the other.

In the case of the Alder Lake processors, the AVX-512 instruction set is one such example, as the P-cores have the hardware to process the instruction, but the E-cores do not.

Due to this reason, the Alder Lake CPUs do not support the AVX-512 instruction set.

That said, AVX-512 instruction can run on certain Alder Lake CPUs' where Intel has not physically fused them off. To do the same, users have to disable the E-cores during BIOS.

Is AVX-512 Needed on Consumer Chipsets?

The AVX-512 instruction set increases the size of a CPU's register to enhance its performance. This boost in performance enables CPUs to crunch numbers faster, allowing users to run video/audio compression algorithms at faster speeds.

That said, this boost in performance can only be observed when the instruction defined in a program is optimized to run on the AVX-512 instruction set.

Due to this reason, instruction set architectures like AVX-512 are more suited for server workloads, and consumer-grade chipsets can work without complex instruction sets like the AVX-512.