How Many X86-64 Instructions Are There Anyway?

x86 is an enormously popular instruction set that is used on most desktop computers and servers (but usually not on mobile devices like phones). Given this wide use, it might seem like an easy question to ask how many x86 instructions there are, but it turns out this is more intricate than it looks.

Background

x86 is the name of a whole family of so-called instruction set architectures (ISA for short). The ISA defines the available instructions of a CPU, but also the registers, the memory architecture, how interrupts are handled, and much more. It is different from the microarchitecture, though, which determines how the ISA is implemented. Different microarchitectures can implement the same ISA, which is what happens with different AMD and Intel microarchitectures implementing essentially the same ISA. Since the introduction of x86 by Intel in 1978, the architecture has evolved extensively, but with almost full backward compatibility. Partly for this reason, the instruction set is full of weird corner-cases and historical artifacts. Here, I will look at the 64-bit variant (sometimes called x86-64) of the ISA and attempt to count the number of instructions. Specifically, the ISA from the Haswell microarchitecture, one of the latest ISA available at the time of writing. Due to the backward compatibility, this instruction set still includes most of the instructions from the very first generation of x86, and anything in between.

Attempts 1 and 2: Mnemonics

Machine instructions (the ones and zeros that tell your computer what to do next) have a portion called the opcode that determines what operation is performed. A mnemonic is a symbolic name for these opcodes. For instance, to add one to the eax register, one could use the instruction addl$1,%eax. Here, addl is the mnemonic, and so we can count the number of unique mnemonics to determine the number of instructions. I've written a small program to do this:

$ ./instr-count --mode "mnemonic_att"
1279

Great, so we are done and there are 1,279 instructions, right? Actually, the story is more complicated. Those familiar with assembly will have realized that the l in addl is a suffix that indicates the bit width of the operands. For instance, to add one to a 16-bit register (instead of the 32-bit register eax), one would use addw$1,%ax. Both of these instructions are doing addition, maybe we should not count them separately. In fact, had we used Intel syntax for the instructions, then the same operation would be written as addeax,1 (note the absence of a width suffix, and the inverted ordering of operands). Let's try that:

$ ./instr-count --mode "mnemonic_intel"
981

Attempt 3: Mnemonics and Operand Types

If we look a little closer at mnemonics, we find that the situation is actually not so simple. Let's again look at the two instructions to add a constant to a 16 and 32-bit register, addw$0x1,%ax and addl$0x1,%eax, respectively. It turns out, that on 64-bit machines we consider here, they actually have relatively different semantics. The former just adds one to ax. On the other hand, addl$0x1,%eax also does the addition, but additionally zeros out the upper 32 bits of rax. Here is an example illustrating this behavior:

Note that the 32-bit addition on the right zeroes out the upper 32 bits of rax (marked in red), but the 16-bit addition (as well as the 8-bit addition not shown here) don't do that. That doesn't really seem like the same instruction any longer, even if both have the same (Intel) mnemonic. For another example why mnemonics can be misleading consider the instruction to perform a 32-bit signed multiplication: imull. There is a variant that multiplies two registers, and there is a variant that multiplies two registers and a constant. Again, even though both of these share the same mnemonic, they hardly seem like the same instruction (one multiplies two numbers, the other three numbers).

Therefore, maybe we should really count all the different combinations of mnemonic as well as the argument types. For instance, addl$1,%eax would be add_imm32_r32. This ensures that we correctly identify the examples shown as different instructions, and leaves us with 3,683 instructions:

$ ./instr-count --mode "operand_type"
3683

Attempt 4: Mnemonics and Operand Widths

Now we have a lot of instructions, and some of them seem like they are not really different instructions. For instance, the instruction to add a constant to a 32-bit register and the instruction to add a constant to a 32-bit memory location are counted separately, when the operation they perform could be argued to be identical. Thus, maybe a better way would be to consider the mnemonic in addition to the operand width. Our running example would be add_32_32, regardless of whether the operand is a register, memory location or constant. This way, we end up with 2,034 instructions:

$ ./instr-count --mode "operand_width"
2034

It's All Wrong

Unfortunately, there are even more weird corner cases and historical artifacts in x86 that make counting instructions hard. For instance, we used mnemonics to group the same "mathematical operation" regardless of the operands, but it turns out that doing a bitwise xor operation is done using xor (in Intel syntax), but also pxor, vpxor, vxorpd, vxorps, xorpd, and xorps depending on the operand size and other things (and even more in AT&T syntax). So if attempt 2 tries to group the same operation regardless of the operation, then that count is actually too high.

And on the other end of the spectrum, we tried to count every different operation separately in attempt 3. However, it turns out there are instructions that really are a family of instructions. For instance, consider cmpsd that compares two floating point numbers. A third operand, an 8-bit constant, determines if the comparison is for equality, inequality, if the first number is larger, and 5 more variants. These clearly seem like different instructions, yet we have counted them as a single one.

Unfortunately, I can't think of an easy automatic way of counting such nuances, so I will leave it at the numbers I have.

Conclusions

As I have hopefully convinced you by now, this innocent looking question is more difficult to answer than it might seem. It depends on your notion of what an instruction is, and is complicated by the many intricacies of the complex x86-64 instruction set. Here is an overview of possible answers:

Measure	Count	Comment
AT&T mnemonic (e.g., `addl`)	1,279	Counts the number of unique mnemonics in AT&T syntax.
Intel mnemonic (e.g., `add`)	981	This is a rough estimate of the number of different kinds of operations the x86 instruction set can perform, ignoring the operand type and size. There are various caveats as described earlier, such as some operations having multiple different mnemonics.
Mnemonic and operand types (e.g., `add_r32_imm32`)	3,683	Distinguishes instructions in every way possible and is thus a sort of upper bound on the number of instructions. If you are looking for a very fine-grained notion of instruction, this is a good measure.
Mnemonic and operand width (e.g., `add_32_32`)	2,034	This is an estimate of the number of different kinds of instructions, if an operation on 8 bits is considered to be different from the same operation on 16 bits (because sometimes, these actually have different semantics, as shown with the zeroing of the upper 32 bits for the `addl` instruction). Does not distinguish instructions that operate on registers vs. constants vs. memory locations.

Personally, the 4th attempt seems to capture my intuitive understanding of what an instruction is the closest. So, if I ever have to cross the bridge of death and get asked how many x86-64 instructions there are, I shall answer 2,034.

All the numbers in this blog post have been obtained through a small program making use of our awesome C++11 library for working with x86-64 assembly. Just like the library, my program is available as open source on GitHub.

How Many X86-64 Instructions Are There Anyway?

Background

Attempts 1 and 2: Mnemonics

Attempt 3: Mnemonics and Operand Types

Attempt 4: Mnemonics and Operand Widths

It's All Wrong

Conclusions

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List