This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing a register allocation pass with a large number of registers, followed by a pass with the number of registers exposed in the ISA, leads to better performance.
Someone
FTA: “For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers”
I think the main argument for doing that was that it meant that existing OSes didn’t need changes for the new CPU. Because they already saved the x87 registers on context switch, they automatically saved the MMX registers, and context switches didn’t slow down.
It also may have decreased the amount of space needed, but that difference can’t have been very large, I think
rep_lodsb
Nitpick (footnote 3): "64-bit kernels can run 32-bit userspace processes, but 64-bit and 32-bit code can’t be mixed in the same process. ↩"
That isn't true on any operating system I'm aware of. If both modes are supported at all, there will be a ring 3 code selector defined in the GDT for each, and I don't think there would be any security benefit to hiding the "inactive" one. A program could even use the LAR instruction to search for them.
At least on Linux, the kernel is perfectly fine with being called from either mode. FASM example code (with hardcoded selector, works on my machine):
format elf executable at $1_0000
entry start
segment readable executable
start: mov eax,4 ;32-bit syscall# for write
mov ebx,1 ;handle
mov ecx,Msg1 ;pointer
mov edx,Msg1.len ;length
int $80
call $33:demo64
mov eax,4
mov ebx,1
mov ecx,Msg3
mov edx,Msg3.len
int $80
mov eax,1 ;exit
xor ebx,ebx ;status
int $80
use64
demo64: mov eax,1 ;64-bit syscall# for write
mov edi,1 ;handle
lea rsi,[Msg2] ;pointer
mov edx,Msg2.len ;length
syscall
retfd ;return to caller in 32 bit mode
Msg1 db "Hello from 32-bit mode",10
.len=$-Msg1
Msg2 db "Now in 64-bit mode",10
.len=$-Msg2
Msg3 db "Back to 32 bits",10
.len=$-Msg3
show comments
JonChesterfield
Good post! Stuff I didn't know x64 has. Sadly doesn't answer the "how many registers are behind rax" question I was hoping for, I'd love to know how many outstanding writes one can have to the various architectural registers before the renaming machinery runs out and things stall. Not really for immediate application to life, just a missing part of my mental cost model for x64.
Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.
show comments
bradley13
The amount of accumulated cruft in the x86 architecture is astounding.
Being a geezer, I remember when there was, for a brief moment, a genuine question whether National Semiconductor, Motorola, or Intel would win the PC market. The NS processors had a nice, clean architecture. The Motorola processors, meh, ok. Intel already had cruft from earlier efforts like the 4004, and was just ugly.
Of course, Intel won, Motorola came in second, and NS became a footnote.
The x86 architecture has only gotten uglier over time.
jsrcout
Tried to answer this question years back for just the "basic" x86 registers. Quickly realized there was never going to be any single answer until I had mastered the entire ISA. Oh well.
diffuse_l
Some minor nitpicks, but hey, we're counting registers, it's already quite nitpicky :)
Add far as I van remember, you can't access the high/low 8 bits of si, di, sp. ip isn't accessible directly at all.
The ancestry of x86 can actually be traced back to 8 bit cpus - the high/low bits of registers are remenants of an even older arch - but I'm not sure about that from the top of my head.
I think most of the "weird" choices mentioned there boil down to limitations that seem absurd right now, but were real constraints - x87 stack can probably traced back to exposing minimal interface to the host processor - 1 register instead of 8 can save quite a few data line - although a multiplexer can probably solve this - so just a wild guess.
MMX probably reused the register file of x87 to save die space.
show comments
burnt-resistor
Conservatively though, another answer could be when not considering subset registers as distinct:
16 GP
2 state (flags + IP)
6 seg
4 TRs
11 control
32 ZMM0-31 (repurposes 8 FPU GP regs)
1 MXCSR
6 FPU state
28 important MSRs
7 bounds
6 debug
8 masks
8 CET
10 FRED
=========
145 total
And don't forget another 10-20 for the local APIC.
"The answer" depends upon the purpose and a specific set of optional extensions. Function call, task switching between processes in an OS, and emulation virtual machine process state have different requirements and expectations. YMMV.
This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing a register allocation pass with a large number of registers, followed by a pass with the number of registers exposed in the ISA, leads to better performance.
FTA: “For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers”
I think the main argument for doing that was that it meant that existing OSes didn’t need changes for the new CPU. Because they already saved the x87 registers on context switch, they automatically saved the MMX registers, and context switches didn’t slow down.
It also may have decreased the amount of space needed, but that difference can’t have been very large, I think
Nitpick (footnote 3): "64-bit kernels can run 32-bit userspace processes, but 64-bit and 32-bit code can’t be mixed in the same process. ↩"
That isn't true on any operating system I'm aware of. If both modes are supported at all, there will be a ring 3 code selector defined in the GDT for each, and I don't think there would be any security benefit to hiding the "inactive" one. A program could even use the LAR instruction to search for them.
At least on Linux, the kernel is perfectly fine with being called from either mode. FASM example code (with hardcoded selector, works on my machine):
Good post! Stuff I didn't know x64 has. Sadly doesn't answer the "how many registers are behind rax" question I was hoping for, I'd love to know how many outstanding writes one can have to the various architectural registers before the renaming machinery runs out and things stall. Not really for immediate application to life, just a missing part of my mental cost model for x64.
Related. Others?
How many registers does an x86-64 CPU have? (2020) - https://news.ycombinator.com/item?id=36807394 - July 2023 (10 comments)
How many registers does an x86-64 CPU have? - https://news.ycombinator.com/item?id=25253797 - Nov 2020 (109 comments)
Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.
The amount of accumulated cruft in the x86 architecture is astounding.
Being a geezer, I remember when there was, for a brief moment, a genuine question whether National Semiconductor, Motorola, or Intel would win the PC market. The NS processors had a nice, clean architecture. The Motorola processors, meh, ok. Intel already had cruft from earlier efforts like the 4004, and was just ugly.
Of course, Intel won, Motorola came in second, and NS became a footnote.
The x86 architecture has only gotten uglier over time.
Tried to answer this question years back for just the "basic" x86 registers. Quickly realized there was never going to be any single answer until I had mastered the entire ISA. Oh well.
Some minor nitpicks, but hey, we're counting registers, it's already quite nitpicky :)
Add far as I van remember, you can't access the high/low 8 bits of si, di, sp. ip isn't accessible directly at all.
The ancestry of x86 can actually be traced back to 8 bit cpus - the high/low bits of registers are remenants of an even older arch - but I'm not sure about that from the top of my head.
I think most of the "weird" choices mentioned there boil down to limitations that seem absurd right now, but were real constraints - x87 stack can probably traced back to exposing minimal interface to the host processor - 1 register instead of 8 can save quite a few data line - although a multiplexer can probably solve this - so just a wild guess. MMX probably reused the register file of x87 to save die space.
Conservatively though, another answer could be when not considering subset registers as distinct:
16 GP
2 state (flags + IP)
6 seg
4 TRs
11 control
32 ZMM0-31 (repurposes 8 FPU GP regs)
1 MXCSR
6 FPU state
28 important MSRs
7 bounds
6 debug
8 masks
8 CET
10 FRED
=========
145 total
And don't forget another 10-20 for the local APIC.
"The answer" depends upon the purpose and a specific set of optional extensions. Function call, task switching between processes in an OS, and emulation virtual machine process state have different requirements and expectations. YMMV.
Here's a good list for reference: https://sandpile.org/x86/initial.htm
Heh, am I the only one who was expecting an article about register renaming?
x86-64 ISA general-purpose register containers: low-er 8 to 16 bits of the 64 bit GPR.
Don't forget x86_64 like ARM is IP-locked, RISC-V is not.