Welcome to 16892 Developer Community-Open, Learning,Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm digging into left and right shift operations in x86 ASM, like shl eax, cl

From IA-32 Intel Architecture Software Developer’s Manual 3

All IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

I'm trying to understand the reasoning behind this logic. Maybe it works this way because on a hardware level it is hard to implement shift for all 32 (or 64) bits in a register using 1 cycle?

Any detailed explanation would help a lot!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.9k views
Welcome To Ask or Share your Answers For Others

1 Answer

Edited to correct statement re: 80386, which (to my surprise) did have a barrel shifter.


Happy to hear the 286 described as "modern" :-)

The 8086 ran a SHL AX, CL in 8 clocks + 4 clocks per bit shifted. So if CL = 255 this is a seriously slow instruction !

So the 286 did everybody a favour and clamped the count by masking to 0..31. Limiting the instruction to at most 5 + 31 clocks. Which for 16 bit registers is an interesting compromise.

[I found "80186/80188 80C186/80C188 Hardware Reference Manual" (order no. 270788-001) which says that this innovation appears there first. SHL et al ran 5+n clocks (for register operations), same like the 286. FWIW, the 186 also added PUSHA/POPA, PUSH immed., INS/OUTS, BOUND, ENTER/LEAVE, INUL immed. and SHL/ROL etc. immed. I do not know why the 186 appears to be a non-person.]

For the 386 they kept the same mask, but that applies also to 32-bit register shifts. I found a copy of the "80386 Programmer's Reference Manual" (order no. 230985-001), which gives a clock count of 3 for all register shifts. The "Intel 80386 Hardware Reference Manual" (order no. 231732-002), section 2.4 "Execution Unit" says that the Execution Unit includes:

? The Data Unit contains the ALU, a file of eight 32-bit general-purpose registers, and a 64-bit barrel shifter (which performs multiple bit shifts in one clock).

So, I do not know why they did not mask 32-bit shifts to 0..63. At this point I can only suggest the cock-up theory of history.

I agree it is a shame that there isn't a (GPR) shift which returns zero for any count >= argument size. That would require the hardware to check for any bit set beyond the bottom 6/5, and return zero. As a compromise, perhaps just the Bit6/Bit5.

[I haven't tried it, but I suspect that using PSLLQ et al is hard work -- shuffling count and value to xmm and shuffling the result back again -- compared to testing the shift count and masking the result of a shift in some branch-free fashion.]

Anyway... the reason for the behaviour appears to be history.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to 16892 Developer Community-Open, Learning and Share
...