MIPS中的跳转/分支指令

本文节选自《See MIPS run2rd》/《MIPS体系结构透视》中的部分章节,结合个人理解,对部分译文有所改动。

1.5.2 Addressing and Memory Accesses

Jump instructions: The limited 32-bit instruction length is a particular problem for branches in anarchitecture that wants to support very large programs. The smallest opcode field in a MIPS instruction is 6 bits, leaving 26 bits to define the target of a jump. Since all instructions are four-byte aligned in memory,the two least significant address bits need not be stored,allowing an address range of 2^28= 256 MB. Rather than make this branch PC relative, this is interpreted as an absolute address within a 256MB segment. That’s inconvenient for single programs larger than this, although it hasn't been much of a problem yet!

Branches out of segment can be achieved by using a jump register instruction, which can go to any 32-bit address.
Conditional branches have only a 16-bit displacement field—giving a 2^18 byte range, since instructions are four-byte aligned—which is interpreted as a signed PC-relative displacement. Compilers can only code a simple conditional branch instruction if they know that the target will be within 128 KB of the instruction following the branch.

1.5.2 编址及内存访问

跳转指令(j):有限的32位指令长度对于大型程序的分支跳转支持确实是个难题。MIPS指令中最小的操作码域占6位,剩下的26位用于跳转目标的编址。由于所有指令在内存中都是4字节对齐的,因此最低的2个地址位是无需存储的,这样可供寻址范围为2^28=256MB。分支跳转地址被当做一个256MB的段内绝对地址,而非PC相对寻址。这对于地址范围超过256MB的跳转程序而言是无能为力的,所幸目前很少遇到这么大的远程跳转需求。
段外分支跳转可以使用寄存器跳转指令实现,它可以跳转到任一32位地址。

条件分支跳转指令(b)编码域的后16位broffset是相对PC的有符号偏移量,由于指令是4字节对齐的,因此可支持的跳转范围实际上是2^18=256KB(相对PC的-128KB~+128KB)。如果确定跳转目标地址在分支指令前后的128KB范围内,编译器就可以编码只生成一条简单的条件分支指令。

1.5.4 Programmer-Visible Pipeline Effects

Delayed branches: The pipeline structure of the MIPS CPU (Figure 1.3) means that when a jump/branch instruction reaches the execute phase and a new program counter is generated,the instruction after the jump will already have been started. Rather than discard this potentially useful work, the architecture dictates that the instruction after a branch must always be executed before the instruction at the target of the branch. The instruction position following any branch is called the branch delay slot.
--------------------------------------------------------------
It is the responsibility of the compiler system or the assembly programming wizard to allowfor and even to exploit the branchdelay; it turns out that it is usually possible toarrange that the instruction in the branch delay slot does useful work. Quite often,the instruction that would otherwise have beenplaced before the branch can be moved into the delay slot.
This can be a bit tricky on a conditional branch, where the branch delay instructionmust be (at least)harmless on bothpaths. Where nothing useful can be done, the delay slot isfilled with a nopinstruction.
--------------------------------------------------------------
Late Data from load(load delay slot): Another consequence of the pipeline is that a load instruction’s data arrives from the cache/memory system after the next instruction’s ALU phase starts—so it is not possible to use the data from a load in the following instruction. (See Figure 1.4 for how this works.)
Theinstruction position immediately after the load is called the load Delay Slot, and an optimizingcompiler will try to do something useful with it. The assembler will hide this from you but may end upputting in a nop.
Onmodern MIPS CPUs the load result is interlocked:If you try to use the result too early, the CPU stops until the data arrives. But on early MIPS CPUs, there were no interlocks, and the attempt to use data in the load delay slot led to unpredictable results.

1.5.4 程序员可见的流水线效果

延迟分支:MIPS CPU的流水线结构(见图1-3)意味着,当跳转/分支指令到达执行阶段并且新的程序计数器已经产生时,紧随其后的下一条指令已经开始执行了。MIPS体系架构并没有抛弃这部分潜在的有用工作,而是规定分支之后的指令总是在分支目标指令之前执行。紧随分支指令之后的位置称为分支延迟槽。

--------------------------------------------------------------
编译器或汇编程序可以考虑充分利用分支延迟,比如在延迟槽中填充有用的指令,把原来放在分支指令之前的指令移动到延迟槽中。

############################################
例如《VxWorks BSP for AMD's AU1500(MIPS)》的V100R001CPE\romMipsInit.s中定义ROM异常入口点时,XVECENT宏中利用分支延迟槽传参。
#define XVECENT(f,bev) b f; li k0,bev
############################################

对条件分支来说需要很小心,分支延迟指令必须对(至少)两条路径都不会产生不良影响。在没有任何可用操作时,延迟槽将填充空指令(nop)。

--------------------------------------------------------------
数据加载延迟(加载延迟槽):流水线的另一后果是数据加载指令的数据在下一条指令的ALU阶段启动之后才能从缓存或内存系统中取得,于是下一条指令便不能使用该数据。图1-4显示了此工作原理。
紧跟在加载指令后的指令位置叫做加载延迟槽,优化的编译器可以用延迟槽来做一些有用的工作。这一特性对程序员是透明的,很多时候汇编器向该延迟槽填入一个空操作指令(nop)。
在现代MIPS CPU中加载结果会导致互锁:如果你想过早使用该数据,CPU就会停下来直到该数据到达。但早期的MIPSCPU中没有互锁硬件,任何企图提前使用加载延迟槽中的数据的操作都将导致无法预料的结果。

############################################
XVECENT宏中的分支延迟槽执行数据加载,但是由于数据加载延迟,不能在f(romExcHandle)的第一条指令中访问k0参数,或者在li后添加一条nop指令。
############################################

8.7.8 Jumps, Subroutine Calls, and Branches
The MIPS architecture follows Motorola nomenclature for these instructions, as follows:
(1) PC-relative instructions are called“branch” and absolute-addressed instructions “jump”; the operation mnemonics begin with b orj.
(2) A subroutine call is “jump and link” or“branch and link” and the mne-monics end . . .al.
(3) All the branch instructions, even branch and link, are conditional, testing one or two registers. Unconditional versions can be and are readily synthesized—for example, beq $0, $0, label.
j: This instruction transfers control unconditionally to an absolute address. Actually, j doesn't quite manage a 32-bit address: The top 4 address bits of the target are not defined by the instruction, and the top 4 bits of the current PC value are used instead. Most of the time this doesn't matter; 28 bits still gives a maximum code size of 256 MB.
To reach a long way away, you must use thejr(jump to register) instruction,which is also used for computed jumps. You can write the j mnemonic with aregister, but it’s quite popular not to do so.
jal , jalr: These implement a direct and indirect subroutine call. As well as jumping to the specified address, they store the return address (the instruction's own address+8) in register ra, which is the alias for $31. Why add eight to the program counter? Remember that jump instructions,like branches, always execute theimmediately following branch delay slot instruction, sothe return
address needs to be the instruction after the branch delay slot. Subroutine return is done with a jump to register, most often jr ra.
PC-relative subroutine calls can use the bal,bgezal, andbltzalinstructions. The conditional branch-and-link instructions save their return address into ra even when the condition is false, which can be useful when doing a computation using the current instruction's address.
b: Unconditional PC-relative (though relatively short-range) branch.
bal: PC-relative function call.

8.7.8 跳转, 分支和子程序调用指令

MIPS体系架构的跳转、分支和子程序调用指令沿用了Motorola命名法:
(1)PC相对地址跳转指令称为“分支”,绝对地址跳转指令称为“跳转”,相应的助记符分别以b和j开头。
(2)子程序调用称为“跳转并链接”或“分支并链接”,相应的助记符以al结尾。
(3)所有的分支指令,包括分支并链接指令都是有条件的跳转,需要对某些寄存器进行测试。无条件的分支指令可以很容易地由其他指令合成,比如“beq $0, $0, label”——$0总是等于它自己,故eq总成立,从而实现了无条件跳转。例如b label=>beq $zero, $zero, offs,bal label=>bgezal $zero, offs。
j:该指令无条件跳转到一个绝对地址。实际上,j指令跳转到的地址并不是直接指定32位的地址(所有MIPS指令都是32位长,不可能全部用于编址数据域,那样的指令是无效的,也许只有nop):由于目的地址的最高4位无法在指令的编码中给出,32位地址的最高4位取值当前PC的最高4位。对于一般的程序而言,28位地址所支持的256MB跳转空间已经足够大了。

要实现更远程的跳转,必须使用jr指令跳转到指定寄存器中,该指令也用于需要计算合成跳转目标地址的情形。你可以使用j助记符后面紧跟一个寄存器表示寄存器跳转,不过一般不推荐这么做。

jal、jalr:这两条指令分别实现了直接和间接子程序调用。在跳转到指定地址实现子程序调用的同时,需要将返回地址(当前指令地址+8)保存到ra($31)寄存器中。为什么是当前指令地址加8呢?这是因为紧随跳转指令之后有一条延迟槽nop指令,加8刚好是延迟槽后面的那条有效指令。从子程序返回是通过寄存器跳转完成,通常调用jr ra。

基于PC相对寻址的位置无关子程序调用通过bal、bgezal和bltzal指令完成。条件分支和链接指令即使在条件为假的情况下,也会将它们的返回地址保存到ra中,这有利于需要计算当前指令地址的场合。
b:相对当前指令地址(PC)的无条件短距离跳转指令。
bal:基于当前指令地址(PC)的函数调用指令。

说明:

1.关于流水线,可参考《大话处理器》中第4章《微架构——处理器的内心世界》,其中“跟着顺溜学流水线”也许有助理解。
2.“jump and link” or “branch and link”中的link(链接),在语义的理解上,我们可以类比联想HTML中的href标签(超链接)。
3.在跳转到指定地址实现子程序调用的同时,需要将返回地址保存到ra寄存器,即通常所说的“函数调用的现场保护”,以便子程序返回时能够继续调用之前的流程。对于跳转/分支指令,MIPS CPU将自动保存ra;若子程序需要嵌套调用其他子程序,则必须先存储ra,通常是压入栈,子程序末尾弹出之前保存的ra,然后jr到ra。

--电子创新网--
粤ICP备12070055号