This post explores RISC-V assembly by examining the implementation of the setjmp
and longjmp
functions from the C standard library. I frequently find that I grasp concepts more quickly when I have actual code that I can disassemble because it allows me to connect information with intent. I believe RISC-V and similar efforts will fundamentally shift how computers are made and programmed. I hope that sharing my knowledge will inspire the same joy in others that I feel when imagining a future of open hardware.
Note: We will be using the RISC-V GNU toolchain throughout this post. If you would like to follow along, you can cross-compile the toolchain for a RISC-V target, or you can download a prebuilt toolchain and emulator. You can also compile and run the programs using a different target toolchain, but the assembly dump will be specific to that architecture. If you are more familiar with a different ISA, this may be a useful way to learn about RISC-V!
A Look at Local Jumps Link to heading
Jumps are one of the fundamental components of control flow in programming. Nearly any Instruction Set Architecture (ISA) is going to have a jump instruction that lets you modify the program counter to execute the next instruction from the specified location in memory. We also see control flow statements in higher level languages that provide similar logic. An example of this would be the goto
statement in C. Using goto
is frowned upon by many C programmers due to the ability to create horribly complex code, but it does have useful applications, such as centralized executing of functions, as described in the Linux kernel coding style guide. It allows you to define a label, then “go to” that place in your program and resume execution from there. For example, the following program will print Here
followed by a newline character over and over again.
goto.c
#include <stdio.h>
int main()
{
jumptome:
printf("Here\n");
goto jumptome;
}
gcc goto.c -o goto
./goto
Let’s take a look at the generated assembly for main()
using objdump
:
0000000000010158 <main>:
10158: 1141 addi sp,sp,-16
1015a: e406 sd ra,8(sp)
1015c: e022 sd s0,0(sp)
1015e: 0800 addi s0,sp,16
10160: 67c9 lui a5,0x12
10162: 6c078513 addi a0,a5,1728 # 126c0 <__errno+0xe>
10166: 1fc000ef jal ra,10362 <puts>
1016a: bfdd j 10160 <main+0x8>
As you can see, using goto
in the program translates directly to the j
RISC-V instruction, which jumps to memory address 10160, causing the processor to continuously execute our printf
statement. It would be much more clear for us to use a while
statement here, but we will actually get the exact same assembly output with an infinite loop:
while.c
#include <stdio.h>
int main()
{
while (1)
{
printf("Here\n");
}
}
gcc while.c -o while
objdump -D while
0000000000010158 <main>:
10158: 1141 addi sp,sp,-16
1015a: e406 sd ra,8(sp)
1015c: e022 sd s0,0(sp)
1015e: 0800 addi s0,sp,16
10160: 67c9 lui a5,0x12
10162: 6c078513 addi a0,a5,1728 # 126c0 <__errno+0xe>
10166: 1fc000ef jal ra,10362 <puts>
1016a: bfdd j 10160 <main+0x8>
However, goto
can provide more functionality than a loop, and is specifically useful for breaking out of a set of deeply nested loops. The aforementioned Linux kernel style guide gives the following example for an appropriate use of goto
:
int fun(int a)
{
int result = 0;
char *buffer;
buffer = kmalloc(SIZE, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
if (condition1) {
while (loop1) {
...
}
result = 1;
goto out_free_buffer;
}
...
out_free_buffer:
kfree(buffer);
return result;
}
In this case, a descriptive label is being used to define a specific error path. Though only a small part of the function body is included here, you can imagine that there could be multiple stages in which the allocated buffer could become full, all of which you would handle by freeing the memory and returning the result. While some may still advocate for never using goto
, this demonstrates that there are some benefits, such has not needing to duplicate redundant code throughout the function body.
Non-Local Jumps Link to heading
Unfortunately (or fortunately if you are a strong proponent of never using goto
), it only is valid in a local context. You cannot jump to a label outside of the function in which you are currently executing. For this reason, setjmp
and longjmp
were added to the C standard library to support non-local jumps. Let’s take a look at a minimal example of using these functions.
minimal.c
#include <stdio.h>
#include <setjmp.h>
static jmp_buf buf;
void b()
{
printf("in function b\n");
longjmp(buf, 1);
}
void a()
{
printf("in function a\n");
if (setjmp(buf))
printf("back in function a\n");
else
b();
}
int main()
{
a();
}
gcc minimal.c -o minimal
./minimal
in function a
in function b
back in function a
We can get a good understanding of what is going on here by taking a look at the setjmp Linux manual page. Specifically for this program, the following portions of the description are important:
In this case, setjmp() returns 0.
The longjmp() function uses the information saved in env to transfer control back to the point where setjmp() was called and to restore (“rewind”) the stack to its state at the time of the setjmp() call.
Following a successful longjmp(), execution continues as if setjmp() had returned for a second time.
In the simplest of terms, these two functions allow us to save an address and return to it at a later point in execution. Behind the scenes, other values are also being saved into the buf
, which we will look at momentarily. First, let’s see what the actual assembly output for a 64-bit RISC-V target looks like.
objdump -S minimal
00000000000103c4 <setjmp>:
103c4: 00153023 sd ra,0(a0)
103c8: e500 sd s0,8(a0)
103ca: e904 sd s1,16(a0)
103cc: 01253c23 sd s2,24(a0)
103d0: 03353023 sd s3,32(a0)
103d4: 03453423 sd s4,40(a0)
103d8: 03553823 sd s5,48(a0)
103dc: 03653c23 sd s6,56(a0)
103e0: 05753023 sd s7,64(a0)
103e4: 05853423 sd s8,72(a0)
103e8: 05953823 sd s9,80(a0)
103ec: 05a53c23 sd s10,88(a0)
103f0: 07b53023 sd s11,96(a0)
103f4: 06253423 sd sp,104(a0)
103f8: b920 fsd fs0,112(a0)
103fa: bd24 fsd fs1,120(a0)
103fc: 09253027 fsd fs2,128(a0)
10400: 09353427 fsd fs3,136(a0)
10404: 09453827 fsd fs4,144(a0)
10408: 09553c27 fsd fs5,152(a0)
1040c: 0b653027 fsd fs6,160(a0)
10410: 0b753427 fsd fs7,168(a0)
10414: 0b853827 fsd fs8,176(a0)
10418: 0b953c27 fsd fs9,184(a0)
1041c: 0da53027 fsd fs10,192(a0)
10420: 0db53427 fsd fs11,200(a0)
10424: 4501 li a0,0
10426: 8082 ret
Here is the implementation of setjmp
in RISC-V assembly. Before diving too far in, it is important to understand the registers in the RISC-V architecture. Since we are using a 64-bit machine, each of the 32 general purpose registers is 64 bits wide. Though each of the registers is classified as general purpose, there are calling conventions that most compilers will adhere to.
Register | ABI Name | Description | Saver |
---|---|---|---|
x0 | zero | hardwired zero | - |
x1 | ra | return address | Caller |
x2 | sp | stack pointer | Callee |
x3 | gp | global pointer | - |
x4 | tp | thread pointer | - |
x5-7 | t0-2 | temporary registers | Caller |
x8 | s0 / fp | saved register / frame pointer | Callee |
x9 | s1 | saved register | Callee |
x10-11 | a0-1 | function arguments / return values | Caller |
x12-17 | a2-7 | function arguments | Caller |
x18-27 | s2-11 | saved registers | Callee |
x28-31 | t3-6 | temporary registers | Caller |
In addition to the registers, we must understand the few pseudo instructions that setjmp
makes use of.
sd
: store doubleword (stores the value in the register specified by the first operand into the address specified by the second)fsd
: the floating point counterpart tosd
li
: load immediate (loads the second operand directly into the register specified by the first)
If you are interested in checking out all of the instructions available when writing RISC-V assembly, take a look at the programmer’s manual.
So what exactly are we doing here? The behavior of setjmp
is specified as storing information about the calling function’s environment into the buffer. If you look back to the source code of minimal.c
, you’ll see that we are passing the buffer of type jmp_buf
into the setjmp
function in a()
. If we look at the dump of a()
you can see that we are jumping and linking (jal
) to the address of the setjmp
function:
0000000000010174 <a>:
10174: 1141 addi sp,sp,-16
10176: e406 sd ra,8(sp)
10178: e022 sd s0,0(sp)
1017a: 0800 addi s0,sp,16
1017c: 67c9 lui a5,0x12
1017e: 7f078513 addi a0,a5,2032 # 127f0 <__errno+0x1a>
10182: 238000ef jal ra,103ba <puts>
10186: 70018513 addi a0,gp,1792 # 14830 <buf>
1018a: 23a000ef jal ra,103c4 <setjmp>
1018e: 87aa mv a5,a0
10190: c799 beqz a5,1019e <a+0x2a>
10192: 67cd lui a5,0x13
10194: 80078513 addi a0,a5,-2048 # 12800 <__errno+0x2a>
10198: 222000ef jal ra,103ba <puts>
1019c: a019 j 101a2 <a+0x2e>
1019e: fbbff0ef jal ra,10158 <b>
101a2: 0001 nop
101a4: 60a2 ld ra,8(sp)
101a6: 6402 ld s0,0(sp)
101a8: 0141 addi sp,sp,16
101aa: 8082 ret
Jump and link just means that after the function we are jumping to finishes execution that it will return to where it was called (this is achieved by storing the current address into register x1
(ra
), the return address register). When we jump to setjmp
it is going to save information about the environment into buf
, which will later be used to restore the environment when longjmp
is called.
In setjmp
, we see a long sequence of store instructions (sd
/ fd
). Each is taking a value from a register and storing it into buf
, which is essentially a sequence of long int
values, defined per target architecture in the C standard library. You can see the RISC-V implementation here (you can also see where our setjmp
and longjmp
assembly is coming from here). Most of these store operations are just moving saved registers into buf
, but three are of primary importance in our simple example:
103c4: 00153023 sd ra,0(a0)
103c8: e500 sd s0,8(a0)
...
103f4: 06253423 sd sp,104(a0)
The first instruction is storing the ra
register into the first entry in buf
(the number preceding (a0)
is the address offset in bytes; remember that buf
values are stored sequentially). Our jal
instruction in a()
set ra
to the calling address, meaning the first entry in buf
will have the information to return execution to the exact location that we called setjmp
in the future. The second two instructions are storing the frame pointer and the stack pointer into buf
. This will be important because we will also want our stack restored when we return with longjmp
.
We finish up by setting the a0
register to 0, as we previously saw detailed in the specification that setjmp
must return with value 0 on direct invocation, then we return (ret
), which will take us back to where we called setjmp
.
Our a()
function will continue execution with an evaluation of the return value of setjmp
. Since it is 0, our if
statement will evaluate to false
, and we will call b()
.
...
if (setjmp(buf))
printf("back in function a\n");
else
b();
...
Now in b()
, we print our statement, then call longjmp
with buf
and our desired return value (1). Let’s take a look at the assembly for b()
:
0000000000010158 <b>:
10158: 1141 addi sp,sp,-16
1015a: e406 sd ra,8(sp)
1015c: e022 sd s0,0(sp)
1015e: 0800 addi s0,sp,16
10160: 67c9 lui a5,0x12
10162: 7e078513 addi a0,a5,2016 # 127e0 <__errno+0xa>
10166: 254000ef jal ra,103ba <puts>
1016a: 4585 li a1,1
1016c: 70018513 addi a0,gp,1792 # 14830 <buf>
10170: 2b8000ef jal ra,10428 <longjmp>
As you can see, the final instruction is to jump and link to longjmp
. However, we will not return to the address that we called from because we are going to overwrite ra
in longjmp
(you can test that we do not return to b()
by placing another print statement after the call to longjmp
). Now let’s take a closer look at our longjmp
implementation.
0000000000010428 <longjmp>:
10428: 00053083 ld ra,0(a0)
1042c: 6500 ld s0,8(a0)
1042e: 6904 ld s1,16(a0)
10430: 01853903 ld s2,24(a0)
10434: 02053983 ld s3,32(a0)
10438: 02853a03 ld s4,40(a0)
1043c: 03053a83 ld s5,48(a0)
10440: 03853b03 ld s6,56(a0)
10444: 04053b83 ld s7,64(a0)
10448: 04853c03 ld s8,72(a0)
1044c: 05053c83 ld s9,80(a0)
10450: 05853d03 ld s10,88(a0)
10454: 06053d83 ld s11,96(a0)
10458: 06853103 ld sp,104(a0)
1045c: 3920 fld fs0,112(a0)
1045e: 3d24 fld fs1,120(a0)
10460: 08053907 fld fs2,128(a0)
10464: 08853987 fld fs3,136(a0)
10468: 09053a07 fld fs4,144(a0)
1046c: 09853a87 fld fs5,152(a0)
10470: 0a053b07 fld fs6,160(a0)
10474: 0a853b87 fld fs7,168(a0)
10478: 0b053c07 fld fs8,176(a0)
1047c: 0b853c87 fld fs9,184(a0)
10480: 0c053d07 fld fs10,192(a0)
10484: 0c853d87 fld fs11,200(a0)
10488: 0015b513 seqz a0,a1
1048c: 952e add a0,a0,a1
1048e: 8082 ret
It looks a lot like setjmp
, but instead of storing into buf
, we are now loading back into registers. Our return address will be set to 10198
(the first entry in buf
, 0(a0)
), the point in a()
where we originally called setjmp
. Similarly, our stack pointer (sp
) and frame pointer (s0
) are going to point to the same addresses as they did when we initially called setjmp
.
Note: Resetting the frame and stack pointers can cause surprising behavior. Think about how returning from the function that calls setjmp
before returning to execution inside of it could lead to errors on the stack. When a function returns, the stack pointer is restored to the frame pointer, meaning values stored on the stack can be overwritten. In fact, our minimal example is susceptible to this behavior, but it does not appear because our two functions’ behavior is not dependent on the stack.
The last three instructions once again implement part of the specification in the Linux manual page:
This “fake” return can be distinguished from a true setjmp() call because the “fake” return returns the value provided in val. If the programmer mistakenly passes the value 0 in val, the “fake” return will instead return 1.
To do so, it uses two new instructions:
seqz
: set if equal to zero (sets value of first operand to 1 if second operand equals 0, otherwise sets to 0)add
: add (sets value of first operand to the sum of the second two)
These effectively work together to make sure that longjmp
either returns the int
passed to it (in register a1
) or 1. If a1
is equal to 0, the seqz
will set a0
to 1, then add
will add a0
(1) and a1
(0) and store the result (1) in a0
. If a1
does not equal 0, seqz
will set a0
to 0, then add
will add a0
(0) and a1
(passed value) and store the result (passed value) in a0
. Then we will return to the address specified in ra
, which we restored from buf
.
Wrapping Up Link to heading
Non-local jumps are not likely to be used extensively in most projects. However, they do a suitable job of demonstrating how higher-level programming concepts translate to machine code. They also introduce the beginning concepts of how concurrency can be implemented. In fact, setjmp
and longjmp
have been used to implement basic coroutines.
Send me a message @hasheddan on Twitter for any questions or comments!