Understanding Non-Local Jumps (setjmp/longjmp) in RISC-V Assembly

October 25, 2020

This post explores RISC-V assembly by examining the implementation of the setjmp and longjmp functions from the C standard library. I frequently find that I grasp concepts more quickly when I have actual code that I can disassemble because it allows me to connect information with intent. I believe RISC-V and similar efforts will fundamentally shift how computers are made and programmed. I hope that sharing my knowledge will inspire the same joy in others that I feel when imagining a future of open hardware.

Note: We will be using the RISC-V GNU toolchain throughout this post. If you would like to follow along, you can cross-compile the toolchain for a RISC-V target, or you can download a prebuilt toolchain and emulator. You can also compile and run the programs using a different target toolchain, but the assembly dump will be specific to that architecture. If you are more familiar with a different ISA, this may be a useful way to learn about RISC-V!

A Look at Local Jumps

Jumps are one of the fundamental components of control flow in programming. Nearly any Instruction Set Architecture (ISA) is going to have a jump instruction that lets you modify the program counter to execute the next instruction from the specified location in memory. We also see control flow statements in higher level languages that provide similar logic. An example of this would be the goto statement in C. Using goto is frowned upon by many C programmers due to the ability to create horribly complex code, but it does have useful applications, such as centralized executing of functions, as described in the Linux kernel coding style guide. It allows you to define a label, then “go to” that place in your program and resume execution from there. For example, the following program will print Here followed by a newline character over and over again.

goto.c


#include <stdio.h>

int main()
{
jumptome:
    printf("Here\n");
    goto jumptome;
}

gcc goto.c -o goto
./goto

Let’s take a look at the generated assembly for main() using objdump:


0000000000010158 <main>:
   10158:    1141                    addi   sp,sp,-16
   1015a:    e406                    sd     ra,8(sp)
   1015c:    e022                    sd     s0,0(sp)
   1015e:    0800                    addi   s0,sp,16
   10160:    67c9                    lui    a5,0x12
   10162:    6c078513                addi   a0,a5,1728 # 126c0 <__errno+0xe>
   10166:    1fc000ef                jal    ra,10362 <puts>
   1016a:    bfdd                    j      10160 <main+0x8>

As you can see, using goto in the program translates directly to the j RISC-V instruction, which jumps to memory address 10160, causing the processor to continuously execute our printf statement. It would be much more clear for us to use a while statement here, but we will actually get the exact same assembly output with an infinite loop:

while.c


#include <stdio.h>

int main()
{
    while (1)
    {
        printf("Here\n");
    }
}

gcc while.c -o while
objdump -D while

0000000000010158 <main>:
   10158:    1141                    addi    sp,sp,-16
   1015a:    e406                    sd      ra,8(sp)
   1015c:    e022                    sd      s0,0(sp)
   1015e:    0800                    addi    s0,sp,16
   10160:    67c9                    lui     a5,0x12
   10162:    6c078513                addi    a0,a5,1728 # 126c0 <__errno+0xe>
   10166:    1fc000ef                jal     ra,10362 <puts>
   1016a:    bfdd                    j       10160 <main+0x8>

However, goto can provide more functionality than a loop, and is specifically useful for breaking out of a set of deeply nested loops. The aforementioned Linux kernel style guide gives the following example for an appropriate use of goto:


int fun(int a)
{
    int result = 0;
    char *buffer;

    buffer = kmalloc(SIZE, GFP_KERNEL);
    if (!buffer)
        return -ENOMEM;

    if (condition1) {
        while (loop1) {
            ...
        }
        result = 1;
        goto out_free_buffer;
    }
    ...
out_free_buffer:
    kfree(buffer);
    return result;
}

In this case, a descriptive label is being used to define a specific error path. Though only a small part of the function body is included here, you can imagine that there could be multiple stages in which the allocated buffer could become full, all of which you would handle by freeing the memory and returning the result. While some may still advocate for never using goto, this demonstrates that there are some benefits, such has not needing to duplicate redundant code throughout the function body.

Non-Local Jumps

Unfortunately (or fortunately if you are a strong proponent of never using goto), it only is valid in a local context. You cannot jump to a label outside of the function in which you are currently executing. For this reason, setjmp and longjmp were added to the C standard library to support non-local jumps. Let’s take a look at a minimal example of using these functions.

minimal.c


#include <stdio.h>
#include <setjmp.h>

static jmp_buf buf;

void b()
{
    printf("in function b\n");
    longjmp(buf, 1);
}

void a()
{
    printf("in function a\n");
    if (setjmp(buf))
        printf("back in function a\n");
    else
        b();
}

int main()
{
    a();
}

gcc minimal.c -o minimal
./minimal

in function a
in function b
back in function a

We can get a good understanding of what is going on here by taking a look at the setjmp Linux manual page. Specifically for this program, the following portions of the description are important:

In this case, setjmp() returns 0.

The longjmp() function uses the information saved in env to transfer control back to the point where setjmp() was called and to restore (“rewind”) the stack to its state at the time of the setjmp() call.

Following a successful longjmp(), execution continues as if setjmp() had returned for a second time.

In the simplest of terms, these two functions allow us to save an address and return to it at a later point in execution. Behind the scenes, other values are also being saved into the buf, which we will look at momentarily. First, let’s see what the actual assembly output for a 64-bit RISC-V target looks like.

objdump -S minimal

00000000000103c4 <setjmp>:
   103c4:    00153023              sd    ra,0(a0)
   103c8:    e500                  sd    s0,8(a0)
   103ca:    e904                  sd    s1,16(a0)
   103cc:    01253c23              sd    s2,24(a0)
   103d0:    03353023              sd    s3,32(a0)
   103d4:    03453423              sd    s4,40(a0)
   103d8:    03553823              sd    s5,48(a0)
   103dc:    03653c23              sd    s6,56(a0)
   103e0:    05753023              sd    s7,64(a0)
   103e4:    05853423              sd    s8,72(a0)
   103e8:    05953823              sd    s9,80(a0)
   103ec:    05a53c23              sd    s10,88(a0)
   103f0:    07b53023              sd    s11,96(a0)
   103f4:    06253423              sd    sp,104(a0)
   103f8:    b920                  fsd    fs0,112(a0)
   103fa:    bd24                  fsd    fs1,120(a0)
   103fc:    09253027              fsd    fs2,128(a0)
   10400:    09353427              fsd    fs3,136(a0)
   10404:    09453827              fsd    fs4,144(a0)
   10408:    09553c27              fsd    fs5,152(a0)
   1040c:    0b653027              fsd    fs6,160(a0)
   10410:    0b753427              fsd    fs7,168(a0)
   10414:    0b853827              fsd    fs8,176(a0)
   10418:    0b953c27              fsd    fs9,184(a0)
   1041c:    0da53027              fsd    fs10,192(a0)
   10420:    0db53427              fsd    fs11,200(a0)
   10424:    4501                  li    a0,0
   10426:    8082                  ret

Here is the implementation of setjmp in RISC-V assembly. Before diving too far in, it is important to understand the registers in the RISC-V architecture. Since we are using a 64-bit machine, each of the 32 general purpose registers is 64 bits wide. Though each of the registers is classified as general purpose, there are calling conventions that most compilers will adhere to.

Register ABI Name Description Saver
x0 zero hardwired zero -
x1 ra return address Caller
x2 sp stack pointer Callee
x3 gp global pointer -
x4 tp thread pointer -
x5-7 t0-2 temporary registers Caller
x8 s0 / fp saved register / frame pointer Callee
x9 s1 saved register Callee
x10-11 a0-1 function arguments / return values Caller
x12-17 a2-7 function arguments Caller
x18-27 s2-11 saved registers Callee
x28-31 t3-6 temporary registers Caller

In addition to the registers, we must understand the few pseudo instructions that setjmp makes use of.

  • sd: store doubleword (stores the value in the register specified by the first operand into the address specified by the second)
  • fsd: the floating point counterpart to sd
  • li: load immediate (loads the second operand directly into the register specified by the first)

If you are interested in checking out all of the instructions available when writing RISC-V assembly, take a look at the programmer’s manual.

So what exactly are we doing here? The behavior of setjmp is specified as storing information about the calling function’s environment into the buffer. If you look back to the source code of minimal.c, you’ll see that we are passing the buffer of type jmp_buf into the setjmp function in a(). If we look at the dump of a() you can see that we are jumping and linking (jal) to the address of the setjmp function:


0000000000010174 <a>:
   10174:    1141                    addi    sp,sp,-16
   10176:    e406                    sd      ra,8(sp)
   10178:    e022                    sd      s0,0(sp)
   1017a:    0800                    addi    s0,sp,16
   1017c:    67c9                    lui     a5,0x12
   1017e:    7f078513                addi    a0,a5,2032 # 127f0 <__errno+0x1a>
   10182:    238000ef                jal     ra,103ba <puts>
   10186:    70018513                addi    a0,gp,1792 # 14830 <buf>
   1018a:    23a000ef                jal     ra,103c4 <setjmp>
   1018e:    87aa                    mv      a5,a0
   10190:    c799                    beqz    a5,1019e <a+0x2a>
   10192:    67cd                    lui     a5,0x13
   10194:    80078513                addi    a0,a5,-2048 # 12800 <__errno+0x2a>
   10198:    222000ef                jal     ra,103ba <puts>
   1019c:    a019                    j       101a2 <a+0x2e>
   1019e:    fbbff0ef                jal     ra,10158 <b>
   101a2:    0001                    nop
   101a4:    60a2                    ld      ra,8(sp)
   101a6:    6402                    ld      s0,0(sp)
   101a8:    0141                    addi    sp,sp,16
   101aa:    8082                    ret

Jump and link just means that after the function we are jumping to finishes execution that it will return to where it was called (this is achieved by storing the current address into register x1 (ra), the return address register). When we jump to setjmp it is going to save information about the environment into buf, which will later be used to restore the environment when longjmp is called.

In setjmp, we see a long sequence of store instructions (sd / fd). Each is taking a value from a register and storing it into buf, which is essentially a sequence of long int values, defined per target architecture in the C standard library. You can see the RISC-V implementation here (you can also see where our setjmp and longjmp assembly is coming from here). Most of these store operations are just moving saved registers into buf, but three are of primary importance in our simple example:


   103c4:    00153023              sd    ra,0(a0)
   103c8:    e500                  sd    s0,8(a0)
   ...
   103f4:    06253423              sd    sp,104(a0)

The first instruction is storing the ra register into the first entry in buf (the number preceding (a0) is the address offset in bytes; remember that buf values are stored sequentially). Our jal instruction in a() set ra to the calling address, meaning the first entry in buf will have the information to return execution to the exact location that we called setjmp in the future. The second two instructions are storing the frame pointer and the stack pointer into buf. This will be important because we will also want our stack restored when we return with longjmp.

We finish up by setting the a0 register to 0, as we previously saw detailed in the specification that setjmp must return with value 0 on direct invocation, then we return (ret), which will take us back to where we called setjmp.

Our a() function will continue execution with an evaluation of the return value of setjmp. Since it is 0, our if statement will evaluate to false, and we will call b().

    ...
    if (setjmp(buf))
        printf("back in function a\n");
    else
        b();
    ...

Now in b(), we print our statement, then call longjmp with buf and our desired return value (1). Let’s take a look at the assembly for b():


0000000000010158 <b>:
   10158:    1141                    addi    sp,sp,-16
   1015a:    e406                    sd      ra,8(sp)
   1015c:    e022                    sd      s0,0(sp)
   1015e:    0800                    addi    s0,sp,16
   10160:    67c9                    lui     a5,0x12
   10162:    7e078513                addi    a0,a5,2016 # 127e0 <__errno+0xa>
   10166:    254000ef                jal     ra,103ba <puts>
   1016a:    4585                    li      a1,1
   1016c:    70018513                addi    a0,gp,1792 # 14830 <buf>
   10170:    2b8000ef                jal     ra,10428 <longjmp>

As you can see, the final instruction is to jump and link to longjmp. However, we will not return to the address that we called from because we are going to overwrite ra in longjmp (you can test that we do not return to b() by placing another print statement after the call to longjmp). Now let’s take a closer look at our longjmp implementation.

0000000000010428 <longjmp>:
   10428:    00053083                  ld        ra,0(a0)
   1042c:    6500                      ld        s0,8(a0)
   1042e:    6904                      ld        s1,16(a0)
   10430:    01853903                  ld        s2,24(a0)
   10434:    02053983                  ld        s3,32(a0)
   10438:    02853a03                  ld        s4,40(a0)
   1043c:    03053a83                  ld        s5,48(a0)
   10440:    03853b03                  ld        s6,56(a0)
   10444:    04053b83                  ld        s7,64(a0)
   10448:    04853c03                  ld        s8,72(a0)
   1044c:    05053c83                  ld        s9,80(a0)
   10450:    05853d03                  ld        s10,88(a0)
   10454:    06053d83                  ld        s11,96(a0)
   10458:    06853103                  ld        sp,104(a0)
   1045c:    3920                      fld        fs0,112(a0)
   1045e:    3d24                      fld        fs1,120(a0)
   10460:    08053907                  fld        fs2,128(a0)
   10464:    08853987                  fld        fs3,136(a0)
   10468:    09053a07                  fld        fs4,144(a0)
   1046c:    09853a87                  fld        fs5,152(a0)
   10470:    0a053b07                  fld        fs6,160(a0)
   10474:    0a853b87                  fld        fs7,168(a0)
   10478:    0b053c07                  fld        fs8,176(a0)
   1047c:    0b853c87                  fld        fs9,184(a0)
   10480:    0c053d07                  fld        fs10,192(a0)
   10484:    0c853d87                  fld        fs11,200(a0)
   10488:    0015b513                  seqz       a0,a1
   1048c:    952e                      add        a0,a0,a1
   1048e:    8082                      ret

It looks a lot like setjmp, but instead of storing into buf, we are now loading back into registers. Our return address will be set to 10198 (the first entry in buf, 0(a0)), the point in a() where we originally called setjmp. Similarly, our stack pointer (sp) and frame pointer (s0) are going to point to the same addresses as they did when we initially called setjmp.

Note: Resetting the frame and stack pointers can cause surprising behavior. Think about how returning from the function that calls setjmp before returning to execution inside of it could lead to errors on the stack. When a function returns, the stack pointer is restored to the frame pointer, meaning values stored on the stack can be overwritten. In fact, our minimal example is susceptible to this behavior, but it does not appear because our two functions’ behavior is not dependent on the stack.

The last three instructions once again implement part of the specification in the Linux manual page:

This “fake” return can be distinguished from a true setjmp() call because the “fake” return returns the value provided in val. If the programmer mistakenly passes the value 0 in val, the “fake” return will instead return 1.

To do so, it uses two new instructions:

  • seqz: set if equal to zero (sets value of first operand to 1 if second operand equals 0, otherwise sets to 0)
  • add: add (sets value of first operand to the sum of the second two)

These effectively work together to make sure that longjmp either returns the int passed to it (in register a1) or 1. If a1 is equal to 0, the seqz will set a0 to 1, then add will add a0 (1) and a1 (0) and store the result (1) in a0. If a1 does not equal 0, seqz will set a0 to 0, then add will add a0 (0) and a1 (passed value) and store the result (passed value) in a0. Then we will return to the address specified in ra, which we restored from buf.

Wrapping Up

Non-local jumps are not likely to be used extensively in most projects. However, they do a suitable job of demonstrating how higher-level programming concepts translate to machine code. They also introduce the beginning concepts of how concurrency can be implemented. In fact, setjmp and longjmp have been used to implement basic coroutines.

Send me a message @hasheddan on Twitter for any questions or comments!