RISC-V Bytes: Go 1.19's Register-Based Calling Convention

August 8, 2022

In our last post in the RISC-V Bytes series, I briefly alluded to the proposal to switch the Go Application Binary Interface (ABI) from a stack-based calling convention to a register-based calling convention. I also mentioned that it appeared at that time that the RISC-V port would support the new calling convention as early as Go 1.19.

Last week, Go 1.19 was officially released, and sure enough, tucked in the release notes was a section mentioning that the riscv64 port now supports passing arguments and results using registers. Go 1.19 is notably lighter than the 1.18 release, which included the long-awaited support for generics, but even with the lack of major new features, the RISC-V calling convention change seemed to, anecdotally, garner less attention than other updates. Perhaps this is due to the lack of available hardware for running meaningful RISC-V programs that would traditionally be written in Go (i.e. programs that run in userspace), or perhaps it is less newsworthy due to the fact that Go 1.17 initially added support for the register-based calling convention on linux/amd64, darwin/amd64, and windows/amd64, and 1.18 expanded to all operating systems for amd64, arm64, ppc64, and ppc64le. Needless to say, it is hardly a new feature.

However, readers of this series are likely more interested in RISC-V than the average Go user, and I haven’t seen any in-depth comparison of the calling convention before and after this change, so we are going to dedicate this post to diving in!

Getting Set Up

Before we get started, we need to ensure we have Go 1.18 and Go 1.19 both installed on our system. Fortunately, Go makes it very easy to manage multiple installed versions. The following commands will download go1.18.5, the latest patch release for 1.18, and go1.19.


$ go install golang.org/dl/go1.18.5@latest
go: downloading golang.org/dl v0.0.0-20220802161245-b76e4016dde5
$ go1.18.5 download
Downloaded   0.0% (     3234 / 141855946 bytes) ...
Downloaded   0.4% (   622592 / 141855946 bytes) ...
Downloaded  31.5% ( 44629680 / 141855946 bytes) ...
Downloaded  59.1% ( 83888666 / 141855946 bytes) ...
Downloaded  82.8% (117442840 / 141855946 bytes) ...
Downloaded 100.0% (141855946 / 141855946 bytes)
Unpacking /home/dan/sdk/go1.18.5/go1.18.5.linux-amd64.tar.gz ...
Success. You may now run 'go1.18.5'
$ go install golang.org/dl/go1.19@latest
$ go1.19 download
Downloaded   0.0% (    16384 / 148796421 bytes) ...
Downloaded  29.3% ( 43597536 / 148796421 bytes) ...
Downloaded  67.7% (100760832 / 148796421 bytes) ...
Downloaded 100.0% (148796421 / 148796421 bytes)
Unpacking /home/dan/sdk/go1.19/go1.19.linux-amd64.tar.gz ...
Success. You may now run 'go1.19'

For consistency, we’ll use the same simple program to demonstrate the stack-based and register-based calling conventions.

main.go


package main

import (
	"fmt"
)

//go:noinline
func add(a, b int) int {
	return a + b
}

func main() {
	fmt.Println(add(1, 2))
}

Note: we decorate our add() function with //go:noinline because we are compiling with optimization and want to demonstrate passing arguments and a return value between procedures. Without the directive, the compiler would eliminate the overhead of passing data between the two procedures by bringing the body of add() into main().

1.18 and Stack-Based Calling Convention

To look at how the stack-based calling convention works in Go 1.18, we need to build our program for RISC-V with the go1.18.5 version we installed.

$ GOOS=linux GOARCH=riscv64 go1.18.5 build -o main1.18 main.go

Next, we can jump into the disassembled machine code using objdump and less, and searching for main.main.

$ riscv64-unknown-linux-gnu-objdump -D main1.18 | less

We should find main.add right before main.main in the dump.


00000000000900c0 <main.add>:
   900c0:       00813283                ld      t0,8(sp)
   900c4:       01013303                ld      t1,16(sp)
   900c8:       006282b3                add     t0,t0,t1
   900cc:       00513c23                sd      t0,24(sp)
   900d0:       00008067                ret
   900d4:       0000                    unimp
        ...

00000000000900d8 <main.main>:
   900d8:       010db503                ld      a0,16(s11)
   900dc:       00256663                bltu    a0,sp,900e8 <main.main+0x10>
   900e0:       b20da2ef                jal     t0,6a400 <runtime.morestack_noctxt>
   900e4:       ff5ff06f                j       900d8 <main.main>
   900e8:       fa113423                sd      ra,-88(sp)
   900ec:       fa810113                addi    sp,sp,-88
   900f0:       00100293                li      t0,1
   900f4:       00513423                sd      t0,8(sp)
   900f8:       00200293                li      t0,2
   900fc:       00513823                sd      t0,16(sp)
   90100:       fc1ff0ef                jal     ra,900c0 <main.add>
   90104:       01813283                ld      t0,24(sp)
   90108:       04013423                sd      zero,72(sp)
   9010c:       04013823                sd      zero,80(sp)
   90110:       00513423                sd      t0,8(sp)
   90114:       8ed880ef                jal     ra,18a00 <runtime.convT64>
   90118:       01013283                ld      t0,16(sp)
   9011c:       00016317                auipc   t1,0x16
   90120:       66430313                addi    t1,t1,1636 # a6780 <type.*+0x6780>
   90124:       04613423                sd      t1,72(sp)
   90128:       04513823                sd      t0,80(sp)
   9012c:       000b8297                auipc   t0,0xb8
   90130:       fc42b283                ld      t0,-60(t0) # 1480f0 <os.Stdout>
   90134:       00044317                auipc   t1,0x44
   90138:       34430313                addi    t1,t1,836 # d4478 <go.itab.*os.File,io.Writer>
   9013c:       00613423                sd      t1,8(sp)
   90140:       00513823                sd      t0,16(sp)
   90144:       04810293                addi    t0,sp,72
   90148:       00513c23                sd      t0,24(sp)
   9014c:       00100293                li      t0,1
   90150:       02513023                sd      t0,32(sp)
   90154:       02513423                sd      t0,40(sp)
   90158:       c08fa0ef                jal     ra,8a560 <fmt.Fprintln>
   9015c:       00013083                ld      ra,0(sp)
   90160:       05810113                addi    sp,sp,88
   90164:       00008067                ret

This looks pretty similar to the assembly in our last post. We’ll focus in specifically on the instructions starting at 900f0 in main.main. We are:

  1. Loading 1 into temporary register t0. (address: 900f0)
  2. Storing the contents of t0 on the stack. (address: 900f4)
  3. Loading 2 into temporary register t1. (address: 900f8)
  4. Storing the contents of t1 on the stack. (address: 900fc)
  5. Jumping to main.add. (address: 90100)

At this point we have both of our arguments that we want to pass to add() (i.e. a and b) on the stack, and we are transferring control. Because add() is a small enough procedure, the compiler avoids performing the customary stack guard check (compare to the first few instructions in main()). The purpose of an ABI is to allow for procedures to communicate data by adhering to a set of rules. In this case, add() expects that its caller (main()) has placed the required arguments on the stack because that is what the Go ABI is dictating. Within add() we are:

  1. Loading the value stored at the memory address at an 8 byte offset from the stack pointer (sp) into temporary register t0. (address: 900c0)
  2. Loading the value stored at the memory address at a 16 byte offset from the stack pointer (sp) into temporary register t1. (address: 900c4)
  3. Adding the contents of t0 and t1. (address: 900c8)
  4. Storing the result in t0 at the memory address at a 24 byte offset from the stack pointer (sp). (address: 900cc)
  5. Returning to the address in the return address register (ra). (address: 900d0) This was automatically set to the address of the next instruction in main.main when we executed jal ra,900c0 <main.add>.

Check yourself: why are values on the stack stored 8 bytes apart from each other? Because we are targeting a 64-bit architecture (i.e. riscv64), the default int size is 64 bits (i.e. 8 bytes).

Similarly, when control is returned to main(), it assumes that add() has placed the return value on the stack, so it loads the values at the memory address at a 24 byte offset from the stack pointer (sp) into t0 (address: 90104), before subsequently storing it on the stack again before calling runtime.convT64 (address: 90114).

You might already be noticing that the load-store nature of RISC-V, coupled with Go’s requirement that arguments and return values be passed on the stack, is creating some unnecessary overhead here.

1.19 and Register-Based Calling Convention

Let’s compile with Go 1.19 and see how the generated machine code changes.

$ GOOS=linux GOARCH=riscv64 go1.19 build -o main1.19 main.go
$ riscv64-unknown-linux-gnu-objdump -D main1.19 | less

We’ll again find main.main and main.add in the disassembly, but the total number of instructions is noticeably less.


000000000008bcb0 <main.add>:
   8bcb0:       00b50533                add     a0,a0,a1
   8bcb4:       00008067                ret
        ...

000000000008bcc0 <main.main>:
   8bcc0:       010db303                ld      t1,16(s11)
   8bcc4:       00236663                bltu    t1,sp,8bcd0 <main.main+0x10>
   8bcc8:       b90dc2ef                jal     t0,68058 <runtime.morestack_noctxt.abi0>
   8bccc:       ff5ff06f                j       8bcc0 <main.main>
   8bcd0:       fc113023                sd      ra,-64(sp)
   8bcd4:       fc010113                addi    sp,sp,-64
   8bcd8:       00113023                sd      ra,0(sp)
   8bcdc:       00100513                li      a0,1
   8bce0:       00200593                li      a1,2
   8bce4:       fcdff0ef                jal     ra,8bcb0 <main.add>
   8bce8:       02013823                sd      zero,48(sp)
   8bcec:       02013c23                sd      zero,56(sp)
   8bcf0:       b488c0ef                jal     ra,18038 <runtime.convT64>
   8bcf4:       0000b297                auipc   t0,0xb
   8bcf8:       d2c28293                addi    t0,t0,-724 # 96a20 <type.*+0x6a20>
   8bcfc:       02513823                sd      t0,48(sp)
   8bd00:       02a13c23                sd      a0,56(sp)
   8bd04:       000bc597                auipc   a1,0xbc
   8bd08:       42c5b583                ld      a1,1068(a1) # 148130 <os.Stdout>
   8bd0c:       0003a517                auipc   a0,0x3a
   8bd10:       6fc50513                addi    a0,a0,1788 # c6408 <go.itab.*os.File,io.Writer>
   8bd14:       03010613                addi    a2,sp,48
   8bd18:       00100693                li      a3,1
   8bd1c:       00068713                mv      a4,a3
   8bd20:       a10fb0ef                jal     ra,86f30 <fmt.Fprintln>
   8bd24:       00013083                ld      ra,0(sp)
   8bd28:       04010113                addi    sp,sp,64
   8bd2c:       00008067                ret

Where we previously had to first load our values into registers, then store them on the stack, we now only load them into registers before jumping to main.add. The sequence consists of:

  1. Loading 1 into argument register a0. (address: 8bcdc)
  2. Loading 2 into argument register a1. (address: 8bce0)
  3. Jumping to main.add. (address: 8bce4)

add() is even more drastically simplified, only containing the add instruction (address: 8bcb0), then returning to main(). Just as the calling convention dictated in Go 1.18 that arguments and return values would be at a certain location on the stack, Go 1.19 dictates that they will be present in the specified argument registers.

Less instructions, assuming constant cycles per instruction (CPI), is more efficient. However, we are not only eliminating instructions, but specifically eliminating more expensive instructions. The example we are using is trivial, but in some cases the number of instructions may not be strictly less than the stack-based calling convention. For example, if the values were not already in the correct registers we would need to move them. However, accessing registers is much faster than accessing main memory, despite modern processors using sophisticated caching to make stack access quite fast. The proposal for the register-based ABI states that accessing registers is roughly 40% faster than accessing arguments on the stack (see linked benchmark). The release notes indicate that the register-based ABI results in typical performance improvements of 10% or more for RISC-V programs.

A side benefit is that registers are being used more conventionally as described in the RISC-V ABI. However, it is important to note that Go is still not adhering to platform ABIs. The proposal provides justification for why adhering to platform ABIs would not be optimal for Go’s language design, noting the differences between Go and C, as well as the value of maintaining a simple common ABI across platforms.

Concluding Thoughts

One of the major benefits of using high-level languages is that simply upgrading to a newer version can result in your program becoming significantly more performant. While most developers won’t need to dig into what’s happening behind the scenes, doing so can give you a better grasp of how design decisions you make in your code lead to actual instructions being executed on the machine. Furthermore, if you are interested in working on compilers and other parts of language toolchains, one of the best ways to get started is learning how the toolchain works for languages you already use.

As always, these posts are meant to serve as a useful resource for folks who are interested in learning more about RISC-V and low-level software in general. If I can do a better job of reaching that goal, or you have any questions or comments, please feel free to send me a message @hasheddan on Twitter!