In our last post in the RISC-V Bytes series, I briefly alluded to the proposal to switch the Go Application Binary Interface (ABI) from a stack-based calling convention to a register-based calling convention. I also mentioned that it appeared at that time that the RISC-V port would support the new calling convention as early as Go 1.19.
Last week, Go 1.19 was officially released, and
sure enough, tucked in the release
notes was a section mentioning that
the riscv64
port now supports passing arguments and results using registers.
Go 1.19 is notably lighter than the 1.18
release, which included the long-awaited
support for
generics,
but even with the lack of major new features, the RISC-V calling convention
change seemed to, anecdotally, garner less attention than other updates. Perhaps
this is due to the lack of available hardware for running meaningful RISC-V
programs that would traditionally be written in Go (i.e. programs that run in
userspace), or perhaps it is less newsworthy due to the fact that Go 1.17
initially added support for the register-based calling
convention on linux/amd64
,
darwin/amd64
, and windows/amd64
, and 1.18 expanded to all operating systems
for amd64
, arm64
, ppc64
, and ppc64le
. Needless to say, it is hardly a
new feature.
However, readers of this series are likely more interested in RISC-V than the average Go user, and I haven’t seen any in-depth comparison of the calling convention before and after this change, so we are going to dedicate this post to diving in!
Getting Set Up Link to heading
Before we get started, we need to ensure we have Go 1.18 and Go 1.19 both
installed on our system. Fortunately, Go makes it very easy to manage multiple
installed versions. The following commands
will download go1.18.5
, the latest patch release for 1.18, and go1.19
.
$ go install golang.org/dl/go1.18.5@latest
go: downloading golang.org/dl v0.0.0-20220802161245-b76e4016dde5
$ go1.18.5 download
Downloaded 0.0% ( 3234 / 141855946 bytes) ...
Downloaded 0.4% ( 622592 / 141855946 bytes) ...
Downloaded 31.5% ( 44629680 / 141855946 bytes) ...
Downloaded 59.1% ( 83888666 / 141855946 bytes) ...
Downloaded 82.8% (117442840 / 141855946 bytes) ...
Downloaded 100.0% (141855946 / 141855946 bytes)
Unpacking /home/dan/sdk/go1.18.5/go1.18.5.linux-amd64.tar.gz ...
Success. You may now run 'go1.18.5'
$ go install golang.org/dl/go1.19@latest
$ go1.19 download
Downloaded 0.0% ( 16384 / 148796421 bytes) ...
Downloaded 29.3% ( 43597536 / 148796421 bytes) ...
Downloaded 67.7% (100760832 / 148796421 bytes) ...
Downloaded 100.0% (148796421 / 148796421 bytes)
Unpacking /home/dan/sdk/go1.19/go1.19.linux-amd64.tar.gz ...
Success. You may now run 'go1.19'
For consistency, we’ll use the same simple program to demonstrate the stack-based and register-based calling conventions.
main.go
package main
import (
"fmt"
)
//go:noinline
func add(a, b int) int {
return a + b
}
func main() {
fmt.Println(add(1, 2))
}
Note: we decorate our
add()
function with//go:noinline
because we are compiling with optimization and want to demonstrate passing arguments and a return value between procedures. Without the directive, the compiler would eliminate the overhead of passing data between the two procedures by bringing the body ofadd()
intomain()
.
1.18 and Stack-Based Calling Convention Link to heading
To look at how the stack-based calling convention works in Go 1.18, we need to
build our program for RISC-V with the go1.18.5
version we installed.
$ GOOS=linux GOARCH=riscv64 go1.18.5 build -o main1.18 main.go
Next, we can jump into the disassembled machine code using objdump
and less
,
and searching for main.main
.
$ riscv64-unknown-linux-gnu-objdump -D main1.18 | less
We should find main.add
right before main.main
in the dump.
00000000000900c0 <main.add>:
900c0: 00813283 ld t0,8(sp)
900c4: 01013303 ld t1,16(sp)
900c8: 006282b3 add t0,t0,t1
900cc: 00513c23 sd t0,24(sp)
900d0: 00008067 ret
900d4: 0000 unimp
...
00000000000900d8 <main.main>:
900d8: 010db503 ld a0,16(s11)
900dc: 00256663 bltu a0,sp,900e8 <main.main+0x10>
900e0: b20da2ef jal t0,6a400 <runtime.morestack_noctxt>
900e4: ff5ff06f j 900d8 <main.main>
900e8: fa113423 sd ra,-88(sp)
900ec: fa810113 addi sp,sp,-88
900f0: 00100293 li t0,1
900f4: 00513423 sd t0,8(sp)
900f8: 00200313 li t1,2
900fc: 00613823 sd t1,16(sp)
90100: fc1ff0ef jal ra,900c0 <main.add>
90104: 01813283 ld t0,24(sp)
90108: 04013423 sd zero,72(sp)
9010c: 04013823 sd zero,80(sp)
90110: 00513423 sd t0,8(sp)
90114: 8ed880ef jal ra,18a00 <runtime.convT64>
90118: 01013283 ld t0,16(sp)
9011c: 00016317 auipc t1,0x16
90120: 66430313 addi t1,t1,1636 # a6780 <type.*+0x6780>
90124: 04613423 sd t1,72(sp)
90128: 04513823 sd t0,80(sp)
9012c: 000b8297 auipc t0,0xb8
90130: fc42b283 ld t0,-60(t0) # 1480f0 <os.Stdout>
90134: 00044317 auipc t1,0x44
90138: 34430313 addi t1,t1,836 # d4478 <go.itab.*os.File,io.Writer>
9013c: 00613423 sd t1,8(sp)
90140: 00513823 sd t0,16(sp)
90144: 04810293 addi t0,sp,72
90148: 00513c23 sd t0,24(sp)
9014c: 00100293 li t0,1
90150: 02513023 sd t0,32(sp)
90154: 02513423 sd t0,40(sp)
90158: c08fa0ef jal ra,8a560 <fmt.Fprintln>
9015c: 00013083 ld ra,0(sp)
90160: 05810113 addi sp,sp,88
90164: 00008067 ret
This looks pretty similar to the assembly in our last post. We’ll focus in
specifically on the instructions starting at 900f0
in main.main
. We are:
- Loading
1
into temporary registert0
. (address:900f0
) - Storing the contents of
t0
on the stack. (address:900f4
) - Loading
2
into temporary registert1
. (address:900f8
) - Storing the contents of
t1
on the stack. (address:900fc
) - Jumping to
main.add
. (address:90100
)
At this point we have both of our arguments that we want to pass to add()
(i.e. a
and b
) on the stack, and we are transferring control. Because
add()
is a small enough procedure, the compiler avoids performing the
customary stack guard check (compare to the first few instructions in main()
).
The purpose of an ABI is to allow for procedures to communicate data by adhering
to a set of rules. In this case, add()
expects that its caller (main()
) has
placed the required arguments on the stack because that is what the Go ABI is
dictating. Within add()
we are:
- Loading the value stored at the memory address at an
8
byte offset from the stack pointer (sp
) into temporary registert0
. (address:900c0
) - Loading the value stored at the memory address at a
16
byte offset from the stack pointer (sp
) into temporary registert1
. (address:900c4
) - Adding the contents of
t0
andt1
. (address:900c8
) - Storing the result in
t0
at the memory address at a24
byte offset from the stack pointer (sp
). (address:900cc
) - Returning to the address in the return address register (
ra
). (address:900d0
) This was automatically set to the address of the next instruction inmain.main
when we executedjal ra,900c0 <main.add>
.
Check yourself: why are values on the stack stored
8
bytes apart from each other? Because we are targeting a 64-bit architecture (i.e.riscv64
), the defaultint
size is64
bits (i.e.8
bytes).
Similarly, when control is returned to main()
, it assumes that add()
has
placed the return value on the stack, so it loads the values at the memory
address at a 24
byte offset from the stack pointer (sp
) into t0
(address:
90104
), before subsequently storing it on the stack again before calling
runtime.convT64
(address: 90114
).
You might already be noticing that the load-store nature of RISC-V, coupled with Go’s requirement that arguments and return values be passed on the stack, is creating some unnecessary overhead here.
1.19 and Register-Based Calling Convention Link to heading
Let’s compile with Go 1.19 and see how the generated machine code changes.
$ GOOS=linux GOARCH=riscv64 go1.19 build -o main1.19 main.go
$ riscv64-unknown-linux-gnu-objdump -D main1.19 | less
We’ll again find main.main
and main.add
in the disassembly, but the total
number of instructions is noticeably less.
000000000008bcb0 <main.add>:
8bcb0: 00b50533 add a0,a0,a1
8bcb4: 00008067 ret
...
000000000008bcc0 <main.main>:
8bcc0: 010db303 ld t1,16(s11)
8bcc4: 00236663 bltu t1,sp,8bcd0 <main.main+0x10>
8bcc8: b90dc2ef jal t0,68058 <runtime.morestack_noctxt.abi0>
8bccc: ff5ff06f j 8bcc0 <main.main>
8bcd0: fc113023 sd ra,-64(sp)
8bcd4: fc010113 addi sp,sp,-64
8bcd8: 00113023 sd ra,0(sp)
8bcdc: 00100513 li a0,1
8bce0: 00200593 li a1,2
8bce4: fcdff0ef jal ra,8bcb0 <main.add>
8bce8: 02013823 sd zero,48(sp)
8bcec: 02013c23 sd zero,56(sp)
8bcf0: b488c0ef jal ra,18038 <runtime.convT64>
8bcf4: 0000b297 auipc t0,0xb
8bcf8: d2c28293 addi t0,t0,-724 # 96a20 <type.*+0x6a20>
8bcfc: 02513823 sd t0,48(sp)
8bd00: 02a13c23 sd a0,56(sp)
8bd04: 000bc597 auipc a1,0xbc
8bd08: 42c5b583 ld a1,1068(a1) # 148130 <os.Stdout>
8bd0c: 0003a517 auipc a0,0x3a
8bd10: 6fc50513 addi a0,a0,1788 # c6408 <go.itab.*os.File,io.Writer>
8bd14: 03010613 addi a2,sp,48
8bd18: 00100693 li a3,1
8bd1c: 00068713 mv a4,a3
8bd20: a10fb0ef jal ra,86f30 <fmt.Fprintln>
8bd24: 00013083 ld ra,0(sp)
8bd28: 04010113 addi sp,sp,64
8bd2c: 00008067 ret
Where we previously had to first load our values into registers, then store them
on the stack, we now only load them into registers before jumping to main.add
.
The sequence consists of:
- Loading
1
into argument registera0
. (address:8bcdc
) - Loading
2
into argument registera1
. (address:8bce0
) - Jumping to
main.add
. (address:8bce4
)
add()
is even more drastically simplified, only containing the add
instruction (address: 8bcb0
), then returning to main()
. Just as the calling
convention dictated in Go 1.18 that arguments and return values would be at a
certain location on the stack, Go 1.19 dictates that they will be present in the
specified argument registers.
Less instructions, assuming constant cycles per instruction (CPI), is more efficient. However, we are not only eliminating instructions, but specifically eliminating more expensive instructions. The example we are using is trivial, but in some cases the number of instructions may not be strictly less than the stack-based calling convention. For example, if the values were not already in the correct registers we would need to move them. However, accessing registers is much faster than accessing main memory, despite modern processors using sophisticated caching to make stack access quite fast. The proposal for the register-based ABI states that accessing registers is roughly 40% faster than accessing arguments on the stack (see linked benchmark). The release notes indicate that the register-based ABI results in typical performance improvements of 10% or more for RISC-V programs.
A side benefit is that registers are being used more conventionally as described in the RISC-V ABI. However, it is important to note that Go is still not adhering to platform ABIs. The proposal provides justification for why adhering to platform ABIs would not be optimal for Go’s language design, noting the differences between Go and C, as well as the value of maintaining a simple common ABI across platforms.
Concluding Thoughts Link to heading
One of the major benefits of using high-level languages is that simply upgrading to a newer version can result in your program becoming significantly more performant. While most developers won’t need to dig into what’s happening behind the scenes, doing so can give you a better grasp of how design decisions you make in your code lead to actual instructions being executed on the machine. Furthermore, if you are interested in working on compilers and other parts of language toolchains, one of the best ways to get started is learning how the toolchain works for languages you already use.
As always, these posts are meant to serve as a useful resource for folks who are interested in learning more about RISC-V and low-level software in general. If I can do a better job of reaching that goal, or you have any questions or comments, please feel free to send me a message @hasheddan on Twitter!