In my last post, we explored how to build WASM modules, both with and without WASI support, using Clang. In a comment on Reddit, it was mentioned that much of the setup I walked through in that post could be avoided by just leveraging Zig’s WASI supprt. This is a great point, and I would recommend doing the same. The following command is inarguably simpler than what I described.
$ zig cc --target=wasm32-wasi
However, there are two reasons why knowing how to use Clang for compilation is useful. First, and most practical, is that I am working on a codebase that uses Clang for its compiler toolchain, so leveraging Zig is not currently an option. Second is that understanding the, admittedly more involved, Clang incantations taught us a little more about what actually goes into a WASM module, and how that changes when using WASI. In order to know exactly what is inside a WASM module, we need to crack it open though. That is what we are going to do today!
As a recap, one of the programs we compiled was a simple add()
function, which
accepted two integers and returned their sum.
wasm32_args.c
int add(int a, int b) {
return a+b;
}
We compiled it to a WASM module using the following command.
$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -o wasm32_args.wasm wasm32_args.c
This produced a binary file which can be recognized as a v1 WASM module.
$ file wasm32_args.wasm
wasm32_args.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
We can view the hex contents of the file using xxd
.
$ xxd wasm32_args.wasm
00000000: 0061 736d 0100 0000 010a 0260 0000 6002 .asm.......`..`.
00000010: 7f7f 017f 0303 0200 0105 0301 0002 063f ...............?
00000020: 0a7f 0141 8088 040b 7f00 4180 080b 7f00 ...A......A.....
00000030: 4180 080b 7f00 4180 080b 7f00 4180 8804 A.....A.....A...
00000040: 0b7f 0041 8008 0b7f 0041 8088 040b 7f00 ...A.....A......
00000050: 4180 8008 0b7f 0041 000b 7f00 4101 0b07 A......A....A...
00000060: a701 0c06 6d65 6d6f 7279 0200 115f 5f77 ....memory...__w
00000070: 6173 6d5f 6361 6c6c 5f63 746f 7273 0000 asm_call_ctors..
00000080: 0361 6464 0001 0c5f 5f64 736f 5f68 616e .add...__dso_han
00000090: 646c 6503 010a 5f5f 6461 7461 5f65 6e64 dle...__data_end
000000a0: 0302 0b5f 5f73 7461 636b 5f6c 6f77 0303 ...__stack_low..
000000b0: 0c5f 5f73 7461 636b 5f68 6967 6803 040d .__stack_high...
000000c0: 5f5f 676c 6f62 616c 5f62 6173 6503 050b __global_base...
000000d0: 5f5f 6865 6170 5f62 6173 6503 060a 5f5f __heap_base...__
000000e0: 6865 6170 5f65 6e64 0307 0d5f 5f6d 656d heap_end...__mem
000000f0: 6f72 795f 6261 7365 0308 0c5f 5f74 6162 ory_base...__tab
00000100: 6c65 5f62 6173 6503 090a 4202 0200 0b3d le_base...B....=
00000110: 0106 7f23 8080 8080 0021 0241 1021 0320 ...#.....!.A.!.
00000120: 0220 036b 2104 2004 2000 3602 0c20 0420 . .k!. . .6.. .
00000130: 0136 0208 2004 2802 0c21 0520 0428 0208 .6.. .(..!. .(..
00000140: 2106 2005 2006 6a21 0720 070f 0b00 3404 !. . .j!. ....4.
00000150: 6e61 6d65 0119 0200 115f 5f77 6173 6d5f name.....__wasm_
00000160: 6361 6c6c 5f63 746f 7273 0103 6164 6407 call_ctors..add.
00000170: 1201 000f 5f5f 7374 6163 6b5f 706f 696e ....__stack_poin
00000180: 7465 7200 6609 7072 6f64 7563 6572 7301 ter.f.producers.
00000190: 0c70 726f 6365 7373 6564 2d62 7901 0c55 .processed-by..U
000001a0: 6275 6e74 7520 636c 616e 673f 3137 2e30 buntu clang?17.0
000001b0: 2e36 2028 2b2b 3230 3233 3132 3039 3132 .6 (++2023120912
000001c0: 3432 3237 2b36 3030 3937 3038 6234 3336 4227+6009708b436
000001d0: 372d 317e 6578 7031 7e32 3032 3331 3230 7-1~exp1~2023120
000001e0: 3931 3234 3333 362e 3737 2900 2c0f 7461 9124336.77).,.ta
000001f0: 7267 6574 5f66 6561 7475 7265 7302 2b0f rget_features.+.
00000200: 6d75 7461 626c 652d 676c 6f62 616c 732b mutable-globals+
00000210: 0873 6967 6e2d 6578 74 .sign-ext
As described in the Binary Format portion of the WASM specification, each module is made up of sections. Each section begins with a 1-byte identifier.
ID (Decimal) | ID (Hex) | Section |
---|---|---|
0 | 0x00 | Custom |
1 | 0x01 | Type |
2 | 0x02 | Import |
3 | 0x03 | Function |
4 | 0x04 | Table |
5 | 0x05 | Memory |
6 | 0x06 | Global |
7 | 0x07 | Export |
8 | 0x08 | Start |
9 | 0x09 | Element |
10 | 0x0a | Code |
11 | 0x0b | Data |
12 | 0x0c | Data Count |
Each section must be present at most once, and they must be provided in-order,
with the exception being Custom sections, for which there may be an arbitrary
number and they may be present anywhere in the file. Every section begins with
its identifier, then an LEB128
variable-length encoded u32
size, followed by the contents of the section. In fact,
all integers in a WASM module are encoded using LEB128.
Decoding LEB128 Integers
LEB128 can be used to encode signed and unsigned integers of arbitrary length.
We will primarily be focused on u32
(unsigned 32-bit) integers today, so we’ll
skip detailing how to decode signed integers. You can find more details on the
previously linked Wikipedia page.
The algorithm for decoding unsigned integers is as follows:
- Take the least significant (lower) 7 bits of the next byte.
- Binary shift the 7 bits to the left by 7 multiplied by the byte number
(initially 0) and bitwise
OR
with previously decoded bits. - If the most significant bit (i.e. the 8th bit) is a
0
, stop decoding. Otherwise, go to step (1).
As an example, if we had the byte sequence a6 03
, we would decode it using the
following steps.
Take first byte and convert hex to binary.
0xa6 -> 10100110
Take least significant 7 bits.
10100110 -> 0100110
Shift bits left by 0 (this is the “0th” byte, 7*0 = 0
) and OR
with
previously decoded bits (none decoded yet).
0100110 -> 0100110
Observe that the 8th bit in 0xa6
is a 1
, so continue to the next byte.
0x03 -> 00000011
Take least significant 7 bits.
00000011 -> 0000011
Shift bits left by 7 (this is the “1st” byte, 7*1 = 7
) and OR
with
previously decoded bits.
0000011 -> 0000011 0000000
0000000 0100110 | 0000011 0000000 = 0000011 0100110
Observe that the 8th bit in 0x03
is 0
. We are done. Convert the final result
to decimal.
0000011 0100110 -> 422
Now that we know how to interpret integers, let’s start breaking down the sections.
Preamble Link to heading
00000000: 0061 736d 0100 0000 .... .... .... .... .asm.......`..`.
Before the first section is the “preamble”, which is how file
was able to
recognize that our binary was a v1 WASM module. The first 4 bytes decode to
\0asm
, with the next 4 bytes indicating the version. WASM is
little-endian, meaning that the
least significant byte is first. Therefore, the version number is 1.
0100 0000 -> 0000 0001 -> 1
The first section begins following the preamble.
Type Section Link to heading
00000000: .... .... .... .... 010a 0260 0000 6002 .asm.......`..`.
00000010: 7f7f 017f .... .... .... .... .... .... ...............?
We can identify the first section as the Type section, as indicated by the first
byte 01
. Following the identifier is the size, to which we can apply our LEB128
decoding.
0x0a -> 00001010 -> 0001010 -> 10
This informs us that the contents of this section should be 10 bytes in size.
The WASM specification describes the Type section contents as a vec
of
functype
. Vectors are simply an LEB128 encoded u32
length followed by a
sequence of their specified element type. The first byte is 02
, which by now
you can probably recognize as 2
without needing to actually perform LEB128
decoding. This means that we should see 2 functype
elements next.
Function types are prefixed with 0x60
, then two vec
, one for parameter
types, and one for return types, follows. We have 00 00
for the first
functype
. Remembering that vec
are prefixed with length. This is essentially
saying that our first function takes 0 parameters and returns 0 values. We can
use the wasm2wat
tool to verify.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -2 | tail -1
(type (;0;) (func))
The next functype
does have parameter and result types. There are two
parameters (0x02
), each encoded as 0x7f
, which corresponds to a signed
32-bit integer (i32
). There is one return type (0x01
), which is also an
i32
(0x7f
). Though we don’t have symbol information about this function yet,
given that it is the last one defined we can safely assume this is our add()
.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -3 | tail -1
(type (;1;) (func (param i32 i32) (result i32)))
Function Section Link to heading
00000010: .... .... 0303 0200 01.. .... .... .... ...............?
Next is the Function section (0x03
). We do not have an Import section (0x02
)
in this module because we did not import any symbols. Because sections must be
provided in-order, we can be certain that no Import section will be provided now
that we have seen the Function section.
The size of this section is 3
(0x03
), and the contents are specified as a
vec
of typeidx
(type index). A type index is a u32
, which will once again
be LEB128 encoded. Our vec
begins with 0x02
, so we should expect two type
indices. The following two bytes, 0x00
and 0x01
, correspond to the entries
in our previously detailed Types section.
At this point, we have two functions, each with their own type signature. For this specific module, the Type and Function sections may seem redundant. However, consider if we had another function in our module.
int multiply (int a, int b) {
return a*b
}
multiply()
has the same function signature as add()
, meaning that we would
have only one entry for (i32, i32) i32
in the Type section, then two entries
in the Function section that referenced the type signature by the corresponding
index. Compiling the new program with multiply()
added results in an
indentical preamble and Type section, but we can see that the Function section
(0x03
) now has length 4
(0x04
), with a type index vec
of length 3
(0x03
), and three type index entries (0x00
, 0x01
, 0x01
) with the latter
two referring to the same type signature.
00000000: 0061 736d 0100 0000 010a 0260 0000 6002
00000010: 7f7f 017f 0304 0300 0101 .... .... ....
Memory Section Link to heading
00000010: .... .... .... .... ..05 0301 0002 .... ...............?
We once again skip a section that is not present in our module (Table with ID
0x04
), and move on to Memory (0x05
). The Memory section contains a vec
of
memories (mem
), which are made up of limits
. In the current WASM
specification, only one memory may be defined. Our Memory section has size of
3
bytes (0x03
), and, as expected, the first byte (0x01
) specifies that the
length of the vec
is 1
. A limit
can include both a maximum and a minimum
size (both LEB128 encoded u32
), as indicated by the first byte. In our case,
the first byte is 0x00
, which means that only a minimum size will be defined,
and the maximum is free to grow to any size. If the first byte was 0x01
, both
a minimum and a maximum would be defined.
The threads proposal extends
limits
to be allow specifying whether memory is shared or unshared.
In our module, the minimum memory size is 2
(0x02
).
Global Section Link to heading
00000010: .... .... .... .... .... .... .... 063f ...............?
00000020: 0a7f 0141 8088 040b 7f00 4180 080b 7f00 ...A......A.....
00000030: 4180 080b 7f00 4180 080b 7f00 4180 8804 A.....A.....A...
00000040: 0b7f 0041 8008 0b7f 0041 8088 040b 7f00 ...A.....A......
00000050: 4180 8008 0b7f 0041 000b 7f00 4101 0b.. A......A....A...
The Global section (0x06
) is next, with a size of 0x3f
. With a larger size,
we can quickly check our LEB128 decoding to ensure that we don’t need to
consider subsequent bytes.
0x3f -> 00111111
The 8th bit is 0
, so we can simply convert to the decimal value of 63
for
our size. The section includes a vec
of globals (global
), where each
global
consists of a type (globaltype
) and expression (expr
). A
globaltype
is made up of a value type (valtype
) and a 1-byte flag (mut
)
indicating whether the value is mutable or not.
An expression is encoded by a sequence of instructions (instr
) with a
terminating byte (0x0b
) specifying the end of the sequence. The byte following
the section size, 0x0a
, informs us that 10 globals will be defined in the
vec
. We can easily
extract the first one by looking for the first instance of 0x0b
.
7f 0141 8088 040b
The 0x7f
should be familiar at this point as an i32
, which is the
globaltype
of this global
. The following byte, 0x01
, marks it as mutable
(mut
).
This is followed by the initialization expression, which includes the first
instruction we have seen. 0x41
is the opcode for i32.const
, which simply
returns a static i32
constant, which is specified by the following bytes.
We’ll need to use our LEB128 decoding to interpret it.
Take the first byte.
0x80 -> 10000000
Take the least significant 7 bits and shift left 0 bits.
10000000 -> 0000000
The 8th bit is a 1
so take the next byte.
0x88 -> 10001000
Take the least significant 7 bits and shift left 7 bits.
10001000 -> 0001000 0000000
OR
with previously decoded bits.
0000000 0000000 | 0001000 0000000 = 0001000 0000000
The 8th bit is a 1
so take the next byte.
0x04 -> 00000100
Take the least significant 7 bits and shift left 14 bits.
00000100 -> 0000100 0000000 0000000
OR
with previously decoded bits.
0000000 0001000 0000000 | 0000100 0000000 0000000 = 0000100 0001000 0000000
The 8th bit is a 0
so we are done.
0000100 0001000 0000000 -> 66560
We can verify our decoding using wasm2wat
again.
The
$__stack_pointer
symbol name will be found in a later section.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -34 | tail -1
(global $__stack_pointer (mut i32) (i32.const 66560))
The same process can be applied to all globals in the vec
, as shown in the
textual representation.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -43 | tail -10
(global $__stack_pointer (mut i32) (i32.const 66560))
(global (;1;) i32 (i32.const 1024))
(global (;2;) i32 (i32.const 1024))
(global (;3;) i32 (i32.const 1024))
(global (;4;) i32 (i32.const 66560))
(global (;5;) i32 (i32.const 1024))
(global (;6;) i32 (i32.const 66560))
(global (;7;) i32 (i32.const 131072))
(global (;8;) i32 (i32.const 0))
(global (;9;) i32 (i32.const 1))
Export Section Link to heading
00000050: .... .... .... .... .... .... .... ..07 A......A....A...
00000060: a701 0c06 6d65 6d6f 7279 0200 115f 5f77 ....memory...__w
00000070: 6173 6d5f 6361 6c6c 5f63 746f 7273 0000 asm_call_ctors..
00000080: 0361 6464 0001 0c5f 5f64 736f 5f68 616e .add...__dso_han
00000090: 646c 6503 010a 5f5f 6461 7461 5f65 6e64 dle...__data_end
000000a0: 0302 0b5f 5f73 7461 636b 5f6c 6f77 0303 ...__stack_low..
000000b0: 0c5f 5f73 7461 636b 5f68 6967 6803 040d .__stack_high...
000000c0: 5f5f 676c 6f62 616c 5f62 6173 6503 050b __global_base...
000000d0: 5f5f 6865 6170 5f62 6173 6503 060a 5f5f __heap_base...__
000000e0: 6865 6170 5f65 6e64 0307 0d5f 5f6d 656d heap_end...__mem
000000f0: 6f72 795f 6261 7365 0308 0c5f 5f74 6162 ory_base...__tab
00000100: 6c65 5f62 6173 6503 09.. .... .... .... le_base...B....=
The Export section (0x07
) consists of a vec
of export
, with each
containing a name
and an export description (exportdesc
). The name
is a
vec
of byte
, while the exportdesc
contains a 1-byte prefix indicating the
type of export, followed by an index to the appropriate section where the export
is defined.
I’ll leave it to the reader to LEB128 decode 0xa701
as the section size (167
bytes). The first byte following the section size, 0x0c
, indicates that the
vec
will contain 12 exports. The next byte, 0x06
, is the first byte of the
first export, and thus is defining the length of its name as 6
. The following
6 bytes can be converted to UTF-8
characters.
6d 65 6d 6f 72 79 -> memory
Because we compiled with -Wl,--export-all
, all symbols will be exported. In
this case, memory
is referring to the first element in the vec
in our Memory
section. The export type prefixes are defined as follows.
ID (Hex) | Export |
---|---|
0x00 | Function |
0x01 | Table |
0x02 | Memory |
0x03 | Global |
As expected, the prefix following memory
is 0x02
. The next byte 0x00
,
specifies that this export corresponds to the first memory in the vec
. The
next two exports are our functions. The first is __wasm_call_ctors
, which,
following the name definition, has a function prefix (0x00
) and an index to
the first function in the Function section vec
(0x00
).
11 5f 5f 77 61 73 6d 5f 63 61 6c 6c 5f 63 74 6f 72 73 -> __wasm_call_ctors
The second correponds out our add()
function, and refers to the second
(0x01
) function (0x00
) in the Function section.
61 64 64 -> add
The remaining exports are shown in their textual representation below.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -55 | tail -12
(export "memory" (memory 0))
(export "__wasm_call_ctors" (func $__wasm_call_ctors))
(export "add" (func $add))
(export "__dso_handle" (global 1))
(export "__data_end" (global 2))
(export "__stack_low" (global 3))
(export "__stack_high" (global 4))
(export "__global_base" (global 5))
(export "__heap_base" (global 6))
(export "__heap_end" (global 7))
(export "__memory_base" (global 8))
(export "__table_base" (global 9))
Code Section Link to heading
00000100: .... .... .... .... ..0a 4202 0200 0b3d le_base...B....=
00000110: 0106 7f23 8080 8080 0021 0241 1021 0320 ...#.....!.A.!.
00000120: 0220 036b 2104 2004 2000 3602 0c20 0420 . .k!. . .6.. .
00000130: 0136 0208 2004 2802 0c21 0520 0428 0208 .6.. .(..!. .(..
00000140: 2106 2005 2006 6a21 0720 070f 0b.. .... !. . .j!. ....4.
In this module, we don’t have a Start (0x08
) or Element (0x09
) section, so
next is the Code section (0x0a
). The section size is 66
(LEB128 encoded
0x42
), and it consists of the body and local variables of the each function as
a vec
of code
. Each code
element consists of a size (u32
), vec
of
locals
, and an expression (expr
) made up of instructions.
Keep in mind that we didn’t specify any optimizations when compiling this module (e.g.
-O3
), so our code section is going to be much longer than it needs to be. However, the unoptimized function body gives us an opportunity to explore more WASM instructions.
The vec
has two elements (0x02
), which correspond to the two entries in the
Function section. The first function has a size of 2
(0x02
), which we can
immediately know means that it has no locals or instructions. The vec
of
locals
has a length of 0
(0x00
) and the expr
, which is its sequence of
instructions, is a single 0x0b
(the expression terminating byte).
The next function, which is our add()
, has a size of 61
(0x3d
). Its vec
of locals
has length 1
(0x01
). Locals are encoded as a u32
count and a
value type (valtype
). That is, there is a single element in the vec
for each
value type for which at least one local exists. Because our vec
has length
1
, we know that all locals
are the same value type. Specifically, there are
6
(0x06
) locals of type i32
(0x7f
).
We’ll save a deep dive into the WASM instruction set architecture (ISA) for a future post, but the key difference from most ISAs you have likely interacted with is that WASM operates as a stack machine. Values that are to be used as operands for an instruction must first be pushed onto the stack, before subsequently being popped off the stack and used to compute a result, which is then pushed onto the stack.
The first instruction in our add()
function is encoded as 0x23
, which
corresponds to global.get
. This instruction takes the index (u32
) of a symbol in
the Global section, but you may notice something strange when LEB128 decoding
it.
80 80 80 80 00 -> 00000000 00000000 00000000 00000000
Why are we using the maximum length for LEB128 encoding an unsigned 32-bit
integer? The reason is related to Global section merging during
linking.
Though we only need one byte to encode index 0
, if the index of
$__stack_pointer
changes, we can now be certain that the global.get
instruction can be updated without changing the position of any other bytes in
the module.
As previously mentioned, WASM is a stack machine, so global.get
is going to
push the value of $__stack_pointer
onto the stack.
Our next instruction is 0x21
, which corresponds to local.set
. It pops a
value off the top of the stack, then stores it in the local at the index
specified by the supplied immediate, which in this case is 2
(0x02
).
Combining this instruction with the previous one results in storing the value of
$__stack_pointer
in the local at index 2
. Why use 2
and not 0
? In
accordance with the WASM specification, 2
actually refers to the first
declared local, as parameters are referenced as the first locals.
The parameters of the function are referenced through 0-based local indices in the function’s body; they are mutable.
You may already recognize these first few instructions as part of a function prologue in which we are “growing the stack”. The use of a stack when coming from a language like C can be confusing given that WASM has its own implicit stack. Also, in this particular example, it is unnecessary to manage a stack, and, as you’ll see in a moment, compiling with optimization removes these instructions. Nevertheless, a “shadow stack” is necessary in some real programs, and we’ll explore some examples in a future post.
Decompiling the full function body results in the following.
$ wasm2wat --enable-annotations wasm32_args.wasm | head -32 | tail -28
(func $add (type 1) (param i32 i32) (result i32)
(local i32 i32 i32 i32 i32 i32)
global.get $__stack_pointer
local.set 2
i32.const 16
local.set 3
local.get 2
local.get 3
i32.sub
local.set 4
local.get 4
local.get 0
i32.store offset=12
local.get 4
local.get 1
i32.store offset=8
local.get 4
i32.load offset=12
local.set 5
local.get 4
i32.load offset=8
local.set 6
local.get 5
local.get 6
i32.add
local.set 7
local.get 7
return)
The only essential instructions are accesing the parameters (local.get 0
and
local.get 1
), and the eventual adding of the values with i32.add
. This can
be observed by recompiling with ooptimization, then Decompiling.
clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -O3-o wasm32_args_optimized.wasm wasm32_args.c
$ wasm2wat --enable-annotations wasm32_args_optimized.wasm | head -8 | tail -4
(func $add (type 1) (param i32 i32) (result i32)
local.get 1
local.get 0
i32.add)
Custom Sections Link to heading
00000140: .... .... .... .... .... .... ..00 3404 !. . .j!. ....4.
00000150: 6e61 6d65 0119 0200 115f 5f77 6173 6d5f name.....__wasm_
00000160: 6361 6c6c 5f63 746f 7273 0103 6164 6407 call_ctors..add.
00000170: 1201 000f 5f5f 7374 6163 6b5f 706f 696e ....__stack_poin
00000180: 7465 7200 6609 7072 6f64 7563 6572 7301 ter.f.producers.
00000190: 0c70 726f 6365 7373 6564 2d62 7901 0c55 .processed-by..U
000001a0: 6275 6e74 7520 636c 616e 673f 3137 2e30 buntu clang?17.0
000001b0: 2e36 2028 2b2b 3230 3233 3132 3039 3132 .6 (++2023120912
000001c0: 3432 3237 2b36 3030 3937 3038 6234 3336 4227+6009708b436
000001d0: 372d 317e 6578 7031 7e32 3032 3331 3230 7-1~exp1~2023120
000001e0: 3931 3234 3333 362e 3737 2900 2c0f 7461 9124336.77).,.ta
000001f0: 7267 6574 5f66 6561 7475 7265 7302 2b0f rget_features.+.
00000200: 6d75 7461 626c 652d 676c 6f62 616c 732b mutable-globals+
00000210: 0873 6967 6e2d 6578 74 .sign-ext
The remaining bytes make up three Custom sections (0x00
), as there is no Data
(0x0b
) section or Data Count (0x0c
) section in our module. Custom sections
are mostly unstructured, but do begin with the same u32
size as other
sections, followed by a name
. The 0x00
byte identifies our first custom
section, and its size is 52
bytes (0x34
). The name
encoding starts with
the number of bytes in the name
, which in this case is 4
(0x04
). The
following 4 bytes are UTF-8 characters making up the name
.
6e 61 6d 65 -> n a m e
The name
of this custom section happens to literally be “name”. It also
happens to be the only Custom section that is defined in the WASM specification
(see Custom Sections in the Appendix). Like other sections, it should only be
included at most once, and it has the additional requirement of occurring after
the Data section. There are three subsections that may be included in the “name”
Custom section, identified by the following IDs.
ID (Hex) | Subsection |
---|---|
0x00 | Module name |
0x01 | Function names |
0x02 | Local names |
The first subsection is Function names (0x01
), and has size of 25
(0x19
).
It consists of a vec
of name
/ index
pairs, otherwise known as a “name
map”, which assigns the provided name
to the given index
in the Function
section. The vec
here is of length 2
(0x02
), and first element corresponds
to the first function in the Function section (0x00
). The name assigned to the
function is 17
bytes long (0x11
).
5f 5f 77 61 73 6d 5f 63 61 6c 6c 5f 63 74 6f 72 73 -> __wasm_call_ctors
The following function name entry can be decoded in the same manner, predictably
assigning add
to the second function in the Function section (0x01
).
61 64 64 -> add
The next and final subsection of the “name” Custom section is not actually
standardized in the core WASM specification, but rather part of the Extended
Name Section
proposal.
In the proposal, 7
(0x07
) is the index for the Global names subsection. This
subsection also contains a “name map”, providing pairs of a Global section index
and name. The first entry in our Global section (0x00
) is being assigned the
name __stack_pointer
.
5f 5f 73 74 61 63 6b 5f 70 6f 69 6e 74 65 72 -> __stack_pointer
The next two Custom sections are defined in the WASM tool-conventions
repository. The first is
named
producers
,
and is meant to denote the tools that were used to produce the WASM module. The
second is named
target_features
and must come after the producers
Custom secion when included. It describes
what features are used, and whether linking should fail if a given feature is or
is not in the allowed set.
The full decompilation of all three Custom sections is as follows.
$ wasm2wat --enable-annotations wasm32_args.wasm | tail -3
(@custom "name" "\01\19\02\00\11__wasm_call_ctors\01\03add\07\12\01\00\0f__stack_pointer")
(@custom "producers" "\01\0cprocessed-by\01\0cUbuntu clang?17.0.6 (++20231209124227+6009708b4367-1~exp1~20231209124336.77)")
(@custom "target_features" "\02+\0fmutable-globals+\08sign-ext"))
The
--enable-annotations
, which we have been using for allwasm2wat
invocations, is required to include Custom sections in the WAT output.
Final Thoughts Link to heading
The WASM module in this post did not include every section that can occur, but hopefully this breakdown was thorough enough that you feel confident in dissecting those that were omitted on your own. There are a number of topics, such as linking, optimization, and shadow stacks, that were mentioned in this post but were not covered in depth. Check back for future posts where we will go into greater detail.
As always, if you have feedback, questions, or just want to chat, feel free to
reach out to @hasheddan
on any of the platforms listed on the home
page.