Zero to WASI with Clang 17

I am currently working on a few projects that involve leveraging WebAssembly (WASM) modules, particularly with WebAssembly System Interface (WASI) support. While WASI is still in its early stages, support has already been added to the Rust compiler and Clang, as well as runtimes such as wasmtime and WAMR. However, getting an environment setup to compile arbitrary C programs to WASM can still be challenging.

I recently updated to Clang 17, which was first released as 17.0.1 in September of 2023. This release added support for shared libraries and position independent code (PIC) for WASM, but otherwise had relatively few updates for the target. Installing Clang and the releated LLVM packages has become rather straightforward with the automatic installation script.

$ bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

wget / bash at your own peril!

This will install the latest stable release, which is currently 17.0.6. If you wish to install a specific version, you can invoke the script with an argument as indicated in the documentation.

$ wget https://apt.llvm.org/llvm.sh
$ chmod +x llvm.sh
$ sudo ./llvm.sh <version number>

On my Linux machine running Ubuntu 22.04, this installs LLVM in /usr/lib.

$ /usr/lib/llvm-17/
├── bin
├── build
├── cmake -> lib/cmake/llvm
├── include
├── lib
└── share

Clang can be found at /usr/lib/llvm-17/bin/clang. We can compile C to WASM without any further steps if we don’t need to perform any I/O. For example, the following program can be compiled as a WASM module with two exposed to the embedder.

Embedder is the term used in the WASM specification to describe host environments in which WASM modules are executed.

wasm32.c

int two() {
  return 2;
}

$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all  -o wasm32.wasm wasm32.c

Because Clang is typically compiling a binary that can be executed from a start symbol, we need to inform it, or specifically the linker it invokes, that it should not expect an entrypoint to be defined (-Wl,--no-entry). We also need to inform the linker that it should not attempt to include a standard library, which we don’t have for our WASM target (-nostdlib). Lastly, we need to ensure that our two() function is exported so that it can be called by the embedder (-Wl,--export-all). You can find the full set of linker options available for wasm-lld here.

To execute the two() function in a host environment, we can use wasmtime with the --invoke flag. If you haven’t already installed wasmtime, you can do so with the provided script.

curl https://wasmtime.dev/install.sh -sSf | bash

Then invoke two().

$ wasmtime run --invoke two wasm32.wasm
2

This is interesting, but most meaningful programs need to perform some sort of I/O. If running in a traditional host environment, such as Linux on my machine, a C program can perform I/O operations by calling functions from the C standard library. Behind the scenes these functions are frequently invoking syscalls, which allow a program running in userspace to access privileged resources (e.g. memory, disk, networking peripherals) via the operating system kernel. The embedder of our WASM module can be thought of similarly to an operating system; it has access to privileged resources that the module does not.

However, unlike binary executables compiled for an operating system and instruction set architecture, WASM modules sacrifice knowledge of a standard interface for accessing privileged resources in favor of portability. WASM modules can be run by any embedder when they don’t rely on the embedder importing any functionality. One option for accessing data outside of the WASM module is to accept arguments to exported functions. We can modify our previous example to demonstrate this.

wasm32_args.c

int add(int a, int b) {
  return a+b;
}

We can use the same command to compile this new module as we previously specified that all symbols should be exported (-Wl,--no-entry).

$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all  -o wasm32_args.wasm wasm32_args.c

We can also use wasmtime again to invoke the add() function, this time passing the necessary arguments.

$ wasmtime run --invoke add wasm32_args.wasm 1 2
3

This certainly expands the functionality that can be offered by a WASM module, and other mechanisms for passing data between the embedder and the module, such as writing to the module memory, then passing a pointer to the data, can be utilized in more sophisticated scenarios. However, this pushes much of the logic to the embedder, meaning that the module remains portable, but more work is required of consumers. If all embedders could agree on a standard set of functionality that would be exposed to modules, the modules could maintain portability, while being able to do almost anything that native binary executables can. Even more importantly, existing programs could be recompiled to target the WASM instruction set without having to change any code.

In reality, there are still frequently small code changes required to target WASM.

WASI was introduced to fill this gap. It is similar to POSIX, but as Lin Clark describes:

POSIX provides source code portability. You can compile the same source code with different versions of libc to target different machines. But WebAssembly needs to go one step beyond this. We need to be able to compile once and run across a whole bunch of different machines. We need portable binaries.

Because of this binary portability, only one libc implementation is necessary, namely wasi-libc. In order to utilize the functionality it offers in our WASM modules, we need to build a sysroot that can be linked against when compiling a module. This can be accomplished by cloning the repository, then running make with our Clang 17 installation.

If you are just interested in getting things working and don’t care to use your existing Clang / LLVM installation, you can download pre-compiled binaries from the wasi-sdk repository. In fact, a new release for LLVM 17 was published while I was writing this post.

$ git clone https://github.com/WebAssembly/wasi-libc.git && cd wasi-libc
$ make CC=/usr/lib/llvm-17/bin/clang

This should produce a sysroot directory, which contains the necessary header files and libraries, as denoted in the output.

#
# The build succeeded! The generated sysroot is in /home/hasheddan/code/github.com/WebAssembly/wasi-libc/sysroot.
#

We can inform Clang of the sysroot location using the --sysroot flag. By default, Clang will attempt to also link the LLVM compiler-rt builtins library. LLVM packages do not currently include the WASM build of the library (libclang_rt.builtins-wasm32.a), so we must either build it ourselves, or download the pre-compiled binary from the wasi-sdk release. Or… we could use some magic incantations with Clang to avoid attempting to link it for our simple example. First let’s define a program that utilizes I/O.

wasm32_wasi.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main() {
  FILE *fp;
  fp = fopen("hello.txt", "w");
  if (fp == NULL) {
    fprintf(stderr, "error opening file: %s\n", strerror(errno));
    exit(1);
  }

  fputs("Hello, world!\n", fp);
  fclose(fp);
  return 0;
}

All we are doing is opening a file and writing Hello, world! to it, but we are making use of several libc functions to do so. To compile this program to a WASM module with WASI support, we can use the following command.

$ /usr/lib/llvm-17/bin/clang -target wasm32-wasi --sysroot=/home/hasheddan/code/github.com/WebAssembly/wasi-libc/sysroot -nodefaultlibs -lc -o wasm32_wasi.wasm wasm32_wasi.c

There are a few notable differences from our previous invocations, including:

The -target is now specified as wasm32-wasi to indicate that we need WASI support.
We have provided a path to our sysroot that we built with wasi-libc.
We have added the -nodefaultlibs flag to prevent the linker from attempting to link libclang_rt.builtins-wasm32.a, then provided -lc to re-add linking libc.
We have removed -nostdlib, -Wl,--no-entry, and -Wl,--export-all as we do want to use the standard library (libc), we do have a defined entrypoint, and we don’t need to export all symbols (_start will be exported automatically).

Finally, we can use wasmtime to run our WASM WASI module, this time without specifying a function to invoke.

$ wasmtime run wasm32_wasi.wasm
error opening file: No such file or directory

The error here is due to WASI capabilities model, specifically related to filesystem rules, which differ from permissions in a POSIX environment.

One difference though is that POSIX normally allows processes to request a file descriptor for any file in the entire filesystem hierarchy, which is granted based on whatever security policies are in place. This doesn’t violate the capability model, but it doesn’t take full advantage of it. CloudABI, Fuchsia, and other capability-oriented systems prefer to take advantage of the hierarchical nature of the filesystem and require untrusted code to have a capability for a directory in order to access things inside that directory. This way, you can launch untrusted code, and at runtime give it access to specific directories, without having to set permissions in the filesystem or in per-application or per-user configuration settings.

In order for our program to create a file in the current directory, we need to give it access to the current directory.

$ wasmtime run --dir=. wasm32_wasi.wasm
$ cat hello.txt
Hello, world!

By exposing libc functions to our module, and granting it capabilities to access resources outside of its environment, we are now able to write programs that look much more like the native binaries we are used to. That’s enough to get us started, but check back for more posts as I continue diving deeper into the current state of WASM / WASI.