I am currently working on a few projects that involve leveraging WebAssembly (WASM) modules, particularly with WebAssembly System Interface (WASI) support. While WASI is still in its early stages, support has already been added to the Rust compiler and Clang, as well as runtimes such as wasmtime and WAMR. However, getting an environment setup to compile arbitrary C programs to WASM can still be challenging.
I recently updated to Clang 17, which was first released as
17.0.1
in September of
2023. This release added
support
for shared libraries and position independent code (PIC) for WASM, but otherwise
had relatively few updates for the target. Installing Clang and the releated
LLVM packages has become rather straightforward with the automatic installation
script.
$ bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
wget
/bash
at your own peril!
This will install the latest stable release, which is currently 17.0.6
. If you
wish to install a specific version, you can invoke the script with an argument
as indicated in the documentation.
$ wget https://apt.llvm.org/llvm.sh
$ chmod +x llvm.sh
$ sudo ./llvm.sh <version number>
On my Linux machine running Ubuntu 22.04, this installs LLVM in /usr/lib
.
$ /usr/lib/llvm-17/
├── bin
├── build
├── cmake -> lib/cmake/llvm
├── include
├── lib
└── share
Clang can be found at /usr/lib/llvm-17/bin/clang
. We can compile C to WASM
without any further steps if we don’t need to perform any I/O. For example, the
following program can be compiled as a WASM module with two
exposed to the
embedder.
Embedder is the term used in the WASM specification to describe host environments in which WASM modules are executed.
wasm32.c
int two() {
return 2;
}
$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -o wasm32.wasm wasm32.c
Because Clang is typically compiling a binary that can be executed from a start
symbol, we need to inform it, or specifically the linker it invokes, that it
should not expect an entrypoint to be defined (-Wl,--no-entry
). We also need
to inform the linker that it should not attempt to include a standard library,
which we don’t have for our WASM target (-nostdlib
). Lastly, we need to ensure
that our two()
function is exported so that it can be called by the embedder
(-Wl,--export-all
). You can find the full set of linker options available for
wasm-lld
here.
To execute the two()
function in a host environment, we can use wasmtime
with the --invoke
flag. If you haven’t already installed wasmtime
, you can
do so with the provided script.
curl https://wasmtime.dev/install.sh -sSf | bash
Then invoke two()
.
$ wasmtime run --invoke two wasm32.wasm
2
This is interesting, but most meaningful programs need to perform some sort of I/O. If running in a traditional host environment, such as Linux on my machine, a C program can perform I/O operations by calling functions from the C standard library. Behind the scenes these functions are frequently invoking syscalls, which allow a program running in userspace to access privileged resources (e.g. memory, disk, networking peripherals) via the operating system kernel. The embedder of our WASM module can be thought of similarly to an operating system; it has access to privileged resources that the module does not.
However, unlike binary executables compiled for an operating system and instruction set architecture, WASM modules sacrifice knowledge of a standard interface for accessing privileged resources in favor of portability. WASM modules can be run by any embedder when they don’t rely on the embedder importing any functionality. One option for accessing data outside of the WASM module is to accept arguments to exported functions. We can modify our previous example to demonstrate this.
wasm32_args.c
int add(int a, int b) {
return a+b;
}
We can use the same command to compile this new module as we previously
specified that all symbols should be exported (-Wl,--no-entry
).
$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -o wasm32_args.wasm wasm32_args.c
We can also use wasmtime
again to invoke the add()
function, this time
passing the necessary arguments.
$ wasmtime run --invoke add wasm32_args.wasm 1 2
3
This certainly expands the functionality that can be offered by a WASM module, and other mechanisms for passing data between the embedder and the module, such as writing to the module memory, then passing a pointer to the data, can be utilized in more sophisticated scenarios. However, this pushes much of the logic to the embedder, meaning that the module remains portable, but more work is required of consumers. If all embedders could agree on a standard set of functionality that would be exposed to modules, the modules could maintain portability, while being able to do almost anything that native binary executables can. Even more importantly, existing programs could be recompiled to target the WASM instruction set without having to change any code.
In reality, there are still frequently small code changes required to target WASM.
WASI was introduced to fill this gap. It is similar to POSIX, but as Lin Clark describes:
POSIX provides source code portability. You can compile the same source code with different versions of libc to target different machines. But WebAssembly needs to go one step beyond this. We need to be able to compile once and run across a whole bunch of different machines. We need portable binaries.
Because of this binary portability, only one libc implementation is necessary,
namely wasi-libc
. In order to
utilize the functionality it offers in our WASM modules, we need to build a
sysroot
that can be linked against when compiling a module. This can be accomplished by
cloning the repository, then running make
with our Clang 17 installation.
If you are just interested in getting things working and don’t care to use your existing Clang / LLVM installation, you can download pre-compiled binaries from the
wasi-sdk
repository. In fact, a new release for LLVM 17 was published while I was writing this post.
$ git clone https://github.com/WebAssembly/wasi-libc.git && cd wasi-libc
$ make CC=/usr/lib/llvm-17/bin/clang
This should produce a sysroot
directory, which contains the necessary header
files and libraries, as denoted in the output.
#
# The build succeeded! The generated sysroot is in /home/hasheddan/code/github.com/WebAssembly/wasi-libc/sysroot.
#
We can inform Clang of the sysroot
location using the --sysroot
flag. By
default, Clang will attempt to also link the LLVM compiler-rt
builtins
library. LLVM packages do not
currently include the WASM build of the library
(libclang_rt.builtins-wasm32.a
), so we must either build it ourselves, or
download the pre-compiled binary from the wasi-sdk
release.
Or… we could use some magic incantations with Clang to avoid attempting to
link it for our simple example. First let’s define a program that utilizes I/O.
wasm32_wasi.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main() {
FILE *fp;
fp = fopen("hello.txt", "w");
if (fp == NULL) {
fprintf(stderr, "error opening file: %s\n", strerror(errno));
exit(1);
}
fputs("Hello, world!\n", fp);
fclose(fp);
return 0;
}
All we are doing is opening a file and writing Hello, world!
to it, but we are
making use of several libc functions to do so. To compile this program to a WASM
module with WASI support, we can use the following command.
$ /usr/lib/llvm-17/bin/clang -target wasm32-wasi --sysroot=/home/hasheddan/code/github.com/WebAssembly/wasi-libc/sysroot -nodefaultlibs -lc -o wasm32_wasi.wasm wasm32_wasi.c
There are a few notable differences from our previous invocations, including:
- The
-target
is now specified aswasm32-wasi
to indicate that we need WASI support. - We have provided a path to our
sysroot
that we built withwasi-libc
. - We have added the
-nodefaultlibs
flag to prevent the linker from attempting to linklibclang_rt.builtins-wasm32.a
, then provided-lc
to re-add linking libc. - We have removed
-nostdlib
,-Wl,--no-entry
, and-Wl,--export-all
as we do want to use the standard library (libc), we do have a defined entrypoint, and we don’t need to export all symbols (_start
will be exported automatically).
Finally, we can use wasmtime
to run our WASM WASI module, this time without
specifying a function to invoke.
$ wasmtime run wasm32_wasi.wasm
error opening file: No such file or directory
The error here is due to WASI capabilities model, specifically related to filesystem rules, which differ from permissions in a POSIX environment.
One difference though is that POSIX normally allows processes to request a file descriptor for any file in the entire filesystem hierarchy, which is granted based on whatever security policies are in place. This doesn’t violate the capability model, but it doesn’t take full advantage of it. CloudABI, Fuchsia, and other capability-oriented systems prefer to take advantage of the hierarchical nature of the filesystem and require untrusted code to have a capability for a directory in order to access things inside that directory. This way, you can launch untrusted code, and at runtime give it access to specific directories, without having to set permissions in the filesystem or in per-application or per-user configuration settings.
In order for our program to create a file in the current directory, we need to give it access to the current directory.
$ wasmtime run --dir=. wasm32_wasi.wasm
$ cat hello.txt
Hello, world!
By exposing libc functions to our module, and granting it capabilities to access resources outside of its environment, we are now able to write programs that look much more like the native binaries we are used to. That’s enough to get us started, but check back for more posts as I continue diving deeper into the current state of WASM / WASI.