Using a ConfigMap as an OCI Image Cache

May 22, 2021

For folks familiar with Crossplane, you likely know that we adopt the design practice of using interfaces over implementations as frequently as possible. Even if we begin with an implementation, such as the current composition engine, we make sure to consider a future with potentially many implementations for the same functionality.

One of the places where we have taken advantage of Kubernetes’ similar approach to interfaces is in how we cache Crossplane package images. Crossplane packages are single-layer OCI images comprised of a single YAML file with a stream of manifests indicating CRDs to be installed, dependencies on other packages, and more. When running in the context of Kubernetes, container images are typically pulled via the kubelet communicating with the container runtime on a given Node. However, because of Crossplane’s mandate that images adhere to a specific minimal format, going through the kubelet introduces additional overhead and complexity compared to just deferring to Crossplane downloading the packages directly 1. Also, because the package images are so small – the latest provider-helm release is 10.11 KB and even much larger packages, such as provider-aws, are typically well under 1 MB – the overhead of downloading is not extreme.

That being said, we do want to maintain a cache of these package images as Crossplane continuously ensures that the state of the cluster matches that dictated by the installed packages, requiring frequent access to their contents. Crossplane will use an emptyDir volume to cache these images by default, but for users who prefer more durable or flexible storage for their cache, it also allows for using a PersistentVolumeClaim instead.

In fact, when developing locally or running integration tests, we frequently run Crossplane with a hostPath PersistentVolume, allowing us to modify the contents of a specific image by simply copying into a local directory. This is great when running on a local machine, but is somewhat cumbersome to do if you are a Crossplane end-user (i.e. requires direct access to Nodes in your cluster).

Yesterday I was thinking about how we could reduce the burden on folks who want to rapidly test their package images without pushing them to a registry then having Crossplane install them by downloading from it. Essentially, I wanted to make the development loop feel as “Kubernetes-native” as possible. You might be thinking to yourself: “Dan, have you heard of the kubectl cp command?". As it turns out, I have! What a useful2 little tool! However, kubectl cp (and its good friend kubectl exec) are not super useful to us here as the former requires tar to be present in the container and the latter requires, well, at least something to write file contents. With Crossplane, we are out of luck because it is built on a distroless base image, which includes the absolute minimum components for our binary to run. Also, though kubectl cp and kubectl exec are commands provided by kubectl itself, they feel more like backdoors than interacting with the Kubernetes API.

So what do Kubernetes users do when they need to store some unstructured data? You guessed it3: ConfigMap. If I was to personify ConfigMap, I would describe it as that friend you have that has zero opinions and will say “sure!” to almost anything. Need to store some configuration data? I got you. Want me to handle some definitely sensitive information in plaintext? Say no more. When the API type you need doesn’t exist and you don’t want to create Yet Another CRD™, ConfigMap is the one that you call. And fortunately for us, ConfigMap is even willing to hold our non-UTF-8 bytes in its binaryData field.

Because OCI images are just the highest praised tarballs of all time, they too can be stored in this binaryData field, and kubectl even has a handy shortcut to create a ConfigMap from a directory or file:


$ kubectl create configmap package-cache --from-file=my-packages/ -n crossplane-system

Let’s say I have just built my Crossplane package in the my-packages directory:


$ kubectl crossplane build configuration --name=./mypackages/bestpackage.xpkg
$ ls my-packages
bestpackage.xpkg

The aforementioned kubectl create configmap command will create a ConfigMap with an entry in the binaryData map with a key of bestpackage.xpkg and value with the bytes of the package tarball.


apiVersion: v1
kind: ConfigMap
metadata:
  name: package-cache
  namespace: crossplane-system
binaryData:
  bestpackage.xpkg: c2hhMjU2Ojc5ZDYxYWIxODUxNmNhZTZmZj...

That’s pretty simple! Now how do we get that into Crossplane? Fortunately, Kubernetes lets us mount ConfigMaps as a volume on a Pod, so we can just replace the emptyDir in the core Crossplane Deployment with our ConfigMap:


    volumeMounts:
      - mountPath: /cache
        name: package-cache
...
volumes:
  - name: package-cache
    configMap:
        name: package-cache

Now, when we install a Configuration with a source image that matches an identifier that is already in our package cache, and a packagePullPolicy: Never, Crossplane will look at the ConfigMap, see that our package is present, and read its contents to install it.


apiVersion: pkg.crossplane.io/v1
kind: Configuration
metadata:
  name: best-package
spec:
  package: bestpackage
  packagePullPolicy: Never

This is great, and updating our ConfigMap will even be reflected in our mounted volume path in the core Crossplane container, so we can update the package contents on the fly, or add new ones.

But this has to be too good to be true right? Well, besides the fact that ConfigMaps are certainly not designed to be reliable persistent data stores, there are a few other problems we can encounter:

  1. ConfigMap volumes are read only, which means that if you try to actually install a package from a registry, Crossplane will fail to cache its contents because the path will not be writeable. This is especially a problem when the package we loaded into the cache has dependencies on packages that are not in the cache.
  2. ConfigMaps have a size limit of 1 MiB4. Even with our tiny Crossplane packages, we can run up on that quite quickly.

That being said, this workflow is only for development, so these limitations may be acceptable. An option for making this more usable in the future could involve supporting a ConfigMap volume source natively in the Crossplane Helm chart, and then supporting separate paths for a “read-only” cache and a “read-write” cache.

However, a potentially less disruptive solution for tackling this problem is to inject a small initContainer into the core Crossplane Pod that copies data from our ConfigMap volume into whatever volume is configured for our package cache. This allows for dynamic loading (and unloading) of local packages without requiring that we actually use a ConfigMap as our long-term cache. When automated with something like up, this is quite a pleasant experience.

You’ll notice that the package we install, bestpackage, has a source of bestpackage, which is the data we loaded using our ConfigMap and initContainer. However, it declares dependencies on provider-aws and provider-helm, which are both fetched from the Upbound Registry and stored in our emptyDir package cache. This gives us the best of both worlds as we don’t have to manually load a package’s whole dependency tree when we are only developing at its layer.

Closing Thoughts

While using a ConfigMap as any meaningful form of storage is ill-advised, hacks such as this one help us rethink how the tools we interact with every day actually work. It also serves as a demonstration of the many ways in which Crossplane is built to be extended, something I hope to explore further in coming posts.

Send me a message @hasheddan on Twitter for any questions or comments!


  1. Downloading package images directly also means that we have more control over what “pulling a package” means, once again providing an interface that enables multiple implementations behind the API line. But that’s a story for another day ;) ↩︎

  2. And potentially horrifingly dangerous↩︎

  3. It was in the title, but we could all use some extra wins these days! ↩︎

  4. Ah yes, the good ol’ Megabyte (MB) vs Mebibyte (MiB) distinction. I hate this just as much as you do, but I’m just glad we can all agree that numbers should always be represented as base 2. ↩︎