For folks familiar with Crossplane, you likely know that we adopt the design practice of using interfaces over implementations as frequently as possible. Even if we begin with an implementation, such as the current composition engine, we make sure to consider a future with potentially many implementations for the same functionality.
One of the places where we have taken advantage of Kubernetes’ similar approach
to interfaces is in how we cache Crossplane package images. Crossplane packages
are single-layer OCI images
comprised of a single YAML file with a stream of manifests indicating
CRDs
to be installed, dependencies on other packages, and
more. When running in
the context of Kubernetes, container images are typically pulled via the
kubelet
communicating with the container runtime on a given Node
. However,
because of Crossplane’s mandate that images adhere to a specific minimal format,
going through the kubelet
introduces additional overhead and complexity
compared to just deferring to Crossplane downloading the packages directly 1.
Also, because the package images are so small – the latest provider-helm
release
is 10.11 KB
and even much larger packages, such as provider-aws
, are
typically well
under
1 MB
– the overhead of downloading is not extreme.
That being said, we do want to maintain a cache of these package images as
Crossplane continuously ensures that the state of the cluster matches that
dictated by the installed packages, requiring frequent access to their contents.
Crossplane will use an emptyDir
volume to cache
these images by default, but for users who prefer more durable or flexible
storage for their cache, it also allows for using a PersistentVolumeClaim
instead.
In fact, when developing locally or running integration tests, we frequently run
Crossplane with a
hostPath
PersistentVolume
, allowing us to modify the contents of a specific image by
simply copying into a local directory. This is great when running on a local
machine, but is somewhat cumbersome to do if you are a Crossplane end-user (i.e.
requires direct access to Nodes
in your cluster).
Yesterday I was thinking about how we could reduce the burden on folks who want
to rapidly test their package images without pushing them to a registry then
having Crossplane install them by downloading from it. Essentially, I wanted to
make the development loop feel as “Kubernetes-native” as possible. You might be
thinking to yourself: “Dan, have you heard of the kubectl cp
command?”. As it
turns out, I have! What a useful2 little tool! However, kubectl cp
(and its
good friend kubectl exec
) are not super useful to us here as the former
requires tar
to be present in the container and the latter requires, well, at
least something to write file contents. With Crossplane, we are out of luck
because it is built on a
distroless base image,
which includes the absolute minimum components for our binary to run. Also,
though kubectl cp
and kubectl exec
are commands provided by kubectl
itself, they feel more like backdoors than interacting with the Kubernetes API.
So what do Kubernetes users do when they need to store some unstructured data?
You guessed it3: ConfigMap
. If I was to personify ConfigMap
, I would
describe it as that friend you have that has zero opinions and will say “sure!”
to almost anything. Need to store some configuration data? I got you. Want me to
handle some definitely sensitive information in plaintext? Say no more. When the
API type you need doesn’t exist and you don’t want to create Yet Another CRD™,
ConfigMap
is the one that you call. And fortunately for us, ConfigMap
is
even willing to hold our non-UTF-8 bytes in its binaryData
field.
Because OCI images are just the highest praised tarballs of all time, they too
can be stored in this binaryData
field, and kubectl
even has a handy
shortcut to create a ConfigMap
from a directory or file:
$ kubectl create configmap package-cache --from-file=my-packages/ -n crossplane-system
Let’s say I have just built my Crossplane
package
in the my-packages
directory:
$ kubectl crossplane build configuration --name=./mypackages/bestpackage.xpkg
$ ls my-packages
bestpackage.xpkg
The aforementioned kubectl create configmap
command will create a ConfigMap
with an entry in the binaryData
map with a key of bestpackage.xpkg
and value
with the bytes of the package tarball.
apiVersion: v1
kind: ConfigMap
metadata:
name: package-cache
namespace: crossplane-system
binaryData:
bestpackage.xpkg: c2hhMjU2Ojc5ZDYxYWIxODUxNmNhZTZmZj...
That’s pretty simple! Now how do we get that into Crossplane? Fortunately,
Kubernetes lets us mount ConfigMaps
as a volume on a Pod
, so we can just
replace the emptyDir
in the core
Crossplane
Deployment
with our ConfigMap
:
volumeMounts:
- mountPath: /cache
name: package-cache
...
volumes:
- name: package-cache
configMap:
name: package-cache
Now, when we install a Configuration
with a source image that matches an
identifier that is already in our package cache, and a packagePullPolicy: Never
, Crossplane will look at the ConfigMap
, see that our package is
present, and read its contents to install it.
apiVersion: pkg.crossplane.io/v1
kind: Configuration
metadata:
name: best-package
spec:
package: bestpackage
packagePullPolicy: Never
This is great, and updating our ConfigMap
will even be reflected in our
mounted volume path in the core Crossplane container, so we can update the
package contents on the fly, or add new ones.
But this has to be too good to be true right? Well, besides the fact that
ConfigMaps
are certainly not designed to be reliable persistent data stores,
there are a few other problems we can encounter:
ConfigMap
volumes are read only, which means that if you try to actually install a package from a registry, Crossplane will fail to cache its contents because the path will not be writeable. This is especially a problem when the package we loaded into the cache has dependencies on packages that are not in the cache.ConfigMaps
have a size limit of1 MiB
4. Even with our tiny Crossplane packages, we can run up on that quite quickly.
That being said, this workflow is only for development, so these limitations may
be acceptable. An option for making this more usable in the future could involve
supporting a ConfigMap
volume source natively in the Crossplane Helm chart,
and then supporting separate paths for a “read-only” cache and a “read-write”
cache.
However, a potentially less disruptive solution for tackling this problem is to
inject a small initContainer
into the core Crossplane Pod
that copies data
from our ConfigMap
volume into whatever volume is configured for our package
cache. This allows for dynamic loading (and unloading) of local packages without
requiring that we actually use a ConfigMap
as our long-term cache. When
automated with something like up
, this is
quite a pleasant experience.
You’ll notice that the package we install, bestpackage
, has a source of
bestpackage
, which is the data we loaded using our ConfigMap
and
initContainer
. However, it declares dependencies on provider-aws
and
provider-helm
, which are both fetched from the Upbound
Registry and stored in our emptyDir
package
cache. This gives us the best of both worlds as we don’t have to manually load a
package’s whole dependency tree when we are only developing at its layer.
Closing Thoughts Link to heading
While using a ConfigMap
as any meaningful form of storage is ill-advised,
hacks such as this one help us rethink how the tools we interact with every day
actually work. It also serves as a demonstration of the many ways in which
Crossplane is built to be extended, something I hope to explore further in
coming posts.
Send me a message @hasheddan on Twitter for any questions or comments!
-
Downloading package images directly also means that we have more control over what “pulling a package” means, once again providing an interface that enables multiple implementations behind the API line. But that’s a story for another day ;) ↩︎
-
And potentially horrifingly dangerous… ↩︎
-
It was in the title, but we could all use some extra wins these days! ↩︎
-
Ah yes, the good ol’ Megabyte (MB) vs Mebibyte (MiB) distinction. I hate this just as much as you do, but I’m just glad we can all agree that numbers should always be represented as base 2. ↩︎