If I was to explain LLVM in a painfully simple manner, I might say something like: “LLVM is a framework that allows you to define a syntax for interacting with computers, and instantly have it work for all computers”. This description does a colossal disservice to the scope of the project, but it serves to illustrate the basic idea: there are many languages (frontends) and many CPU architectures (backends) and we don’t want to re-implement every translation permutation.
Furthermore, if we can make some generic optimizations in the middle, we get wins across all languages and architectures. Consumers of compilers built on LLVM can also pick and choose what sort of optimizations they want to enable, and most of them come with tradeoffs. When debugging a program, we may want to trade some performance optimizations for faster compilation times. When deploying an application to production, we may opt for the latter. Importantly, the language designer does not have to implement all of these different options, they merely have to expose them.
The Rust compiler, for example, exposes
flags, such as
to allow users to tune these optimizations to their liking. It also allows for
customization of debug information included in the generated binary, using
This is extremely valuable functionality, and something Rust mostly gets for
free by being built on
LLVM. There is much
more we could explore here, and maybe we will in another post, but that is not
our focus for today.
If you are interested in learning more about LLVM and how to use it, I highly recommend the Kaleidoscope tutorial.
Crossplane - Explained by a Human Link to heading
So what does this have to do with Crossplane? Crossplane is neither a language nor a target architecture – its documentation says it is “… an open source Kubernetes add-on that supercharges your Kubernetes clusters enabling you to provision and manage infrastructure, services, and applications from kubectl”. That is some great marketing copy! 1
We’ll keep up our trend of overly simplifying extremely complex systems by
describing Kubernetes as a “distributed system
framework that just incidentally happens to be commonly used for orchestrating
containers”2. Even folks who have had only the most elementary introduction
to Kubernetes will likely recognize some of the API types it offers, such as
Services. These are the components that lead the
project to frequently be described as a “container orchestrator”, and they are
built-in the the API server.
Kubernetes also, (in)famously3, allows anyone to extend the API using another
built-in type named
(CRDs). Crossplane takes advantage of this functionality by introducing a few
core CRDs into your
cluster, then allowing you to add many more by installing various
The core CRDs enable two main feature areas in Crossplane: composition and package management. Composition allows you to take granular Kubernetes API types and create new abstract API types that map to one or more of them. The package manager allows you to introduce, version, and upgrade more of these primitive API types in your cluster. Crossplane has two types of packages: Providers and Configurations. Let’s take a look at Providers first.
Crossplane Providers are a set of Kubernetes CRDs and
reconcile them. These CRDs are high-fidelity, 1:1 representations of external
APIs. We refer to this type of CRD as a “managed resource”, and they are all
cluster-scoped4. The controllers that reconcile them know how to take the
specification of the resource and drive its external state to match it. Common
examples of Crossplane Providers include
countless others. Each of these introduce CRDs for their APIs. For example,
provider-gcp installs CRDs such as
User, and more. Once these Providers are
installed into a Crossplane cluster, you could create instances of them and they
would be provisioned via the corresponding API.
Before jumping to Configuration packages, let’s take a brief detour into
composition. If we installed some of the Providers mentioned above, we now have
many managed resource CRDs in our cluster. Some of these are very complex,
requiring configuration of many
and some depend on the existence of other resources before they can be used
RDSInstance might require other AWS managed resources, such as
SecurityGroup, etc. as well as SQL resources, such as
Grant, before an application can connect. It is
difficult for even an experienced cloud architect5. In response,
organizations either require developers to contact infrastructure admins and ask
them to set up the proper resources they need, or they throw security out the
window and subject developers to baptism by fire. The first solution removes any
notion of DevOps6, and the second is not something anyone is going to be
excited about, unless your name is Ian
Crossplane aspires to provide a third option by allowing infrastructure admins
to define abstract infrastructure types and templates that map to them. In the
example with the
RDSInstance, the developer does not really want an
RDSInstance. They want a
MySQL database. The fact that the
database is on AWS, GCP, or running in the server closet down the hall is of no
concern to them. However, they may want to influence some characteristics of
that database, such as its size, locality, or engine. Infrastructure admins can
choose what knobs to expose to developers, and may choose to expose more to some
teams than others.
Alright, enough playing around, let’s look at some
YAML. The two primary
API types that are used to facilitate this workflow are
CompositeResourceDefinitons (XRDs) and
Compositions. XRDs are what define
the abstract type. When an infrastructure admin creates an XRD, Crossplane
responds by creating a cluster-scoped CRD, and optionally a namespace-scoped CRD
as well. Here is one for a
DB, with only one configurable field (
which is similar to the example from the Crossplane getting started
apiVersion: apiextensions.crossplane.io/v1 kind: CompositeResourceDefinition metadata: name: clusterdbs.example.org spec: group: example.org names: kind: ClusterDB plural: clusterdbs claimNames: kind: DB plural: dbs connectionSecretKeys: - username - password - endpoint - port versions: - name: v1alpha1 served: true referenceable: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: parameters: type: object properties: storageGB: type: integer required: - storageGB required: - parameters
If you were to
kubectl apply this manifest in your Kubernetes cluster where
Crossplane is installed, you would see after a few seconds that a
spec.names.kind) cluster-scoped CRD existed, and a
spec.claimNames.kind). As of this point, our cluster looks like the
Note: All of the CRD instances that exist in the “Cluster-Scoped Resources” box and define the managed resources (e.g.
RDSInstance) and core Crossplane types (e.g.
Composition) in the “Cluster-Scoped API Types” box have been omitted for brevity, but they are present as well.
Crossplane creates these CRDs on your behalf, using the
provide, as well as injecting some common
that Crossplane uses for machinery. Next, Kubernetes, as it does for all CRDs,
will register the defined schemas as API types at the cluster and namespace
Now anyone in the cluster, given proper RBAC permissions, could create a
a given namespace, or a
ClusterDB at the cluster scope. In Crossplane
vernacular, We call the cluster-scoped variant (
ClusterDB) an “XR” or
“composite resource”, and the namespace-scoped variant (
DB) an “XRC” or
“composite resource claim”. The other thing Crossplane is going to do when an
XRD is created is dynamically start controllers to watch both the
variants. These controllers watch for users to create instances of these types,
then use generic logic to match the instances to templates that have declared
they can satisfy the type. We’ll need to create at least one of those templates
for these types to be of any use to us.
Composition type is used to define a new template. These can be relatively
simple, such mapping our namespace-scoped
DB to a cluster-scoped
and hard-coding some fields. This is useful for a policy enforcement and
permission isolation7. An example
Composition that does this could look as
apiVersion: apiextensions.crossplane.io/v1 kind: Composition metadata: name: clusterdbs.aws.example.org labels: provider: aws environment: dev spec: writeConnectionSecretsToNamespace: crossplane-system compositeTypeRef: apiVersion: example.org/v1alpha1 kind: ClusterDB resources: - name: rdsinstance base: apiVersion: database.aws.crossplane.io/v1beta1 kind: RDSInstance spec: forProvider: region: us-east-1 dbInstanceClass: db.t2.small masterUsername: masteruser engine: postgres engineVersion: "9.6" skipFinalSnapshotBeforeDeletion: true publiclyAccessible: true writeConnectionSecretToRef: namespace: crossplane-system patches: - fromFieldPath: "metadata.uid" toFieldPath: "spec.writeConnectionSecretToRef.name" transforms: - type: string string: fmt: "%s-postgresql" - fromFieldPath: "spec.parameters.storageGB" toFieldPath: "spec.forProvider.allocatedStorage" connectionDetails: - fromConnectionSecretKey: username - fromConnectionSecretKey: password - fromConnectionSecretKey: endpoint - fromConnectionSecretKey: port
kubectl apply with this manifest will create the
instance at the cluster scope. It declares in its
compositeTypeRef that it can
ClusterDB abstraction, meaning that those dynamic controllers the
Crossplane started for the XR (
ClusterDB) and XRC (
DB) will both view this
as a valid template to target. Furthermore, you’ll notice we are patching from
spec.parameters.storageGB on the XR / XRC to the
spec.forProvider.allocatedStorage field on the templated
means that the developer can specify the size of the database without actually
knowing what is satisfying the request, and also means that we could just as
easily swap out the
RDSInstance template for a
CloudSQLInstance one for GCP:
apiVersion: apiextensions.crossplane.io/v1 kind: Composition metadata: name: clusterdb.gcp.example.org labels: provider: gcp environment: dev spec: writeConnectionSecretsToNamespace: crossplane-system compositeTypeRef: apiVersion: example.org/v1alpha1 kind: ClusterDB resources: - name: cloudsqlinstance base: apiVersion: database.gcp.crossplane.io/v1beta1 kind: CloudSQLInstance spec: forProvider: databaseVersion: POSTGRES_9_6 region: us-central1 settings: tier: db-custom-1-3840 dataDiskType: PD_SSD ipConfiguration: ipv4Enabled: true authorizedNetworks: - value: "0.0.0.0/0" writeConnectionSecretToRef: namespace: crossplane-system patches: - fromFieldPath: "metadata.uid" toFieldPath: "spec.writeConnectionSecretToRef.name" transforms: - type: string string: fmt: "%s-postgresql" - fromFieldPath: "spec.parameters.storageGB" toFieldPath: "spec.forProvider.settings.dataDiskSizeGb" connectionDetails: - fromConnectionSecretKey: username - fromConnectionSecretKey: password - fromConnectionSecretKey: endpoint - type: FromValue name: port value: "5432"
In this case,
spec.parameters.storageGB is being mapped to
spec.forProvider.settings.dataDiskSizeGb, but from the developer’s
perspective, it will have the same effect of setting the size of the database.
You can define as many of these templates as you like for an XR / XRC pair. The
template that is chosen when an actual instance of the XR / XRC is created is
dictated by labels (you’ll notice we added a few in the
Before we actually create an instance of an XRC or XR, let’s take another peek
at our cluster:
Note: CRD instances and XRD instances are omitted in “Cluster-Scoped Resources” here as well.
Things are coming along! However, I must confess something to you: I told a lie. It was well-meaning, but a lie nonetheless. I said that these templates could satisfy both the XR and the XRC. The truth of the matter is that they can only satisfy the XR, but the creation of an XRC results in creation of an XR.
Note: though we will not demonstrate it today, this property of an XRC being bound to an XR means that infrastructure admins could “pre-provision” infrastructure by creating an XR directly, then have developers just create the XRC that binds to the existing XR, rather than creating a new one entirely. Yep – there is some fun stuff you can do here.
Let’s create the following XRC and see what happens in our cluster.
apiVersion: example.org/v1alpha1 kind: DB metadata: name: my-db namespace: default spec: parameters: storageGB: 20 compositionSelector: matchLabels: provider: aws writeConnectionSecretToRef: name: db-conn
Whoa, that’s pretty simple! You’re telling me that’s all I have to do to create a fairly complex database instance? Neat! 8 Behind the scenes, Crossplane is executing the following steps:
- Create a corresponding cluster-scoped XR (
my-db-xxx) for the namespace-scoped XRC (
- Match the XR to a compatible template. We specified
provider: awsin our XRC, so the only
Compositionwe can match to is
- Create the templated managed resources by taking the values provided on the
XR / XRC and patching them to the resources as defined by the selected
Composition. In this case, this results in the creation of a single
RDSInstancewith some hard-coded values and the
allocatedStorage: 20. At that point, the provider-aws controller is responsible for managing the
RDSInstance(it responds the same whether the resource was created directly by a user or indirectly using composition). Crossplane’s composition machinery will reflect relevant parts of the status back to the XR / XRC, as well as the connection details (which we won’t spend too much time on today).
This is a nice abstraction layer, but a rather trivial example. In a production
scenario, you likely do not want your database exposed publicly to the entire
internet, so you will need to set up networking infrastructure around it, and
reference it from the database. The Crossplane getting-started
has a wonderful “AWS with New VPC” example that I encourage you to take a look
at, but I will not copy it here because this post is getting quite long, I
haven’t even gotten to the point, and too large of a YAML wall at this point
would likely cause everyone but my own mother to abandon ship. The main idea is
that we go through the exact same flow as above, but instead of just spitting
RDSInstance, we spit out:
So we’ve established how Crossplane composition works, let’s get back to
packaging. Your XRDs and
Compositions are essentially how you are buidling
your own platform API, and if you have a limited number of clusters, it may make
sense to just throw them in a git repo and
kubectl apply them when you make
updates. There are a fair amount of Crossplane users that follow this exact
strategy. However, I ask you to indulge me for a moment while I explore another
strategy that will likely seem excessive at first, but I promise to prove its
worth in a moment10.
As mentioned before, Crossplane has a concept of a Configuration package. This
is similar to the Provider package, but instead of CRDs and a controller, they
only include XRDs and
Compositions. They also can declare dependencies on
Provider packages, meaning that when you install a Configuration package,
Crossplane will make sure a compatible version of all Providers it depends upon
are present in the cluster and will install them if they are not. Both types of
Crossplane packages are formatted as OCI
images, meaning that they can be
pushed to any registry, such as DockerHub, GCR, or the Upbound
This allows you (and anyone else if the package is public or shared) to have a “one-click” experience of deploying your entire platform API to any Crossplane cluster in a repeatable way. Once again, this has limited value if you are running a single cluster for your internal control plane, but if you have any intention of sharing, it can be very powerful. 12
Crossplane as a Cluster API Backend Link to heading
I think up until this point we have done an admirable job of showing how Crossplane can provide value to an organization that wishes to allow developers to self-service their infrastructure without making their environments the wild west from a policy enforcement perspective. But this has been shared before and isn’t really reason for another blog post.
What I am more interested in is how Crossplane’s packaging and composition models can be used with other Kubernetes controllers. And let’s not beat around the bush, there is one specific project where this could be of immense value: Cluster API (CAPI). I have been working with a few of the CAPI maintainers to think about how the two projects could collaborate.
DISCLAIMER: Cluster API is a rapidly changing project and I am certainly not the foremost expert. However, regardless of the exact details the concepts below should continue to hold.
The mission of Cluster API is allow provisioning Kubernetes clusters using the Kubernetes API itself. CAPI works by implementing core controllers and API types, then interfacing with three types of providers:
- Cluster Infrastructure Providers: supplies whatever prerequisites are necessary for running machines. Examples might include networking, load balancers, firewall rules, and so on.
- Machine Infrastructure Providers: responsible for managing the lifecycle of provider-specific machine instances. These may be physical or virtual instances, and they represent the infrastructure for Kubernetes nodes.
- Bootstrap Providers: generates bootstrap data that is used to bootstrap a Kubernetes node.
In practice, Cluster and Machine providers are typically packaged together and
have a lot of similarities with Crossplane providers:
and so on. Each of these providers supplies concrete implementations of the
abstract types required by core CAPI. 13 For instance, core CAPI has a
type, which references a concrete provider implementation in its
If using cluster-api-provider-gcp, the referenced type would be
Behind the scenes, cluster-api-provider-gcp implements a
that sets up all of the infrastructure required to run Kubernetes on GCP, other
than the nodes (which are handled by the
types and their corresponding controllers). The list of infrastructure resources
managed by the
GCPCluster controller includes14:
This differs from Crossplane’s provider-gcp, as each of these resources have
their own controller. In comparison, CAPI introduces abstraction at the
controller level, while Crossplane implements it at the composition level.
In Crossplane, we would define a
GCPCluster XRD (or even just a generic
Cluster, since we can have many
Compositions for a single XRD) and a
Composition with templates for all of the resources listed above. Why move
abstraction to the composition level? There are a few reasons.
First, it leads to small, focused controllers with minimal scope. Crossplane provider controllers are interested in reconciling their 1:1 representative external resource with the corresponding object in the cluster. Presumably a reason for bundling many external resources into a single controller in CAPI is because they have many related fields and references to each other. However, Crossplane addresses this by the aforementioned cross-resource reference model.
Second, and most importantly, it allows for easy customization by users. In the
CAPI model, if a user wants to change some aspect of the
GCPCluster API must be modified to expose that field, and the controller
must be modified to honor it. Alternatively, when using Crossplane composition,
a user could modify or create a new
Composition that maps to a
XRD with the
Firewall template modified, or add a field to the
XRD that patches to the relevant field on the
Firewall for “provision-time”
customization. This is faster, more flexible, and doesn’t require actually
modifying any code.
Third, since Crossplane allows users to package up
Compositions and XRDs into
Configuration packages, these different cluster configurations can be easily
shared. For instance, the community could define Configuration packages that
supply “production cluster configurations” and “development cluster
configurations”, with the latter optimizing for cost and the former optimizing
for high availability. Furthermore, these packages will automatically install
the necessary managed resources by declaring dependencies on the relevant
Lastly, it brings together the cloud-native community to accelerate the effort to define infrastructure using the declarative Kubernetes API. The CAPI, Crossplane, and various provider-specific communities have all done a lot of work to this end, but the longer we stay fragmented, the more duplicate work we will perform.
Weren’t We Talking About LLVM? Link to heading
Ah yes, sorry about that, this post is supposed to be about Crossplane and LLVM. You may already see where we are going. We described Crossplane as a “backend” for Cluster API, but really Crossplane itself is the customizable translation layer, and Crossplane’s Providers are the backends that target infrastructure APIs, similar to the LLVM backends the target different ISAs. LLVM provides a more friendly interface to compute resources – CPU, memory, etc. – for its compiler frontends, while Crossplane provides a friendly interface to cloud infrastructure.
In the example above, CAPI, and operators for Kafka and Sourcegraph, are deploying distributed systems that require infrastructure. They all can run on various cloud providers, and Kafka and Sourcegraph can run on Kubernetes itself. Importantly, Crossplane providers can target any API, not jut the cloud providers, so something like provider-helm could be a backend for deploying something like Kafka on Kubernetes.
Closing Thoughts Link to heading
As you can tell from my endless droning on, I am very excited about this direction for the cloud-native community. While I paint a rosy picture, there is still quite a lot of work to be done, but I can say for certain that I, and others in the Crossplane community, will show up to do our part.
I also want to acknowledge the incredible work the Cluster API community has done to build an innovative project that continues to gain wide adoption. Any future work will be standing on the shoulders of giants.
Send me a message @hasheddan on Twitter for any questions or comments!
I’m allowed to say snarky stuff like this because I helped write it. ↩︎
I had too much coffee this morning and now I’m just getting carried away. ↩︎
This was my job title once and I still don’t know what it means – yet here I am using it in a blog post that I proclaim contains valuable information. Life comes at you fast. ↩︎
There I go again, using words that don’t mean anything. ↩︎
Read more about moving the level of permissioning to the level of abstraction in this other terrific post by Nic. Also, if you are familiar with Open Policy Agent your mind is probably already going to some wild places. ↩︎
Update: the day has progressed, the coffee is wearing off, and sarcasm levels are rising. ↩︎
This feature is another reason to have cluster-scoped managed resources – okay, you can continue yelling at me now. ↩︎
Apologies for burying the lede here. Also, did you know that it is spelled “lede” when you say “burying the lede”? I sure didn’t. Or maybe it isn’t spelled “lede” and “lead” is actually correct. I really don’t know at this point, you decide. ↩︎
The Upound Registry provides some additional functionality for indexing Crossplane packages over any vanilla image registry, plus I work for Upbound so I am basically required by law to shill for it. There are some real benefits, but you can also use other registries if you please. ↩︎
I promise the moment of truth is coming. ↩︎
Hmm, this sounds familiar. ↩︎