A quick code search query reveals at least 7,000 Kubernetes Custom Resource Definitions in the open source corpus,1 most of which are likely generated with controller-gen —a tool that turns Go structs with comments-based markers into Kubernetes CRD manifests, which end up being custom APIs served by the Kubernetes API server.
At LinkedIn, we develop our fair share of custom Kubernetes APIs and controllers
to run workloads or manage infrastructure. In doing so, we rely on the custom
resource machinery and controller-gen
heavily to generate our CRDs.
Table of Contents
Validate religiously
As a controller developer, you should only admit validated and complete custom resources into your API server.
Any resource that has illegal values or missing fields is begging for trouble to happen down the line. Your controllers should not have implicit defaults for resources. As the Kubernetes API conventions recommend:
In general we want default values to be explicitly represented in our APIs, rather than asserting that “unspecified fields get the default behavior”.
You cannot reliably compensate for a missing field in your controller in the long term, nor should you have to deal with an illegal value during reconciliation.
Explicit +required
or +optional
on every field
controller-gen has many different ways of marking a field “optional”:
-
Go struct field has omitempty marker:
type Car struct { Brand json:"brand,omitempty"`
-
The struct field has the
//+optional
marker comment.type Car struct { //+optional Brand json:"brand"`
-
The struct field has
//+kubebuilder:validation:Optional
marker comment.type Car struct { //+kubebuilder:validation:Optional Brand json:"brand"`
…and typically you might think that’s it but:
-
You have a package-level marker on the Go package that makes all fields “optional by default” (a feature I’m yet to find a use case for):
//+kubebuilder:validation:Optional package v1beta1
This is simply far too many different ways to achieve something, and it offers too many ways to open your API up to more relaxed validation due to misconfiguration.
Up until controller-tools v0.16 (released last month) it was not
possible to reliably mark a field as required. (Even if you specified the
+required
marker on the field, using the omitempty
tag would silently turn
the field into optional.)
For this reason alone, I strongly recommend upgrading to controller-tools v0.16+
and start explicitly specifying +required
or +optional
markers on every
single field of your API. You may already find that your API was making
some fields optional by mistake if you do this.
I recommend still having the package-level //+kubebuilder:validation:Required
marker to so that all struct fields are required by default as a safety net.
Field Validation
Zero vs null pitfalls
A major pitfall in understanding CRD validation is that the Go type system
allows for zero values to pass the +required
check: Empty strings (""
), zero
numerics (0
, 0.0
), empty slices ([]
), or empty maps ({}
) are all valid
values for their respective types.
The OpenAPI schema validation looks for the non-null
presence of the field in
the request payload.
This mistake will usually fly under the radar because you’re probably writing
your tests in Go. When the Go JSON serializer turns your test object into a JSON
payload, the request body will have "field": ""
(because you don’t have
omitempty
on the field), and the server will accept this as a valid resource.
Your tests will not fail.
If you really want to disallow zero values or empty strings for a field, use markers like:
+kubebuilder:validation:MinLength=1
for strings+kubebuilder:validation:Minimum=1
for integers.
NOTE
If you want to semantically distinguish between an “unspecified” field vs a “zero value”, define the Go struct field as pointer type. (That’s practically the only acceptable reason to use pointer fields on custom resource structs.)
Nested fields are not always validated
Consider this Car
custom resource type that has a required
spec.brand
field with an enum validation:
package v1beta1
//+kubebuilder:object:root=true
type Car struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CarSpec `json:"spec,omitempty"`
}
type CarSpec struct {
//+kubebuilder:validation:Enum=BMW;Porsche;McLaren
//+required
Brand string `json:"brand"`
}
It’s still possible to create a resource like the following and skip the the API server validation:
apiVersion: example.com/v1beta1
kind: Car
metadata:
name: my-car
This is a valid object as far as the API server is concerned, but it’s not what
you wanted. The Brand
field was not validated because of how Open API schema
validation works: An object field is only validated if it is specified on the
request payload.
If the YAML payload listed above had included a spec: {}
on the wire, it would
have been validated and the request would have been rejected.
Markers aren’t always validated
While the controller-gen
tool does extensive validation of Go structs you
are authoring for small mistakes, it does not validate the markers it does not
recognize:
type Car struct {
//+kubebuilder:validation:enum=BMW;Porsche;McLaren
Brand string `json:"brand,omitempty"`
}
If you can’t spot the error above, you’re one of the dozens of
developers
that thought controller-gen
would’ve complained if you misspelled :Enum
as
:enum
.
Do not rely on every validation markers to be strictly validated by
controller-gen
, and inspect the generated CustomResourceDefinition manifests.
Field Defaulting
Defaulting on nested structs
Let’s extend the Car
API to define a Transmission.Type
field that’s not
required, but defaults to Automatic
using the +kubebuilder:default
marker:
type CarSpec struct {
//+kubebuilder:validation:Enum=BMW;Porsche;McLaren
//+required
Brand string `json:"brand"`
//+optional
Transmission Transmission `json:"transmission,omitempty"`
}
type Transmission struct {
//+kubebuilder:default:=Automatic
Type string `json:"type,omitempty"`
}
If you craft a request to create a Car
resource and the request body lacks a
transmission
field like this:
spec:
brand: BMW
the Transmission.Type
field won’t be defaulted to Automatic
—for the
same reason listed above on how OpenAPI schema validation works on nested
structs: Members of the Transmission
type is defaulted only if the field has a
non-null
value in the request payload.
To avoid this pitfall, you can set a default the value on the Transmission
field to empty object ({}
), which will do the defaulting on its nested
fields:
//+kubebuilder:default:={}
//+optional
Transmission Transmission `json:"transmission,omitempty"`
It is now possible to omit the spec.transmission
field from the request
body on the wire, and the resulting object will have transmission: {type: Automatic}
.
Defaulting and validation at the same time
Let’s continue from the previous example, and make the spec.transmission.type
field //+required
like this
type CarSpec struct {
//+kubebuilder:default:={}
//+optional
Transmission Transmission `json:"transmission,omitempty"`
}
type Transmission struct {
//+kubebuilder:default:=Automatic
//+required
Type string `json:"type"`
}
If you run controller-gen crd
to generate a CRD from this, you’ll see that
it is explicitly failing:
The CustomResourceDefinition "cars.example.com" is invalid: spec.validation.openAPIV3Schema.properties[spec].properties[transmission].default.type: Required value
In this error, the API Server it tells you that the default value transmission: {}
is not a valid value for that field using its OpenAPI schema validation, and
refuses to accept the CustomResourceDefinition.
You can fix this by providing a more complete default value on the parent struct:
type CarSpec struct {
//+kubebuilder:default:={type:Automatic}
Transmission Transmission `json:"transmission,omitempty"`
...
However, this is not ideal because we ended up duplicating the default value
"Automatic"
twice: once on the CarSpec.Transmission.Type
field, and once on
the CarSpec.Transmission
. This is potentially a maintenance nightmare. Let me
know if you have a better solution to this problem. But this is the only way
I know to make this work.
Explicit defaults for zeroable fields
Suppose you’re implementing the ReplicaSet API and its controller, and you have
fields like .status.readyReplicas
updated by the controller. Since a single
controller is responsible for updating the status, you probably make a PATCH
request.
However, if you calculate a patch when both the “before” and
“after” objects have readyReplicas: 0
, the resulting payload will not have a
readyReplicas
field.
As a result, the Kubernetes API machinery will not set a value for this field.
and your status
will be missing this field (which is not what you want,
because your clients will expect this field to be present even when the value is
0). The field will remain non-existent in the status object until the controller
updates it to a non-zero value, which might never happen.
That’s why you should consider explicitly configuring the default values on the controller-managed fields (where applicable) in case the controller sends a partial patch that never sets a value for the field:
type MyWorkloadStatus struct {
//+kubebuilder:default:=0
ReadyReplicas int32 `json:"readyReplicas"`
Functionally, this is not super critical but when you see a mix of empty values
and 0 values in your kubectl get
output, you now know why it might be
happening.
Conclusion
controller-gen
has its fair share of quirks. Just like any weakly-typed
system, controller-gen
ideally should be accompanied by a more robust static
analysis tool or linter that can catch these mistakes before they’re committed
to the repository. I think we’re lacking more tooling in this space (though
some tools exist).
Comment-based markers pose a real risk of breaking the backwards compatibility on an API field (e.g. by making a field required, or removing an enum value). Human judgement goes only so far in getting these right.
It is maintained by a fairly small yet active developer community. If you use CRDs at your company, consider contributing to the project to make it better.
If you’re using controller-gen
in your project, I hope this article helps you
avoid some of the pitfalls we’ve learned. If you’re looking to make the
ecosystem better, hopefully the problems inspire you to build static analysis
tools that can catch these problems before they’re committed.
-
This astounding number excludes most of the repos that also automatically generate the Go types (cloud providers, Crossplane, etc.). It’s fair to say we’re all probably overdoing CRDs a bit, but that’s a topic for another day. ↩︎