Knative project is usually explained as building blocks for “serverless on Kubernetes”. As a result of this implication, most Kubernetes users are not aware of what Knative can do for their non-serverless workloads: Better autoscaling and networking for stateless microservices on Kubernetes.

Let’s put “serverless” aside for a moment. Most Kubernetes users write microservices that follow the twelve-factor app manifesto: Services should be stateless and communicating to each other over requests/RPCs.

Kubernetes does not operate at a layer friendly to such microservices. Built-in Kubernetes workload and networking features fall short in some common tasks you would need in a microservices world:

  • need to write two manifests for an app (Deployment, Service)
  • no per-request/per-RPC load balancing for HTTP/gRPC 1
  • no traffic splitting, so can’t do blue/green deployments easily
  • CPU/memory based autoscaling (often slow, and not what you need)
  • no concurrency controls (i.e. max N in-flight requests per Pod)

Knative Serving installed on a Kubernetes cluster directly addresses these shortcomings of Kubernetes:

  • a “Service” object which is a combination of Kubernetes Service+Deployment
  • per-request load balancing for HTTP/gRPC
  • request-based rapid autoscaling, reacts quickly
  • traffic splitting using revisions of a Service (blue/green deployments)
  • rapid autoscaling out-of-the-box, highly configurable
    • scale-to-zero (suspend) and 0-to-N (activation), also configurable
    • concurrency controls (limit in-flight requests per Pod)
  • (optional) automatic monitoring support for HTTP metrics (latency, requests, etc.)
  • (optional) automatic TLS certs and termination for external endpoints

Adopting a new component like Knative on the critical path of your stack can be a hard decision. There are several aspects of the Knative project that might change your mind about this:

  • Knative is about to hit v1.0 and become “stable”, has a solid community. 2
  • Knative is modular: You can only install its Serving component.
  • Knative no longer depends on Istio: Knative needs a load balancer for routing and traffic splitting, but you can use alternatives like Gloo or Ambassador, or a gateway proxy built specifically for Knative such as Kourier.

The Knative “Service” API

Knative combines Kubernetes Deployment+Service into a single Service type.

Most stateless Kubernetes Deployment manifests can be migrated to Knative very easily, by changing apiVersion/kind and trimming some unnecessary parts:

# modify this:
apiVersion: apps/v1
kind: Deployment

# to this:
kind: Service

Every time you update the Knative Service, a new Revision object is created. Revision objects are immutable, forcing you to have snapshots of your deployment. You can then use Revision objects to split traffic during rollouts, or simply rollback.

Knative Serving offers other APIs the developers may not face day-to-day, such as Configuration which encourages to separate your code and configuration (another twelve-factor app mantra).

Knative Autoscaling

As a developer, if you have had problems wrapping your mind around autoscaling your microservices in Kubernetes, there’s a good reason for it.

Kubernetes autoscaling is primarily done using Horizontal Pod Autoscaler. The HPA controller looks at CPU/memory usage (or custom metrics) and operates on a long time window by default. Therefore, it is slower to react to sudden traffic spikes.

On the other hand, Knative autoscaling is driven by “in-flight requests” (concurrency): If your service suddenly gets more requests than it can handle, Knative will add more Pods quickly, because it operates on a shorter time window. Therefore it’s quicker to react.

Knative knows every request that comes to a Pod.3 With Knative, you can configure your app to have a “target concurrency” level, which is a soft target –or a hard “concurrency” limit, which guarantees maximum in-flight requests sent to each Pod. With the concurrency settings in mind, Knative autoscaler closely monitors incoming requests and quickly decides how many Pods to add.

While your app is being scaled up, Knative will hold on to the incoming requests, and proxy them to the new Pods without dropping the requests, which is not possible with Kubernetes.

As a developer, I prefer to say “my app can handle 20 requests at a time” in a one-line config, rather than “target average 70% CPU utilization and scale my app between 1 and 30 pods” in a long HorizontalPodAutoscaling manifest.

Moreover, Knative offers “scale-to-zero” for services not receiving traffic for some time (Kubernetes natively cannot do this4). This is achieved using the Knative activator component. This feature is turned on by default in a serverless fashion, and therefore causes “cold starts”. However, it can be easily turned off.

You can refer to this blog post, this doc and this example to learn more about the Knative autoscaler.

Knative is still Kubernetes

Applications deployed with Knative are still Kubernetes services. They can connect other Kubernetes services, and they can be connected using native Kubernetes service discovery.

Knative Services are deployed as Kubernetes Deployments. You can still specify the PodSpec values (environment variables, volume mounts, and other details) through the Knative Service object (example).

For a Kubernetes developer, the barrier of entry for Knative is quite low. With managed services like Cloud Run on Anthos, you can have a managed Knative setup in your clusters, either on cloud, or on-premises.

I think many Kubernetes users are currently missing out on how the Knative project can help them. Let me know if you liked this explanation of Knative. I hope to continue writing more articles around Knative.

  1. Kubernetes doesn’t know anything about individual requests, as it only sees TCP traffic. ↩︎

  2. Google Cloud Platform, Red Hat and IBM Cloud already offer Knative as a service. ↩︎

  3. Knative achieves the concurrency controls by injecting a sidecar proxy (queue-proxy). ↩︎

  4. Similarly, Osiris project by Azure also offers a scale-to-zero controller. ↩︎