Automate Canary Release Decisions in Your Kubernetes Cluster
December 30, 2022
Background
Nowadays, microservices has become a typical and widely used software architecture pattern. Services are loosely coupled and collaborate via APIs. The microservice pattern makes each application independently deployable and maintainable, so the release is more frequent. As we all know, the release is risky; you never know if there are any bugs in the new version. That's why people use strategies like canary release and blue-green deployment to gradually roll out their latest version and reduce the risk.
Canary Release splits the traffic into two groups of target service, the stable group and the canary one. API Gateway, like Apache APISIX, efficiently and safely exposes the microservice architecture to APIs. It is equipped with the feature of canary release. Typically there are two ways to decide how to split the traffic: the weight-based way and the predicate expression way.
Weight-Based Way
Users need to specify the proportion of traffic that will hit the canary group. In the image above, 95% of traffic will forward to the stable service while the other 5% will forward to the canary one.
Predicate Expression Way
The request predicate expression way indicates only traffic that fits in with the specified characteristics will hit the canary group. For instance, Only HTTP requests with the request header X-Debug and its actual value will reach the canary service.
Automate Canary Release
When you operate the API gateway by calling the API or the dashboard, there will be a time lag in adjusting the traffic ratio (for the weight-based way) or predicates (for the predicate expression way). Nowadays, more and more users use Kubernetes to orchestrate their microservices. Can people start the canary release once the new service version is created? In this article, I'll show you how to use API7 Cloud to automate the canary release in your Kubernetes cluster.
What is API7 Cloud
API7 Cloud is an any-cloud, multi-location SaaS platform for deploying, controlling, visualizing, and monitoring APIs at scale. Run your APIs anywhere but manage them just in one place. API7 Cloud uses Apache APISIX as the API Gateway to expose your APIs efficiently and safely.
To use API7 Cloud, you must deploy Apache APISIX on your infrastructures, like Docker and Kubernetes. You can use Cloud CLI to ease the deployment.
# Configure an access token from the API7 Cloud console.
cloud-cli configure --token {YOUR TOKEN}
# Deploy Apache APISIX (version 2.15.1) to namespace apisix, with only one replica.
cloud-cli deploy kubernetes \
--name my-apisix \
--namespace apisix \
--replica-count 1 \
--apisix-image apache/apisix:2.15.1-centos
Canary Release is one of the built-in features of API7 Cloud. Users can either configure the canary release rules via the console or call the API7 Cloud Open API. We aim to automate the canary release decisions, so we'll use the latter way.
Scenario
Let's say in our Kubernetes cluster, there is a simple error-page application, which always returns an error message. We are rolling out the version 2.0 and want to use the canary release strategy to reduce the release risk. What's more, we also want to make the whole progress automatically. Therefore, we create a canary release controller, which monitors the changes in Kubernetes service resources, then create / modify the canary releases on API7 Cloud via the API7 Cloud Go SDK. We only use the weight-based way to split the traffic. All components, including the Apache APISIX API Gateway, will be deployed on Kubernetes so that the diagram will be like this:
The canary release controller watches the service changes and makes reactions according to some annotations, specifically:
- If the service contains the annotation api7.cloud / published-service, the canary release controller will try to create an Application on API7 Cloud.
- If the service has the annotation api7.cloud / published-canary-service, the canary release controller will try to set up the canary release rule on API7 Cloud and the annotation api7.cloud/published-service-canary-percentage will decide the percentage.
Note this controller is not self-contained (it doesn't delete the Application if the service is deleted), but it's enough to show the automated canary release process.
Here We Go!
Let's start by deploying Apache APISIX and the canary release controller. As mentioned above, we use Cloud CLI to deploy Apache APISIX. We also have two YAML files (error-page/manifest-v1.yaml and controller/manifest.yaml) for deploying the error-page application and the canary release controller.
- Please prepare an available Kubernetes cluster if you want to execute the following commands.
- The canary release controller needs an access token for calling API7 Cloud API. We fetch a token according to this doc and store the token in a K8s secret.
kubectl create namespace canary-release-demo
# Deploy the error-page v1 version.
kubectl apply -f https://raw.githubusercontent.com/tokers/canary-release-automation-demo/main/error-page/manifest-v1.yaml -n canary-release-demo
# Create a K8s secret to save the API7 Cloud access token.
kubectl create secret generic api7-cloud --namespace canary-release-demo --from-literal token={Your Access Token}
# Deploy the canary release controller.
kubectl apply -f https://raw.githubusercontent.com/tokers/canary-release-automation-demo/main/controller/manifest.yaml -n canary-release-demo
# Check if all workloads are normal.
kubectl get all -n canary-release-demo
Check the proxy
Let's publish this service by annotating it.
kubectl annotate service -n canary-release-demo error-page-v1 "api7.cloud/published-service=error-page"
The canary release controller will watch this change and create an Application on API7 Cloud. Now let's access Apache APISIX to see if the proxy is normal.
kubectl port-forward -n canary-release-demo service/apisix-gateway 10080:80
curl http://127.0.0.1:10080/api/error_page -H 'Host: error-page' -s
If everything is OK, you'll see {"error": "injected by error_page service", "version": "v1"}
.
Currently, the canary release controller creates a "match-everything" API in the Application, and the Host is the same as the Application name (error-page).
Roll out V2
We want to roll out version 2 for the error page application. First, we deploy version 2 by applying the manifest-v2.yaml. We annotate the error-page-v2 service with the canary release annotations.
kubectl apply -f https://raw.githubusercontent.com/tokers/canary-release-automation-demo/main/error-page/manifest-v2.yaml -n canary-release-demo
# Tell the canary release controller that we enable the canary release for error-page-v2, and the percentage is 10%.
kubectl annotate service -n canary-release-demo error-page-v2 "api7.cloud/published-canary-service=true" "api7.cloud/published-service-canary-percentage=10"
# Start the canary.
kubectl annotate service -n canary-release-demo error-page-v2 "api7.cloud/published-service=error-page"
Now let's send 100
requests to Apache APISIX again and see if some requests were forwarded to the canary service error-page-v2.
kubectl port-forward -n canary-release-demo service/apisix-gateway 10080:80
for ((i=0; i<100; i++)); do
curl http://127.0.0.1:10080/api/error_page -H 'Host: error-page' -s
done
Only bout 10% of requests will reach the error-page-v2 (not precisely 10% due to the internal strategy that Apache APISIX selects the backend) if everything is well.
Rollback
We found that version 2 is unstable and want to roll it back. Before we do that, we'll stop the canary first, so we change the percentage to 0. Then send requests to Apache APISIX again.
kubectl annotate service -n canary-release-demo error-page-v2 "api7.cloud/published-service-canary-percentage=0" --overwrite
for ((i=0; i<100; i++)); do
curl http://127.0.0.1:10080/api/error_page -H 'Host: error-page' -s
done
You'll see all requests now going to the error-page-v1.
Release
After a long while, we believe version 2 is stable enough, and we want all requests to reach version 2. Then we can take the version 1 error page application offline. So we change the percentage to 100%.
kubectl annotate service -n canary-release-demo error-page-v2 "api7.cloud/published-service-canary-percentage=100" --overwrite
for ((i=0; i<100; i++)); do
curl http://127.0.0.1:10080/api/error_page -H 'Host: error-page' -s
done
Now all requests are proxied to error-page-v2. And the error-page-v1 can be offline safely.
Summary
Canary release is an effective weapon for release. However, tweaking the canary release strategy might be hysteretic. This article shows how to operate the canary releases declaratively and automate the canary release to a certain extent. Some people might pursue the thoroughly automatic canary release with the help of GitOps components. e.g., by using Argo Rollouts, one can automatically promote or roll back the services. Argo Rollouts queries the service metrics and integrate with ingress controllers to change their CRDs. Ultimately, the API Gateway will forward requests in the correct proportions to the canary version.
Reference
Source Code for the error page and canary release controller: https://github.com/tokers/canary-release-automation-demo.