APISIX Motivates API Gateway Upgrade in Tencent BlueKing
This case study is shared by Lei Zhu, Technical Director of BlueKing PaaS Platform, IEG (Interactive Entertainment Group), Tencent
BlueKing is an internal set of integrated research and operation PaaS incubated within Tencent IEG (Interactive Entertainment Group), serving multiple business units and internal platforms. Its role is to provide full life cycle services for the company's projects in CI (Continuous Integration), CD (Continuous Delivery), and CO (Continuous Operation) stages.
- Low Performance: Low performance in high concurrency conditions and algorithm of routing, resulting in slow routing matching and forwarding
- Tremendous DB Pressure: Routing policies are stored in MySQL. Therefore, the database is stressed with plenty of retrieval requests
- High Costs: Redis is widely used in many scenarios, causing high headover costs
- Insufficient Isolation: Can't realize physical isolation; unstable long-term connections
- Support Single Protocol: Only support HTTP protocol
- Support No Dynamic Routing: Unfriendly to canary release and incapable of encapsulating scenarios
- Lack of Service Discovery: Unfriendly to microservices architecture
- Turn platform capabilities into independent microservices, and carry out microservices transformation to form a PaaS architecture
- Use low-code technology to efficiently develop SaaS to use the microservices capabilities of the PaaS platform
- Flexibly respond to different service scenarios through various SaaS
- Realized the unified operations and expansion of the gateway using the CRD custom resource provided by K8s
- Provided richer features integrating APISIX: resource management, versions release, automatic documents, permission management, observability, monitoring, and security protection
- Lowered the costs for supporting service discovery scenarios with the unified developer interfaces and regulations
Games Diversity and Complexity Requires BlueKing API Gateway
Within Tencent, there may be thousands of games. Except for some self-developed games, others belong to the agencies. The games differ in languages, the storage they depend on, and the architectural style determine that BlueKing developed its owned API gateway.
Faced with such a complex business scenario involving a large number of heterogeneous architectures, BlueKing, as an internal platform, needs to transform its API gateway to develop a PaaS architecture, then use the low-code technology to develop SaaS efficiently and use the microservices capability of the PaaS, and use various SaaS to handle service scenarios.
To abstract BluKing’s architecture, we can get an API diagram.
On the one hand, BlueKing is a complicated platform with complex requirements for a unified gateway. Except working as a proxy to call the API of the platforms, BlueKing requires more gateway capabilities, for example, service discovery, authorization and authentication, dynamic routing, canary release, and rate limiting, etc.
On the other hand, with the development of cloud-native technology, many internal SaaS and platforms are now deployed in K8s clusters. This scenario puts forward new requirements for the gateway, such as unified traffic control of external call requests through a unified traffic gateway or API gateway.
At the same time, some internal business systems use some of the infrastructure capabilities of the BlueKing platform, such as container management or monitoring. They also need a unified service gateway to manage all call traffic.
With the development of internal business requirements, BlueKing's API gateway needs to support increasingly diverse scenarios.
BlueKing API Gateway 1.0
The BlueKing API Gateway 1.0 was aimed to allow the caller of the platforms (including various SaaS and process engines) to directly call the API gateway to complete protocol conversion and authority verification through the gateway.
The architecture was also relatively simple, which was divided into two parts: the server side and the management side. The platforms must access the management side, configure the API resource addresses and their corresponding permissions of the platforms, and finally register its services with the API Gateway.
After several years, with increasing requests and complex scenarios, the shortcomings of the BlueKing API Gateway 1.0 began to appear gradually. For example:
Poor framework performance: The Django framework was chosen for implementation. Its performance is average in high-concurrency scenarios and becomes stretched when processing massive requests.
Average routing implementation performance: The performance of the algorithm used in API routing is low, affecting the matching and forwarding speed of routes.
Databases are stressed: All routing policies are stored in MySQL. When there are many rules, many retrieval requests need to be carried with heavy query pressure.
High network costs: Redis is heavily used in various scenarios, resulting in too much network overhead costs.
BlueKing API Gateway 2.0
In order to solve the above problems, BlueKing iterated on the basis of version 1.0 and designed and implemented version 2.0. Compared with the previous generation, the most significant change of version 2.0 was the re-implementation of the gateway framework and forwarding layer in Golang because Golang has more advantages than Python in handling large concurrent requests.
Other optimization changes have also been made. For example, a more efficient routing implementation was maintained in memory; a memory-based cache was introduced in the middle layer to reduce the dependence on Redis. The new architecture also adds lifecycle management for gateways with multiple versions and environments and introduces an extended plugin mechanism to facilitate developers to expand gateway capabilities through plugins.
Overall, BlueKing API Gateway 2.0 addresses performance issues and most pain points encountered in version 1.0. But as time went by, new problems began to surface slowly.
Insufficient isolation: Can’t achieve real physical isolation; the release process will cause long-term connections to be disconnected
Single protocol support: Only HTTP is supported, and the demand for non-HTTP protocols is increasing in actual scenarios
Support no dynamic routing rules: Support no dynamic routing rules matched by conditions; unfriendly enough for canary release scenarios; incapable of scenario-based combination and encapsulation
Lack of service discovery capability: Lack of automatic service discovery capability, unfriendly to microservices architecture
APISIX Outstands in Technology Selection of BlueKing API Gateway
There are many product systems in the company that needs to use the API gateway. It is very difficult to integrate all the diverse requirements for the gateway into the same set of gateways.
Therefore, we have the idea of designing a distributed gateway. That is, a large gateway is split into many microgateways, which are used to balance the differences in the requirements of different systems for gateways." Zhu said.
The components of the distributed gateway architecture are mainly divided into two categories: the management side and the microgateway instance.
The management side uniformly manages and controls each microgateway, and the administrator of each gateway configures and manages the gateway. Microgateway instances are individual gateway services deployed independently, which undertake the access traffic of each specific group of services and perform related access control according to the settings of the management side. All microgateway instances are controlled by the same set of management sides.
"In terms of the technology selection of the microgateway, we referred to several popular open-source gateways on the market, such as Kong and Tyk. After comparing popularity, technology stack, protocol support, and other levels, we finally chose APISIX as the most important backend technology of our microgateway." Zhu said.
Zhu said BlueKing has selected APISIX because it is implemented based on NGINX + Lua, so its overall performance has advantages compared to those based on Golang. Furthermore, APISIX is remarkable in scalability, and it also supports the expansion of capabilities through multi-programming language plugins. Many typical use cases were witnessed.
Besides, thanks to its great compatibility, APISIX can be customized according to BlueKing's needs. For example, on the basis of APISIX, BlueKing customized the control surface of APISIX according to internal requirements.
BlueKing API Gateway 3.0 Based on APISIX
In the cloud-native environment, K8s is the most critical basic component that needs attention. Because the entire microgateway is designed for the cloud-native environment, the 3.0 version has a new architecture design based on K8s.
The core part is to use the CRD custom resource provided by K8s to realize the whole set of operations and expansion of the API gateway.
As shown in the figure above, the gateway introduces a set of K8s CRD resources, including BkGatewayStage (gateway environment), BkGatewayService (backend service), etc. Through these CRDs, BlueKing can control the specific behavior of each microgateway instance.
Several "Operators" in the figure are the core part of this architecture. Above is the Plugin Operators service, which contains a series of plugin operators. For example, the Operator responsible for service discovery will write the address of the backend service registered in the service discovery center into the K8s cluster.
The core Operator in the middle monitors all gateway-related CRD resources. The resource reconciler is responsible for converting the gateway configuration into a format that the APISIX microgateway instance can understand, thus realizing full life cycle management of the microgateway.
This set of microgateway is divided into two deployment types:
Shared gateway: The default type, which is deployed on the platform, and the API access address is uniformly generated and managed by the platform.
Dedicated gateway: The user deploys a "microgateway" instance, which becomes a "dedicated gateway" after accessing the platform. The API access address needs to be manually managed, and the traffic flows directly from the "dedicated gateway" to the backend service.
There is only one unified management side, the capabilities of which, like multi-environment management and access control, are shared by all the gateways. However, among the different types of microgateway instances that it manages, the supported feature sets differ from each other.
Taking the shared gateway instance as an example, the feature set it supports is relatively basic, which mainly includes unified login authentication, rate-limiting, and multi-protocol support. But the independent dedicated gateway instances have their unique personalized capabilities. Because the dedicated gateway and the business belong to the same cluster, it can quickly realize dynamic routing, custom service discovery, etc., and use the robust scalability of APISIX to customize more capabilities.
Based on the above architecture and modes, BlueKing API Gateway 3.0 provides more prosperous functions with the support of APISIX. For example, resource management, version release, automatic documents, SDK, permission management, observability, monitoring and alerting, and security protection.
Practical Scenarios of BlueKing Gateway 3.0 Using APISIX
There are four typical practical scenarios within Tencent: service discovery, unified authentication, dynamic routing, and license management of the client.
Service discovery is a basic capability required by the microservices architecture. Internally, it can be implemented through custom resource CRD. A valid service discovery YAML definition is shown in the code on the right side of the figure below.
After the above CRD resources are written into the K8s cluster, it will trigger the related actions of the service discovery-related controllers. Afterward, the reconciler can capture the corresponding service discovery configuration and create program objects related to service discovery.
Then the reconciler reads the relevant address information of the service discovery center through the built-in service discovery interface like Watcher and Lister and rewrites the obtained address back into the K8s cluster through the CRD resource BkGatewayEndpoints.
After some complex processing by the core Operator on the right, these endpoints are finally synchronized to the upstream corresponding to APISIX. A complete service discovery process is completed.
To facilitate development, BlueKing implemented a general service discovery framework, which provides a unified development interface and specification and can be used to support other types of service discovery scenarios at a low cost.
The unified authentication part is relatively simple. In daily practice, there are requests from three sources: browsers, platforms, and individual users. Based on APISIX, BluKing customized an authentication plugin, BK-Auth, to achieve unified authentication.
The specific implementation process is shown in the figure above. After the request, the plugin will read the relevant credential information from the header and then uniformly call the BK-Auth authentication service to verify the credential and read the corresponding SaaS information. Then the plugin will use the private key agreed with the backend to issue a JWT token and write it into the request header, and finally, write it into the APISIX variable.
In addition to unified authentication, there are also some complex authentication scenarios in internal projects. Its main function is to judge whether the SaaS has permission when the SaaS calls a certain resource of an platform. The unified resource authentication is also implemented by Golang through the APISIX plugin, as shown in the figure below.
The client requests can first fetch the SaaS application information through the authentication link, then interact with the authentication plugin based on RPC when passing the ext-plugin. At this time, the authentication plugin will directly query the authentication-related data in the cache, synchronized by the management side through the full and incremental mechanism, and then completes the authentication.
A typical dynamic routing application scenario comes from BlueKing's container management platform. The BlueKing container platform manages a lot of K8s clusters, some of which are service clusters and some are data clusters.
As a user, you often need to request the APIServers of these clusters. When a user request enters the microgateway, the gateway determines which cluster's APIServer to forward it to based on the request path.
After the request enters, the dynamic routing plugin first extracts the ID information of the cluster, then rewrites the route, and then determines whether the cluster is directly connected.
For non-directly connected clusters, a BCS cluster manager upstream is first generated and then interacts with the BCS Agent through it, and finally passes the request to the APIServer of the cluster.
For directly connected clusters, the process is similar to the unified authentication plugin above, and the plugin will periodically synchronize some basic information related to the cluster. After finding the cluster information, generate the relevant upstream, redefine the connection logic through the APISIX plugin, and finally send the request to the cluster APIServer.
Client Certificate Management
In BlueKing’s practical scenarios, there is a class of systems that use a more complex client certificate verification mode when registering resources to the gateway. Therefore, if a user wants to request its resources, it must provide a valid client certificate.
The specific implementation is shown in the figure above. The gateway manager needs to configure the client certificate used by the gateway for different environments on the management side. After the configuration, the certificate will be published to the K8s cluster, where the corresponding microgateway is located.
This process uses some CRD resources and official Secret resources of K8s and will be continuously reconciled by the core Operator service, such as finding the corresponding certificate according to the domain name. Effective client certificate configuration will eventually be reflected in the relevant configuration of APISIX Service. (As shown in the red box on the upper right of the figure above)
Apache APISIX is an open-source, dynamic, scalable, and high-performance cloud-native API gateway for all your APIs and microservices. Being donated to Apache Software Foundation by API7.ai, APISIX has grown into a top-level open-source Apache project.
With the development of microservices architecture and internal business projects, the previous API gateway can no longer meet the needs. Tencent BlueKing not only resolved the problem, but also provided richer features leveraging APISIX.