APISIX Fuels Millions of DAU for Tencent Timi's Mobile Game Success

Yilia Lin

Yilia Lin

May 27, 2025

Case Study

Overview

About Tencent Timi

TiMi is a subsidiary of Tencent Games and the developer of several popular mobile games, including Call of Duty: Mobile, Pokémon Unite, Arena of Valor, and Honor of Kings.

Honor of Kings is one of the world's most popular MOBA mobile games. As of December 2023, it has recorded up to 160 million daily active users, 3 million concurrent online users, over 3.8 billion downloads, and over 300 million registered users. In May 2017, it topped the global mobile game revenue chart. (Source: Wikipedia)

Challenges

  • The Timi team, inexperienced in operations and aiming to cut costs, seeks to centralize authentication and traffic recording via a business-oriented API gateway.
  • The necessity for frequent infrastructure migration in overseas operations makes reliance on cloud-based solutions unfeasible. Hence, all data and components must be migrable.
  • Due to the stringent compliance requirements of the overseas business, many internal company components cannot be directly implemented.
  • Within Timi, company-specific features such as service discovery, logging, and tracing need to be integrated with APISIX in a balanced way.

Results

  • By adopting APISIX's standalone architecture, the Timi team has eliminated dependencies on etcd, which is particularly advantageous in overseas markets where certain cloud services may not be available.
  • The team has streamlined the Kubernetes deployment process, leveraging the APISIX Helm Charts, simplifying configuration management, and standardizing the deployment process.
  • The team has achieved comprehensive end-to-end observability across their entire system, including metrics and log collecting, tracing, and monitoring. It helps the team quickly identify and resolve issues, ensuring the smooth operation of their services.
  • The granular traffic management ensured a better and uninterrupted user experience, enabled an automated backend workflow and canary release, allowing rapid rollback.

Background

Tencent Timi's backend development team primarily uses the Golang language and also handles operational responsibilities. For reducing maintenance burden and saving costs, the team requires a business-oriented API gateway, integrating with features, including authentication, traffic recording, etc. Additionally, its overseas business requires frequent infrastructure migrations, which necessitate migrable data and components.

Due to the stringent compliance requirements of the overseas gaming industry, many internal company components cannot be directly implemented. The team needs to balance the integration of APISIX with the existing infrastructure, such as internal service discovery, log standards, and trace reporting. These company-specific functions cannot be directly merged into the open-source upstream. Therefore, the Tencent Timi Studio Group developed a customized version based on APISIX, TAPISIX, adding a series of plugins designed for their internal technical environment.

TAPISIX Gateway Architecture

Their services run on Kubernetes (k8s) clusters, with APISIX serving as the traffic entry point and connecting to internal business services. Service discovery relies on the company's internal Polaris system, metric monitoring is achieved through the Prometheus provided by APISIX, and log and trace collection are performed via OpenTelemetry and ClickHouse. For CI tools, they use OCI (similar to GitHub Actions), which supports pipeline definition through YAML; for CD tools, they select Argo CD, which implements continuous deployment based on open-source solutions.

Why Tencent Timi Opted for APISIX

Tencent Timi has chosen APISIX as its API gateway solution, considering the following reasons:

Rich Features

APISIX provides features: routing, rate limiting, load balancing, service discovery, and observability, offering end-to-end API management capabilities. Timi can use APISIX to manage APIs' entire lifecycle, improving API management efficiency and standardization.

APISIX dynamically adjusts load balancing based on backend service status and traffic conditions, ensuring even traffic distribution and enhancing system performance and reliability. It supports multi-dimensional traffic splitting and canary releases, enabling Timi to test new features on a small scale before full rollout. Additionally, APISIX's automatic circuit-breaking isolates faulty services during failures or latency issues, preventing cascading failures and ensuring system stability. Once services recover, traffic is automatically restored, enhancing fault tolerance and reducing user impact.

High Scalability

APISIX adopts a stateless architecture, enabling horizontal scaling in cloud-native environments. Timi can quickly scale up its API gateway by adding instances to meet high traffic demands during peak periods like game launches or promotional events. After traffic subsides, it can scale down to reduce resource usage and costs.

APISIX integrates seamlessly with Kubernetes, which perfectly suits Tencent Timi's environment. It supports service discovery and dynamic configuration in cloud-native environments, making it an ideal ingress controller for Kubernetes.

Thanks to the scalability of APISIX, Timi can customize its own API gateway based on APISIX by adding a series of plugins designed for their internal technical environment.

Flexible and Migratable

Timi's overseas business infrastructure requires frequent migrations, and all data and components must be migratable. APISIX's architecture and features align with this requirement, allowing Timi to easily migrate its API gateway services during overseas business transitions without impacting existing services.

APISIX supports dynamic configuration updates without restarting services. Changes to routing rules, plugins, and other configurations take effect in real time.

APISIX allows users to implement custom routing algorithms, enabling Timi to define routing rules based on specific business requirements, directing traffic to appropriate backend services.

Active Community Support

As a top-level project under the Apache Software Foundation, APISIX boasts a large and active open-source community. Timi can leverage the community's resources, such as documentation, tutorials, and plugins, to start APISIX quickly. It can also participate in community activities, collaborate with other users and developers, and stay updated on the latest developments in APISIX.

Customized Business-Oriented API Gateway

In building this API gateway, the Timi team needs to solve several issues:

  1. Reduce the development threshold as the frontline developers are familiar with Golang but lack familiarity with the Lua language and APISIX plugin development
  2. Verify plugin features quickly
  3. Ensure plugin reliability and safety

To resolve these issues, the team conducted the following four measures.

1. Development Standards

The team defined a library, specified the storage path for plugins, and required plugins to adopt a single-file format, consistent with APISIX's single-file plugin mechanism, to facilitate management and maintenance.

Local running and testing are supported for quick development. By utilizing APISIX's Docker image, local plugins can be mounted into containers via volume mapping for convenient deployment. In addition, the downstream echo-service (a service developed based on open-source Node.js) can be used to simulate upstream behaviors. This service can return all request content, such as request headers. By adding specific parameters in the request (e.g., HTTP status code 500), upstream exceptional behavior can be simulated, thereby comprehensively verifying plugin functionality.

TAPISIX Project Introduction

2. Local Running and Testing

The team provides a convenient local development environment support to reduce the development threshold and accelerate verification.

  1. File Mapping: By mounting local plugin files into Docker containers, developers can test plugin changes in real-time.
  2. Build Makefile: Construct a Makefile to support quick startup of the plugin testing environment via the make run-dev command, ensuring seamless connection between local files and containers.
  3. Verify Plugins via Browser: Developers can verify plugins easily by accessing relevant interfaces in a browser, without additional deployment or configuration.
Run and Test

By defining development standards and providing local development support, Timi has effectively lowered the development threshold and accelerated the plugin verification process. Developers can focus on feature implementation without worrying about complex deployment and testing procedures, thereby improving overall development efficiency.

3. Pipeline Construction

During pipeline construction, it is essential to ensure reliability and stability in plugin development. The development process is as follows:

  1. Developers create a new branch from the master branch for development, then submit the PR to the master branch after development.
  2. After submitting a PR, the system will automatically trigger a Webhook to start the pipeline.
  3. Then, the system will conduct pipeline inspection:
    • Lint Check: Primarily checks code formatting standards.
    • Unit Testing: Runs unit tests to verify whether plugin functionality meets expectations.
    • Try Build: Constructs an image using the source code to verify its buildability.
Pipeline Building

4. Reliability Assurance (CR, lint, unit testing, black-box testing)

Timi utilized the k6 testing framework from Grafana to validate core test cases. The k6 framework covers various scenarios and supports writing test cases declaratively. These test cases are replayed regularly to check interfaces. For instance, even if only a plugin is modified, a comprehensive replay testing would be conducted, including parsing and service discovery.

k6 Test Cases: Comprising hundreds of test cases covering core processes to ensure plugin reliability.

K6 Test

Through the complete process of local development, quick validation, MR submission, pipeline inspection, reliability assurance, and packaging deployment, Timi ensures that every stage of plugin development and deployment undergoes strict quality control.

Achievements after Using and Customizing Apache APISIX

Database-Free Architecture

Tencent Timi has successfully deployed APISIX in its overseas business by utilizing the standalone mode, which retains only the data plane with local configurations. This eliminates reliance on etcd, making it suitable for overseas scenarios where some cloud providers do not offer etcd services.

Additionally, considering the stringent overseas data compliance requirements and the k8s-based deployment environment, Timi has implemented a k8s-friendly configuration management approach. This deployment strategy meets the specific needs of overseas operations and also enhances flexibility and compliance in managing configurations.

APISIX Deployment
  • YAML Configuration: All configurations are directly stored in YAML files for easy management and automated deployment.
  • ConfigMap Storage: YAML files are directly placed in k8s ConfigMaps to ensure configuration versioning and traceability.

Simplified Kubernetes Deployment

When managing config.yaml, the Timi team found that Kubernetes deployments rely on a series of complex configuration files, such as Service.yaml, ConfigMap.yaml, and Workload. These numerous and detailed configuration files can lead to complex management and potential errors.

Helm Chart came out as a solution, which templates Kubernetes configuration files, significantly simplifying configuration management. The official APISIX Helm Chart allows for efficient configuration management, like node counts, reducing manual YAML file handling.

However, a key follow-up issue arises: how to deploy Helm Charts or YAML files to a Kubernetes cluster. To address this, Timi adopted the GitOps model, deploying YAML files to a Kubernetes cluster via pipelines. Under the GitOps model, all configurations are stored in Git as code. By triggering CI/CD processes via Git, Timi achieved automated deployment. Both config.yaml and other configuration files are stored in Git, ensuring versioned management and traceability of configurations. This approach not only simplifies configuration management but also automates and standardizes the deployment process, enhancing overall efficiency and reliability.

Deployment Workflow

This GitOps deployment workflow brings some advantages:

  • Configuration Consistency: All configuration changes are made through Git, ensuring system configuration consistency.
  • Security: Reduces the risk of manual modifications, with all changes traceable.
  • Automated Deployment: Achieves automated deployment and canary releases based on version changes in Argo CD or Git.

The team only needs to maintain two repositories: the code repository (for application code) and the deployment repository (for all deployment-related configuration files), which is simple and efficient. The key APISIX configuration files (e.g., routing and config.yaml startup configurations) are integrated into a single Helm Chart repository for unified management and deployment.

GitOps Advantages

The APISIX Ingress Controller, as the official community solution for k8s, follows this core process: By defining custom resources such as APISIXRoute, routing and other configurations are described in YAML files within k8s.

APISIX Ingress Controller Wrkflow

After deploying these CRDs to the k8s cluster, the Ingress Controller continuously monitors the relevant CRD resources. It parses the configuration information from the CRDs and synchronizes the configurations to APISIX by invoking APISIX's Admin API. The Ingress Controller primarily facilitates deployment between CRDs and APISIX, ultimately writing data to etcd.

Timi didn't select APISIX Ingress Controller due to several reasons:

  1. Business-Oriented Gateway Positioning: As a business-oriented gateway, the team focuses on reducing development and operational thresholds to enhance usability and development efficiency.
  2. Operational Cost: Introducing the Ingress Controller adds an extra layer of operational complexity. It requires deep integration with Kubernetes, involving additional Golang code and Kubernetes API calls, which increases operational difficulty and cost.
  3. Environment Consistency: Due to reliance on the k8s environment, discrepancies between local development and online deployment environments may lead to inconsistencies such as "works locally but fails online," complicating fault diagnosis and resolution.
  4. Version Coupling: There is a strong coupling between APISIX and Ingress Controller versions. Since Timi uses a customized version of APISIX, they only maintain compatibility with specific versions. This may result in unsupported APIs or compatibility issues, affecting system stability and reliability.
  5. Configuration Opacity: With the Ingress Controller approach, the final configurations still need to be written to etcd, which may cause inconsistent configuration states. For example, Ingress Controller monitoring failures or poor etcd status may trigger issues like excessive connections, making the entire architecture chain more opaque and complex. In contrast, Helm Charts offers a comprehensive and auditable YAML file containing all routing configurations, making routing states clear and visible.

Enhanced Stability and Availability

There are two ways of reloading configurations:

  1. APISIX Routing Configuration (apisix.yaml): Uses traditional loading methods to define routing configurations, including upstream routing and corresponding forwarding rules.
  2. Startup Configuration (config.yaml): Serves as the startup configuration file, specifying key parameters such as the APISIX runtime port. Changes to certain configuration items require a service restart to take effect.

Hot Reloading

When APISIX-related configurations change, the corresponding configmap.yaml is updated. However, the deployment.yaml (i.e., the APISIX deployment instance) remains unchanged. To address this issue, the k8s community proposes a solution that involves splitting configurations and utilizing hash and annotation methods. The content of the ConfigMap that needs to be changed is injected into the deployment.yaml as annotations, thereby achieving dynamic configuration updates.

  • apisix-configmap.yaml: Primarily stores APISIX's core configurations, such as routing rules. When this type of ConfigMap is modified, due to the built-in timer mechanism in APISIX, it periodically reads and updates the in-memory configuration information from the local file. Therefore, the APISIX service does not need to be restarted for the configuration to take effect.
  • config-configmap.yaml: Mainly includes basic configurations such as the APISIX runtime environment. When this type of ConfigMap is modified, as it involves the basic runtime environment settings of the APISIX service, a restart of the APISIX deployment instance is required to ensure the new configurations are correctly loaded and applied.

To automatically detect configuration changes and trigger the update process, Timi annotated the ConfigMap content with a hash and wrote the hash value into the deployment.yaml file. When configuration changes cause the hash value to update, the deployment.yaml file also changes. The k8s system detects this change and automatically triggers the update process, ensuring that the APISIX deployment instance promptly applies the new configurations.

Hot Reloading

End-to-End Observability

Runtime Operation

Metrics Collection

K8s clusters offer an official metrics collection solution called the Kubernetes Prometheus Operator. Data is regularly reported to external systems such as Prometheus by scraping metrics ports and information exposed by services. Related k8s configurations are fully described in APISIX's Helm Chart.

Metrics

Trace Reporting

Trace reporting is implemented based on the OpenTelemetry plugin provided by APISIX. This plugin sends data to the OpenTelemetry Collector via the OpenTelemetry protocol, which ultimately writes the data to ClickHouse for trace data collection and storage.

Trace

Log Collection

Log collection also utilizes the OpenTelemetry protocol. However, the OpenTelemetry plugin provided by APISIX only supports trace reporting and does not include log reporting. Therefore, local log storage is recommended. By employing a sidecar mode, APISIX logs are written to a shared folder. In the Deployment, another Pod is mounted, which shares the same log folder as the APISIX Pod, thereby achieving log collection and reporting via the OpenTelemetry protocol.

Log

Monitoring dashboard

Additionally, Timi custom-developed dedicated monitoring panels based on the collected metric data to meet specific monitoring requirements. The alerting system is built using Grafana's open-source solution, leveraging its powerful visualization and alerting capabilities to achieve real-time monitoring and alerting of APISIX's operational status.

Monitoring and Alerting

Granular Traffic Management

Previously, Timi's traffic management architecture was as follows.

  1. EdgeOne served as the Content Delivery Network (CDN), handling initial traffic ingress.
  2. Traffic was then forwarded by Cloud Load Balancer (CLB) to the Ingress layer, which utilized Istio.
  3. Finally, requests reached the internal APISIX gateway for processing.

APISIX Replaces Ingress

With its business expanding, Timi recognized the need for a more efficient and flexible traffic management system. Consequently, they planned to replace the existing Ingress layer with APISIX, leveraging it as the Kubernetes Ingress Controller.

Migration Solution Evaluation

During the migration process, the team evaluated two primary migration solutions:

  • Solution One: CDN Canary and Dual Domains – Deploy a new APISIX instance alongside the existing architecture to direct new traffic to this instance. However, this solution's drawback is the need to modify the front-end domain, which may impact user access and business continuity. After careful consideration, the team temporarily set aside this solution.
  • Solution Two: CDN Traffic Steering – This approach allows configuring multiple CLB routes and achieving traffic push based on percentages. Its advantage lies in the ability to gradually switch traffic to the new APISIX instance without changing the user access entry point. Additionally, the traffic ratio can be flexibly adjusted based on actual conditions, facilitating observation and evaluation of the migration effects.

Migration Solutions

Advantages of the Final Solution

Timi selected the second solution and successfully established a new traffic path: new traffic reaches APISIX directly through canary deployment. This new architecture offers the following significant advantages:

  • No Front-end Changes: The domain names and entry points accessed by front-end users remain unchanged, ensuring uninterrupted user experience and avoiding potential user confusion or access interruptions caused by domain changes.
  • Full Backend Autonomy: The backend gains autonomous control and management over traffic switching, enabling flexible adjustment of traffic distribution based on business needs and system status without reliance on external coordination.
  • Rapid Rollback Capability: With the canary release feature, any issues discovered during migration can be quickly rolled back to the original path, minimizing migration risks and ensuring stable business operations.
  • User-Transparent Migration: The entire migration process is transparent to users, who remain unaware of backend architectural changes during business access, ensuring a smooth and seamless migration. Below is the overall migration process.

Migration Practices

Conclusion

The Timi team has developed the business-oriented API gateway TAPISIX based on APISIX. As the core component of their gateway architecture, APISIX has played a crucial role in meeting stringent international compliance requirements, reducing development and operational overhead, and enhancing system flexibility and reliability.

APISIX's robust features—such as high-performance routing, dynamic configuration capabilities, and a rich plugin ecosystem—have enabled them to build a highly efficient, stable, and adaptable gateway platform. Looking ahead, they are excited to continue their collaboration with the APISIX community, exploring innovative application scenarios and unlocking greater value for their business.

Tags: