APISIX at Honor: Unmatched Reliability in Handling Millions of QPS
May 23, 2025
Overview
About Honor
Established in 2013, Honor is a global leader in smart terminal solutions. Its products are sold in over 100 countries and regions worldwide, with partnerships established with more than 200 operators. Honor operates over 52,000 experience stores and dedicated counters globally, with over 250 million active devices in use.
As Honor expands its business, it requires an API gateway to handle all traffic. Hence, there are some challenges that need to be resolved.
Challenges
-
Complex request processing: After traffic is forwarded to the upstream, the request is mirrored to a third-party asset platform. However, if the recording platform fails to respond or experiences failures, the mirroring operation can be blocked, causing traffic disruption.
-
Instable traffic scheduling: Percentage-based routing leads to inconsistent traffic assignments. Identical requests may be routed to different environments, jeopardizing stability in consumer-facing scenarios.
-
Cumbersome rate limiting: Manual coordination with administrators is required to allocate rate limits. In elastic scaling scenarios, mismatches in throttling values can overwhelm backend services, causing system anomalies.
-
Redis bottleneck constraints: Redis becomes a bottleneck in single-key scenarios. Frequent Redis requests result in significant network overhead and CPU usage exceeding 50%, while also increasing request latency by 2–3 milliseconds.
-
Vulnerable inline WAF: Inline WAF architectures are prone to single points of failure. If the WAF malfunctions, it can disrupt the entire traffic flow and negatively impact business operations.
Results
-
Honor's APISIX-based gateway platform has scaled to handle peak traffic volumes of several million QPS.
-
Over 100 custom plugins have been developed to address specific business requirements.
-
Canary release strategies have been implemented for fine-grained traffic scheduling.
-
Rate-limiting mechanisms are in place to effectively manage and control traffic flow.
-
CPU usage has decreased from 20% to just 2% through the optimized health checker.
-
Public network bandwidth costs have been reduced by two-thirds, significantly lowering operational expenses.
Background
Honor began exploring traffic gateway products in 2021, conducting preliminary research on APISIX in Q3, and officially introduced APISIX in Q4 to initiate the construction of its internal traffic gateway platform.
In 2022, APISIX was officially deployed in Honor's production environment. In Q1, pilot promotion of traffic access for consumer-facing (B2C) business began; in Q2, APIs were opened for deployment platforms, supporting traffic scheduling and container instance reporting. Furthermore, even when the deployment platform was not yet fully built, traffic access and scheduling could be performed by invoking APIs through scripts.
In 2023, Q1 saw the completion of APISIX-CP containerization, followed by the launch of APISIX-DP elastic scaling in Q3. By Q4, a single cluster supported over ten million connections, and by year-end, full coverage of cloud services for B2C business was achieved.
In 2024, APISIX-DP containerization was completed in Q1, the runtime architecture was optimized to version 2.0 in Q2, and in Q4, Honor reached a throughput of 1 million QPS (queries per second) in a single cluster. By year-end, Honor's API gateway, built on Apache APISIX, supported all of Honor's business projects.
To date, the APISIX-based gateway platform within Honor has achieved peak traffic volumes of millions of QPS. Leveraging APISIX's extensibility, nearly 100 custom plugins have been developed.
Honor Gateway Architecture
Why Honor Opted for Apache APISIX
Honor selected Apache APISIX as its API gateway due to its high performance, scalability, and reliability. APISIX handles Honor's massive traffic, achieving millions of QPS. It supports Honor's extensive business needs through nearly 100 custom plugins. Features such as canary releases, rate limiting, and circuit breaking ensure precise traffic scheduling and reliability.
High Performance
Apache APISIX delivers the best performance among other API gateways, with a single-core QPS of 18,000 and an average latency of 0.2 ms. This high performance ensures that Honor's large user base and diverse services can be supported without degradation in service quality. After four years of use, Apache APISIX has demonstrated its exceptional performance, successfully managing Honor's massive traffic volumes and achieving peak traffic handling of millions of QPS.
Scalability
To handle high-traffic scenarios, the gateway can be rapidly scaled out and promptly recovered via virtual machines, supporting automatic scaling. For example, when CPU usage exceeds a threshold, machines are automatically scaled out and attached to the load balancer.
APISIX provides the scalability needed to accommodate Honor's rapidly growing and diverse business requirements. Honor has developed nearly 100 custom plugins and integrated them with APISIX. APISIX offers a flexible framework that can be tailored to meet specific business needs, allowing for easy expansion and adaptation as Honor's services evolve.

In addition to custom plugins, Honor has also extensively customized APISIX in several key areas, including traffic mirroring, canary release, rate limiting, and security, to better align with its business needs and technical requirements.
Rich Features
APISIX comes equipped with a comprehensive set of features critical for Honor's operations. Features such as canary releases enable gradual and controlled rollouts of new services, minimizing risk. Rate limiting helps manage traffic loads and prevent system overloads, while circuit breaking provides a mechanism to quickly respond to and isolate failures, ensuring overall system reliability. Additionally, the ability to integrate with WAF adds an extra layer of security and flexibility in traffic management.
These factors collectively make Apache APISIX the ideal choice for Honor's API gateway, supporting its global business operations and enhancing its service delivery capabilities.
Achievements after Using and Customizing Apache APISIX
Enhanced Traffic Processing Efficiency
Previously, after a request reaches APISIX and is forwarded to the upstream, it is mirrored to a third-party asset platform. However, this mirroring operation is blocking. If the recording platform does not respond or fails, it can block the client's request and severely impact production traffic stability.

To resolve this, Honor implemented asynchronous processing through a custom plugin. The asynchronous traffic processing workflow is as follows:
-
When a request arrives, the request is asynchronously saved to a queue.
-
APISIX forwards the request to the upstream server. Once the upstream response is returned, the client request process concludes.
-
Then, asynchronous threads extract requests from the queue and send them to the analytics platform for data recording. Since recording requests include timestamps, asynchronous operations do not affect production traffic.

The recording platform is responsible for data collection and enables traffic scaling during playback and adding headers for end-to-end stress testing. Moreover, the system supports configuring queue sizes and thread performance parameters to ensure system performance.
Optimized Canary Release
The percentage-based routing supported by traditional canary release plugins can lead to inconsistent traffic assignments, which may affect the stability of B2C scenarios.
Honor added a key-hash
plugin in front of the canary release plugin. It hashes requests based on headers or Cookies to allocate traffic percentages, ensuring consistent traffic for B2C scenarios. The canary release plugin has also been customized for precise scheduling, directing traffic via the API gateway as needed.
APISIX tags traffic with canary headers, which propagate through services. Services register with canary tags in the registration center. When handling requests, each service reads the tag, prioritizes matching canary instances, and falls back to production if none exist, ensuring consistent canary routing across the service chain.
Precise Rate Limiting
Single-Node Rate Limiting
In single-node rate limiting, users had to manually split global rate limits (e.g., 4,000 QPS) among nodes (e.g., 2,000 QPS per node for two nodes), which was inefficient and error-prone. Additionally, during auto-scaling, the total rate limit would unintentionally increase (e.g., 3 nodes × 2,000 QPS = 6,000 QPS), risking backend overload and system instability.

To resolve these issues, Honor uses the server-info
plugin to periodically write the information of DP nodes into etcd as a leased key. Thus, etcd consistently updates all active DP node information for the API gateway. Newly developed plugins fetch the node information from etcd and dynamically calculate the base rate limit each node should handle, ensuring the rate limit aligns with the actual number of nodes.

For optimized performance, only privileged processes are allowed to periodically retrieve gateway information from etcd, reducing etcd load and minimizing APISIX overhead. Privileged processes write retrieved data to shared memory, and other processes periodically query shared memory to obtain node information.
Furthermore, Honor abstracted the rate-limiting feature into a plugin, providing a unified interface. Many plugins, such as fixed window rate limiting and custom performance plugins, can query the node number from shared memory and dynamically adjust configurations to meet optimization requirements.
Distributed Rate Limiting
In single-key rate-limiting scenarios, when the rate-limiting rule applies to the route path, all requests are mapped to the same Redis key. This results in traffic being concentrated on a single Redis shard, making it impossible to achieve load balancing through horizontal scaling. Frequent Redis requests lead to a significant increase in CPU usage, with utilization rising by over 50%. Additionally, requests need to first access Redis for counting before being forwarded to the upstream, resulting in 2–3 ms of latency.

To resolve these issues, Honor introduces a local counting cache that allows requests to pass if the count is over zero. The local counting cache is periodically and asynchronously synchronized with Redis. The deviation of requests between two synchronization periods is counted and deducted from Redis. After synchronization, the Redis count overrides the local cache, ensuring consistency in distributed rate limiting. This solution is applicable in high QPS applications, significantly reducing Redis performance bottlenecks and network overhead.

Reliable Circuit Breaker
The original circuit breaker plugin does not support circuit breaking based on failure rates. It only has on/off states, which may allow too many requests to pass during state transitions. This could exacerbate upstream service degradation and potentially collapse the gateway due to upstream response timeouts.
Honor customized its circuit breaker plugin based on that of APISIX. It supports circuit breaking based on percentages, offering finer control. There are three states: closed, open, and half-open. The Honor team also introduced the silent count mechanism to prevent state transitions triggered by a small number of requests. Only when the request count reaches the silent count and the failure rate exceeds the threshold does the state transition to open.
-
When the request count reaches the silent count and the failure rate exceeds the threshold, the circuit breaker transitions to the open state.
-
After the circuit break time expires, the state transitions to half-open.
-
In the half-open state, if the number of allowed requests reaches the configured value and upstream services recover, the state transitions to closed; if the failure rate remains high or there is no response, it reverts to the open state.
In addition, taking inspiration from the sliding window of Sentinel, Honor selected the fixed window focused on failure rates within a time period, simplifying implementation and reducing performance overhead. Shared memory was introduced to store state, ensuring consistent behavior across workers and avoiding the complexity and performance costs associated with sliding windows.
Fortified WAF Architecture for Security
As shown in the left diagram, traditional inline WAFs require modifying DNS records to route traffic to the WAF. After inspecting and filtering the traffic, WAFs forward it back to the origin server. However, this architecture is prone to single points of failure. If the WAF itself encounters a malfunction, it can disrupt the entire traffic flow and adversely affect business operations.
To resolve this issue, Honor, in collaboration with API7.ai and Tencent Cloud, implemented a bypass WAF architecture. Traffic is directed straight to the APISIX cluster, among which only a necessary portion is forwarded to the WAF for inspection.
-
If the WAF detects normal traffic, it returns a
200
status code, allowing the request to pass through to the upstream server. -
If the WAF detects malicious attacks, it returns a status code similar to
403
, rejecting the request. -
If the WAF fails, traffic can be directly forwarded to the backend, preventing link interruption due to WAF failure and enhancing overall link reliability.
Improved Performance and Cost Control
Lower CPU Consumption by Optimized Health Checker
In high-traffic scenarios with over a thousand upstream nodes, frequent updates triggered the destruction and creation of health checkers, leading to significant CPU usage spikes.
Honor decreased CPU consumption from 20% to 2% by optimizing the health checker.
During upstream updates, health checkers are dereferenced but not destroyed, reducing overhead from frequent destruction and creation. When created, health checkers are cached with timestamps. Subsequent requests first check for existence; if non-existent, the cache is replenished. If not expired, they are directly returned; if expired, they are recreated.
All upstream nodes are batch-updated to the health checker's shared memory, reducing the overhead of node-by-node operations. Additionally, a concurrency control mechanism ensures only one worker creates health checkers at a time, preventing simultaneous operations and significantly reducing CPU consumption.
Cost Saving via Traffic Compression, Single-Line EIP, and Gateway Scaling
To reduce traffic costs, which account for about three-quarters of Honor's gateway expenses, Honor provides user-friendly compression plugins like br and gzip. By including a compression identifier in the request, non-technical personnel can significantly reduce traffic volume, with a maximum compression rate exceeding 70%. This effectively lowers LB costs and EIP bandwidth costs in cloud providers' LB billing models.
To address the high costs of BGP EIP bandwidth, Honor configured a static single-line EIP for gateway clusters, backed by a backup BGP EIP, and used DNS intelligent resolution to route mainstream carrier traffic to corresponding single-line EIPs. This approach reduced costs to one-third of BGP EIP costs, saving around two-thirds on public network bandwidth expenses.
Additionally, Honor optimized gateway scaling by making it elastic based on CPU and memory utilization, ensuring resource usage stays within a reasonable range to prevent waste or insufficiency.
Summary
Since adopting APISIX in 2021, Honor has developed a high-performance, scalable, and reliable API gateway to support the rapid growth of its extensive business through continuous optimization and expansion. APISIX has become the backbone of Honor's operations, delivering holistic traffic coverage and sustaining peak throughput exceeding millions of QPS. By leveraging deep customization, the platform has enabled advanced traffic management, enhanced security protocols, and significant cost reduction.
Looking ahead, Honor intends to further modernize its API gateway by integrating AI-driven analytics to automate traffic prioritization and predictive decision-making. Additionally, the company plans to implement containerized auto-scaling mechanisms within Kubernetes environments to streamline resource orchestration and accelerate deployment cycles.