Apache APISIX Unifies Full Traffic Management with Service Mesh

With the rapid growth of Cloud Native, Service Mesh is also starting to heat up. At present, there are many well-known Service Mesh solutions, and each product has its own advantages. Therefore, decisions of Service Mesh solutions vary from person to person when facing different industries or business backgrounds.

Current Situation and Pain Points of Service Mesh

The emergence of Service Mesh strongly connects to the current evolution of business architecture. With the booming trend of Cloud Native, most of the enterprises started to transform to microservices, where we found that App began to become smaller and smaller, and each App would have some common essence inside. So we started to think, "Is there a technology that can resolve the common scenario? Thus, Sidecar emerged.

With Sidecar, some functions such as service registration and discovery, traffic management, observability, and security can be sunk into it so that business services can focus on developing business logic.

However, the emergence of Sidecar has made people slowly realize some of its pain points in practice, such as:

There are so many solutions that it is difficult to migrate once chosen. Nowadays, there are numerous Service Mesh solutions, and characteristics and capabilities vary from solution to solution. Once one of these solutions is determined, it will be used with no more replacements. However, if we find that the solution is hard to meet the new requirements when the business expands, huge costs are incurred when migrating.
High cost of integration with infrastructure. Service Mesh implemented in practice often requires integration with infrastructures, such as with previous microservice architectures, MQ, or database infrastructure components. Some legacy issues or historical technical debt can also create resistance to the integration process.
Performance loss and additional consumption of resources. Currently, no matter which Service Mesh solution you choose, it will have some performance loss. Also, because of Sidecar, extra resources need to be allocated for it when configuring the business.
Severe difficulty in scaling. Some Service Mesh solutions are not scalable in terms of protocols or features with the existing configuration methods, and they are not customizable by plugging and unplugging.

Therefore, amid the business situation and pain points, we started to think about whether there could be an ideal Service Mesh solution that could solve these problems.

How Ideal Service Mesh Looks like?

In business scenarios, our requirements for Service Mesh are, as shown above, i.e., there are multiple dimensions of requirements in the direction of resources, performance, traffic management, and scaling. Of course, in addition to these, there will also be some more detailed requirements in other dimensions. For example:

First, at the level of usage experience, it is necessary to achieve a lower cost of getting started since there may be more operation and maintenance operations to apply Service Mesh than developers. Therefore, the cost of getting started is one of the factors that people choose a solution.
Secondly, at the technical level, the configuration of the control plane must be easy to start. At the same time, the relevant permissions can be strictly and safely controlled, and the configuration should be closer to the public level.
On the data side, it is better to support multiple protocols or even custom protocols natively because you need to consider the problems caused by some historical system migration. Due to Sidecar's presence, it also needs to be considered that its resource footprint is manageable so that costs can be effectively controlled. Scalability is also required for customization.
Finally, within the whole ecosystem of the product, both the community and product repair need to match the speed of "timely response".

Since we have such clear requirements and goals, the next step is to implement and build such a near-ideal solution.

APISIX-based Service Mesh Solution

Apache APISIX is a dynamic, real-time, high-performance cloud-native API gateway that provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more.

Hundreds of enterprises around the world already apply Apache APISIX to handle business-critical traffic, covering finance, Internet, manufacturing, retail, carriers, etc., such as NASA, European Factory Platform, China Airlines, China Mobile, Tencent, Huawei, Weibo, Netease, Ke Holdings Inc., 360, Taikang, Nayuki Holdings Limited, etc. Therefore, the APISIX-based Service Mesh solution will be in strong demand not only in terms of usage level but also when facing different industry sectors.

The architecture of the current Service Mesh solution is based on Istio as the control plane and APISIX as the data plane.

First, we chose Istio as the control plane, because Istio is the most popular Service Mesh solution today, and it has an active community, which makes almost all mainstream cloud vendors support Istio, and also represents the current market demand and direction to some extent. So in terms of the choice of the control plane, there was no self-developed module. Instead, we chose to embrace Istio, which is more suitable and has a higher acceptance rate.

The data plane is where Apache APISIX can take advantage. As a cloud-native API gateway, what actions has APISIX taken as the data plane of Istio in this Service Mesh solution?

People who are familiar with APISIX know that APISIX uses etcd for data storage. But if we use APISIX as Sidecar, where does its configuration come from? This is the question that needs to be considered. If we put the etcd component into Sidecar as well, we will find that the whole resource consumption is very large and not flexible enough.

Therefore, in this process, we first added a configuration center called xDS to APISIX to eliminate the need for etcd, and then introduced Amesh, as shown above, a Go program compiled into a dynamic link library that is loaded when APISIX is started. It uses the xDS protocol to interact with Istio. Then, it writes the obtained configuration to the xDS configuration center of APISIX, which generates specific routing rules and eventually uses APISIX to route the corresponding requests.

After the configuration of the above architecture, the following results can be achieved:

Amesh and APISIX maintain the same lifecycle and can be turned on and off together.
Thanks to APISIX's native support for xDS Discovery, data is converted to each other via the xDS protocol, resulting in a controlled resource consumption level.
CRD can be used to extend the whole solution quickly and easily, especially for features not yet supported by Istio. For example, the most abundant plugin configuration to APISIX is configured by CRD; by using the controller and Istio together, the scalability of the Istio and APISIX Service Mesh solution is maximized.

Specific Performance of the Solution

Significant Performance Improvement

Data is always the most intuitive and effective way to present a technology product. We use APISIX and Envoy as the data plane in the Service Mesh solution, use the volume of up to 5,000 routes to conduct stress testing in various scenarios, and finally present the following data comparison.

The test scenario is "Matching the 3000th route out of 5000 routes". Due to the large number of test scenarios, only the routes matching the middle part are described below, and there are also scenarios matching the head routes and tail routes. (There are too much data to show, so the following data is the result of the test with the volume of 3000 routes).

As you can see from the data above, APISIX shows a 5x performance improvement at the QPS level for the same pressure and the same machine configuration. Also, at the request latency level, APISIX is lower than Envoy by order of magnitude, with APISIX latency in the microsecond range and Envoy in the millisecond range. Of course, you can also see that in this measurement, APISIX's CPU consumption is only 50% when Envoy's CPU is already running at full capacity.

Reduced Resource Usage

When injecting Sidecar into the Service Mesh solution, additional resources are usually consumed. The API-based solution can reduce resource consumption by 60%. Why is this possible?

First, the configuration is distributed on demand. Istio comes with some on-demand policies for resource management of Sidecar, such as segregation by namespace, but this process imposes additional mental burdens and management difficulties on user operations.

The second is whether the configuration is easy to understand. When you configure a route in Envoy, it's routing information when checked via Dump shows that there are already thousands in Envoy while there are not that many in reality, which has many implications, such as affecting startup speed and causing some configurations to be bulky, thus increasing resource usage.

With APISIX, you can avoid all these troubles. APISIX configuration is straightforward and clear, reducing the data stored in memory and, combined with its high performance, reducing CPU resource consumption. The resource consumption of the whole solution is reduced by about 60%.

Low Learning Cost and High Customization Capability

Firstly, as an active open source project of the Apache Software Foundation, APISIX provides a wealth of documentation and learning resources in the community, which reduces the learning cost for those who want to get started with APISIX.

Secondly, the source code of APISIX is based on Lua implementation, which is relatively easy to read and understand, as it is a lightweight scripting language, so it is clearer to view the APISIX source code.

Finally, APISIX-based secondary development is very easy. When you want to customize plugins for your business, not only the native Lua language is supported, but also you can use APISIX's Plugin Runner to implement secondary development in more familiar languages like Python, Java, Go, and even Wasm.

The ecological power of APISIX is also due to the product's strong extensibility.

APISIX itself offers a wide range of plugins for security, logging, observability, and more. These plugins can be used to the fullest when APISIX is used as a traditional gateway component. However, when APISIX is used in conjunction with Istio on the control plane to create a service mesh, many resources are not defined on the Istio control plane. So what can be done in this case?

That's where custom CRDs come in. For example, if we want to make a fault injection plugin, this plugin is already available in APISIX, so we only need to configure a few additional parameters here. Of course, here is just an example of the fault injection plugin. There are also plugins of secure identity authentication, rate limiting, and other common plugins in APISIX that can be accessed in this way to achieve fine-grained control.

The most immediate benefit of using CRD in this way is that it makes the extension more native, using declarative configuration to make it easier for users to accept while not intruding on Istio's original configuration for perfect compatibility. Another benefit is that migration costs are lower for users already using Istio solutions.

In summary, the APISIX-based Service Mesh solution is easier to use and expandable for developers or users and has excellent performance and rich ecological support. Thanks to the capability of the APISIX product, it also has good support at the level of plugins and multi-protocols. We expect to bring you the next version of APISIX-based Service Mesh at the end of the year. Please look forward to it!

Future Prospects for the APISIX-based Service Mesh Solution

In retrospect, the APISIX-based Service Mesh solution is already working towards the pain points mentioned in the previous article, and is working towards the ideal Service Mesh solution.

With the APISIX-based Service Mesh solution, you can see that traffic comes in from the outside through APISIX Ingress and is processed through APISIX in the middle. In this process, APISIX Ingress handles the north-south traffic, and Amesh + Istio handles the east-west traffic.

A deployment architecture like this can bring some business-level value, such as cost savings and a unified technology stack, where you can quickly replicate common capabilities that were used inside traditional gateways into Service Mesh. This allows for unified management at the business level, thus controlling costs and highly reusing the experience in the gateway, Ingress, and Service Mesh.

In subsequent iterations, the APISIX-based Service Mesh solution will continue to be deepened, accomplishing such planned directions as:

Building implementations of xRPC capabilities in conjunction with APISIX functionality.
Performing native heterogeneous multi-protocol support.
Covering all kinds of scenarios and configurations, including Istio, to significantly reduce user migration costs.

In conclusion, among the current technology trends, Service Mesh is bound to be a popular trend in the future. Although various solutions are still not perfect, the overall situation is an upward spiral. Of course, APISIX-based Service Mesh is also moving toward the ideal Service Mesh solution that everyone has in mind.