What Is Service Mesh?

Introduction of Service Mesh

Service mesh is a configurable infrastructure used to manage the interservice communications of microservice systems. It aims to process the traffic between microservice, also called east-west traffic.

In cloud-native applications, an application might consist of hundreds of services. Each service could have multiple instances, and each of those instances might constantly be changing. Under such a complicated running environment, how to provide users with reliable access and keep services running stably has become a huge challenge. Thus, a solution called service mesh was born.

Service mesh is like the TCP/IP between microservices, which handles interservice functions like network calls, rate limiting, monitoring, etc. We mainly apply service mesh to the Kubernetes platform. Also, the most classic pattern is called sidecar, which abstracts some general functions into the sidecar container and mounts it together with the service container into the same pod. The below image demonstrates why it is called service mesh.

Sidecar is not the only pattern that applies service mesh; besides that, we have the DaemonSet pattern and Ambient mesh pattern:

The difference between the DaemonSet pattern and the sidecar pattern is that the DaemonSet pattern only allows each node in the Kubernetes cluster to run one pod, and this pod works as a sidecar proxy. Compared to the sidecar pattern, the DaemonSet pattern uses much fewer machine resources, but it has shortcomings such as poor isolation, hard-to-predict resource calls, etc. You can find more differences in this article: Sidecars and DaemonSets: Battle of containerization patterns.
Ambient mesh is a new data plane mode introduced by Istio on September 7th, 2022. To resolve the coupling issue of mesh's infrastructure and deployment, Ambient mesh separates the data plane proxy from the application pod so that it can be deployed separately.

Ambient mesh splits the data plane into a secure overlay layer and an L7 processing layer: The secure overlay layer handles functions such as TCP routing, monitoring metrics, access logging, mTLS tunneling; besides all the functions of the secure overlay layer, the L7 processing layer has many more functions like traffic control over HTTP routing, observability, and achieves rich L7 authorization policies.

In addition, Ambient mesh uses a shared agent called ztunnel (zero-trust tunnel), which runs at every node inner the Kubernetes cluster and securely connects and authenticates the workloads inside the mesh. You could read this article if you want to find out details about Ambient mesh mode: Introducing Ambient Mesh

Why do we need service mesh?

Before service mesh gets popular, many microservices architectures' service governance was achieved through the microservice framework collaborating with the control platform. However, this method has the following issues:

Tight couples between framework and service, so its overall maintenance difficulty and complexity become very high. Also, developers need to understand public libraries, which makes them unable to concentrate on service implementations.
It needs to maintain a multi-language framework, increasing maintenance costs.
Microservice has a high upgrade cost, and it usually needs to reboot service during the upgrade.
There exist frameworks with many different online versions, forcing people to consider complex compatibility.

To resolve the above issues, previous Twitter engineer Willian Morgan, one of the founders of Linkerd, proposed the concept of "Service Mesh". Service mesh uses a sidecar pattern to decouple infrastructure with service logic without affecting the application, which achieves a language-unified upgrade and O&M.

Microservices Framework to Service Mesh

Service mesh moves functions such as traffic control, observability, and safety communications to the basic components; thus, developers don't need to worry about the concrete implementations of the communication layer and service management. Developers could leave all the dirty work related to the communication to service mesh and focus on service development. Based on these features, service mesh could help us resolve those earlier-mentioned problems.

How does service mesh work?

Service mesh would not add new functions to the application's running environment, so all applications within a framework still need corresponding rules to specify how to send requests from A to B. The difference is service mesh would extract interservice communications of logics management and then abstract them into an infrastructure layer.

Currently, most service meshes use the data plane + the control plane architecture, which is shown below:

The control plane

The control plane manages and configures the data plane and conducts strategies while running the service. All instances inside the control plane with a single service mesh would share the same configuration resources.

The control plane focuses more on delivery and strategies like security, observability, and traffic control. It will also gather and collect telemetry data of the data plane thus that DevOps can use them.

The data plane

The data plane usually works as a proxy and consists of many sidecar proxies. Sidecar would run with the service instances in parallel and control the service application's traffic by intercepting the service data flow.

As we mentioned earlier, the service mesh is achieved by implementing a sidecar pattern into Kubernetes and wrapping it as a container. Sidecar suggests using an extra container to expand and strengthen the main container, and this extra container is called a sidecar container, which is allocated in the same pod as the service container. On the other hand, the service mesh is a meshed network consisting of those sidecar proxies.

Applications of service mesh

In the microservice architecture, engineers would usually encrypt exposed public services or limit access to protect the service, but they ignore the communication safety inner clusters. Until now, many microservice applications still lack interservice communication encryptions, and the cluster's internal traffic is even transferred in raw data format. As a result, internal traffic is very likely to suffer eavesdropping attacks and MITM (Man-in-the-middle attack).

To avoid attacks against cluster's internal traffic, we use mTLS to encrypt traffic data. mTLS could secure communication safety between microservices within service mesh. It uses encryption technology to authenticate each microservice and encrypt the interservice traffic mutually.

Even though we could directly define the communication safety strategy inside the microservice and implement identity authentication and encryption, it is still very inefficient to implement the same function individually in every single microservice. Adding a new function has to modify the service's codes and invade the service logic. Furthermore, even if we could implement the new function, the later iterations, upgrades, and tests will still require developers to spend more time maintaining. Thus, developers couldn't focus on the service function development.

Instead, if we use service mesh, we could provide mTLS communication without the original service needing to be aware. Therefore, in the service mesh, we move all the communication-related functions to sidecar proxies.

When two microservices need to communicate, the sidecar proxy will first build an mTLS connection, and it will send encrypted traffic via this mTLS connection. Sidecar will switch certificates and authenticates each other by the certification authority. Before connecting, the sidecar will examine the authentication strategy sent by the control plane to determine whether it allows the microservice to communicate. If communication is allowed, sidecar will use the generated communication key to build secure connections and encrypt the communication data among microservices. During the whole process, service applications won't be affected, thus reducing developers' bothers.

From this scenario, everyone could understand why service mesh could expand current functions without affecting current service. But, of course, apart from achieving internal traffic safety configuration function, which is similar to mTLS, service mesh could also rapidly expand functions like traffic control, observability, and codec protocol by modifying the configuration of the control plane.

Conclusion

This article briefly introduces the basic concepts of the service mesh, its working principle, and the benefits it brings us. Service mesh revolutionizes the microservice architecture and helps developers get rid of the complex microservice running environment to focus on service function development.

Even though service mesh resolves many pain points in the microservice architecture, it still has limitations. The complexity of software development is eternal, and it is just transferred from one part to another. When we abstract service management into a separate layer, we have to face additional O&M difficulties and increases in traffic links. Furthermore, service mesh needs to be used under a cloud-native environment, which sets a higher bar for DevOps's professional ability and engineering working experience. That is why we say technology is just a tool to resolve problems, but we need to weigh the benefits service mesh brings according to its practical application.

With the explosive development of cloud-native and the optimization of the service mesh, service mesh will probably entirely replace microservice architecture in the future and become the first choice of each company's microservice and cloud-native rebuild architecture.

If you want to know more about API management, contact us whenever you want: https://api7.ai/contact.