Rate Limiting in API Management

With the development of the internet, more and more companies have started adopting cloud-native and microservices. However, due to the technical features of cloud-native and microservices, we have to manage hundreds of different services simultaneously. Therefore, not only do we need to consider the whole system's smooth operation, but also we have to take care of every API-based service's security and stability.

Rate limiting is one of the most critical solutions to ensure the stability of the API-based services. However, the application would become extremely bloated if each service needed a rate limitation. As the entrance and exit of all traffic in the digital world, the API gateway helps achieve the unified API management of all services. It protects the stable operation of the system service. This article shows how we achieve rate limiting through the Apache APISIX API gateway and describes rate limiting strategies and techniques.

What Is Rate Limiting

Rate limiting is a strategy to control internet traffic and maximize the throughput. Using rate limiting would only allow requests under a specific constraint to access the system; the system would queue up any additional requests that exceed the constraint, downgrade their priorities or even refuse or abandon them. Rate limiting also protects the system from unexpected accidents like traffic spikes or malicious attacks and enables the system to provide consistent, stable services.

For example, when a trending tweet creates a traffic spike, Twitter has to put a rate limitation to prevent server crashes due to traffic overload.

Why Do You Need It

First, let’s see some simple use cases in our real life that use rate limiting. For example, tourist attractions could only sell a certain number of holiday tickets. Also, we usually need to book ahead or wait for a long time before enjoying the food at popular restaurants.

In the API gateway, rate limiting also has many benefits. Rate limiting would put some constraints on our API-based services, secure its smooth operation, and avoids unnecessary loss caused by server crashes due to traffic spikes. Here we have listed five different practical constraints:

Limit the request rate.
Limit the number of requests per time unit.
Delay requests.
Refuse client requests.
Limit the response rate.

When Would You Need It

Together with identification and authentication, rate limiting could best maximize its advantages and improve the availability of the system in the following ways:

Avoid malicious attacks.
Secure the system's stable operation and avoid server crashes due to traffic spikes.
Prevent server crashing due to requests spikes caused by bugs generated by upstream or downstream services.
Avoid too frequent expensive API calls.
Reduce unnecessary waste of resources by limiting the frequency of API calls.

The Theory Behind Rate Limiting

In the previous sections, we have introduced the benefits of rate limiting. In this section, let us find out the theories behind it! Specific algorithms achieve the low-level implementation of rate limiting. The customarily used ones include:

Counter Algorithm
- Fixed Window
- Sliding Window
Leaky Bucket Algorithm
Token Bucket Algorithm

Counter Algorithm

A counter algorithm is relatively easy to understand, and it has two types:

The first type is Fixed Window Algorithm, which maintains a counter in a fixed time unit and will reset the counter to zero if it detects the time unit has passed.

The second type is Sliding Window Algorithm, which is an improvement based on the first one; it has the following steps:

Split time units into several intervals (each called a block).
Each block has a counter; Any incoming requests will increment the counter by 1.
After a fixed amount of time, this time window moves forward by one block.
It would calculate the total requests in that time window by summing up all counters of blocks in the time window; if the total requests exceed the constraint, it will drop all the requests in that time window.

Leaky Bucket Algorithm

Suppose there is a leaky bucket; all requests would first queue up first, and then the leaky bucket will send them out at a constant rate.

Leaky Bucket Algorithm

If requests exceed the capacity of the bucket, the system would abandon and refuse any overflowed requests. The leaky bucket algorithm could limit the request rate and ensure all requests send out at a constant rate, which creates an easy-in but hard-out mode.

The core steps in this algorithm:

All requests are stored in a fixed-size bucket.
The bucket would send out requests at a constant rate until the bucket was empty.
When the bucket is full, the system will abandon any additional requests.

Token Bucket Algorithm

The token bucket algorithm consists of two parts: token generating and discarding. The token bucket generates tokens at a constant rate and stores them in a fixed storage bucket. When a request goes through the token bucket, the request takes one or more tokens. When the number of tokens in the token bucket reaches the max capacity, the token bucket will abandon new generated tokens. Also, the storage bucket will reject incoming requests if no token is left.

Token Bucket Algorithm

The core steps of the token bucket algorithm:

The token bucket would generate tokens at a constant rate and put them into the storage bucket.
If the token bucket is full, newly generated tokens would be directly abandoned. When a request arrives, it takes one or more tokens from the storage bucket.
If no token is left in the token bucket, the system will reject any incoming request.

Achieve Rate Limiting via API Gateway

If only a few API-based services need to be maintained, we could directly use rate-limiting algorithms in the service. For example, if you are using Go to develop your system, it will use tollbooth or golang.org/x/time/rate to implement the algorithms. If you are using Lua, you could use NGINX's limit_req, limit_conn, and Lua-resty-limit-traffic modules to implement the algorithms.

If rate limiting is implemented based on an API-based service, the constraints of rate limiting would be set by the service itself, and each service might have different constraints. Constraints and differences will bring management-level issues if the number of API-based services increases significantly. By then, we could not use the API gateway to manage all API services uniformly. You could also implement unrelated business features on the gateway, like identification, authentication, log, observability, etc. when using the API gateway to resolve rate-limiting issues.

Apache APISIX is a dynamic, real-time, high-performance cloud-native gateway. APISIX currently has supported over 80 different plugins, and it has already built a rich ecosystem. We could manage the traffic of API-based services by using APISIX's plugins, including limit-req, limit-conn, and limit-count. Here, I will share a use case to demonstrate the usage of APISIX rate-limit plugins.

Suppose there is an API-based service (/user/login) that helps users log in. To avoid malicious attacks and resource starvation, we need to enable the rate-limiting function to secure the system's stability.

Limit Requests

The limit-req plugin limits the request rate, which uses the leaky bucket algorithm, and we bind it with corresponding routes or specific customers.

We could directly use APISIX's Admin API to create such a route:

X-API-Key is the admin_key in the APISIX configuration.

curl http://127.0.0.1:9080/apisix/admin/routes/1 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "methods": ["POST"],
    "uri": "/user/login",
    "plugins": {
        "limit-req": {
            "rate": 3,
            "burst": 2,
            "rejected_code": 503,
            "key": "remote_addr"
        }
    },
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:1980": 1
        }
    }
}'

The meaning of this code snippet: We use the client's IP address as the requirement to limit the request rate.

If the request rate is smaller than 3 requests/second (rate), then the request is normal;
If the request rate is more significant than 3 requests/second and smaller than 5 requests/second (rate+burst), we will downgrade these exceeding requests;
If the request rate is larger than 5 requests/second (rate+burst), any requests that exceed the max constraint will return HTTP Code 503.

If you want to learn more about limit-req, please check this doc: APISIX limit-req

Limit Connections

limit-conn plugin limits parallel requests (or parallel connections). The following is an example code snippet to enable this plugin for /user/login:

curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "methods": ["POST"],
    "uri": "/user/login",
    "id": 1,
    "plugins": {
        "limit-conn": {
            "conn": 3,
            "burst": 2,
            "default_conn_delay": 0.1,
            "rejected_code": 503,
            "key": "remote_addr"
        }
    },
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:1980": 1
        }
    }
}'

The meaning of this code snippet: We use the client's IP address as the requirement to limit parallel requests.

If the same client's parallel connections are less than 3 (conn), then it responds normal status 200;
If the parallel connections are more than 3 (conn) but less than 5 (conn+burst), we will slow down the overflow requests, and increase 0.1 seconds delay time;
If the parallel connections are more than 5 (conn+burst), this request would be refused and returns HTTP Code 503.

If you want to learn more about limit-conn, please check this doc: APISIX limit-conn

Limit Count

limit-count plugin is similar to Github's API rate limiting; it would limit the number of total requests in a specific time interval and pass back the remaining requests count in the HTTP header. The following is an example code snippet to enable this plugin for /user/login:

curl -i http://127.0.0.1:9080/apisix/admin/routes/1 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/user/login",
    "plugins": {
        "limit-count": {
            "count": 3,
            "time_window": 60,
            "rejected_code": 503,
            "key": "remote_addr",
            "policy": "local"
        }
    },
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:9001": 1
        }
    }
}'

The meaning of this code snippet: We use the client's IP address as the requirement to limit the number of requests; the counter is saved locally in memory.

If there are more than 3(count) requests within 60 seconds(time_window), the requests being sent over 3 times would return HTTP code 503 (rejected_code).

If you want to learn more about limit-count, please check this doc: APISIX limit-count

The Advantages of Apache APISIX Rate Limiting

When we use NGINX to manage traffic, if the number of API requests creates a burst of traffic, then NGINX would expose its shortcomings, and one of the shortcomings is it cannot dynamically load configurations. On the other hand, APISIX's services (like Route and Service) could support configuration hot-reloading. Even if there is a burst of traffic, APISIX could instantly modify the rate limitation and the other security plugin configuration. Thanks to the watch mechanism of etcd, it allows APISIX to update the data layer within milliseconds without reloading services.

Apart from that, APISIX also supports cluster-level rate limiting. For example, we could use limit-count to modify the configuration of policy to redis or redis-cluster. Therefore, we could limit cluster-level rates by sharing computation results between different APISIX nodes.

As a DevOps, using a graphical dashboard to manage all API services would boost the productivity. APISIX provides a neat visual management dashboard to make API config modification much more convenient.

Conclusion

Rate limiting is a common need in actual business scenarios, and it is a crucial way to protect the system from traffic spikes and ensure its smooth operation. Rate limiting is just a single part of API services management; we could also use many other technologies to provide vital security support and improve user experiences.