How to Deal with Bursty Traffic: Leaky Bucket and Token Bucket Algorithms

January 5, 2023

OpenResty (NGINX + Lua)

In the previous articles, we learned about code optimization and cache design, which are closely related to the application's overall performance and deserve our attention. However, in real business scenarios, we also need to consider the impact of burst traffic on performance. Burst traffic here may be normal, such as traffic from breaking news, promotions, etc., or abnormal traffic, such as DDoS attacks.

OpenResty is now mainly used as an access layer for Web applications such as WAFs and API gateways, which have to deal with the normal and abnormal burst traffic just mentioned. After all, if you can't handle bursty traffic, the back-end services can easily be brought down, and the business won't respond appropriately. So today, we'll look at ways to deal with burst traffic.

Traffic Control

Traffic control is a must-have feature for WAF (Web Application Firewall) and API gateways. It ensures that upstream services can function appropriately by channeling and controlling ingress traffic through some algorithms, thus keeping the system healthy.

Because the processing capacity of the back end is limited, we need to consider multiple aspects such as cost, user experience, and system stability. No matter which algorithm is used, it will inevitably cause normal user requests to be slowed down or even rejected, sacrificing part of the user experience. Therefore, we need to control traffic while balancing business stability and user experience.

In fact, there are many "traffic control" cases in real life. For example, during the Chinese Spring Festival travel rush, people flow crowded in subway stations, train stations, airports, and other transportation hubs because the handling capacity of these vehicles is limited. Therefore, passengers must wait in line and enter the station in batches to ensure their safety and regular operation of traffic.

This naturally affects the passenger experience, but on the whole, it ensures the efficient and safe operation of the system. For example, if there was no queuing and batching, but instead, everyone was allowed to enter the station in a swarm, the result would be that the whole system would be down.

Going back to technology, for example, let's assume that an upstream service is designed to handle 10,000 requests per minute. At peak times, if there is no flow control at the entry point and the stack of tasks reaches 20,000 per minute, the processing performance of this upstream service will degrade to perhaps only 5,000 requests per minute and continue to deteriorate, perhaps eventually leading to service unavailability. This is not the outcome we want to see.

The common traffic control algorithms we use to cope with such burst traffic are the leaky bucket algorithm and the token bucket algorithm.

Leaky Bucket Algorithm

Let's start by looking at the leaky bucket algorithm, which aims to keep a constant request rate and smooth bursts of traffic. But how is it achieved? First, look at the following conceptual abstraction from the Wikipedia introduction to the leaky bucket algorithm.

leaky bucket algorithm

We can imagine the client's traffic as water flowing from a water pipe with an uncertain flow rate, sometimes fast and sometimes slow. The external traffic processing module, which is the bucket that receives the water, has a hole at the bottom for leakage. This is the origin of the name of the leaky bucket algorithm, which has the following benefits:

First, Whether the flow into the bucket is a trickle or a monstrous flood, it is guaranteed that the rate of water flowing out of the bucket is constant. This steady traffic is friendly to upstream services, which is what traffic shaping is all about.

Second, The bucket itself has a certain volume and can accumulate a certain amount of water. This is equivalent to client requests that can be queued if they cannot be processed immediately.

Third, Water that exceeds the bucket's volume will not be accepted by the bucket but will flow away. The correspondent metaphor here is that if there are too many client requests that exceed the queue length, it will directly return a failure message to the client. At this moment, the server side can't handle so many requests and the queuing becomes unnecessary.

So how should this algorithm be implemented? Let's take the resty.limit.req library that comes with OpenResty as an example. It is a rate limit module implemented by the leaky bucket algorithm. We will introduce more about it in the following article. Today we'll start with a brief look at the following lines of code, which are the key ones:

local elapsed = now - tonumber(rec.last)
excess = max(tonumber(rec.excess) - rate * abs(elapsed) / 1000 + 1000,0)
if excess > self.burst then
    return nil, "rejected"
-- return the delay in seconds, as well as excess
return excess / rate, excess / 1000

Let's read these lines of code briefly. Where elapsed is the number of milliseconds between the current and last requests, and rate is the rate we set per second. Since the smallest unit of rate is 0.001 s/r, the code implemented above needs to be multiplied by 1000 to calculate it.

excess indicates the number of requests still in the queue, 0 means the bucket is empty, no requests in the queue, and burst refers to the volume of the entire bucket. If excess is greater than burst, it means that the bucket is full, and the traffic coming in will be discarded directly; if excess is greater than 0 and less than burst, it will enter the queue to wait for processing, and the excess/rate returned here is the time to wait.

In this way, we can control the queue length of burst traffic by adjusting the burst size, while the back-end service's processing capacity remains unchanged. But, of course, it depends on your business scenario whether you tell the user that there are too many requests and to retry later, or let the user wait for a longer period.

Token Bucket Algorithm

Both the token bucket algorithm and the leaky bucket algorithm have the same purpose, to ensure that back-end services are not beaten by bursts of traffic, although the two are not implemented in the same way.

The leaky bucket algorithm uses the endpoint IP to do the traffic and rate limit basics. In this way, each client's exit rate of the leaky bucket algorithm is fixed. However, this poses a problem:

Suppose user A's requests are frequent and other users' requests are infrequent. In that case, the leaky bucket algorithm will slow down or reject some of A's requests, even though the service can handle them at that time, even though the overall service pressure is not very high.

This is where the token bucket comes in handy.

While the leaky bucket algorithm is concerned with smoothing traffic, the token bucket allows bursts of traffic to enter the backend service. The principle of the token bucket is to put tokens into the bucket at a fixed rate and keep putting them in as long as the bucket is not full. In this way, all requests coming from the endpoint need to go to the token bucket to get the token first before the backend can process them; if there is no token inside the bucket, then the request will be rejected.

However, OpenResty does not implement token buckets to limit traffic and rate in its library. So here is a brief introduction to the token bucket-based rate limiting module lua-resty-limit-rate, which is open-sourced by the UPYUN, as an example:

local limit_rate = require "resty.limit.rate"

-- global 20r/s 6000r/5m
local lim_global ="my_limit_rate_store", 100, 6000, 2)

-- single 2r/s 600r/5m
local lim_single ="my_limit_rate_store", 500, 600, 1)

local t0, err = lim_global:take_available("__global__", 1)
local t1, err = lim_single:take_available(ngx.var.arg_userid, 1)

if t0 == 1 then
    return -- global bucket is not hungry
    if t1 == 1 then
        return -- single bucket is not hungry
        return ngx.exit(503)

In this code, we set up two token buckets: a global token bucket, and a token bucket with ngx.var.arg_userid as the key, divided by user. There is a combination of the two token buckets, which has the following main benefit:

  • Not having to determine the user's token bucket if there are still tokens in the global token bucket, and serving as many burst requests from users as possible if the backend service can run properly.
  • In the absence of a global token bucket, requests cannot be rejected indiscriminately, so it is necessary to determine the token bucket of individual users and reject requests from users with more burst requests. In this way, it is ensured that requests from other users are not affected.

Obviously, token buckets are more resilient than leaky buckets, allowing for situations where bursts of traffic are passed to backend services. But, of course, they both have their pros and cons, and you can choose to use them according to your situation.

NGINX's Rate Limit Module

With these two algorithms out of the way, let's finally look at how to implement a rate limit in the NGINX. In NGINX, the limit_req module is the most commonly used rate limit module, and the following is a simple configuration:

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

server {
    location /search/ {
        limit_req zone=one burst=5;

This code takes the client's IP address as the key, requests a 10M memory space address called one, and limits the rate to 1 request per second.

In the server's location, the one rate-limiting rule is also referenced, and the brust is set to 5. If the rate exceeds 1r/s, 5 requests can be queued simultaneously, giving a certain buffer area. If brust is not set, requests that exceed the rate will be rejected directly.

This NGINX module is based on a leaky bucket, so it is essentially the same as resty.limit.req in OpenResty, which we described above.


The biggest problem of rate-limiting in NGINX is that they cannot be modified dynamically. After all, you need to restart the configuration file after modifying it to make it effective, which is unacceptable in a rapidly changing environment. Therefore, the following article will look at implementing traffic and rate limits dynamically in OpenResty.

Finally, let's consider a question. From the perspective of WAF and API gateways, is there a better way to identify what are normal user requests and what are malicious ones? Because, for burst traffic from normal users, we can quickly scale up the back-end services to increase the capacity of the service, while for malicious requests, it is better to reject them directly at the access layer.

You are welcome to forward this article to your colleagues and friends to learn and progress together.