Dynamic Rate-Limiting in OpenResty

API7.ai

January 6, 2023

OpenResty (NGINX + Lua)

In the previous article, I introduced you to leaky bucket and token bucket algorithms, which are common for dealing with burst traffic. Also, we learned how to conduct rate limiting of requests using the NGINX configuration. However, using the NGINX configuration is only at the level of usability and there is still a long way from being useful.

The first problem is that the rate limit key is limited to a range of NGINX variables and cannot be set flexibly. For example, there is no way to set different speed limit thresholds for different provinces and different client channels, which is a common requirement with NGINX.

Another bigger problem is that the rate cannot be dynamically adjusted, and each change requires reloading the NGINX service. As a result, limiting the speed based on different periods can only be lamely implemented through external scripts.

It is important to understand that technology serves the business, and at the same time, the business drives the technology. At the time of NGINX's birth, there was little need to adjust the configuration dynamically; it was more about reverse proxying, load balancing, low memory usage, and other similar needs that drove NGINX's growth. In terms of technology architecture and implementation, no one could have predicted the massive explosion of demand for dynamic and fine-grained control in scenarios such as the mobile Internet, IoT, and microservices.

OpenResty's use of Lua scripting makes up for the lack of NGINX in this area, making it an effective complement. This is why OpenResty is so widely used as a replacement for NGINX. In the next few articles, I'll continue introducing you to more dynamic scenarios and examples in OpenResty. Let's start by looking at how to use OpenResty to implement dynamic rate limiting.

In OpenResty, we recommend using lua-resty-limit-traffic to limit traffic. It includes limit-req (limit request rate), limit-count (limit request count), and limit-conn (limit concurrent connections); and provides limit.traffic to aggregate these three methods.

Limit request rate

Let's start by looking at limit-req, which uses a leaky bucket algorithm to limit the rate of requests.

In the previous section, we briefly introduced the key implementation code of the leaky bucket algorithm in this resty library, and now we will learn how to use this library. First, let's look at the following sample code.

resty --shdict='my_limit_req_store 100m' -e 'local limit_req = require "resty.limit.req"
local lim, err = limit_req.new("my_limit_req_store", 200, 100)
local delay, err = lim:incoming("key", true)
if not delay then
    if err == "rejected" then
        return ngx.exit(503)
    end
    return ngx.exit(500)
end

 if delay >= 0.001 then
    ngx.sleep(delay)
end'

We know that lua-resty-limit-traffic uses a shared dict to store and count keys, so we need to declare the 100m space for my_limit_req_store before we can use limit-req. This is similar for limit-conn and limit-count, which both need separate shared dict space to be distinguished.

limit_req.new("my_limit_req_store", 200, 100)

The above line of code is one of the most critical lines of code. It means that a shared dict called my_limit_req_store is used to store statistics, and the rate per second is set to 200, so that if it exceeds 200 but is less than 300 (this value is calculated from 200 + 100), it will be queued, and if it exceeds 300, it will be rejected.

After the setup is done, we have to process the request from the client. lim: incoming("key", true) is here to do this. incoming has two parameters, which we need to read in detail.

The first parameter, the user-specified key for the rate-limiting, is a string constant in the above example, which means that the rate-limiting should be uniform for all clients. If you want to limit the rate according to different provinces and channels, it is very simple to use both information as the key, and the following is the pseudo-code to achieve this requirement.

local  province = get_ province(ngx.var.binary_remote_addr)
local channel = ngx.req.get_headers()["channel"]
local key = province .. channel
lim:incoming(key, true)

Of course, you can also customize the meaning of the key and the conditions for calling incoming, so that you can get a very flexible effect of rate-limiting.

Let's look at the second parameter of the incoming function, and it is a boolean value. The default is false, meaning that the request will not be recorded in the shared dict for statistics; it is just an exercise. If it is set to true, it will have a real effect. Therefore, in most cases, you will need to set it to true explicitly.

You may wonder why this parameter exists. Consider a scenario where you set up two different limit-req instances with different keys, one key being the hostname and the other key being the IP address of the client. Then, when a client request is processed, the incoming methods of these two instances are called in order, as indicated by the following pseudo-code.

local limiter_one, err = limit_req.new("my_limit_req_store", 200, 100)
local limiter_two, err = limit_req.new("my_limit_req_store", 20, 10)

limiter_one :incoming(ngx.var.host, true)
limiter_two:incoming(ngx.var.binary_remote_addr, true)

If the user's request passes limiter_one's threshold detection but is rejected by limiter_two's detection, then the limiter_one:incoming function call should be considered an exercise and we don't need to count it.

In this case, the above code logic is not rigorous enough. We need to do a walkthrough of all limiters beforehand so that if a limiter threshold is triggered that can reject the client request, it can directly return.

for i = 1, n do
    local lim = limiters[i]
    local delay, err = lim:incoming(keys[i], i == n)
    if not delay then
        return nil, err
    end
end

This is what the second argument of the incoming function is all about. This code is the core code of the limit.traffic module, which is used to combine multiple rate limiters.

Limit the number of requests

Let's take a look at limit.count, a library that limits the number of requests. It works like the GitHub API Rate Limiting, which limits the number of user requests in a fixed time window. As usual, let's start with a sample code.

local limit_count = require "resty.limit.count"

local lim, err = limit_count.new("my_limit_count_store", 5000, 3600)

local key = ngx.req.get_headers()["Authorization"]
local delay, remaining = lim:incoming(key, true)

You can see that limit.count and limit.req are used similarly. We start by defining a shared dict in nginx.conf.

lua_shared_dict my_limit_count_store 100m;

Then new a limiter object, and finally use the incoming function to determine and process it.

However, the difference is that the second return value of the incoming function in limit-count represents the remaining calls, and we can add fields to the response header accordingly to give the client a better indication.

ngx.header["X-RateLimit-Limit"] = "5000"
ngx.header["X-RateLimit-Remaining"] = remaining

Limit the number of concurrent connections

limit.conn is a library for limiting the number of concurrent connections. It differs from the two previously mentioned libraries in that it has a special leaving API, which I'll briefly describe here.

Limiting the request rate and the number of requests, as mentioned above, can be done directly in the access phase. Unlike limiting the number of concurrent connections, which requires not only determining whether the threshold is exceeded in the access phase but also calling the leaving API in the log phase.

log_by_lua_block {
    local latency = tonumber(ngx.var.request_time) - ctx.limit_conn_delay
    local key = ctx.limit_conn_key

    local conn, err = lim:leaving(key, latency)
}

However, the core code of this API is quite simple, which is the following line of code that decrements the number of connections by one. If you don't cleanup in the log phase, the number of connections will keep going up and soon reach the concurrency threshold.

local conn, err = dict:incr(key, -1)

Combination of rate limiters

This is the end of the introduction of each of these three methods. Finally, let's see how to combine limit.rate, limit.conn and limit.count. Here we need to use the combine function in limit.traffic`.

local lim1, err = limit_req.new("my_req_store", 300, 200)
local lim2, err = limit_req.new("my_req_store", 200, 100)
local lim3, err = limit_conn.new("my_conn_store", 1000, 1000, 0.5)

local limiters = {lim1, lim2, lim3}
local host = ngx.var.host
local client = ngx.var.binary_remote_addr
local keys = {host, client, client}

local delay, err = limit_traffic.combine(limiters, keys, states)

This code should be easy to understand with the knowledge you just gained. the core code of the combine function, which we already mentioned in the analysis of limit.rate above, is mainly implemented with the help of the drill function and the uncommit function. This combination allows you to set different thresholds and keys for multiple limiters to achieve more complex business requirements.

Summary

Not only does limit.traffic support the three rate limiters mentioned today, but as long as the rate limiter has incoming and uncommit API, it can be managed by the combine function of limit.traffic.

Finally, I'll leave you with a homework question. Can you write an example that combines the token and bucket rate limiters we introduced before? Feel free to write down your answer in the comment section to discuss with me, and you are also welcome to share this article with your colleagues and friends to learn and communicate together.