Introduction of Common APIs in OpenResty

API7.ai

November 4, 2022

OpenResty (NGINX + Lua)

In the previous articles, you have been familiar with many important Lua APIs in OpenResty. Today, we will learn about some other general APIs, mainly related to regular expressions, time, process, etc.

Let's start by looking at the most commonly used and the most important regular expressions. In OpenResty, we should use the set of APIs provided by ngx.re.* to handle the logic related to regular expressions instead of using Lua pattern matching. This is not only for performance reasons but also because Lua regularity is self-contained and not a PCRE specification, which would be annoying for most developers.

In the previous articles, you have already come across some of the ngx.re.* APIs, the documentation of which is very detailed. Thus I won't list them more. Here, I will introduce the following two APIs separately.

ngx.re.split

The first one is ngx.re.split. String cutting is a very common function, and OpenResty also provides a corresponding API, but many developers can't find such a function and have to choose to implement it themselves.

Why? the ngx.re.split API is not in lua-nginx-module but in lua-resty-core; it is not in the documentation of the lua-resty-core home page but in the documentation of the lua-resty-core/lib/ngx/re.md third-level directory. As a result, many developers are completely unaware of the existence of this API.

Similarly, APIs that are hard to discover include ngx_resp.add_header, enable_privileged_agent, etc., which we mentioned earlier. So how do we quickly solve this problem? In addition to reading the lua-resty-core home page documentation, you need to read through the *.md documentation in the lua-resty-core/lib/ngx/ directory as well.

lua_regex_match_limit

Second, I want to introduce lua_regex_match_limit. We haven't talked about the NGINX commands provided by OpenResty before because, in most cases, the default values are sufficient, and there is no need to modify them at runtime. The exception to this is the lua_regex_match_limit command, which is related to regular expressions.

We know that if we use a regular engine that is implemented based on backtracking NFA, then there is a risk of Catastrophic Backtracking, where the regular is backtracking too much when matching, causing CPU to be 100% and services to be blocked.

Once a catastrophic backtrace occurs, we need to use gdb to analyze the dump or use systemtap to analyze the online environment to locate it. Unfortunately, detecting it beforehand isn't easy because only special requests will trigger it. This allows attackers to take advantage of this, and ReDoS (RegEx Denial of Service) refers to this type of attack.

Here, I mainly introduce you to how to use the following line of code in OpenResty to avoid the above problems simply and effectively:

lua_regex_match_limit is used to limit the number of backtracking by the PCRE regular engine. This way, even if catastrophic backtracking occurs, the consequences will be limited to a range that will not cause your CPU to be full.

lua_regex_match_limit 100000;

The most commonly used time API is ngx.now, which prints out the current timestamp, such as the following line of code:

resty -e 'ngx.say(ngx.now())'

As you can see from the printed results, ngx.now includes the fractional part, so it is more accurate. The related ngx.time API only returns the integer part of the value. The others, ngx.localtime, ngx.utctime, ngx.cookie_time and ngx.http_time are mainly used to return and process time in different formats. If you want to use them, you can check the documentation, they are not difficult to understand, so I don't need to talk about them.

However, it is worth mentioning that these APIs that return the current time, if not triggered by a non-blocking network IO operation, will always return the cached value rather than the current real-time time as we would like. Take a look at the following sample code:

$ resty -e 'ngx.say(ngx.now())
os.execute("sleep 1")
ngx.say(ngx.now())'

Between the two calls to ngx.now, we used Lua's blocking function to sleep for 1 second, but the timestamp returned is the same on both occasions, as shown by the printed results.

So, what if we replace it with a non-blocking sleep function? For example, the following new code:

$ resty -e 'ngx.say(ngx.now())
ngx.sleep(1)
ngx.say(ngx.now())'

It will print a different timestamp. This leads us to ngx.sleep, a non-blocking sleep function. In addition to sleeping for a specified amount of time, this function has another special purpose.

For example, if you have a piece of code that is doing intensive calculations, which takes a lot of time, the requests corresponding to this piece of code will keep taking up worker and CPU resources during this time, causing other requests to queue up and not get a timely response. At this point, we can intersperse ngx.sleep(0) to make this code give up control so that other requests can also be processed.

Worker and process API

OpenResty provides the ngx.worker.* and ngx.process.* APIs to obtain information about workers and processes. The former relates to Nginx worker processes, while the latter refers to all Nginx processes in general, not only worker processes, but also the master process, privileged process, and so on.

The problem of true and null values

Finally, let's look at the issue of true and null values. In OpenResty, the determination of true value and null values has been a very troublesome and confusing point.

Let's look at the definition of a true value in Lua: except for nil and false, they are all true values.

So, true values would also include 0, empty string, empty table, etc.

Let's look at nil in Lua, which means undefined. For example, if you declare a variable but haven't initialized it, its value is nil.

$ resty -e 'local a
ngx.say(type(a))'

And nil is also a data type in Lua. Having understood these two points, let's now look at the other issues derived from these two definitions.

ngx.null

The first issue is ngx.null. Because Lua's nil cannot be used as the value of a table, OpenResty introduces ngx.null as the null value in the table.

$ resty -e  'print(ngx.null)'
null
$ resty -e 'print(type(ngx.null))'
userdata

As you can see from the two pieces of code above, ngx.null is printed as null, and its type is userdata, so can it be treated as a false value? Of course not. The boolean value of ngx.null is true.

$ resty -e 'if ngx.null then
ngx.say("true")
end'

So, keep in mind that only nil and false are false values. If you miss this point, it is easy to step into the pitfalls, for example, when you use lua-resty-redis and make the following judgment:

local res, err = red:get("dog")
if not res then
    res = res + "test"
end

If the return value res is nil, the function call has failed; if res is ngx.null, the key dog does not exist in redis, then the code crashes if the key dog does not exist.

cdata:NULL

The second issue is cdata:NULL. When you call a C function through the LuaJIT FFI interface, and the function returns a NULL pointer, then you will encounter another kind of null value, cdata:NULL.

$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
if cdata_null then
    ngx.say("true")
end'

Like ngx.null, cdata:NULL is also true. But what's more puzzling is that the following code, which prints true, means that cdata:NULL is equivalent to nil.

$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
ngx.say(cdata_null == nil)'

So how should we handle ngx.null and cdata:NULL? It is not a good solution to let the application layer care about these troubles. It's better to do a second-level wrapper and not let the caller know these details.

It's better to do a second-level wrapper and not let the caller know these details.

cjson.null

Finally, let's look at the null values that appear in cjson. cjson library takes the NULL in json, decodes it into Lua lightuserdata, and uses cjson.null to represent.

$ resty -e 'local cjson = require "cjson"
local data = cjson.encode(nil)
local decode_null = cjson.decode(data)
ngx.say(decode_null == cjson.null)'

Lua's nil becomes cjson.null after being encoded and decoded by JSON. As you can imagine, it is introduced for the same reason as ngx.null, because nil cannot be used as a value in a table.

So far, Have you been confused by so many kinds of null values in OpenResty? Don't worry. Read this part a few more times and sort it out yourself, then you won't be confused. Of course, we need to think more in the future about whether it works when writing something like if not foo then.

Summary

Today's article introduces you to the Lua APIs commonly used in OpenResty.

Finally, I'll leave you with a question: In the ngx.now example, why is the value of the ngx.now not modified when there is no yield operation? Welcome to share your opinion in the comments, and also welcome you to share this article so that we can communicate and improve together.