Introduction of Common APIs in OpenResty
API7.ai
November 4, 2022
In the previous articles, you have been familiar with many important Lua APIs in OpenResty. Today, we will learn about some other general APIs, mainly related to regular expressions, time, process, etc.
Regular Expressions-related APIs
Let's start by looking at the most commonly used and the most important regular expressions. In OpenResty, we should use the set of APIs provided by ngx.re.*
to handle the logic related to regular expressions instead of using Lua pattern matching. This is not only for performance reasons but also because Lua regularity is self-contained and not a PCRE
specification, which would be annoying for most developers.
In the previous articles, you have already come across some of the ngx.re.*
APIs, the documentation of which is very detailed. Thus I won't list them more. Here, I will introduce the following two APIs separately.
ngx.re.split
The first one is ngx.re.split
. String cutting is a very common function, and OpenResty also provides a corresponding API, but many developers can't find such a function and have to choose to implement it themselves.
Why? the ngx.re.split
API is not in lua-nginx-module
but in lua-resty-core
; it is not in the documentation of the lua-resty-core
home page but in the documentation of the lua-resty-core/lib/ngx/re.md
third-level directory. As a result, many developers are completely unaware of the existence of this API.
Similarly, APIs that are hard to discover include ngx_resp.add_header
, enable_privileged_agent
, etc., which we mentioned earlier. So how do we quickly solve this problem? In addition to reading the lua-resty-core
home page documentation, you need to read through the *.md
documentation in the lua-resty-core/lib/ngx/
directory as well.
lua_regex_match_limit
Second, I want to introduce lua_regex_match_limit
. We haven't talked about the NGINX commands provided by OpenResty before because, in most cases, the default values are sufficient, and there is no need to modify them at runtime. The exception to this is the lua_regex_match_limit
command, which is related to regular expressions.
We know that if we use a regular engine that is implemented based on backtracking NFA, then there is a risk of Catastrophic Backtracking, where the regular is backtracking too much when matching, causing CPU to be 100% and services to be blocked.
Once a catastrophic backtrace occurs, we need to use gdb
to analyze the dump or use systemtap
to analyze the online environment to locate it. Unfortunately, detecting it beforehand isn't easy because only special requests will trigger it. This allows attackers to take advantage of this, and ReDoS
(RegEx Denial of Service) refers to this type of attack.
Here, I mainly introduce you to how to use the following line of code in OpenResty to avoid the above problems simply and effectively:
lua_regex_match_limit
is used to limit the number of backtracking by the PCRE
regular engine. This way, even if catastrophic backtracking occurs, the consequences will be limited to a range that will not cause your CPU to be full.
lua_regex_match_limit 100000;
Time-related APIs
The most commonly used time API is ngx.now
, which prints out the current timestamp, such as the following line of code:
resty -e 'ngx.say(ngx.now())'
As you can see from the printed results, ngx.now
includes the fractional part, so it is more accurate. The related ngx.time
API only returns the integer part of the value. The others, ngx.localtime
, ngx.utctime
, ngx.cookie_time
and ngx.http_time
are mainly used to return and process time in different formats. If you want to use them, you can check the documentation, they are not difficult to understand, so I don't need to talk about them.
However, it is worth mentioning that these APIs that return the current time, if not triggered by a non-blocking network IO operation, will always return the cached value rather than the current real-time time as we would like. Take a look at the following sample code:
$ resty -e 'ngx.say(ngx.now())
os.execute("sleep 1")
ngx.say(ngx.now())'
Between the two calls to ngx.now
, we used Lua's blocking function to sleep for 1
second, but the timestamp returned is the same on both occasions, as shown by the printed results.
So, what if we replace it with a non-blocking sleep function? For example, the following new code:
$ resty -e 'ngx.say(ngx.now())
ngx.sleep(1)
ngx.say(ngx.now())'
It will print a different timestamp. This leads us to ngx.sleep
, a non-blocking sleep function. In addition to sleeping for a specified amount of time, this function has another special purpose.
For example, if you have a piece of code that is doing intensive calculations, which takes a lot of time, the requests corresponding to this piece of code will keep taking up worker and CPU resources during this time, causing other requests to queue up and not get a timely response. At this point, we can intersperse ngx.sleep(0)
to make this code give up control so that other requests can also be processed.
Worker and process API
OpenResty provides the ngx.worker.*
and ngx.process.*
APIs to obtain information about workers and processes. The former relates to Nginx worker processes, while the latter refers to all Nginx processes in general, not only worker processes, but also the master process, privileged process, and so on.
The problem of true
and null
values
Finally, let's look at the issue of true
and null
values. In OpenResty, the determination of true
value and null
values has been a very troublesome and confusing point.
Let's look at the definition of a true
value in Lua: except for nil
and false
, they are all true
values.
So, true
values would also include 0
, empty string
, empty table
, etc.
Let's look at nil
in Lua, which means undefined
. For example, if you declare a variable but haven't initialized it, its value is nil
.
$ resty -e 'local a
ngx.say(type(a))'
And nil
is also a data type in Lua. Having understood these two points, let's now look at the other issues derived from these two definitions.
ngx.null
The first issue is ngx.null
. Because Lua's nil
cannot be used as the value of a table
, OpenResty introduces ngx.null
as the null
value in the table.
$ resty -e 'print(ngx.null)'
null
$ resty -e 'print(type(ngx.null))'
userdata
As you can see from the two pieces of code above, ngx.null
is printed as null
, and its type is userdata
, so can it be treated as a false
value? Of course not. The boolean value of ngx.null
is true
.
$ resty -e 'if ngx.null then
ngx.say("true")
end'
So, keep in mind that only nil
and false
are false
values. If you miss this point, it is easy to step into the pitfalls, for example, when you use lua-resty-redis
and make the following judgment:
local res, err = red:get("dog")
if not res then
res = res + "test"
end
If the return value res
is nil
, the function call has failed; if res
is ngx.null
, the key dog
does not exist in redis, then the code crashes if the key dog
does not exist.
cdata:NULL
The second issue is cdata:NULL
. When you call a C function through the LuaJIT FFI interface, and the function returns a NULL
pointer, then you will encounter another kind of null
value, cdata:NULL
.
$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
if cdata_null then
ngx.say("true")
end'
Like ngx.null
, cdata:NULL
is also true
. But what's more puzzling is that the following code, which prints true
, means that cdata:NULL
is equivalent to nil
.
$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
ngx.say(cdata_null == nil)'
So how should we handle ngx.null
and cdata:NULL
? It is not a good solution to let the application layer care about these troubles. It's better to do a second-level wrapper and not let the caller know these details.
It's better to do a second-level wrapper and not let the caller know these details.
cjson.null
Finally, let's look at the null
values that appear in cjson
. cjson
library takes the NULL
in json, decodes it into Lua lightuserdata
, and uses cjson.null
to represent.
$ resty -e 'local cjson = require "cjson"
local data = cjson.encode(nil)
local decode_null = cjson.decode(data)
ngx.say(decode_null == cjson.null)'
Lua's nil
becomes cjson.null
after being encoded and decoded by JSON. As you can imagine, it is introduced for the same reason as ngx.null
, because nil
cannot be used as a value in a table
.
So far, Have you been confused by so many kinds of null values in OpenResty? Don't worry. Read this part a few more times and sort it out yourself, then you won't be confused. Of course, we need to think more in the future about whether it works when writing something like if not foo then
.
Summary
Today's article introduces you to the Lua APIs commonly used in OpenResty.
Finally, I'll leave you with a question: In the ngx.now
example, why is the value of the ngx.now
not modified when there is no yield
operation? Welcome to share your opinion in the comments, and also welcome you to share this article so that we can communicate and improve together.