What makes OpenResty so special
API7.ai
October 14, 2022
In previous articles, you've learned about the two cornerstones of OpenResty: NGINX and LuaJIT, and I'm sure you're ready to start learning about the APIs OpenResty provides.
But don't be too hasty. Before you do, you need to spend a little more time familiarizing yourself with the principles and basic concepts of OpenResty.
Principles
OpenResty's Master
and Worker
processes both contain a LuaJIT VM, which is shared by all coroutines within the same process, and in which Lua code is run.
And at the same point in time, each Worker
process can only handle requests from one user, which means that only one coroutine is running. You may have a question: Since NGINX can support C10K (tens of thousands of concurrency), doesn't it need to handle 10,000 requests simultaneously?
Of course not. NGINX uses epoll
to drive events to reduce waiting and idling so that as many CPU resources as possible can be used to process user requests. After all, the whole thing achieves high performance only when individual requests are processed quickly enough. If a multi-threaded mode is used so that one request corresponds to one thread, then with C10K, resources can easily be exhausted.
At the OpenResty level, Lua's coroutines work in conjunction with NGINX's event mechanism. If an I/O operation like querying a MySQL database occurs in Lua code, it will first call the Lua coroutine's yield
to hang itself and then register a callback in NGINX; after the I/O operation completes (which could also be a timeout or error), the NGINX callback resume
will wake up the Lua coroutine. This completes the cooperation between the Lua concurrent and NGINX event drivers, avoiding writing callbacks in the Lua code.
We can look at the following diagram, which describes the entire process. Both lua_yield
and lua_resume
are part of the lua_CFunction
provided by Lua.
On the other hand, if there are no I/O or sleep
operations in the Lua code, such as all intensive encryption and decryption operations, then the LuaJIT VM will be occupied by the Lua coroutine until the entire request is processed.
I've provided a snippet of source code for ngx.sleep
below to help you understand this more clearly. This code is located in ngx_http_lua_sleep.c
, which you can find in the src
directory of the lua-nginx-module
project.
In ngx_http_lua_sleep.c
, we can see the concrete implementation of the sleep
function. You must first register the Lua API ngx.sleep
with the C function ngx_http_lua_ngx_sleep
.
void ngx_http_lua_inject_sleep_api(lua_State *L)
{
lua_pushcfunction(L, ngx_http_lua_ngx_sleep);
lua_setfield(L, -2, "sleep");
}
The following is the main function of sleep
, and I have extracted only a few lines of the main code here.
static int ngx_http_lua_ngx_sleep(lua_State *L)
{
coctx->sleep.handler = ngx_http_lua_sleep_handler;
ngx_add_timer(&coctx->sleep, (ngx_msec_t) delay);
return lua_yield(L, 0);
}
As you can see:
- Here the callback function
ngx_http_lua_sleep_handler
is added first. - Then call
ngx_add_timer
, an interface provided by NGINX, to add a timer to NGINX's event loop. - Finally, use
lua_yield
to suspend the Lua concurrent, giving control to the NGINX event loop.
The ngx_http_lua_sleep_handler
callback function is triggered when the sleep operation is complete. It calls ngx_http_lua_sleep_resume
and eventually wakes up the Lua coroutine using lua_resume
. You can retrieve the details of the call yourself in the code so that I won't go into the details here.
ngx.sleep
is just the simplest example, but by dissecting it, you can see the basic principles of the lua-nginx-module
module.
Basic concepts
After analyzing the principles, let's refresh our memory and recall the two important concepts of stages and non-blocking in OpenResty.
OpenResty, like NGINX, has the concept of stages, and each stage has its own distinct role:
set_by_lua
, which is used to set variables.rewrite_by_lua
, for forwarding, redirecting, etc.access_by_lua
, for access, permissions, etc.content_by_lua
, for generating return content.header_filter_by_lua
, for response header filter processing.body_filter_by_lua
, for response body filteringlog_by_lua
, for logging.
Of course, if the logic of your code is not too complex, it is possible to execute it all in the rewrite
or content
phase.
However, note that OpenResty's APIs have phase usage limits. Each API has a list of phrases that it can use, and you will get an error if you use it out of scope. This is very different from other developed languages.
As an example, I'll use ngx.sleep
. From the documentation, I know that it can only be used in the following contexts and does not include the log
phase.
context: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*, ssl_certificate_by_lua*, ssl_session_fetch_by_lua*_
And if you don't know this, use sleep
in a log
phase that it doesn't support:
location / {
log_by_lua_block {
ngx.sleep(1)
}
}
In the NGINX error log, there is an error
level indication.
[error] 62666#0: *6 failed to run log_by_lua*: log_by_lua(nginx.conf:14):2: API disabled in the context of log_by_lua*
stack traceback:
[C]: in function 'sleep'
So, before you use the API, always remember to consult the documentation to determine if it can be used in the context of your code.
After reviewing the concept of phases, let's review non-blocking. First, let's clarify that all APIs provided by OpenResty are non-blocking.
I'll continue with the sleep 1-second requirement as an example. If you want to implement it in Lua, you must do this.
function sleep(s)
local ntime = os.time() + s
repeat until os.time() > ntime
end
Since standard Lua doesn't have a sleep
function, I use a loop here to keep determining whether the specified time has been reached. This implementation is blocking, and during the second that sleep
is running, Lua is doing nothing while other requests that need to be processed are just waiting around.
However, if we switch to ngx.sleep(1)
, according to the source code we analyzed above, OpenResty can still process other requests (like request B
) during this second. The context of the current request (let's call it request A
) will be saved and woken up by the NGINX event mechanism and then go back to request A
, so that the CPU is always in a natural working state.
Variables and life cycle
In addition to these two important concepts, the lifecycle of variables, is also an easy area of OpenResty development to get wrong.
As I said before, in OpenResty, I recommend that you declare all variables as local variables and use tools like luacheck
and lua-releng
to detect global variables. This is the same for modules, such as the following.
local ngx_re = require "ngx.re"
In OpenResty, except for the two phases init_by_lua
and init_worker_by_lua
, an isolated table of global variables is set for all phases to avoid contaminating other requests during processing. Even in these two phases where you can define global variables, you should try to avoid doing so.
As a rule, problems that are attempted to be solved with global variables should be better solved with variables in modules and will be much clearer. The following is an example of a variable in a module.
local _M = {}
_M.color = {
red = 1,
blue = 2,
green = 3
}
return _M
I defined a module in a file called hello.lua
, which contains the table color
, and then I added the following configuration to nginx.conf
.
location / {
content_by_lua_block {
local hello = require "hello"
ngx.say(hello.color.green)
}
}
This configuration will require the module in the content
phase and print the value of green
as the HTTP response body.
You may wonder why the module variable is so amazing?
The module will only be loaded once in the same Worker
process; after that, all requests handled by the Worker
will share the data in the module. We say that "global" data is suitable for encapsulating in modules because OpenResty's Worker
s are entirely isolated from each other, so each Worker
loads the module independently, and the module's data cannot cross Worker
s.
As for handling the data that needs to be shared between Worker
s, I'll leave that for a later chapter, so you don't have to dig into it here.
However, there's one thing that can go wrong here: when accessing module variables, you'd better keep them read-only and not try to modify them, or you'll get a race
in the case of high concurrency, a bug that can't be detected by unit testing, which occasionally occurs online and is hard to locate.
For example, the current value of the module variable green
is 3
, and you do a plus 1
operation in your code, so is the value of green
now 4
? Not necessarily; it could be 4
, 5
, or 6
because OpenResty does not lock when writing to a module variable. Then there is competition, and the value of the module variable is updated by multiple requests simultaneously.
Having said that about global, local, and module variables, let's discuss cross-stage variables.
There are situations where we need variables that span phases and can be read and written. Variables like $host
, $scheme
, etc., which are familiar to us in NGINX, can't be created dynamically even though they satisfy the cross-phase condition, and you have to define them in the configuration file before you can use them. For example, if you write something like the following.
location /foo {
set $my_var ; # need to create $my_var variable first
content_by_lua_block {
ngx.var.my_var = 123
}
}
OpenResty provides ngx.ctx
to solve this kind of problem. It is a Lua table that can be used to store request-based Lua data with the same lifetime as the current request. Let's look at this example from the official documentation.
location /test {
rewrite_by_lua_block {
ngx.ctx.foo = 76
}
access_by_lua_block {
ngx.ctx.foo = ngx.ctx.foo + 3
}
content_by_lua_block {
ngx.say(ngx.ctx.foo)
}
}
You can see that we have defined a variable foo
that is stored in ngx.ctx
. This variable spans the rewrite
, access
, and content
phases and finally prints the value in the content
phase, which is 79
as we expected.
Of course, ngx.ctx
has its limitations.
For example, child requests created with ngx.location.capture
will have their separate ngx.ctx
data, independent of the parent request's ngx.ctx
.
Then again, internal redirects created with ngx.exec
destroy the original request's ngx.ctx
and regenerate it with a blank ngx.ctx
.
Both of these limitations have detailed code examples in the official documentation, so you can check them yourself if you are interested.
Summary
Finally, I'll say a few more words. We are learning the principles of OpenResty and a few important concepts, but you don't need to memorize them. After all, they always make sense and come alive when combined with real-world requirements and code.
I wonder how you understand it? Welcome to leave a comment and me to discuss, but also welcome you to share this article with your colleagues and friends. We communicate together, together with progress.