What makes OpenResty so special

API7.ai

October 14, 2022

OpenResty (NGINX + Lua)

In previous articles, you've learned about the two cornerstones of OpenResty: NGINX and LuaJIT, and I'm sure you're ready to start learning about the APIs OpenResty provides.

But don't be too hasty. Before you do, you need to spend a little more time familiarizing yourself with the principles and basic concepts of OpenResty.

Principles

Diagram1

OpenResty's Master and Worker processes both contain a LuaJIT VM, which is shared by all coroutines within the same process, and in which Lua code is run.

And at the same point in time, each Worker process can only handle requests from one user, which means that only one coroutine is running. You may have a question: Since NGINX can support C10K (tens of thousands of concurrency), doesn't it need to handle 10,000 requests simultaneously?

Of course not. NGINX uses epoll to drive events to reduce waiting and idling so that as many CPU resources as possible can be used to process user requests. After all, the whole thing achieves high performance only when individual requests are processed quickly enough. If a multi-threaded mode is used so that one request corresponds to one thread, then with C10K, resources can easily be exhausted.

At the OpenResty level, Lua's coroutines work in conjunction with NGINX's event mechanism. If an I/O operation like querying a MySQL database occurs in Lua code, it will first call the Lua coroutine's yield to hang itself and then register a callback in NGINX; after the I/O operation completes (which could also be a timeout or error), the NGINX callback resume will wake up the Lua coroutine. This completes the cooperation between the Lua concurrent and NGINX event drivers, avoiding writing callbacks in the Lua code.

We can look at the following diagram, which describes the entire process. Both lua_yield and lua_resume are part of the lua_CFunction provided by Lua.

Diagram2

On the other hand, if there are no I/O or sleep operations in the Lua code, such as all intensive encryption and decryption operations, then the LuaJIT VM will be occupied by the Lua coroutine until the entire request is processed.

I've provided a snippet of source code for ngx.sleep below to help you understand this more clearly. This code is located in ngx_http_lua_sleep.c, which you can find in the src directory of the lua-nginx-module project.

In ngx_http_lua_sleep.c, we can see the concrete implementation of the sleep function. You must first register the Lua API ngx.sleep with the C function ngx_http_lua_ngx_sleep.

void ngx_http_lua_inject_sleep_api(lua_State *L)
{
     lua_pushcfunction(L, ngx_http_lua_ngx_sleep);
     lua_setfield(L, -2, "sleep");
}

The following is the main function of sleep, and I have extracted only a few lines of the main code here.

static int ngx_http_lua_ngx_sleep(lua_State *L)
{
    coctx->sleep.handler = ngx_http_lua_sleep_handler;
    ngx_add_timer(&coctx->sleep, (ngx_msec_t) delay);
    return lua_yield(L, 0);
}

As you can see:

  • Here the callback function ngx_http_lua_sleep_handler is added first.
  • Then call ngx_add_timer, an interface provided by NGINX, to add a timer to NGINX's event loop.
  • Finally, use lua_yield to suspend the Lua concurrent, giving control to the NGINX event loop.

The ngx_http_lua_sleep_handler callback function is triggered when the sleep operation is complete. It calls ngx_http_lua_sleep_resume and eventually wakes up the Lua coroutine using lua_resume. You can retrieve the details of the call yourself in the code so that I won't go into the details here.

ngx.sleep is just the simplest example, but by dissecting it, you can see the basic principles of the lua-nginx-module module.

Basic concepts

After analyzing the principles, let's refresh our memory and recall the two important concepts of stages and non-blocking in OpenResty.

OpenResty, like NGINX, has the concept of stages, and each stage has its own distinct role:

  • set_by_lua, which is used to set variables.
  • rewrite_by_lua, for forwarding, redirecting, etc.
  • access_by_lua, for access, permissions, etc.
  • content_by_lua, for generating return content.
  • header_filter_by_lua, for response header filter processing.
  • body_filter_by_lua, for response body filtering
  • log_by_lua, for logging.

Of course, if the logic of your code is not too complex, it is possible to execute it all in the rewrite or content phase.

However, note that OpenResty's APIs have phase usage limits. Each API has a list of phrases that it can use, and you will get an error if you use it out of scope. This is very different from other developed languages.

As an example, I'll use ngx.sleep. From the documentation, I know that it can only be used in the following contexts and does not include the log phase.

context: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*, ssl_certificate_by_lua*, ssl_session_fetch_by_lua*_

And if you don't know this, use sleep in a log phase that it doesn't support:

location / {
    log_by_lua_block {
        ngx.sleep(1)
     }
}

In the NGINX error log, there is an error level indication.

[error] 62666#0: *6 failed to run log_by_lua*: log_by_lua(nginx.conf:14):2: API disabled in the context of log_by_lua*
stack traceback:
    [C]: in function 'sleep'

So, before you use the API, always remember to consult the documentation to determine if it can be used in the context of your code.

After reviewing the concept of phases, let's review non-blocking. First, let's clarify that all APIs provided by OpenResty are non-blocking.

I'll continue with the sleep 1-second requirement as an example. If you want to implement it in Lua, you must do this.

function sleep(s)
   local ntime = os.time() + s
   repeat until os.time() > ntime
end

Since standard Lua doesn't have a sleep function, I use a loop here to keep determining whether the specified time has been reached. This implementation is blocking, and during the second that sleep is running, Lua is doing nothing while other requests that need to be processed are just waiting around.

However, if we switch to ngx.sleep(1), according to the source code we analyzed above, OpenResty can still process other requests (like request B) during this second. The context of the current request (let's call it request A) will be saved and woken up by the NGINX event mechanism and then go back to request A, so that the CPU is always in a natural working state.

Variables and life cycle

In addition to these two important concepts, the lifecycle of variables, is also an easy area of OpenResty development to get wrong.

As I said before, in OpenResty, I recommend that you declare all variables as local variables and use tools like luacheck and lua-releng to detect global variables. This is the same for modules, such as the following.

local ngx_re = require "ngx.re"

In OpenResty, except for the two phases init_by_lua and init_worker_by_lua, an isolated table of global variables is set for all phases to avoid contaminating other requests during processing. Even in these two phases where you can define global variables, you should try to avoid doing so.

As a rule, problems that are attempted to be solved with global variables should be better solved with variables in modules and will be much clearer. The following is an example of a variable in a module.

local _M = {}

_M.color = {
      red = 1,
      blue = 2,
      green = 3
  }

  return _M

I defined a module in a file called hello.lua, which contains the table color, and then I added the following configuration to nginx.conf.

location / {
    content_by_lua_block {
        local hello = require "hello"
        ngx.say(hello.color.green)
     }
}

This configuration will require the module in the content phase and print the value of green as the HTTP response body.

You may wonder why the module variable is so amazing?

The module will only be loaded once in the same Worker process; after that, all requests handled by the Worker will share the data in the module. We say that "global" data is suitable for encapsulating in modules because OpenResty's Worker s are entirely isolated from each other, so each Worker loads the module independently, and the module's data cannot cross Workers.

As for handling the data that needs to be shared between Workers, I'll leave that for a later chapter, so you don't have to dig into it here.

However, there's one thing that can go wrong here: when accessing module variables, you'd better keep them read-only and not try to modify them, or you'll get a race in the case of high concurrency, a bug that can't be detected by unit testing, which occasionally occurs online and is hard to locate.

For example, the current value of the module variable green is 3, and you do a plus 1 operation in your code, so is the value of green now 4? Not necessarily; it could be 4, 5, or 6 because OpenResty does not lock when writing to a module variable. Then there is competition, and the value of the module variable is updated by multiple requests simultaneously.

Having said that about global, local, and module variables, let's discuss cross-stage variables.

There are situations where we need variables that span phases and can be read and written. Variables like $host, $scheme, etc., which are familiar to us in NGINX, can't be created dynamically even though they satisfy the cross-phase condition, and you have to define them in the configuration file before you can use them. For example, if you write something like the following.

location /foo {
      set $my_var ; # need to create $my_var variable first
      content_by_lua_block {
          ngx.var.my_var = 123
      }
  }

OpenResty provides ngx.ctx to solve this kind of problem. It is a Lua table that can be used to store request-based Lua data with the same lifetime as the current request. Let's look at this example from the official documentation.

location /test {
      rewrite_by_lua_block {
          ngx.ctx.foo = 76
      }
      access_by_lua_block {
          ngx.ctx.foo = ngx.ctx.foo + 3
      }
      content_by_lua_block {
          ngx.say(ngx.ctx.foo)
      }
  }

You can see that we have defined a variable foo that is stored in ngx.ctx. This variable spans the rewrite, access, and content phases and finally prints the value in the content phase, which is 79 as we expected.

Of course, ngx.ctx has its limitations.

For example, child requests created with ngx.location.capture will have their separate ngx.ctx data, independent of the parent request's ngx.ctx.

Then again, internal redirects created with ngx.exec destroy the original request's ngx.ctx and regenerate it with a blank ngx.ctx.

Both of these limitations have detailed code examples in the official documentation, so you can check them yourself if you are interested.

Summary

Finally, I'll say a few more words. We are learning the principles of OpenResty and a few important concepts, but you don't need to memorize them. After all, they always make sense and come alive when combined with real-world requirements and code.

I wonder how you understand it? Welcome to leave a comment and me to discuss, but also welcome you to share this article with your colleagues and friends. We communicate together, together with progress.