Knowledge of NGINX Used in OpenResty

Through the previous post, you have general knowledge of OpenResty. In the following few articles, I'll take you through the two cornerstones of OpenResty: NGINX and LuaJIT, and you can learn OpenResty better by mastering these basics.

Today I will start with NGINX, and here I will only introduce some NGINX basics that may be used in OpenResty, which is only a tiny subset of NGINX.

Regarding configuration, in OpenResty development, we need to pay attention to the following points.

To configure nginx.conf as little as possible.
Avoid using the combination of multiple instructions such as if, set, rewrite, etc.
Don't use NGINX configuration, variables, and modules when you can solve it with Lua code.

These methods will maximize readability, maintainability, and extensibility. The following NGINX configuration is a typical bad example of using configuration as code.

location ~ ^/mobile/(web/app.htm) {
    set $type $1;
    set $orig_args $args;
    if ( $http_user_Agent ~ "(iPhone|iPad|Android)" ) {
        rewrite  ^/mobile/(.*) http://touch.foo.com/mobile/$1 last;
    }
    proxy_pass http://foo.com/$type?$orig_args;
}

This is what we need to avoid when developing with OpenResty.

NGINX Configuration

NGINX controls its behavior through configuration files, which can be thought of as a simple DSL. NGINX reads the configuration when the process starts and loads it into memory. If you modify the configuration file, you will need to restart or reload NGINX and wait until NGINX reads the configuration file again for the new configuration to take effect. Only the commercial version of NGINX provides some of this dynamic capability at runtime, in the form of APIs.

Let's start with the following configuration, which is very simple.

worker_processes auto;

pid logs/nginx.pid;
error_log logs/error.log notice;

worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
}

http {
    server {
        listen 80;
        listen 443 ssl;

        location / {
            proxy_pass https://foo.com;
        }
    }
}

stream {
    server {
        listen 53 udp;
    }
}

However, even these simple configurations involve some fundamental concepts.

First, each directive has its context, which is its scope in the NGINX configuration file.

The top level is main, which contains some instructions that have nothing to do with the specific business, such as worker_processes, pid, and error_log, which are all part of the main context. In addition, there is a hierarchical relationship between contexts. For example, the context of location is server, the context of server is http, and the context of http is main.

Directives cannot be run in the wrong context. NGINX will check if nginx.conf is legal when it starts up. For example, if we change listen 80; from the server context to the main context and start the NGINX service, we will see an error like this:

"listen" directive is not allowed here ......

Second, NGINX can handle not only HTTP requests and HTTPS traffic but also UDP and TCP traffic. The L7 is in HTTP and the L4 is in the Stream. In OpenResty, lua-nginx-module and stream-lua-nginx-module correspond to these two, respectively.

One thing to note here is that OpenResty does not support all features in NGINX, and you need to look at the version of OpenResty. OpenResty's version is consistent with NGINX, making it easy to identify.

The configuration directives involved in nginx.conf above are in the NGINX core modules ngx_core_module, ngx_http_core_module, and ngx_stream_core_module, which you can click to see the specific documentation.

MASTER-WORKER mode

After understanding the configuration file, let's look at NGINX's multi-process mode (as shown in the figure below). As you can see, when NGINX starts, there will be a Master process and multiple Worker processes (or just one Worker process, depending on how you configure it).

NGINX Worker Mode

First of all, the Master process, as its name suggests, plays the role of "manager" and is not responsible for handling requests from clients. It manages the Worker process, including receiving signals from the administrator and monitoring the Workers' status. When a Worker process exits abnormally, the Master process will restart a new Worker process.

Worker processes are the "real working employees" that handle requests from clients. They are forked from the Master process and are independent of each other. This multi-process model is much more advanced than Apache's multi-threaded model, with no cross-thread locking and easy debugging. Even if a process crashes and quits, it usually does not affect the other workers' process work.

OpenResty adds a unique privileged agent to the NGINX Master-Worker model. This process does not listen to any ports and has the same privileges as the NGINX Master process, so it can do some tasks that require high privileges, such as some write operations to local disk files.

If the privileged process works with the NGINX binary hot upgrade mechanism, OpenResty can implement the entire self-upgrade binary on the fly without relying on external programs.

Reducing the dependency on external programs and trying to solve problems within the OpenResty process facilitates deployment, reduces operation and maintenance costs, and reduces the probability of program errors. The privileged process and ngx.pipe in OpenResty are all for this purpose.

Execution Phase

Execution phases are also an essential feature of NGINX and are closely related to the specific implementation of OpenResty. NGINX has 11 execution phases, which we can see in the source code of ngx_http_core_module.h:

typedef enum {
    NGX_HTTP_POST_READ_PHASE = 0,

    NGX_HTTP_SERVER_REWRITE_PHASE,

    NGX_HTTP_FIND_CONFIG_PHASE,
    NGX_HTTP_REWRITE_PHASE,
    NGX_HTTP_POST_REWRITE_PHASE,

    NGX_HTTP_PREACCESS_PHASE,

    NGX_HTTP_ACCESS_PHASE,
    NGX_HTTP_POST_ACCESS_PHASE,

    NGX_HTTP_PRECONTENT_PHASE,

    NGX_HTTP_CONTENT_PHASE,

    NGX_HTTP_LOG_PHASE
} ngx_http_phases;

If you want to learn more about the role of these 11 phases, you can read the NGINX documentation so that I won't go into it here.

Coincidentally, OpenResty also has 11 *_by_lua directives related to the NGINX phase, as shown in the figure below (from the lua-nginx-module documentation).

Order of Lua NGINX Module Directives

init_by_lua is executed only when the Master process is created, and init_worker_by_lua is executed only when each Worker process is created. The other *_by_lua commands are triggered by client requests and are executed repeatedly.

So during the init_by_lua phase, we can preload Lua modules and public read-only data to take advantage of the OS's COW (copy on write) feature to save memory.

Most of the operations can be done inside content_by_lua, but I would recommend splitting it up according to different functions, like the following.

set_by_lua: setting variables.
rewrite_by_lua: forwarding, redirection, etc.
access_by_lua: access, permissions, etc.
content_by_lua: generating return content.
header_filter_by_lua: response header filtering processing.
body_filter_by_lua: response body filtering processing.
log_by_lua: logging.

Let me give you an example to show the benefits of splitting it up this way. Let's assume that many plaintext APIs are provided externally, and now we need to add custom encryption and decryption logic. So, do we need to change the code of all APIs?

location /mixed {
    content_by_lua '...';
}

Of course not. Using the phase feature, we can decrypt in the access phase and encrypt in the body filter phase without making any changes to the code in the original content phase.

location /mixed {
    access_by_lua '...';
    content_by_lua '...';
    body_filter_by_lua '...';
}

Upgrade NGINX Binary On The Fly

Finally, let me briefly explain the upgrade NGINX binary on the fly. We know that after you modify the NGINX configuration file, you need to reload it to make it work. But when NGINX upgrades itself, it can do it on the fly. This may seem like putting the cart before the horse, but it's understandable given that NGINX started with traditional static load balancing, reverse proxying, and file caching.

The hot upgrade is done by sending USR2 and WINCH semaphores to the old Master process. For these two steps, the former starts the new Master process; the latter shuts down the Worker process gradually.

After these two steps, the new Master and the new Worker are started. At this point, the old Master does not quit. The reason for not quitting is simple: if you need to fall back, you can still send HUP semaphores to the old Master. Of course, if you have determined that you do not need to fall back, you can send a KILL semaphore to the old Master to quit.

That's it, and the upgrade NGINX binary on the fly is done.

If you want to know more detailed information about this, you can check the official documentation to continue learning.

Summary

In general, what you use in OpenResty are the basics of NGINX, mainly related to configuration, master-slave processes, execution phases, etc. The other things that can be solved with Lua code are solved with code as much as possible, rather than using NGINX modules and configurations, which is a change in thinking when learning OpenResty.

Finally, I've left you with an open question: Nginx officially supports NJS, which means you can write JS to control some of the NGINX logic, similar to OpenResty. What do you think about this? Welcome to share this article.