Why Does lua-resty-core Perform Better?

API7.ai

September 30, 2022

OpenResty (NGINX + Lua)

As we said in the previous two lessons, Lua is an embedded development language that keeps the core short and compact. You can embed Lua in Redis and NGINX to help you flexibly do business logic. Lua also allows you to call existing C functions and data structures to avoid repeating the wheel.

In Lua, you can use the Lua C API to call C functions, and in LuaJIT, you can use FFI. For OpenResty.

  • In the core lua-nginx-module, the API for calling C functions is done using the Lua C API.
  • In lua-resty-core, some of the APIs already in the lua-nginx-module are implemented using the FFI model.

You are probably wondering why we need to implement it with FFI?

Don't worry. Let's take ngx.base64_decode, a straightforward API, as an example and see how the Lua C API differs from the FFI implementation. You can also have an intuitive understanding of their performance.

Lua CFunction

Let's take a look at how this is implemented in the lua-nginx-module using the Lua C API. We search for decode_base64 in the project's code and find its implementation in ngx_http_lua_string.c.

lua_pushcfunction(L, ngx_http_lua_ngx_decode_base64);
lua_setfield(L, -2, "decode_base64");

The above code is annoying to look at, but luckily, we don't have to dig into the two functions starting with lua_ and the specific role of their arguments; we just need to know one thing - there is a CFunction registered here: ngx_http_lua_ngx_decode_base64, and it corresponds to the ngx.base64_decode, which corresponds to the API exposed to the public.

Let's go ahead and "follow the map" and search for ngx_http_lua_ngx_decode_base64 in this C file, which is defined at the beginning of the file at:

static int ngx_http_lua_ngx_decode_base64(lua_State *L);

For those C functions that can be called Lua, its interface must follow the form required by Lua, which is typedef int (*lua_CFunction)(lua_State* L). It contains a pointer L of type lua_State as an argument; its return value type is an integer that indicates the number of values returned, not the return value itself.

It is implemented as follows (here, I have removed the error handling code).

static int
 ngx_http_lua_ngx_decode_base64(lua_State *L)
 {
     ngx_str_t p, src;

    src.data = (u_char *) luaL_checklstring(L, 1, &src.len);

     p.len = ngx_base64_decoded_length(src.len);

     p.data = lua_newuserdata(L, p.len);

     if (ngx_decode_base64(&p, &src) == NGX_OK) {
         lua_pushlstring(L, (char *) p.data, p.len);

     } else {
         lua_pushnil(L);
     }

     return 1;
 }

The main thing in this code is ngx_base64_decoded_length, and ngx_decode_base64, both of which are C functions provided by NGINX.

We know that functions written in C cannot pass the return value to Lua code but need to pass the call parameters and return a value between Lua and C through the stack. This is why there is a lot of code that we can't understand at first glance. Also, this code cannot be tracked by JIT, so for LuaJIT, these operations are in a black box and cannot be optimized.

LuaJIT FFI

Unlike FFI, the interactive part of FFI is implemented in Lua, which can be tracked by JIT and optimized; of course, the code is also more concise and easier to understand.

Let's take the example of base64_decode, whose FFI implementation is spread over two repositories: lua-resty-core and lua-nginx-module, and let's look at the code implemented in the former.

ngx.decode_base64 = function (s)
     local slen = #s
     local dlen = base64_decoded_length(slen)

     local dst = get_string_buf(dlen)
     local pdlen = get_size_ptr()
     local ok = C.ngx_http_lua_ffi_decode_base64(s, slen, dst, pdlen)
     if ok == 0 then
         return nil
     end
     return ffi_string(dst, pdlen[0])
 end

You will find that compared to CFunction, the code of FFI implementation is much fresher, its specific implementation is ngx_http_lua_ffi_decode_base64 in the lua-nginx-module repository. If you are interested here, you can check the performance of this function yourself. It is straightforward, I will not post the code here.

However, if you are careful, did you find some function naming rules in the above code snippet?

Yes, all functions in OpenResty have naming conventions, and you can infer their usage by naming them. For example:

  • ngx_http_lua_ffi_, the Lua function that uses FFI to handle NGINX HTTP requests.
  • ngx_http_lua_ngx_, a Lua function that uses C function to handle NGINX HTTP requests.
  • The other functions starting with ngx and lua are built-in functions for NGINX and Lua respectively.

Further, the C code in OpenResty has a strict code specification, and I recommend reading the official C code style guide here. This is a must-have document for developers who want to learn OpenResty's C code and submit PRs. Otherwise, even if your PR is well written, you will be repeatedly commented on and asked to change it because of code style issues.

For more API and details about FFI, we recommend you read the official LuaJIT tutorials and documentation. Technical columns are not a substitute for official documentation; I can only help you point out the path of learning in a limited time, with fewer detours; Difficult problems still need to be solved by you.

LuaJIT FFI GC

When using FFI, we may be confused: who will manage the memory requested in FFI? Should we release it manually in C, or should LuaJIT reclaim it automatically?

Here is a simple principle: LuaJIT is only responsible for the resources allocated by itself; ffi.

For example, if you request a memory block using ffi.C.malloc, you will need to free it with the paired ffi.C.free. The official LuaJIT documentation has an example of the equivalent.

local p = ffi.gc(ffi.C.malloc(n), ffi.C.free)
 ...
 p = nil -- Last reference to p is gone.
 -- GC will eventually run finalizer: ffi.C.free(p)

In this code, the ffi.C.malloc(n) requests a memory section, and ffi.gc registers a destruct callback function ffi.C.free,ffi.C.free will then be called automatically when a cdata p is GC'd by LuaJIT to free the C-level memory. And cdata is GC'd by LuaJIT. LuaJIT will automatically free p in the above code.

Note that if you want to request a large chunk of memory in OpenResty, I recommend using ffi.C.malloc instead of ffi.new. The reasons are also apparent.

  1. ffi.new returns cdata, which is part of the memory managed by LuaJIT.
  2. LuaJIT GC has an upper limit of memory management, and LuaJIT in OpenResty does not have the GC64 option enabled. Hence the upper limit of memory for a single worker is only 2G. Once the upper limit of LuaJIT memory management is exceeded, it will cause an error.

When using FFI, we also need to pay special attention to memory leaks. However, everyone makes mistakes, and as long as humans write the code, there are always bugs.

This is where OpenResty's robust surrounding testing and debugging toolchain comes in handy.

Let's talk about testing first. In the OpenResty system, we use Valgrind to detect memory leaks.

The test framework we mentioned in the previous course, test::nginx, has a special memory leak detection mode to run unit test case sets; you need to set the environment variable TEST_NGINX_USE_VALGRIND=1. The official OpenResty project will be fully registered in this mode before releasing the version, and we will go into more details in the testing section later. We will go into more detail in the test section later.

OpenResty's CLI resty also has the --valgrind option, which allows you to run a Lua code alone, even if you haven't written a test case.

Let's look at the debugging tools.

OpenResty provides systemtap-based extensions to perform live dynamic analysis of OpenResty programs. You can search for the keyword gc in the toolset of this project, and you will see two tools, lj-gc and lj-gc-objs.

For offline analysis like core dump, OpenResty provides a GDB toolset, and you can also search for gc in it and find the three tools lgc, lgcstat and lgcpath.

The specific usage of these debugging tools will be covered in detail in the debugging section later so you can get an impression first. After all, OpenResty has a dedicated set of tools to help you locate and solve these problems.

lua-resty-core

From the above comparison, we can see that the FFI approach is not only cleaner in code, but also can be optimized by LuaJIT, which is the better choice. OpenResty has deprecated the CFunction implementation, and the performance has been removed from the codebase. The new APIs are now implemented in the lua-resty-core repository through FFI.

Before OpenResty's 1.15.8.1 was released in May 2019, lua-resty-core was not enabled by default, which resulted in performance losses and potential bugs, so I strongly recommend that anyone still use the historical version manually to enable lua-resty-core. You only need to add one line of code to the init_by_lua phase.

require "resty.core"

Of course, the lua_load_resty_core directive has been added in the belated 1.15.8.1 release, and lua-resty-core is enabled by default.

I personally feel that OpenResty is still too cautious about enabling lua-resty-core, and open-source projects should set similar features to be enabled by default as soon as possible.

lua-resty-core not only re-implements some of the APIs from the lua-nginx-module project, such as ngx.re.match, ngx.md5, etc., but also implements several new APIs, such as ngx.ssl, ngx.base64, ngx.errlog, ngx.process, ngx.re.process, and ngx.ngx.md5. ngx.re.split, ngx.resp.add_header, ngx.balancer, ngx.semaphore, etc. which we will cover later in the OpenResty API chapter.

Summary

Having said all this, I'd like to conclude that FFI, while good, is not a performance silver bullet. The main reason why it is efficient is that it can be tracked and optimized by JIT. If you write Lua code that can't be JIT'd and needs to be executed in interpreted mode, then FFI will be less efficient.

So what operations can be JIT and what can't? How can we avoid writing code that can't be JIT? I'll reveal this in the next section.

Finally, a hands-on homework problem: Can you find one or two APIs in both lua-nginx-module and lua-resty-core, and then compare the differences in performance tests? You can see how significant the performance improvement of FFI is.

Welcome to leave a comment, and I will share your thoughts and gains and welcome you to share this article with your colleagues and friends, together with the exchange and progress.