Tips for 10x Performance Improvement in OpenResty: Table Data Structure

December 9, 2022

OpenResty (NGINX + Lua)

In OpenResty, in addition to the frequent performance problems with string operations, table operations are also a performance roadblock. In previous articles, we've covered table-related functions sporadically but not specifically in terms of performance improvements. Today, I'll take you through the performance impact of table operations.

There are two main reasons why developers know little about table-related performance optimization than string operations.

  1. The Lua used in OpenResty is a self-maintained LuaJIT branch, rather than the standard LuaJIT or standard Lua. most developers are unaware of the difference and tend to use the standard Lua table library to write OpenResty code
  2. Whether in the standard LuaJIT or in the LuaJIT branch maintained by OpenResty itself, the documentation related to table operations is hidden deep and difficult for developers to find. And there is no sample code in the documentation, so developers need to look for examples in open-source projects.

These two points make learning OpenResty a relatively high cognitive barrier, leading to polarized results - veteran OpenResty developers can write very high-performance code, while those just starting will wonder if OpenResty's high performance is real. But after you learn this lesson, you can easily cross the perception barrier and achieve a 10x performance improvement.

Before going into detail about table optimization, I would like to emphasize a simple principle of table-related optimization.

Try to reuse tables and avoid unnecessary table creation.

We will introduce optimization in terms of table creation, element insertion, emptying, and loop usage.

Pre-generated array

The first step is to create an array. In Lua, the way we create an array is simple.

local t = {}

The above line of code creates an empty array. You can also add the initialized data right when creating:

local color = {first = "red", "blue", third = "green", "yellow"}

However, the second writing method is more costly in terms of performance because it involves space allocation, resize and rehash of the array every time an array element is added and deleted.

So how should it be optimized? Space for time is a common optimization idea. Since the performance bottleneck here is the dynamic allocation of array space, then we can pre-generate an array of a specified size. Although this may waste some memory space, the multiple space allocation, resize, and rehash can be combined into one, which is much more efficient.

The table.new(narray, nhash) function in LuaJIT was added.

This function pre-allocates the specified array and hash space size instead of growing itself when inserting elements. This is the connotation of its two parameters, narray and nhash.

Here's a simple example to see how to use it. Since this function is a LuaJIT extension, we need to require the following before we can use it.

local new_tab = require "table.new"
 local t = new_tab(100, 0)
 for i = 1, 100 do
   t[i] = i
 end

Also, because the previous OpenResty did not fully bind LuaJIT and still supports standard Lua, some older code will do this for compatibility. If the function table.new is not found, an empty function will be simulated to ensure the uniformity of the caller.

local ok, new_tab = pcall(require, "table.new")
  if not ok then
    new_tab = function (narr, nrec) return {} end
  end

Manually calculating table subscript

Once you have a table object, the next step is to add elements to it. The most direct way to insert an element is to call the table.insert function:

local new_tab = require "table.new"
 local t = new_tab(100, 0)
 for i = 1, 100 do
   table.insert(t, i)
 end

Alternatively, get the length of the current array first and insert elements using indexes:

local new_tab = require "table.new"
 local t = new_tab(100, 0)
 for i = 1, 100 do
   t[#t + 1] = i
 end

However, both of them need to calculate the array's length first and then add new elements. This operation time complexity is O(n). In the above code example, the for loop will calculate the length of the array 100 times, so the performance is not good, and the larger the array is, the lower the performance will be.

Let's look at how the official lua-resty-redis library solves this problem.

local function _gen_req(args)
    local nargs = #args

    local req = new_tab(nargs * 5 + 1, 0)
    req[1] = "*" .. nargs .. "\r\n"
    local nbits = 2

    for i = 1, nargs do
        local arg = args[i]
        req[nbits] = "$"
        req[nbits + 1] = #arg
        req[nbits + 2] = "\r\n"
        req[nbits + 3] = arg
        req[nbits + 4] = "\r\n"
        nbits = nbits + 5
    end
    return req
end

This function pre-generates the array req, the size of which is determined by the function's input parameters, to ensure that as little space as possible is wasted.

It uses the nbits variable to maintain the subscript of the req manually, instead of using Lua's built-in table.insert function and the # operator to get the length.

You can see that in the for loop, some operations such as nbits + 1 are inserting elements directly as subscripts and keeping the subscript at the right value at the end with nbits = nbits + 5.

The advantage of this is obvious, it omits the O(n) operation of getting the size of the array and instead accesses it directly with the index, and the time complexity becomes O(1). The disadvantage is also obvious, it reduces the readability of the code, and the probability of error is greatly increased, so it is a double-edged sword.

Reusing a single table

Since the overhead of creating a table is so high, we naturally want to reuse it as much as possible. However, there are conditions for reuse. We first need to clean up the original data in the table to avoid impact on the next user.

This is where the table.clear function comes in handy. From its name you can see what it does, it clears all the data in the array, but the length of the array does not change. That is if you use table.new(narray, nhash) to generate an array of length 100, after clearing it, the length is still 100.

To give you a better idea of its implementation, I have given a code example below which is compatible with standard Lua:

local ok, clear_tab = pcall(require, "table.clear")
  if not ok then
    clear_tab = function (tab)
      for k, _ in pairs(tab) do
        tab[k] = nil
      end
    end
  end

As you can see, the clear function sets each element to nil.

Generally, we put this kind of circular table on the top level of a module. This way, when you use the functions in the module, you can decide whether to use them directly or after clearing them, depending on your situation.

Let's look at a practical application example. The following pseudo-code is taken from the open-source microservices API gateway Apache APISIX, and this is its logic when loading the plugin.

local local_plugins = core.table.new(32, 0)
local function load(plugin_names, wasm_plugin_names)
    local processed = {}
    for _, name in ipairs(plugin_names) do
        if processed[name] == nil then
            processed[name] = true
        end
    end
    for _, attrs in ipairs(wasm_plugin_names) do
        if processed[attrs.name] == nil then
            processed[attrs.name] = attrs
        end
    end

    core.log.warn("new plugins: ", core.json.delay_encode(processed))

    for name, plugin in pairs(local_plugins_hash) do
        local ty = PLUGIN_TYPE_HTTP
        if plugin.type == "wasm" then
            ty = PLUGIN_TYPE_HTTP_WASM
        end
        unload_plugin(name, ty)
    end

    core.table.clear(local_plugins)
    core.table.clear(local_plugins_hash)

    for name, value in pairs(processed) do
        local ty = PLUGIN_TYPE_HTTP
        if type(value) == "table" then
            ty = PLUGIN_TYPE_HTTP_WASM
            name = value
        end
        load_plugin(name, local_plugins, ty)
    end

    -- sort by plugin's priority
    if #local_plugins > 1 then
        sort_tab(local_plugins, sort_plugin)
    end

    for i, plugin in ipairs(local_plugins) do
        local_plugins_hash[plugin.name] = plugin
        if enable_debug() then
            core.log.warn("loaded plugin and sort by priority:",
                          " ", plugin.priority,
                          " name: ", plugin.name)
        end
    end

    _M.load_times = _M.load_times + 1
    core.log.info("load plugin times: ", _M.load_times)
    return true
end

You can see that the array local_plugins is the top-level variable for the plugin module. At the start of the load function, the table is cleared, and a new list of plugins is generated based on the current situation.

tablepool

By now, you've mastered the optimization of cycling through a single table. Then you can go one step further and use a cache pool to keep multiple tables for available access at any time, which is the function of the official lua-tablepool.

The following code shows the basic usage of a table pool. We can fetch a table from a specified pool and release it back when we are done using it:

local tablepool = require "tablepool"
local tablepool_fetch = tablepool.fetch
local tablepool_release = tablepool.release


local pool_name = "some_tag"
local function do_sth()
     local t = tablepool_fetch(pool_name, 10, 0)
     -- -- using t for some purposes
    tablepool_release(pool_name, t)
end

tablepool uses several of the methods we introduced earlier, and it has less than a hundred lines of code, so I highly recommend searching and studying it yourself. Here, I will mainly introduce its two APIs.

The first is the fetch method, which takes the same arguments as table.new, but with one more pool_name. fetch will call table.new to create a new array if there is no free array in the pool.

tablepool.fetch(pool_name, narr, nrec)

The second is release, a function that puts the table back into the pool. Among its arguments, the last one, no_clear, is used to configure whether to call table.clear to clear the array.

tablepool.release(pool_name, tb, [no_clear])

See how all the methods we introduced earlier are now co-related? However, be careful not to abuse tablepool for this reason. tablepool is not used much in real projects. For example, it is not used by Kong, and APISIX has only a few calls related to it. In most cases, even without the encapsulation of this layer tablepool, it is enough for us to use.

Summary

Performance optimization, a difficult area in OpenResty, is a hot spot. Today I introduced table-related performance optimization tips. I hope they can help you with your actual project.

Finally, think of a question: Can you do a performance test yourself and compare the performance difference before and after using table-related optimization techniques? Again, welcome to leave a comment and communicate with me as I'd love to know your practice and views. Welcome to share this article so that more people can participate in it together.