Tips for 10x Performance Improvement in OpenResty: `Table` Data Structure
API7.ai
December 9, 2022
In OpenResty, in addition to the frequent performance problems with string
operations, table
operations are also a performance roadblock. In previous articles, we've covered table
-related functions sporadically but not specifically in terms of performance improvements. Today, I'll take you through the performance impact of table
operations.
There are two main reasons why developers know little about table
-related performance optimization than string
operations.
- The Lua used in OpenResty is a self-maintained LuaJIT branch, rather than the standard LuaJIT or standard Lua. most developers are unaware of the difference and tend to use the standard Lua
table
library to write OpenResty code - Whether in the standard LuaJIT or in the LuaJIT branch maintained by OpenResty itself, the documentation related to table operations is hidden deep and difficult for developers to find. And there is no sample code in the documentation, so developers need to look for examples in open-source projects.
These two points make learning OpenResty a relatively high cognitive barrier, leading to polarized results - veteran OpenResty developers can write very high-performance code, while those just starting will wonder if OpenResty's high performance is real. But after you learn this lesson, you can easily cross the perception barrier and achieve a 10x performance improvement.
Before going into detail about table
optimization, I would like to emphasize a simple principle of table
-related optimization.
Try to reuse tables and avoid unnecessary table creation.
We will introduce optimization in terms of table
creation, element insertion, emptying, and loop usage.
Pre-generated array
The first step is to create an array. In Lua, the way we create an array is simple.
local t = {}
The above line of code creates an empty array. You can also add the initialized data right when creating:
local color = {first = "red", "blue", third = "green", "yellow"}
However, the second writing method is more costly in terms of performance because it involves space allocation, resize
and rehash
of the array every time an array element is added and deleted.
So how should it be optimized? Space for time is a common optimization idea. Since the performance bottleneck here is the dynamic allocation of array space, then we can pre-generate an array of a specified size. Although this may waste some memory space, the multiple space allocation, resize
, and rehash
can be combined into one, which is much more efficient.
The table.new(narray, nhash)
function in LuaJIT was added.
This function pre-allocates the specified array and hash space size instead of growing itself when inserting elements. This is the connotation of its two parameters, narray
and nhash
.
Here's a simple example to see how to use it. Since this function is a LuaJIT extension, we need to require the following before we can use it.
local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
t[i] = i
end
Also, because the previous OpenResty did not fully bind LuaJIT and still supports standard Lua, some older code will do this for compatibility. If the function table.new
is not found, an empty function will be simulated to ensure the uniformity of the caller.
local ok, new_tab = pcall(require, "table.new")
if not ok then
new_tab = function (narr, nrec) return {} end
end
Manually calculating table
subscript
Once you have a table
object, the next step is to add elements to it. The most direct way to insert an element is to call the table.insert
function:
local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
table.insert(t, i)
end
Alternatively, get the length of the current array first and insert elements using indexes:
local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
t[#t + 1] = i
end
However, both of them need to calculate the array's length first and then add new elements. This operation time complexity is O(n)
. In the above code example, the for
loop will calculate the length of the array 100
times, so the performance is not good, and the larger the array is, the lower the performance will be.
Let's look at how the official lua-resty-redis
library solves this problem.
local function _gen_req(args)
local nargs = #args
local req = new_tab(nargs * 5 + 1, 0)
req[1] = "*" .. nargs .. "\r\n"
local nbits = 2
for i = 1, nargs do
local arg = args[i]
req[nbits] = "$"
req[nbits + 1] = #arg
req[nbits + 2] = "\r\n"
req[nbits + 3] = arg
req[nbits + 4] = "\r\n"
nbits = nbits + 5
end
return req
end
This function pre-generates the array req
, the size of which is determined by the function's input parameters, to ensure that as little space as possible is wasted.
It uses the nbits
variable to maintain the subscript of the req
manually, instead of using Lua's built-in table.insert
function and the #
operator to get the length.
You can see that in the for loop, some operations such as nbits + 1
are inserting elements directly as subscripts and keeping the subscript at the right value at the end with nbits = nbits + 5
.
The advantage of this is obvious, it omits the O(n)
operation of getting the size of the array and instead accesses it directly with the index, and the time complexity becomes O(1)
. The disadvantage is also obvious, it reduces the readability of the code, and the probability of error is greatly increased, so it is a double-edged sword.
Reusing a single table
Since the overhead of creating a table
is so high, we naturally want to reuse it as much as possible. However, there are conditions for reuse. We first need to clean up the original data in the table
to avoid impact on the next user.
This is where the table.clear
function comes in handy. From its name you can see what it does, it clears all the data in the array, but the length of the array does not change. That is if you use table.new(narray, nhash)
to generate an array of length 100, after clearing it, the length is still 100
.
To give you a better idea of its implementation, I have given a code example below which is compatible with standard Lua:
local ok, clear_tab = pcall(require, "table.clear")
if not ok then
clear_tab = function (tab)
for k, _ in pairs(tab) do
tab[k] = nil
end
end
end
As you can see, the clear
function sets each element to nil
.
Generally, we put this kind of circular table
on the top level of a module. This way, when you use the functions in the module, you can decide whether to use them directly or after clearing them, depending on your situation.
Let's look at a practical application example. The following pseudo-code is taken from the open-source microservices API gateway Apache APISIX, and this is its logic when loading the plugin.
local local_plugins = core.table.new(32, 0)
local function load(plugin_names, wasm_plugin_names)
local processed = {}
for _, name in ipairs(plugin_names) do
if processed[name] == nil then
processed[name] = true
end
end
for _, attrs in ipairs(wasm_plugin_names) do
if processed[attrs.name] == nil then
processed[attrs.name] = attrs
end
end
core.log.warn("new plugins: ", core.json.delay_encode(processed))
for name, plugin in pairs(local_plugins_hash) do
local ty = PLUGIN_TYPE_HTTP
if plugin.type == "wasm" then
ty = PLUGIN_TYPE_HTTP_WASM
end
unload_plugin(name, ty)
end
core.table.clear(local_plugins)
core.table.clear(local_plugins_hash)
for name, value in pairs(processed) do
local ty = PLUGIN_TYPE_HTTP
if type(value) == "table" then
ty = PLUGIN_TYPE_HTTP_WASM
name = value
end
load_plugin(name, local_plugins, ty)
end
-- sort by plugin's priority
if #local_plugins > 1 then
sort_tab(local_plugins, sort_plugin)
end
for i, plugin in ipairs(local_plugins) do
local_plugins_hash[plugin.name] = plugin
if enable_debug() then
core.log.warn("loaded plugin and sort by priority:",
" ", plugin.priority,
" name: ", plugin.name)
end
end
_M.load_times = _M.load_times + 1
core.log.info("load plugin times: ", _M.load_times)
return true
end
You can see that the array local_plugins
is the top-level variable for the plugin
module. At the start of the load function, the table
is cleared, and a new list of plugins is generated based on the current situation.
tablepool
By now, you've mastered the optimization of cycling through a single table. Then you can go one step further and use a cache pool to keep multiple tables for available access at any time, which is the function of the official lua-tablepool
.
The following code shows the basic usage of a table pool. We can fetch a table from a specified pool and release it back when we are done using it:
local tablepool = require "tablepool"
local tablepool_fetch = tablepool.fetch
local tablepool_release = tablepool.release
local pool_name = "some_tag"
local function do_sth()
local t = tablepool_fetch(pool_name, 10, 0)
-- -- using t for some purposes
tablepool_release(pool_name, t)
end
tablepool
uses several of the methods we introduced earlier, and it has less than a hundred lines of code, so I highly recommend searching and studying it yourself. Here, I will mainly introduce its two APIs.
The first is the fetch
method, which takes the same arguments as table.new
, but with one more pool_name
. fetch
will call table.new
to create a new array if there is no free array in the pool.
tablepool.fetch(pool_name, narr, nrec)
The second is release
, a function that puts the table back into the pool. Among its arguments, the last one, no_clear
, is used to configure whether to call table.clear
to clear the array.
tablepool.release(pool_name, tb, [no_clear])
See how all the methods we introduced earlier are now co-related? However, be careful not to abuse tablepool
for this reason. tablepool
is not used much in real projects. For example, it is not used by Kong, and APISIX has only a few calls related to it. In most cases, even without the encapsulation of this layer tablepool
, it is enough for us to use.
Summary
Performance optimization, a difficult area in OpenResty, is a hot spot. Today I introduced table
-related performance optimization tips. I hope they can help you with your actual project.
Finally, think of a question: Can you do a performance test yourself and compare the performance difference before and after using table
-related optimization techniques? Again, welcome to leave a comment and communicate with me as I'd love to know your practice and views. Welcome to share this article so that more people can participate in it together.