What is table and metatable in Lua?
API7.ai
October 11, 2022
Today we'll learn about the only data structure in LuaJIT: table
.
Unlike other scripting languages with rich data structures, LuaJIT has only one data structure, table
, which is not distinguished from arrays, hashes, collections, etc., but is somewhat mixed. Let's review one of the examples mentioned before.
local color = {first = "red", "blue", third = "green", "yellow"}
print(color["first"]) --> output: red
print(color[1]) --> output: blue
print(color["third"]) --> output: green
print(color[2]) --> output: yellow
print(color[3]) --> output: nil
In this example, the table color
contains an array and a hash and can be accessed without interfering with each other. For example, you can use the ipairs
function to iterate through only the array part of the table.
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
for k, v in ipairs(color) do
print(k)
end
'
The table
operations are so crucial that LuaJIT extends the standard Lua 5.1 table library, and OpenResty extends LuaJIT's table library even further. Let's take a look at each of these library functions.
The table library functions
Let's start with the standard table library functions. Lua 5.1 doesn't have many table library functions, so we can skim through them.
table.getn
Get the number of elements
As we mentioned in the Standard Lua and LuaJIT chapter, getting the correct number of all the table elements is a big problem in LuaJIT.
For sequences, you can use table.getn
or the unary operator #
to return the correct number of elements. The following example returns the number of 3 we would expect.
$ resty -e 'local t = { 1, 2, 3 }
print(table.getn(t))
The correct value cannot be returned for tables that are not sequential. In the second example, the value returned is 1.
$ resty -e 'local t = { 1, a = 2 }
print(#t) '
Fortunately, such difficult-to-understand functions have been replaced by extensions to LuaJIT, which we will mention later. So in the OpenResty context, do not use the function table.getn
and the unary operator #
unless you know explicitly that you are getting the sequence length.
Also, table.getn
and the unary operator #
are not O(1) time complexes but O(n), which is another reason to avoid them if possible.
table.remove
Removes the specified element
The second one is the table.remove
function, which removes elements in the table based on subscripts, i.e., only the elements in the array part of the table can be removed. Let's look at the color
example again.
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
table.remove(color, 1)
for k, v in pairs(color) do
print(v)
end'
This code will remove the blue
with subscript 1. You may ask, how do I delete the hash part of the table? It's as simple as setting the value corresponding to the key to nil
. Thus, in the color
example, the green
corresponding to third
is deleted.
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
color.third = nil
for k, v in pairs(color) do
print(v)
end'
table.concat
Element splicing function
The third one is the table.concat
element splicing function. It splices together the elements of the table according to subscripts. Since this is again based on subtitles, it is still for the array part of the table. Again with the color
example.
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
print(table.concat(color, ", "))'
After using the table.concat
function, it outputs blue, yellow
and the hash part is skipped.
In addition, this function can also specify the starting position of the subscript to do the concatenation; for example, it is written like the following
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow", "orange"}
print(table.concat(color, ", ", 2, 3))'
This time the output is yellow, orange
, skipping blue
.
Please don't underestimate this seemingly useless function, but it can have unexpected effects when optimizing performance and is one of the main characters in our later performance optimization chapters.
table.insert
Inserts an element
Finally, let's look at the table.insert
function. It inserts a new element in the specified subscript, which affects the array part of the table. To illustrate, again, using the color
example.
$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
table.insert(color, 1, "orange")
print(color[1])
'
You can see that the first element of color becomes orange, but of course, you can leave the subtitle unspecified so that it will be inserted at the end of the queue by default.
I should note that table.insert
is a pervasive operation, but the performance is not good. If you are not inserting elements based on the specified script, then you will need to call LuaJIT's lj_tab_len
each time to get the array length to insert at the end of the queue. As table.getn
, the time complexity of getting the table length is O(n).
So, for the table.insert
operation; we should try to avoid using it in hot code. For example:
local t = {}
for i = 1, 10000 do
table.insert(t, i)
end
LuaJIT's table extension function
Next, let's look at LuaJIT's table extension functions. LuaJIT extends the standard Lua with two beneficial table functions for creating and emptying a table, which I'll describe below.
table.new(narray, nhash)
Create a new table
The first one is the table.new(narray, nhash)
function. Instead of growing itself when inserting elements, this function will pre-allocate the space size of the specified array and hash, which is what its two parameters narray
and nhash
mean. Self-growth is a costly operation that involves space allocation, resize
and rehash
, and should be avoided at all costs.
Note here that the documentation for table.new
is not on the LuaJIT website but is deep in the GitHub project's extended documentation, so it's hard to find it even if you Google it, so not many engineers know about it.
Here's a simple example, and I'll show you how it works. First of all, this function is extended, so before you can use it, you need to require
it.
local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
t[i] = i
end
As you can see, this code creates a new table with 100 array elements and 0 hash elements. Of course, you can create a new table with 100 array elements and 50 hash elements as needed, which is legal.
local t = new_tab(100, 50)
Alternatively, if you go beyond the preset space size, you can still use it usually, but the performance will degrade, and the point of using table.new
will be lost.
In the following example, we have a preset size of 100, but we are using 200.
local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 200 do
t[i] = i
end
You need to preset the size of the array and hash space in table.new
according to the actual scenario so you can find a balance between performance and memory usage.
table.clear()
Clears the table
The second one is the clear function table.clear()
. It clears all the data in a table but does not free the memory occupied by the array and hash parts. Therefore, it is beneficial when recycling Lua tables to avoid the overhead of repeatedly creating and destroying tables.
$ resty -e 'local clear_tab =require "table.clear"
local color = {first = "red", "blue", third = "green", "yellow"}
clear_tab(color)
for k, v in pairs(color) do
print(k)
end'
However, there are not many scenarios where this function can be used, and in most cases, we should leave this task to the LuaJIT GC.
OpenResty's table extension function
As I mentioned at the beginning, OpenResty maintains its own LuaJIT branch, which also extends table, with several new APIs: table.isempty,
table. isarray
, table.nkeys
and table.clone
.
Before using these new APIs, please check the version of OpenResty, as most of these APIs can only be used in versions of OpenResty after 1.15.8.1. This is because OpenResty has not had a new release for about a year before version 1.15.8.1, and these APIs were added in that release interval.
I've included a link to the article, so I'll use table.nkeys
as an example. The other three APIs are straightforward to understand from a naming perspective, so look through the GitHub documentation, and you'll understand. I have to say that OpenResty's documentation is very high quality, including code examples, whether it can be JIT, what to look for, etc. Several orders of magnitude are better than Lua's and LuaJIT's documentation.
Okay, back to the table.nkeys
function. Its naming may confuse you, but it is a function that gets the length of the table and returns the number of elements of the table, including the elements of the array and the hash part. Therefore, we can use it instead of table.getn
, for example, as follows.
local nkeys = require "table.nkeys"
print(nkeys({})) -- 0
print(nkeys({ "a", nil, "b" })) -- 2
print(nkeys({ dog = 3, cat = 4, bird = nil })) -- 2
print(nkeys({ "a", dog = 3, cat = 4 })) -- 3
Metatable
After talking about the table function, let's look at the metatable
derived from table
. The metatable is a unique concept in Lua, and is widely used in real-world projects. It is not an exaggeration to say that you can find it in almost any lua-resty-*
library.
Metatable
behaves like operator overloads; for example, we can overkill __add
to compute the concatenation of two Lua arrays or __tostring
to define functions that convert to strings.
Lua, on the other hand, provides two functions for handling metatable.
- The first is
setmetatable(table, metatable)
, which sets up a metatable for a table. - The second is
getmetatable(table)
, which gets the table's metatable.
After all this, you may be more interested in what it does, so let's look at what metatable is specifically used for. Here is a piece of code from an actual project.
$ resty -e ' local version = {
major = 1,
minor = 1,
patch = 1
}
version = setmetatable(version, {
__tostring = function(t)
return string.format("%d.%d.%d", t.major, t.minor, t.patch)
end
})
print(tostring(version))
'
We first define a table named version
, and as you can see, the purpose of this code is to print out the version number in version
. However, we can't print the version
directly. You can try to do this and see that printing directly will only output the address of the table.
print(tostring(version))
So, we need to customize the string conversion function for this table, which is __tostring
, and this is where the metatable comes in. We use setmetatable
to reset the __tostring
method of the table version
to print out the version number: 1.1.1.
In addition to __tostring
, we often override the following two metamethods in the metatable in real projects.
One of them is __index. When we look up an element in a table, we first look it up directly from the table, and if we don't find it, we go on to the __index
of the meta table.
We remove the patch
from the version
table in the following example.
$ resty -e ' local version = {
major = 1,
minor = 1
}
version = setmetatable(version, {
__index = function(t, key)
if key == "patch" then
return 2
end
end,
__tostring = function(t)
return string.format("%d.%d.%d", t.major, t.minor, t.patch)
end
})
print(tostring(version))
'
In this case, t.patch
doesn't get the value, so it goes to the __index
function, which prints 1.1.2.
__index
can be not only a function but also a table, and if you try to run the following code, you'll see that they achieve the same result.
$ resty -e ' local version = {
major = 1,
minor = 1
}
version = setmetatable(version, {
__index = {patch = 2},
__tostring = function(t)
return string.format("%d.%d.%d", t.major, t.minor, t.patch)
end
})
print(tostring(version))
'
Another metamethod is __call. It is similar to a functor that allows a table to be called.
Let's build on the code above that prints the version number and see how to call a table.
$ resty -e '
local version = {
major = 1,
minor = 1,
patch = 1
}
local function print_version(t)
print(string.format("%d.%d.%d", t.major, t.minor, t.patch))
end
version = setmetatable(version,
{__call = print_version})
version()
'
In this code, we use setmetatable
to add a metatable to the table version
, and the __call
metamethod inside it points to the function print_version
. So, if we try to call version
a function, the function print_version
will be executed here.
And getmetatable
is the operation paired with setmetatable
to get the metatable that has been set, like the following code.
$ resty -e ' local version = {
major = 1,
minor = 1
}
version = setmetatable(version, {
__index = {patch = 2},
__tostring = function(t)
return string.format("%d.%d.%d", t.major, t.minor, t.patch)
end
})
print(getmetatable(version).__index.patch)
'
In addition to these three metamethods we talked about today, there are some infrequently used metamethods that you can consult the documentation to learn more about when you encounter them.
Object-oriented
Finally, let's talk about object orientation. As you may know, Lua is not an Object Orientation language, but we can use metatable to implement OO.
Let's look at a practical example. lua-resty-mysql is the official MySQL client of OpenResty, and it uses metatables simulation classes and class methods, which are used in the following way.
$ resty -e 'local mysql = require "resty.mysql" -- first reference the lua-resty library
local db, err = mysql:new() -- Create a new instance of the class
db:set_timeout(1000) -- Calling methods of a class
You can execute the above code directly with the resty
command line. These lines of code are easy to understand; the only thing that might cause you trouble is.
When calling a class method, why is it a colon instead of a dot?
Actually, both colons and dots are fine here, and db:set_timeout(1000)
and db.set_timeout(db, 1000)
are exactly equivalent. The colon is a syntactic sugar in Lua that allows omitting the first argument self
of a function.
As we all know, there are no secrets in front of the source code, so let's look at the concrete implementation corresponding to the above lines of code so you can better understand how to model object-oriented with meta-tables.
local _M = { _VERSION = '0.21' } -- Using the table simulation class
local mt = { __index = _M } -- mt is short for metatable, __index refers to the class itself
-- Constructor of class
function _M.new(self)
local sock, err = tcp()
if not sock then
return nil, err
end
return setmetatable({ sock = sock }, mt) -- example of simulated classes using table and metatable
end
-- Member functions of a class
function _M.set_timeout(self, timeout) -- Use the self argument to get an instance of the class you want to operate on
local sock = self.sock
if not sock then
return nil, "not initialized"
end
return sock:settimeout(timeout)
end
Table _M
simulates a class initialized with a single member variable _VERSION
and subsequently defines member functions such as _M.set_timeout
. In the constructor _M.new(self)
, we return a table whose meta-table is mt
, and the __index
meta-method of mt
points to _M
so that the returned table emulates an instance of the class _M
.
Summary
Well, that concludes the main content for today. Table and metatable are heavily used in OpenResty's lua-resty-*
library and OpenResty-based open source projects. I hope this lesson will make it easier for you to read and understand the source code.
There are other standard functions in Lua besides the table, which we'll learn together in the next lesson.
Finally, I'd like to leave you with a thought-provoking question. Why does the lua-resty-mysql
library mimic OO as a layer of wrapping? Welcome to discuss this question in the comments section, and welcome you to share this article with your colleagues and friends so we can communicate and progress together.