What is table and metatable in Lua?

API7.ai

October 11, 2022

OpenResty (NGINX + Lua)

Today we'll learn about the only data structure in LuaJIT: table.

Unlike other scripting languages with rich data structures, LuaJIT has only one data structure, table, which is not distinguished from arrays, hashes, collections, etc., but is somewhat mixed. Let's review one of the examples mentioned before.

local color = {first = "red", "blue", third = "green", "yellow"}
print(color["first"])                 --> output: red
print(color[1])                         --> output: blue
print(color["third"])                --> output: green
print(color[2])                         --> output: yellow
print(color[3])                         --> output: nil

In this example, the table color contains an array and a hash and can be accessed without interfering with each other. For example, you can use the ipairs function to iterate through only the array part of the table.

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
for k, v in ipairs(color) do
      print(k)
end
'

The table operations are so crucial that LuaJIT extends the standard Lua 5.1 table library, and OpenResty extends LuaJIT's table library even further. Let's take a look at each of these library functions.

The table library functions

Let's start with the standard table library functions. Lua 5.1 doesn't have many table library functions, so we can skim through them.

table.getn Get the number of elements

As we mentioned in the Standard Lua and LuaJIT chapter, getting the correct number of all the table elements is a big problem in LuaJIT.

For sequences, you can use table.getn or the unary operator # to return the correct number of elements. The following example returns the number of 3 we would expect.

$ resty -e 'local t = { 1, 2, 3 }
  print(table.getn(t))

The correct value cannot be returned for tables that are not sequential. In the second example, the value returned is 1.

$ resty -e 'local t = { 1, a = 2 }
  print(#t) '

Fortunately, such difficult-to-understand functions have been replaced by extensions to LuaJIT, which we will mention later. So in the OpenResty context, do not use the function table.getn and the unary operator # unless you know explicitly that you are getting the sequence length.

Also, table.getn and the unary operator # are not O(1) time complexes but O(n), which is another reason to avoid them if possible.

table.remove Removes the specified element

The second one is the table.remove function, which removes elements in the table based on subscripts, i.e., only the elements in the array part of the table can be removed. Let's look at the color example again.

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
  table.remove(color, 1)
  for k, v in pairs(color) do
      print(v)
  end'

This code will remove the blue with subscript 1. You may ask, how do I delete the hash part of the table? It's as simple as setting the value corresponding to the key to nil. Thus, in the color example, the green corresponding to third is deleted.

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
  color.third = nil
  for k, v in pairs(color) do
      print(v)
  end'

table.concat Element splicing function

The third one is the table.concat element splicing function. It splices together the elements of the table according to subscripts. Since this is again based on subtitles, it is still for the array part of the table. Again with the color example.

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
print(table.concat(color, ", "))'

After using the table.concat function, it outputs blue, yellow and the hash part is skipped.

In addition, this function can also specify the starting position of the subscript to do the concatenation; for example, it is written like the following

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow", "orange"}
print(table.concat(color, ", ", 2, 3))'

This time the output is yellow, orange, skipping blue.

Please don't underestimate this seemingly useless function, but it can have unexpected effects when optimizing performance and is one of the main characters in our later performance optimization chapters.

table.insert Inserts an element

Finally, let's look at the table.insert function. It inserts a new element in the specified subscript, which affects the array part of the table. To illustrate, again, using the color example.

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
table.insert(color, 1,  "orange")
print(color[1])
'

You can see that the first element of color becomes orange, but of course, you can leave the subtitle unspecified so that it will be inserted at the end of the queue by default.

I should note that table.insert is a pervasive operation, but the performance is not good. If you are not inserting elements based on the specified script, then you will need to call LuaJIT's lj_tab_len each time to get the array length to insert at the end of the queue. As table.getn, the time complexity of getting the table length is O(n).

So, for the table.insert operation; we should try to avoid using it in hot code. For example:

local t = {}
for i = 1, 10000 do
     table.insert(t, i)
end

LuaJIT's table extension function

Next, let's look at LuaJIT's table extension functions. LuaJIT extends the standard Lua with two beneficial table functions for creating and emptying a table, which I'll describe below.

table.new(narray, nhash) Create a new table

The first one is the table.new(narray, nhash) function. Instead of growing itself when inserting elements, this function will pre-allocate the space size of the specified array and hash, which is what its two parameters narray and nhash mean. Self-growth is a costly operation that involves space allocation, resize and rehash, and should be avoided at all costs.

Note here that the documentation for table.new is not on the LuaJIT website but is deep in the GitHub project's extended documentation, so it's hard to find it even if you Google it, so not many engineers know about it.

Here's a simple example, and I'll show you how it works. First of all, this function is extended, so before you can use it, you need to require it.

local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
   t[i] = i
end

As you can see, this code creates a new table with 100 array elements and 0 hash elements. Of course, you can create a new table with 100 array elements and 50 hash elements as needed, which is legal.

local t = new_tab(100, 50)

Alternatively, if you go beyond the preset space size, you can still use it usually, but the performance will degrade, and the point of using table.new will be lost.

In the following example, we have a preset size of 100, but we are using 200.

local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 200 do
   t[i] = i
end

You need to preset the size of the array and hash space in table.new according to the actual scenario so you can find a balance between performance and memory usage.

table.clear() Clears the table

The second one is the clear function table.clear(). It clears all the data in a table but does not free the memory occupied by the array and hash parts. Therefore, it is beneficial when recycling Lua tables to avoid the overhead of repeatedly creating and destroying tables.

$ resty -e 'local clear_tab =require "table.clear"
local color = {first = "red", "blue", third = "green", "yellow"}
clear_tab(color)
for k, v in pairs(color) do
     print(k)
end'

However, there are not many scenarios where this function can be used, and in most cases, we should leave this task to the LuaJIT GC.

OpenResty's table extension function

As I mentioned at the beginning, OpenResty maintains its own LuaJIT branch, which also extends table, with several new APIs: table.isempty, table. isarray, table.nkeys and table.clone.

Before using these new APIs, please check the version of OpenResty, as most of these APIs can only be used in versions of OpenResty after 1.15.8.1. This is because OpenResty has not had a new release for about a year before version 1.15.8.1, and these APIs were added in that release interval.

I've included a link to the article, so I'll use table.nkeys as an example. The other three APIs are straightforward to understand from a naming perspective, so look through the GitHub documentation, and you'll understand. I have to say that OpenResty's documentation is very high quality, including code examples, whether it can be JIT, what to look for, etc. Several orders of magnitude are better than Lua's and LuaJIT's documentation.

Okay, back to the table.nkeys function. Its naming may confuse you, but it is a function that gets the length of the table and returns the number of elements of the table, including the elements of the array and the hash part. Therefore, we can use it instead of table.getn, for example, as follows.

local nkeys = require "table.nkeys"
print(nkeys({}))  -- 0
print(nkeys({ "a", nil, "b" }))  -- 2
print(nkeys({ dog = 3, cat = 4, bird = nil }))  -- 2
print(nkeys({ "a", dog = 3, cat = 4 }))  -- 3

Metatable

After talking about the table function, let's look at the metatable derived from table. The metatable is a unique concept in Lua, and is widely used in real-world projects. It is not an exaggeration to say that you can find it in almost any lua-resty-* library.

Metatable behaves like operator overloads; for example, we can overkill __add to compute the concatenation of two Lua arrays or __tostring to define functions that convert to strings.

Lua, on the other hand, provides two functions for handling metatable.

  • The first is setmetatable(table, metatable), which sets up a metatable for a table.
  • The second is getmetatable(table), which gets the table's metatable.

After all this, you may be more interested in what it does, so let's look at what metatable is specifically used for. Here is a piece of code from an actual project.

$ resty -e ' local version = {
  major = 1,
  minor = 1,
  patch = 1
  }
version = setmetatable(version, {
    __tostring = function(t)
      return string.format("%d.%d.%d", t.major, t.minor, t.patch)
    end
  })
  print(tostring(version))
'

We first define a table named version, and as you can see, the purpose of this code is to print out the version number in version. However, we can't print the version directly. You can try to do this and see that printing directly will only output the address of the table.

print(tostring(version))

So, we need to customize the string conversion function for this table, which is __tostring, and this is where the metatable comes in. We use setmetatable to reset the __tostring method of the table version to print out the version number: 1.1.1.

In addition to __tostring, we often override the following two metamethods in the metatable in real projects.

One of them is __index. When we look up an element in a table, we first look it up directly from the table, and if we don't find it, we go on to the __index of the meta table.

We remove the patch from the version table in the following example.

$ resty -e ' local version = {
  major = 1,
  minor = 1
  }
version = setmetatable(version, {
     __index = function(t, key)
         if key == "patch" then
             return 2
         end
     end,
     __tostring = function(t)
      return string.format("%d.%d.%d", t.major, t.minor, t.patch)
    end
  })
  print(tostring(version))
'

In this case, t.patch doesn't get the value, so it goes to the __index function, which prints 1.1.2.

__index can be not only a function but also a table, and if you try to run the following code, you'll see that they achieve the same result.

$ resty -e ' local version = {
  major = 1,
  minor = 1
  }
version = setmetatable(version, {
     __index = {patch = 2},
     __tostring = function(t)
      return string.format("%d.%d.%d", t.major, t.minor, t.patch)
    end
  })
  print(tostring(version))
'

Another metamethod is __call. It is similar to a functor that allows a table to be called.

Let's build on the code above that prints the version number and see how to call a table.

$ resty -e '
local version = {
  major = 1,
  minor = 1,
  patch = 1
  }
local function print_version(t)
     print(string.format("%d.%d.%d", t.major, t.minor, t.patch))
end
version = setmetatable(version,
     {__call = print_version})
  version()
'

In this code, we use setmetatable to add a metatable to the table version, and the __call metamethod inside it points to the function print_version. So, if we try to call version a function, the function print_version will be executed here.

And getmetatable is the operation paired with setmetatable to get the metatable that has been set, like the following code.

$ resty -e ' local version = {
  major = 1,
  minor = 1
  }
version = setmetatable(version, {
     __index = {patch = 2},
     __tostring = function(t)
      return string.format("%d.%d.%d", t.major, t.minor, t.patch)
    end
  })
  print(getmetatable(version).__index.patch)
'

In addition to these three metamethods we talked about today, there are some infrequently used metamethods that you can consult the documentation to learn more about when you encounter them.

Object-oriented

Finally, let's talk about object orientation. As you may know, Lua is not an Object Orientation language, but we can use metatable to implement OO.

Let's look at a practical example. lua-resty-mysql is the official MySQL client of OpenResty, and it uses metatables simulation classes and class methods, which are used in the following way.

    $ resty -e 'local mysql = require "resty.mysql" -- first reference the lua-resty library
    local db, err = mysql:new() -- Create a new instance of the class
    db:set_timeout(1000) -- Calling methods of a class

You can execute the above code directly with the resty command line. These lines of code are easy to understand; the only thing that might cause you trouble is.

When calling a class method, why is it a colon instead of a dot?

Actually, both colons and dots are fine here, and db:set_timeout(1000) and db.set_timeout(db, 1000) are exactly equivalent. The colon is a syntactic sugar in Lua that allows omitting the first argument self of a function.

As we all know, there are no secrets in front of the source code, so let's look at the concrete implementation corresponding to the above lines of code so you can better understand how to model object-oriented with meta-tables.

local _M = { _VERSION = '0.21' } -- Using the table simulation class
local mt = { __index = _M } -- mt is short for metatable, __index refers to the class itself
-- Constructor of class
function _M.new(self)
    local sock, err = tcp()
    if not sock then
          return nil, err
    end
    return setmetatable({ sock = sock }, mt) -- example of simulated classes using table and metatable
end

-- Member functions of a class
function _M.set_timeout(self, timeout) -- Use the self argument to get an instance of the class you want to operate on
  local sock = self.sock
  if not sock then
      return nil, "not initialized"
  end

  return sock:settimeout(timeout)
end

Table _M simulates a class initialized with a single member variable _VERSION and subsequently defines member functions such as _M.set_timeout. In the constructor _M.new(self), we return a table whose meta-table is mt, and the __index meta-method of mt points to _M so that the returned table emulates an instance of the class _M.

Summary

Well, that concludes the main content for today. Table and metatable are heavily used in OpenResty's lua-resty-* library and OpenResty-based open source projects. I hope this lesson will make it easier for you to read and understand the source code.

There are other standard functions in Lua besides the table, which we'll learn together in the next lesson.

Finally, I'd like to leave you with a thought-provoking question. Why does the lua-resty-mysql library mimic OO as a layer of wrapping? Welcome to discuss this question in the comments section, and welcome you to share this article with your colleagues and friends so we can communicate and progress together.