Advantages and Disadvantages of `string` in OpenResty

In the last article, we got familiar with the common blocking functions in OpenResty, which are often misused for beginners. Starting from this article, we will get into the core of performance optimization, which will involve a lot of optimization techniques that can help us quickly improve the performance of OpenResty code, so don't take it lightly.

In this process, we need to write more test code to experience how to use these optimization techniques and verify their effectiveness so we can make good use of them.

Behind the Scenes of Performance Optimization Tips

Optimization techniques are all part of the "practice" part, so before we do that, let's talk about the "theory" of optimization.

Performance optimization methods will change with the iterations of LuaJIT and OpenResty. Some methods may be directly optimized by the underlying technology and no longer need to be mastered; at the same time, there will be some new optimization techniques. Therefore, it is most important to master the constant concept behind these optimization techniques.

Let's take a look at some of the critical ideas about performance in OpenResty programming.

Theory 1: Processing requests should be short, simple, and fast

OpenResty is a web server, so it often handles 1,000+, 10,000+, or even 100,000+ client requests simultaneously. Therefore, to achieve the highest overall performance, we must ensure that individual requests are processed quickly and that various resources, such as memory, are recovered.

The "short" mentioned here means that the request life cycle should be short so as not to take up resources for a long time without releasing them; even for long connections, a threshold of time or number of requests should be set to release resources regularly.
The second "simple" refers to doing only one thing in an API. Break up complex business logic into multiple APIs and keep the code simple.
Finally, "fast" means don't block the main thread and don't run too many CPU operations. Even if you have to do so, don't forget to work with other methods we introduced in the last article.

This architectural consideration is not only suitable for OpenResty, but also for further development languages and platforms, so I hope you can understand and think about it carefully.

Theory 2: Avoid generating intermediate data

Avoiding useless data in the intermediate process is arguably the most dominant optimization theory in OpenResty programming. Let's look at a small example to explain useless data in the intermediate process.

$ resty -e 'local s= "hello"
s = s .. " world"
s = s .. "!"
print(s)
'

In this code snippet, we did several splicing operations on the s variable to get the result hello world!. But only the final hello world! state of s is useful. The initial value of s and the intermediate assignments are all intermediate data that should be generated as little as possible.

The reason is that these temporary data will bring initialization and GC performance loss. Do not underestimate these losses; if this appears in hot code such as loops, the performance will be obviously degraded. I will also explain this later with a string example.

`string`s are immutable

Now, back to the subject of this article, string. Here, I'm highlighting the fact that strings are immutable in Lua.

Of course, this doesn't mean that strings can't be spliced, modified, etc., but when we modify a string, we don't change the original string but create a new string object and change the reference to the string. So naturally, if the original string does not have any other references, it will be recovered by Lua's GC (garbage collection).

The apparent benefit of immutable strings is that they save memory. This way, there will be only one copy of the same string in memory, and different variables will point to the same memory address.

The disadvantage of this design is that when it comes to adding and reclaiming strings, every time you add a string, LuaJIT has to call lj_str_new to inquire if the string already exists; if not, it needs to create a new string. If you do this very often, it will have a massive impact on performance.

Let's look at a concrete example of a string splicing operation like the one in this example, which is found in many OpenResty open-source projects.

$ resty -e 'local begin = ngx.now()
local s = ""
-- `for` loop, using `..` to perform string splicing
for i = 1, 100000 do
    s = s .. "a"
end
ngx.update_time()
print(ngx.now() - begin)
'

What this sample code does is do 100,000 string splices on the s variable and print out the runtime. Although the example is a bit extreme, it gives a good idea of the difference between before and after performance optimization. Without optimization, this code runs for 0.4 seconds on my laptop, which is still relatively slow. So how should we optimize it?

In the previous articles, the answer was given, which is to use table to do a layer of encapsulation, removing all the temporary intermediate strings and keeping only the original data and the final result. Let's look at the concrete code implementation.

$ resty -e 'local begin = ngx.now()
local t = {}
-- for loop that uses an array to hold the string, counting the length of the array each time
for i = 1, 100000 do
    t[#t + 1] = "a"
end
-- Stitching strings using the concat method of arrays
local s =  table.concat(t, "")
ngx.update_time()
print(ngx.now() - begin)
'

We can see that this code saves each string in turn with table, and the index is determined by #t + 1, that is, the current length of table plus 1. Finally, use the table.concat function to concatenate each array element. This naturally skips all the temporary strings and avoids 100,000 times lj_str_new and GC.

That was our code analysis, but how does the optimization work? The optimized code takes only 0.007 seconds, which means a performance improvement of more than 50 times. In an actual project, the performance improvement might be even more pronounced because, in this example, we only added one character a at a time.

What would the performance difference be if the new string is in the length of 10x a?

Are the 0.007 seconds of code good enough for our optimization work? No, it can still be optimized. Let's modify one more line of code and see the result.

$ resty -e 'local begin = ngx.now()
local t = {}
-- for loop, using an array to hold the string, maintaining the length of the array itself
for i = 1, 100000 do
    t[i] = "a"
end
local s =  table.concat(t, "")
ngx.update_time()
print(ngx.now() - begin)
'

This time, we changed t[#t + 1] = "a" to t[i] = "a", and with just one line of code, we can avoid 100,000 function calls to get the length of the array. Remember the operation to get the length of an array that we mentioned in the table section earlier? It has a time complexity of O(n), a relatively expensive operation. So, here we simply maintain our array index to bypass the operation of getting the array length. As the saying goes, if you can't afford to mess with it, you can avoid it.

Of course, this is a simpler way to write it. The following code illustrates more clearly how to maintain the index of an array by ourselves.

$ resty -e 'local begin = ngx.now()
local t = {}
local index = 1
for i = 1, 100000 do
    t[index] = "a"
    index = index + 1
end
local s = table.concat(t, "")
ngx.update_time()
print(ngx.now() - begin)
'

Reduce other temporary `string`s

The mistakes we just talked about, temporary strings caused by string splicing, are apparent. With a few reminders of the sample code above, I believe we will not make similar mistakes again. However, some more hidden temporary strings are generated in OpenResty, which are much less easily detected. For example, the string handling function we will discuss below is often used. Can you imagine that it also generates temporary strings?

As we know, the string.sub function intercepts a specified part of a string. As we mentioned earlier, strings in Lua are immutable, so intercepting a new string involves lj_str_new and subsequent GC operations.

resty -e 'print(string.sub("abcd", 1, 1))'

The function of the above code is to fetch the first character of the string and print it out. Naturally, it will inevitably generate a temporary string. Is there a better way to accomplish the same effect?

resty -e 'print(string.char(string.byte("abcd")))'

Naturally so. Looking at this code, we first use string.byte to get the numeric code of the first character and then use string.char to convert the number to the corresponding character. This process does not generate any temporary strings. Therefore, it is most efficient to use string.byte to do the string-related scanning and analysis.

Leverage SDK support for `table` type

After learning how to reduce the temporary string, are you eager to try it? Then, we can take the result of the sample code above and output it to the client as the response body's content. At this point, you can pause and try to write this code yourself first.

$ resty -e 'local begin = ngx.now()
local t = {}
local index = 1
for i = 1, 100000 do
    t[index] = "a"
    index = index + 1
end
local response = table.concat(t, "")
ngx.say(response)
'

If you can write this code, you're already ahead of most OpenResty developers. OpenResty's Lua API already takes into account the use of tables for string splicing, so in ngx.say, ngx.print, ngx.log, cosocket:send, and other APIs that may take a lot of strings, it accepts not only string as a parameter, but also accepts table as a parameter.

resty -e 'local begin = ngx.now()
local t = {}
local index = 1
for i = 1, 100000 do
    t[index] = "a"
    index = index + 1
end
ngx.say(t)
'

In this last code snippet, we omit the local response = table.concat(t, ""), the string splicing step, and pass the table directly to ngx.say. This shifts the string splicing task from the Lua level to the C level, avoiding another string lookup, generation, and GC. For long strings, this is another significant performance gain.

Summary

After reading this article, we can see that a lot of OpenResty's performance optimization deals with various details. Therefore, we need to know LuaJIT and OpenResty's Lua API well to achieve optimal performance. This also reminds us that if we have forgotten the previous content, we must review and consolidate it in time.

Finally, think about a problem: write the strings hello, world, and ! to the error log. Can we write a sample code without string splicing?

Also, don't forget the other question in the text. What would be the performance difference in the following code if the new strings are in the length of 10x a?

$ resty -e 'local begin = ngx.now()
local t = {}
for i = 1, 100000 do
    t[#t + 1] = "a"
end
local s =  table.concat(t, "")
ngx.update_time()
print(ngx.now() - begin)
'

You are also welcome to share this article with your friends to learn and communicate.