Keys to High Performance: `shared dict` and `lru` Cache

In the previous article, I introduced OpenResty's optimization techniques and performance tuning tools, which involve string, table, Lua API, LuaJIT, SystemTap, flame graphs, etc.

These are the cornerstones of system optimization, and you need to master them well. However, only knowing them is not enough to face real business scenarios. In a more complex business scene, maintaining high performance is a systematic work, not just code and gateway-level optimization. It will involve various aspects such as database, network, protocol, cache, disk, etc., which is the meaning of the existence of an architect.

In today's article, let's take a look at the component that plays a very critical role in performance optimization - cache, and see how it is used and optimized in OpenResty.

Cache

At the hardware level, most computer hardware uses caches to improve speed. For example, CPUs have multi-level caches, and RAID cards have read-and-write caches. At the software level, the database we use is a very good example of cache design. There are caches in SQL statement optimization, index design, and disk read and write.

Here, I suggest you learn about MySQL's various caching mechanisms before designing your own caching. The material I recommend to you is the excellent book High Performance MySQL: Optimization, Backups, and Replication. When I was in charge of the database many years ago, I benefited a lot from this book, and many other optimization scenarios later also borrowed from the design of MySQL.

Back to caching, we know that a caching system in a production environment needs to find the best solution based on its business scenarios and system bottlenecks. It's an art of balance.

In general, caching has two principles.

One is that the closer to the user's request, the better. For example, don't send HTTP requests if you can use a local cache. Send it to the origin site if you can use CDN, and don't send it to the database if you can use OpenResty cache.
The second is to try to use this process and the local cache to solve it. Because across processes, machines, and even server rooms, the network overhead of caching will be very large, which will be very obvious in high-concurrency scenarios.

In OpenResty, the design and use of cache also follow these two principles. There are two cache components in OpenResty: shared dict cache and lru cache. The former can only cache string objects, and there is only one copy of the cached data, which can be accessed by each worker, so it is often used for data communication between workers. The latter can cache all Lua objects, but they can only be accessed within a single worker process. There are as much cached data as there are workers.

The following two simple tables can illustrate the difference between shared dict and lru cache:

Cache component name	Access scope	Cache data type	Data structure	Obsolete data can be obtained	APIs number	Memory usage
shared dict	Between multiple workers	string objectss	dict,queue	yes	20+	a piece of data
lru cache	within a single worker	all lua objects	dict	no	4	n copies of data (N = worker number)

shared dict and lru cache are not good or bad. They should be used together according to your scenario.

If you don't need to share data between workers, then lru can cache complex data types such as arrays and functions and has the highest performance, so it is the first choice.
But if you need to share data between workers, you can add a shared dict cache based on thelru cache to form a two-level cache architecture.

Next, let's take a look at these two caching methods in detail.

`Shared dict` cache

In Lua's article, we have made a specific introduction to shared dict, here is a brief review of its usage:

$ resty --shdict='dogs 1m' -e 'local dict = ngx.shared.dogs
                               dict:set("Tom", 56)
                               print(dict:get("Tom"))'

You need to declare memory zone dogs in the NGINX configuration file in advance, and then it can be used in Lua code. If you find that the space allocated to dogs is not enough during use, you need to modify the NGINX configuration file first, and then reload NGINX to take effect. Because we can't expand and shrink at runtime.

Next, let's focus on several performance-related issues in the shared dict cache.

Serialization of cached data

The first problem is the serialization of cached data. Since only string objects can be cached in the shared dict, if you want to cache an array, you must serialize once when setting and deserialize once when getting:

resty --shdict='dogs 1m' -e 'local dict = ngx.shared.dogs
                        dict:set("Tom", require("cjson").encode({a=111}))
                        print(require("cjson").decode(dict:get("Tom")).a)'

However, such serialization and deserialization operations are very CPU-intensive. If so many operations are per request, you can see their consumption on the flame graph.

So, how to avoid this consumption in shared dictionaries? There is no good way here, either to avoid putting the array in the shared dictionary at the business level; or to manually splice the strings into JSON format by yourself. Of course, this will also bring performance consumption of string splicing and may cause Hide more bugs.

Most of the serialization can be disassembled at the business level. You can break up the array's contents and store them in the shared dictionary as strings. If it doesn't work, you can also cache the array in lru, and use the memory space in exchange for the convenience and performance of the program.

In addition, the key in the cache should also be as short and meaningful as possible, saving space and facilitating subsequent debugging.

Stale data

There is also a get_stale method for reading data in the shared dict. Compared with the get method, it has an additional return value for expired data:

resty --shdict='dogs 1m' -e 'local dict = ngx.shared.dogs
                            dict:set("Tom", 56, 0.01)
                            ngx.sleep(0.02)
                             local val, flags, stale = dict:get_stale("Tom")
                            print(val)'

In the above example, the data is only cached in the shared dict for 0.01 seconds, and the data has timed out 0.02 seconds after the set. At this time, the data will not be obtained through the get interface but expired data may also be obtained through get_stale. The reason why I use the word "possible" here is because the space occupied by expired data has a certain chance to be recycled and then used for other data. This is the LRU algorithm.

Seeing this, you may have doubts: what is the use of obtaining expired data? Don't forget that what we store in the shared dict is cached data. Even if the cached data expires, it doesn't mean that the source data must be updated.

For example, the data source is stored in MySQL. After we get the data from MySQL, we set five seconds timeout in the shared dict. Then, when the data expires, we have two options:

When the data does not exist, go to MySQL to query again, and put the result in the cache.
Determine whether the MySQL data has changed. If there is no change, read the expired data in the cache, modify its expiration time, and make it continue to take effect.

The latter is a more optimized solution that can interact with MySQL as little as possible so that all client requests get data from the fastest cache.

At this time, how to judge whether the data in the data source has changed has become a problem that we need to consider and solve. Next, let's take lru cache as an example to see how an actual project solves this problem.

`lru` Cache

There are only 5 interfaces for lru cache: new, set, get, delete, and flush_all. Only the get interface is related to the above problem. Let us first understand how this interface is used:

resty -e 'local lrucache = require "resty.lrucache"
local cache, err = lrucache.new(200)
cache:set("dog", 32, 0.01)
ngx.sleep(0.02)
local data, stale_data = cache:get("dog")
print(stale_data)'

You can see that in the lru cache, the second return value of the get interface is directly stale_data, instead of being divided into two different APIs, get and get_stale, like shared dict. Such interface encapsulation is more friendly to using expired data.

We generally recommend using version numbers to distinguish different data in actual projects. This way, its version number will also change after the data changes. For example, a modified index in etcd can be used as a version number to mark whether the data has changed. With the concept of the version number, we can make a simple secondary encapsulation of the lru cache. For example, look at the following pseudo-code, taken from lrucache

local function (key, version, create_obj_fun, ...)
    local obj, stale_obj = lru_obj:get(key)
    -- If the data has not expired and the version has not changed, return the cached data directly
    if obj and obj._cache_ver == version then
        return obj
    end

    -- If the data has expired, but can still be obtained, and the version has not changed, directly return the expired data in the cache
    if stale_obj and stale_obj._cache_ver == version then
        lru_obj:set(key, obj, item_ttl)
        return stale_obj
    end

    -- If no expired data is found, or the version number has changed, get the data from the data source
    local obj, err = create_obj_fun(...)
    obj._cache_ver = version
    lru_obj:set(key, obj, item_ttl)
    return obj, err
end

From this code, you can see that by introducing the concept of the version number, we fully use expired data to reduce the pressure on the data source and achieve optimal performance when the version number does not change.

In addition, in the above solution, there is a potential big optimization in that we separate the key and the version number and use the version number as an attribute of the value.

We know that the more conventional approach is to write the version number into the key. For example, the value of the key is key_1234. This practice is very common, but in the OpenResty environment, this is a waste. Why do you say that?

Give an example, and you will understand. If the version number changes every minute, then key_1234 will become key_1235 after one minute, and 60 different keys and 60 values will be regenerated in one hour. This also means Lua GC needs to recycle the Lua objects behind 59 key-value pairs. Object creation and GC will consume more resources if you update more frequently.

Of course, these consumptions can also be avoided simply by moving the version number from the key to the value. No matter how frequently a key is updated, only two fixed Lua objects exist. It can be seen that such optimization techniques are very ingenious. However, behind the simple and ingenious techniques, you need to understand OpenResty's API and caching mechanism deeply.

Summary

Although the documentation of OpenResty is relatively detailed, you need to experience and comprehend how to combine it with the business to produce the greatest optimization effect. In many cases, there are only one or two sentences in the document, such as stale data, but it will have a huge performance difference.

So, have you had a similar experience when using OpenResty? Welcome to leave a message to share with us, and you are welcome to share this article, let us learn and progress together.