`lua-resty-*` Encapsulation Releases Developers from Multi-Level Caching
API7.ai
December 30, 2022
In the previous two articles, we have learned about caching in OpenResty and the cache stampede problem, which are all on the basic side. In the actual project development, developers prefer an out-of-the-box library with all the details handled and hidden and can be used to develop business code directly.
This is a benefit of the division of labor, the basic components developers focus on flexible architecture, good performance, and code stability without caring about the upper business logic; while the application engineers concerned more about the business implementation and rapid iteration, hoping not to be distracted by various technical details of the bottom layer. The gap in between can be filled by wrapper libraries.
Caching in OpenResty faces the same problem. shared dict
and lru caches
are stable and efficient enough, but there are too many details to deal with. The "last mile" for application development engineers can be arduous without some useful encapsulation. This is where the importance of community comes into play. An active community will take the initiative to find the gaps and fill them quickly.
lua-resty-memcached-shdict
Let's get back to cache encapsulation. lua-resty-memcached-shdict
is an official OpenResty project that uses shared dict
to make a layer of encapsulation for memcached
, handling details like cache stampede and expired data. If your cached data happens to be stored in memcached
on the backend, then you can try using this library.
It is an official OpenResty-developed library, but it is not in the OpenResty package by default. If you want to test it locally, you need to download its source code to the local OpenResty lookup path first.
This encapsulation library is the same solution we mentioned in the previous article. It uses lua-resty-lock
to be mutually exclusive. In case of a cache failure, only one request goes to memcached
to fetch the data and avoid cache storms. The stale data is returned to the endpoint if the latest data is not fetched.
However, this lua-resty
library, although an official OpenResty project, is not perfect:
- First, it has no test case coverage, meaning the code quality cannot be consistently guaranteed.
- Second, it exposes too many interface parameters, with 11 required and 7 optional parameters.
local memc_fetch, memc_store =
shdict_memc.gen_memc_methods{
tag = "my memcached server tag",
debug_logger = dlog,
warn_logger = warn,
error_logger = error_log,
locks_shdict_name = "some_lua_shared_dict_name",
shdict_set = meta_shdict_set,
shdict_get = meta_shdict_get,
disable_shdict = false, -- optional, default false
memc_host = "127.0.0.1",
memc_port = 11211,
memc_timeout = 200, -- in ms
memc_conn_pool_size = 5,
memc_fetch_retries = 2, -- optional, default 1
memc_fetch_retry_delay = 100, -- in ms, optional, default to 100 (ms)
memc_conn_max_idle_time = 10 * 1000, -- in ms, for in-pool connections,optional, default to nil
memc_store_retries = 2, -- optional, default to 1
memc_store_retry_delay = 100, -- in ms, optional, default to 100 (ms)
store_ttl = 1, -- in seconds, optional, default to 0 (i.e., never expires)
}
Most of the parameters exposed can be simplified by "creating a new memcached
handler". The current way of encapsulating all the parameters by throwing them at the user is not user-friendly, so I would welcome interested developers to contribute PRs to optimize this.
Also, further optimizations are mentioned in the following directions in the documentation of this encapsulation library.
- Use
lua-resty-lrucache
to increase theWorker
-level cache, rather than just theserver
-levelshared dict
cache. - Use
ngx.timer
to do asynchronous cache update operations.
The first direction is a very good suggestion, as the cache performance within the worker is better; the second suggestion is something you need to consider based on your actual scenario. However, I don't generally recommend the second one, not only because there is a limit to the number of timers, but also because if the update logic here goes wrong, the cache will never be updated again, which has a large impact.
lua-resty-mlcache
Next, let's introduce a caching encapsulation commonly used in OpenResty: lua-resty-mlcache
, which uses shared dict
and lua-resty-lrucache
to implement a multi-layer caching mechanism. Let's look at how this library is used in the following two code examples.
local mlcache = require "resty.mlcache"
local cache, err = mlcache.new("cache_name", "cache_dict", {
lru_size = 500, -- size of the L1 (Lua VM) cache
ttl = 3600, -- 1h ttl for hits
neg_ttl = 30, -- 30s ttl for misses
})
if not cache then
error("failed to create mlcache: " .. err)
end
Let's look at the first piece of code. The beginning of this code introduces the mlcache
library and sets the parameters for initialization. We would normally put this code in the init
phase and only need to do it once.
In addition to the two required parameters, cache name, and dictionary name, a third parameter is a dictionary with 12 options that are optional and use the default values if not filled in. This is much more elegant than lua-resty-memcached-shdict
. If we were to design the interface ourselves, it would be better to adopt the mlcache
approach - keep the interface as simple as possible while retaining enough flexibility.
Here is the second piece of code, which is the logical code when the request is processed.
local function fetch_user(id)
return db:query_user(id)
end
local id = 123
local user , err = cache:get(id , nil , fetch_user , id)
if err then
ngx.log(ngx.ERR , "failed to fetch user: ", err)
return
end
if user then
print(user.id) -- 123
end
As you can see, the multi-layer cache is hidden, so you need to use the mlcache
object to fetch the cache and set the callback function when the cache expires. The complex logic behind this can be completely hidden.
You may be curious as to how this library is implemented internally. Next, let's take another look at the architecture and implementation of this library. The following image is a slide from a talk given by Thibault Charbonnier, the author of mlcache
, at OpenResty Con 2018.
As you can see from the diagram, mlcache
divides the data into three layers, namely L1
, L2
and L3
.
The L1
cache is lua-resty-lrucache
, where each Worker
has their copy, and with N
Worker
s, there are N
copies of data, so there is data redundancy. Since operating lrucache
within a single Worker
does not trigger locks, it has higher performance and is suitable as a first-level cache.
The L2
cache is a shared dict
. All Worker
s share a single copy of the cached data and will query the L2
cache if the L1
cache does not hit. ngx.shared
.DICT provides an API that uses spinlocks to ensure the atomicity of operations, so we don't have to worry about race conditions here.
The L3
is the case where the L2
cache does not hit either, and the callback function needs to be executed to query the data source, such as an external database, and then cache it to L2
. Here, to avoid cache storms, it uses lua-resty-lock
to ensure that only one Worker
goes to the data source to get the data.
From a request perspective:
- First, it will query the L1 cache within the
Worker
and return directly if theL1
hits. - If
L1
does not hit or the cache fails, it queries theL2
cache betweenWorker
s. IfL2
hits, it returns and caches the result inL1
. - If
L2
also misses or the cache is invalidated, a callback function is called to look up the data from the data source and write it to theL2
cache, which is the function of theL3
data layer.
You can also see from this process that cache updates are passively triggered by endpoint requests. Even if a request fails to fetch the cache, subsequent requests can still trigger the update logic to maximize cache security.
However, although mlcache
has been implemented perfectly, there is still a pain point - the serialization and deserialization of data. This is not a problem with mlcache
, but the difference between lrucache
and shared dict
, which we repeatedly mentioned. In lrucache
, we can store various Lua data types, including table
; but in shared dict
, we can only store strings.
L1, the lrucache
cache, is the layer of data that users touch, and we want to cache all kinds of data in it, including string
, table
, cdata
, and so on. The problem is that L2
can only store strings, and when the data is elevated from L2
to L1
, we need to do a layer of conversion from strings to data types that we can give directly to the user.
Fortunately, mlcache
has taken this situation into account and provides optional functions l1_serializer
in the new
and get
interfaces, specifically designed to handle data processing when L2
is raised to L1
. We can see the following sample code, which I extracted from my test case set.
local mlcache = require "resty.mlcache"
local cache, err = mlcache.new("my_mlcache", "cache_shm", {
l1_serializer = function(i)
return i + 2
end,
})
local function callback()
return 123456
end
local data = assert(cache:get("number", nil, callback))
assert(data == 123458)
Let me explain it quickly. In this case, the callback function returns the number 123456
; in new
, the l1_serializer
function we set will add 2
to the incoming number before setting the L1
cache, which becomes 123458
. With such a serialization function, the data can be more flexible when converting between L1
and L2
.
Summary
With multiple caching layers, server-side performance can be maximized, and many details are hidden in between. At this point, a stable and efficient wrapper library saves us a lot of effort. I also hope these two wrapper libraries introduced today will help you better understand caching.
Finally, think about this question: Is the shared dictionary layer of cache necessary? Is it possible to use only lrucache
? Feel free to leave a comment and share your opinion with me, and you are also welcome to share this article with more people to communicate and progress together.