Accurate Performance Testing with `wrk`
API7.ai
November 25, 2022
In this article, we'll talk about performance testing. This part is not unique to OpenResty but applies to other backend services.
Performance testing is widespread, and when we deliver the products, they all come with performance metrics, such as QPS, TPS, latency, the user number that connection is supported concurrently, and so on. For open-source projects, we also do a performance test before releasing a version to compare it with the previous one to see if there is a significant decline. There are also neutral websites that publish comparative performance data of similar products. I have to say that performance testing is close to us.
So, how do scientific and rigorous performance testing?
Performance Testing Tools
To do a good job, you must first use a good tool. Choosing a good performance testing tool is half the success.
The Apache Benchmark tool, also known as ab
, which you should be familiar with, is arguably the easiest performance testing tool, but unfortunately, it is not very useful. This is because the current server side is developed based on concurrent and asynchronous I/O, whose performance is not bad. The ab
does not take advantage of the machine's multi-core and the generated requests are not under enough stress. In this case, the results obtained from ab
testing are not real.
Therefore, we can choose a criterion for the stress testing tool: the tool itself has solid performance and can generate significant enough stress to stress the server-side program.
Of course, you can also spend more money to start many stress testing client sides and turn them into a distributed stress testing system. But don't forget that the complexity of it also increases.
Going back to the OpenResty practice, our recommended performance testing tool is wrk
. First, why do we choose it?
First, wrk
meets the criteria for tool selection. The stress generated by wrk
on a single machine can easily allow NGINX to reach 100% CPU utilization, not to mention other server-side applications.
Second, wrk
has a lot of similarities to OpenResty. wrk
is not an open-source project written from scratch; it stands on the shoulders of LuaJIT and Redis and takes advantage of the system's multi-core resources to generate requests. In addition, wrk
exposes a Lua API that allows you to embed your Lua scripts to customize the request headers and content, making it very flexible.
So how should we use wrk
? It's as simple as looking at the following code snippet.
wrk -t12 -c400 -d30s http://127.0.0.1:8080/index.html
This means that wrk
will use 12 threads, holding 400 long connections for 30 seconds, to send HTTP requests to the specified API interface. Of course, if you don't specify parameters, wrk
will start two threads and ten long connections by default.
Testing environment
After finding the testing tools, we can't start the stress test directly. We need to check the testing environment once. There are four main items to be checked in the testing environment, and I will discuss them in detail.
1. Turn off SELinux
If you have a CentOS/RedHat operating system, it is recommended that you turn off SELinux. Otherwise, you may encounter a lot of weird permission problems.
Let's check if SELinux is turned on with the following command.
$ sestatus
SELinux status: disabled
If it says it is on(enforcing), you can turn it off temporarily by $ setenforce 0
; also, modify the /etc/selinux/config
file to turn it off permanently by changing SELINUX=enforcing
to SELINUX=disabled
.
2. Maximum number of open files
Then, you need to check the current system's overall maximum number of open files with the following command.
$ cat /proc/sys/fs/file-nr
3984 0 3255296
The last number 3255296
here is the maximum number of open files. If this number is small on your machine, you need to modify the /etc/sysctl.conf
file to increase it.
fs.file-max = 1020000
net.ipv4.ip_conntrack_max = 1020000
net.ipv4.netfilter.ip_conntrack_max = 1020000
After the modification, you must restart the system service to take effect.
sudo sysctl -p /etc/sysctl.conf
3. Process limits
In addition to the overall maximum number of open files on the system, there is also a limit to the number of files a process can open, which you can check with the command ulimit
.
$ ulimit -n
1024
You will notice that this value defaults to 1024, a small value. Since each user request corresponds to a file handle, and stress tests generate a lot of requests, we need to increase this value and change it to the millions, which you can temporarily change with the following command.
ulimit -n 1024000
You can also modify the configuration file /etc/security/limits.conf
to make it permanent.
* hard nofile 1024000
* soft nofile 1024000
4. NGINX configuration
Finally, you will need to make a small change to the NGINX configuration, which is the following three lines of code.
events {
worker_connections 10240;
}
This allows us to increase the number of connections per Worker. Since the default value is only 512, this is not enough for high-stress tests.
Check before the stress test
At this point, the test environment is ready. You must be eager to get started and test it, right? Let's check one last time before launching the test with wrk
. After all, people make mistakes, thus it's essential to do a cross-test.
This last test can be divided into two steps.
1. Use the automated tool c1000k
c1000k comes from the author of SSDB
. As you can see from the name, the purpose of this tool is to check if your environment can meet the requirements of 10^6 concurrent connections.
The use of this tool is also straightforward. We start a server
and a client
, corresponding to the server program listening on port 7000
and the client program launching the stress test to simulate the stress test in a real environment:
. /server 7000
. /client 127.0.0.1 7000
Immediately afterward, the client
sends a request to the server
to check if the current system environment can support one million concurrent connections. You can run it yourself and see the result.
2. Check whether the server program is running normally
If the server-side program is not working correctly, the stress test may become an error log refresh test or a 404
response test.
So, the last and most crucial step of the test environment testing is to run through the server-side unit test set or manually call a few major interfaces to ensure that all interfaces, returns, and HTTP response codes of the wrk
test are normal and that there are no error-level messages in logs/error.log
.
Sending requests
Okay, so now everything is ready to go. Let's start stress testing with wrk
!
$ wrk -d 30 http://127.0.0.2:9080/hello
Running 30s test @ http://127.0.0.2:9080/hello
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 595.39us 178.51us 22.24ms 90.63%
Req/Sec 8.33k 642.91 9.46k 59.80%
499149 requests in 30.10s, 124.22MB read
Requests/sec: 16582.76
Transfer/sec: 4.13MB
I didn't specify parameters here so that wrk
will start 2 threads and 10 long connections by default. You don't need to adjust the number of threads and connections in wrk
to be very large; as long as you can get the target program to reach 100% CPU utilization, you'll be fine.
But the time of the stress test must not be too short as a few seconds of the stress test is meaningless. Otherwise, the stress test will likely be over before the server program finishes the hot reloading. At the same time, you need to use a monitoring tool like top
or htop
to check if the target program on the server is running at 100% CPU utilization during the stress test.
Phenomenally, if the CPU is fully loaded and the CPU and memory usage decrease quickly after the test is stopped, then congratulations, the test was completed successfully. However, if there are any exceptions like the following, as a server-side developer, you should pay attention.
- CPU cannot be fully loaded. This is not a
wrk
problem; it could be a network limitation or a blocking operation in your code. You can determine this by reviewing your code or using theoff CPU
flame graph. - CPU is always fully loaded, even when the stress stops. This indicates an infinite loop in the code caused by a regular expression or a LuaJIT bug, which I have encountered in real environments. At this point, you'll need to use the CPU flame graph to determine that.
Finally, let's look at the wrk
statistics. Concerning this result, we generally focus on two values.
The first is QPS, or Requests/sec: 16582.76
, which is an exact figure that indicates how many requests are processed per second on the server side.
The second is Latency: Latency 595.39us 178.51us 22.24ms 90.63%
, as important as QPS, which reflects the system's response speed. For example, for gateway applications, we want to keep the latency within 1 ms.
In addition, wrk
also provides a latency
parameter that prints out the percentage distribution of latency in detail, for example.
Latency Distribution
50% 134.00us
75% 180.00us
90% 247.00us
99% 552.00us
However, wrk
latency distribution data is inaccurate because it artificially adds network and tool perturbations that amplify the latency, which requires your special attention.
Summary
Performance testing is a technical job; not many people can do it right and well. I hope this article will give you a more comprehensive understanding of performance testing.
Finally, I'll leave you with a question: wrk
supports custom Lua scripts to do stress testing, so can you write a simple Lua script based on its documentation? It may be a bit difficult, but you'll understand the intent of wrk
exposed interfaces when you finished it.
You are welcome to share this article with more people, and we will make progress together.