Accurate Performance Testing with `wrk`

In this article, we'll talk about performance testing. This part is not unique to OpenResty but applies to other backend services.

Performance testing is widespread, and when we deliver the products, they all come with performance metrics, such as QPS, TPS, latency, the user number that connection is supported concurrently, and so on. For open-source projects, we also do a performance test before releasing a version to compare it with the previous one to see if there is a significant decline. There are also neutral websites that publish comparative performance data of similar products. I have to say that performance testing is close to us.

So, how do scientific and rigorous performance testing?

Performance Testing Tools

To do a good job, you must first use a good tool. Choosing a good performance testing tool is half the success.

The Apache Benchmark tool, also known as ab, which you should be familiar with, is arguably the easiest performance testing tool, but unfortunately, it is not very useful. This is because the current server side is developed based on concurrent and asynchronous I/O, whose performance is not bad. The ab does not take advantage of the machine's multi-core and the generated requests are not under enough stress. In this case, the results obtained from ab testing are not real.

Therefore, we can choose a criterion for the stress testing tool: the tool itself has solid performance and can generate significant enough stress to stress the server-side program.

Of course, you can also spend more money to start many stress testing client sides and turn them into a distributed stress testing system. But don't forget that the complexity of it also increases.

Going back to the OpenResty practice, our recommended performance testing tool is wrk. First, why do we choose it?

First, wrk meets the criteria for tool selection. The stress generated by wrk on a single machine can easily allow NGINX to reach 100% CPU utilization, not to mention other server-side applications.

Second, wrk has a lot of similarities to OpenResty. wrk is not an open-source project written from scratch; it stands on the shoulders of LuaJIT and Redis and takes advantage of the system's multi-core resources to generate requests. In addition, wrk exposes a Lua API that allows you to embed your Lua scripts to customize the request headers and content, making it very flexible.

So how should we use wrk? It's as simple as looking at the following code snippet.

wrk -t12 -c400 -d30s http://127.0.0.1:8080/index.html

This means that wrk will use 12 threads, holding 400 long connections for 30 seconds, to send HTTP requests to the specified API interface. Of course, if you don't specify parameters, wrk will start two threads and ten long connections by default.

Testing environment

After finding the testing tools, we can't start the stress test directly. We need to check the testing environment once. There are four main items to be checked in the testing environment, and I will discuss them in detail.

1. Turn off SELinux

If you have a CentOS/RedHat operating system, it is recommended that you turn off SELinux. Otherwise, you may encounter a lot of weird permission problems.

Let's check if SELinux is turned on with the following command.

$ sestatus
SELinux status: disabled

If it says it is on(enforcing), you can turn it off temporarily by $ setenforce 0; also, modify the /etc/selinux/config file to turn it off permanently by changing SELINUX=enforcing to SELINUX=disabled.

2. Maximum number of open files

Then, you need to check the current system's overall maximum number of open files with the following command.

    $ cat /proc/sys/fs/file-nr
    3984 0 3255296

The last number 3255296 here is the maximum number of open files. If this number is small on your machine, you need to modify the /etc/sysctl.conf file to increase it.

fs.file-max = 1020000
net.ipv4.ip_conntrack_max = 1020000
net.ipv4.netfilter.ip_conntrack_max = 1020000

After the modification, you must restart the system service to take effect.

sudo sysctl -p /etc/sysctl.conf

3. Process limits

In addition to the overall maximum number of open files on the system, there is also a limit to the number of files a process can open, which you can check with the command ulimit.

$ ulimit -n
1024

You will notice that this value defaults to 1024, a small value. Since each user request corresponds to a file handle, and stress tests generate a lot of requests, we need to increase this value and change it to the millions, which you can temporarily change with the following command.

ulimit -n 1024000

You can also modify the configuration file /etc/security/limits.conf to make it permanent.

* hard nofile 1024000
* soft nofile 1024000

4. NGINX configuration

Finally, you will need to make a small change to the NGINX configuration, which is the following three lines of code.

events {
    worker_connections 10240;
}

This allows us to increase the number of connections per Worker. Since the default value is only 512, this is not enough for high-stress tests.

Check before the stress test

At this point, the test environment is ready. You must be eager to get started and test it, right? Let's check one last time before launching the test with wrk. After all, people make mistakes, thus it's essential to do a cross-test.

This last test can be divided into two steps.

1. Use the automated tool `c1000k`

c1000k comes from the author of SSDB. As you can see from the name, the purpose of this tool is to check if your environment can meet the requirements of 10^6 concurrent connections.

The use of this tool is also straightforward. We start a server and a client, corresponding to the server program listening on port 7000 and the client program launching the stress test to simulate the stress test in a real environment:

. /server 7000
. /client 127.0.0.1 7000

Immediately afterward, the client sends a request to the server to check if the current system environment can support one million concurrent connections. You can run it yourself and see the result.

2. Check whether the server program is running normally

If the server-side program is not working correctly, the stress test may become an error log refresh test or a 404 response test.

So, the last and most crucial step of the test environment testing is to run through the server-side unit test set or manually call a few major interfaces to ensure that all interfaces, returns, and HTTP response codes of the wrk test are normal and that there are no error-level messages in logs/error.log.

Sending requests

Okay, so now everything is ready to go. Let's start stress testing with wrk!

$ wrk -d 30 http://127.0.0.2:9080/hello
Running 30s test @ http://127.0.0.2:9080/hello
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   595.39us  178.51us  22.24ms   90.63%
    Req/Sec     8.33k   642.91     9.46k    59.80%
  499149 requests in 30.10s, 124.22MB read
Requests/sec:  16582.76
Transfer/sec:      4.13MB

I didn't specify parameters here so that wrk will start 2 threads and 10 long connections by default. You don't need to adjust the number of threads and connections in wrk to be very large; as long as you can get the target program to reach 100% CPU utilization, you'll be fine.

But the time of the stress test must not be too short as a few seconds of the stress test is meaningless. Otherwise, the stress test will likely be over before the server program finishes the hot reloading. At the same time, you need to use a monitoring tool like top or htop to check if the target program on the server is running at 100% CPU utilization during the stress test.

Phenomenally, if the CPU is fully loaded and the CPU and memory usage decrease quickly after the test is stopped, then congratulations, the test was completed successfully. However, if there are any exceptions like the following, as a server-side developer, you should pay attention.

CPU cannot be fully loaded. This is not a wrk problem; it could be a network limitation or a blocking operation in your code. You can determine this by reviewing your code or using the off CPU flame graph.
CPU is always fully loaded, even when the stress stops. This indicates an infinite loop in the code caused by a regular expression or a LuaJIT bug, which I have encountered in real environments. At this point, you'll need to use the CPU flame graph to determine that.

Finally, let's look at the wrk statistics. Concerning this result, we generally focus on two values.

The first is QPS, or Requests/sec: 16582.76, which is an exact figure that indicates how many requests are processed per second on the server side.

The second is Latency: Latency 595.39us 178.51us 22.24ms 90.63%, as important as QPS, which reflects the system's response speed. For example, for gateway applications, we want to keep the latency within 1 ms.

In addition, wrk also provides a latency parameter that prints out the percentage distribution of latency in detail, for example.

Latency Distribution
        50% 134.00us
        75% 180.00us
        90% 247.00us
        99% 552.00us

However, wrk latency distribution data is inaccurate because it artificially adds network and tool perturbations that amplify the latency, which requires your special attention.

Summary

Performance testing is a technical job; not many people can do it right and well. I hope this article will give you a more comprehensive understanding of performance testing.

Finally, I'll leave you with a question: wrk supports custom Lua scripts to do stress testing, so can you write a simple Lua script based on its documentation? It may be a bit difficult, but you'll understand the intent of wrk exposed interfaces when you finished it.

You are welcome to share this article with more people, and we will make progress together.