Transforming Log Plugins to Enhance Observability
Nowadays, logs are crucial in API gateways, especially for fault detection and troubleshooting, saving operation and maintenance costs. API gateway log is also of great value in supporting data collection. To improve observability and system stability, the open-source API gateway Apache APISIX supports many log plugins, like
The value of gateway logs
As businesses rapidly grow in the digital age, software architectures become increasingly complex, making fault detection and diagnosis much more challenging. Again, this underscores the significance of software observability.
Logs are one of the three pillars of observability. They provide system administrators and developers with insights into the operational status of a system, which facilitates the timely identification and resolution of problems.
Moreover, logs can serve additional purposes, such as data mining, auditing, and security monitoring, helping to maintain system compliance and security.
An API gateway connects the application and the outside world, enabling organizations to better manage and monitor API calls, including the critical function of recording API call logs.
We will analyze the value of API gateway logs from two perspectives in the following sections.
The value of O&M
Whether it's traditional systems O&M (operations & maintenance) or modern SRE (Site Reliability Engineering), discovering and fixing faults is always the top priority in ensuring the system's stability. It is because every second of downtime can cause significant business losses and harm the user experience for any large online business.
The API gateway is at the forefront of the entire system and can act as a "sentinel" role. We can extract a wealth of critical data from its access logs which is crucial to the O&M system:
- Analyze upstream and downstream status codes to monitor website availability and the availability of specific upstream services.
- Use access logs to monitor website traffic and promptly detect attacks like DDoS.
- Analyze traffic trends to provide a reference for scaling business systems up or down.
- Track request processing times to generate interface-level performance reports as a data reference for optimizing business system performance.
The above are some of the best practices for API gateway logs that are widely adopted in the industry and implemented in operation and maintenance systems of many enterprises.
The business value
Compared to the widely recognized operational value, the business value of gateway logs is often overlooked.
For example, in user behavior analysis, encoding is the most common way to collect data through event tracking in programs. However, for a well-designed API, gateway logs can naturally meet such requirements. As a result, we can do many things like:
- Analyze the client IP to determine the geographic distribution of traffic.
- Analyze the HTTP Referer to understand the source of user access to each page.
- We can directly obtain critical business indicators for the aggregated statistics of key APIs. For instance, by counting the successful calls of APIs for user registration and order placement, we can obtain the number of new users and orders within a specified period.
Of course, gateway logs are less flexible than event tracking and cannot fulfill custom data collection requirements; they hold enough business value to meet basic data mining needs.
Apache APISIX is a dynamic, real-time, and high-performance cloud-native API gateway that provides rich traffic management functions such as load balancing, dynamic upstream, canary release, circuit breaking, identity authentication, and observability.
Many of its capabilities are provided through plugins, including dozens of logging plugins. The following will use the typical logging plugin in APISIX as an example to explain how to integrate gateway logs with a log analysis system to unlock more value.
Introduction to typical logging plugins of APISIX
Elastic Search is a distributed open-source search and analytics engine used for processing large amounts of data and is highly renowned in log analysis. Its complementary data dashboard Kibana can easily customize various statistical charts to meet the organization's needs for visual query analysis.
In practical applications, since most traditional software logs are saved to local files, a project in the Elastic Search ecosystem called
Filebeat is used to monitor log files on local machines and send incremental logs to the Elastic Search server.
However, the elasticsearch-logger plugin provided by APISIX can directly send APISIX access logs to the Elastic Search server, which could provide several benefits:
- The deployment of the Filebeat component is unnecessary, resulting in a shorter processing chain and reduced computing resources.
- The logs are not stored on disk, so there is no need to worry about disk space usage. However, access logs can be enormous in volume, and if the file rotation is not handled properly, it can quickly fill up the machine's disk and cause failure. Furthermore, interacting with the disk can also reduce the performance of the gateway.
Gateway access logs have a prominent characteristic: the volume of log data is proportional to the volume of business requests: the more requests, the more logs.
Online businesses typically exhibit periodic patterns in request volume. For instance, food delivery platforms tend to experience high traffic during mealtimes, while video sites experience peaks during after-work hours.
This presents a significant challenge for log storage systems, and ensuring that the system can function adequately during traffic peaks is a crucial skill for every ElasticSearch administrator.
A message queue is the best tool for balancing traffic. Introducing a message queue between the gateway and the storage system to provide a log buffer will significantly reduce the pressure on the storage system during peak traffic periods.
For this purpose, APISIX provides kafka-logger, which delivers access logs to Kafka servers, avoiding direct impact on log storage.
In recent years, the concept of [SaaS (Software as a service)] (https://en.wikipedia.org/wiki/Software_as_a_service) has gradually become popular, favored by many small and medium-sized enterprises for its shallow entry threshold and pay-as-you-go pricing model. In the field of log analysis, many SaaS products have emerged, and loggly has become a leader with its rich sources of logs and analysis capabilities.
Against this backdrop, the active APISIX community has developed out-of-the-box Loggly plugins, which only require configuring the credential information to directly send APISIX access logs to the Loggly service, making it highly convenient to use.
The plugins introduced above are all used to collect access logs. However, there is another type of log in APISIX, namely the error log (
error.log), which is critical for diagnosing gateway malfunctions.
Therefore, APISIX provides the error-log-logger to send error logs to a remote server for storage and analysis. In practice, the logging level configuration of APISIX can be used to print more debug or info-level logs, which contain more detailed gateway operation status logs. By using these logs, we could locate most problems.
The gateway logs contain an enormous value, and we can see from the abundant log plugins in the Apache APISIX project that the enterprise users in the community recognize the value of gateway logs. In addition, these plugins further reduce the cost of setting up a log system for new users.
Fuethermore, APISIX has two other types of observability plugins:
tracing. Combined with log plugins, they will further enhance the observability of the gateway and help build system stability.