etcd vs PostgreSQL
Jinhua Luo
March 17, 2023
Historical Background
PostgreSQL
PostgreSQL was originally developed in 1986 under the leadership of Professor Michael Stonebraker at the University of California, Berkeley. Over the course of several decades of development, PostgreSQL has emerged as the leading open-source relational database management system available today. Its permissive license enables anyone to use, modify, and distribute PostgreSQL freely, regardless of whether it is for private, commercial, or academic research purposes.
PostgreSQL offers robust support for both online analytical processing (OLAP) and online transaction processing (OLTP), boasting powerful SQL query capabilities and a broad range of extensions that allow it to meet nearly all commercial needs. As a result, it has garnered increasing attention in recent years. In fact, PostgreSQL's scalability and high performance enable it to replicate the functionality of virtually any other type of database.
Image source (following CC 3.0 BY-SA license agreement): https://en.wikibooks.org/wiki/PostgreSQL/Architecture
etcd
How did etcd come into existence, and what problem does it solve?
In 2013, the startup team CoreOS developed a product called Container Linux. It's an open-source, lightweight operating system that prioritizes the automation and rapid deployment of application services. Container Linux requires applications to run in containers and provides a cluster management solution, making it convenient for users to manage services as if on a single machine.
To ensure that user services wouldn't experience downtime due to a node restart, CoreOS needed to run multiple replicas. But how would they coordinate between multiple replicas and avoid all replicas becoming unavailable during changes?
To address this issue, the CoreOS team required a coordination service that could store service configuration information and provide distributed locking capabilities, and more. So, what was their approach? They first analyzed the business scenario, pain points, and core objectives. Then, they selected a solution that aligned with their goals, evaluating whether to choose an open-source community solution or develop their own custom tool. This approach is a universal problem-solving method that is often employed when faced with challenging problems, and the CoreOS team followed the same principle.
A coordination service ideally needs to meet the following five goals:
- High availability with multiple data replicas
- Data consistency with version checking between replicas
- Minimal storage capacity: the coordination service should store only critical metadata configuration information for services and nodes belonging to the control plane configuration, rather than user-related data. This approach minimizes the need for data sharding for storage and avoids excessive design.
- Functionality for CRUD (create, read, update, and delete), as well as a mechanism for listening to data changes. It should store the status information of services, and when there are changes or abnormalities in the services, it should quickly push the change event to the control plane. This helps improve service availability and reduces unnecessary performance overhead for the coordination service.
- Operational simplicity: the coordination service should be easy to operate, maintain, and troubleshoot. An easy-to-use interface can reduce the risk of errors, lower maintenance costs, and minimize downtime.
From the perspective of the CAP Theorem, etcd belongs to the CP (Consistency & Partition Tolerance) system.
As the central component of a Kubernetes cluster, kube-apiserver uses etcd as its underlying storage.
On the one hand, etcd is used for persistence in creating resource objects in a k8s cluster. On the other hand, it is etcd's data watch mechanism that drives the entire cluster's Informer work, enabling continuous container orchestration.
Therefore, from a technical perspective, the core reasons why Kubernetes uses etcd are:
- etcd is written in Go language, which is consistent with the k8s technology stack, has low resource consumption, and is extremely easy to deploy.
- etcd’s strong consistency, watch, lease, and other features are core dependencies of k8s.
In summary, etcd is a distributed key-value database designed specifically for configuration management and distribution. As a cloud-native software, it offers out-of-the-box usability and high performance, making it superior to traditional databases in this particular area of need.
To make an objective comparison between etcd and PostgreSQL, which are two different types of databases, it is important to evaluate them in the context of the same requirement. Therefore, this article will only discuss the differences between the two in terms of their ability to meet the requirements of configuration management.
Data Model
Different databases have different data models that they present to users, and this factor determines the database's suitability for various scenarios.
Key-value vs SQL
The key-value data model is a popular model in NoSQL, which is also adopted by etcd. How does this model compare to SQL and what are its advantages?
First, let's take a look at SQL.
Relational databases maintain data in tables and provide an efficient, intuitive, and flexible way to store and access structured information.
A table, also known to as a relation, is made up of columns that contain one or more categories of data, and rows, also known as table records, that include a set of data defining the categories. Applications retrieve data by using queries that employ operations such as "project" to identify attributes, "select" to identify tuples, and "join" to combine relations. The relational model for managing databases was developed in 1970 by Edgar Codd, a computer scientist at IBM.
Image source (complying with CC 3.0 BY-SA licensing agreement): https://en.wikipedia.org/wiki/Associative_entity
Records in a table do not have unique identifiers because tables are designed to accommodate multiple duplicate rows. To enable key-value queries, a unique index must be added to the field that serves as the key in the table. PostgreSQL's default index is btree, which, similar to etcd, can perform range queries on keys.
Structured query language (SQL) is a programming language for storing and processing information in a relational database. A relational database stores information in tabular form, with rows and columns representing different data attributes and the various relationships between the data values. You can use SQL statements to store, update, remove, search, and retrieve information from the database. You can also use SQL to maintain and optimize database performance.
PostgreSQL has expanded SQL with numerous extensions, rendering it a Turing-complete language. This means that SQL can perform any complex operation, facilitating the execution of data processing logic entirely on the server side.
In comparison, etcd is designed as a configuration management tool, with configuration data typically represented as a hash table. This is why its data model is structured as a key-value format, effectively creating a single large global table. CRUD operations can be performed on this table, which has only two fields: a unique key with version information, and an untyped value. As a result, clients must retrieve the full value for further processing.
Overall, the key-value structure of etcd simplifies SQL and is more convenient and intuitive for the specific task of configuration management.
MVCC (Multi-Version Concurrency Control)
MVCC is an essential feature for versioning data in configuration management. It allows for:
- Querying historical data
- Determining the age of data by comparing versions
- Watching data, which requires versioning to enable incremental notifications
Both etcd and PostgreSQL have MVCC, but what are the differences between them?
etcd uses a globally incrementing 64-bit version counter to manage its MVCC system. There is no need to worry about overflow. The counter is designed to handle a vast number of updates, even if they occur at a rate of millions per second. Each time a key-value pair is created or updated, it is assigned a version number. When a key-value pair is deleted, a tombstone is created with a version number reset to 0. This means that every change produces a new version, rather than overwriting the previous one.
Furthermore, etcd retains all versions of a key-value pair and makes them visible to users. The key-value data is never overwritten, and new versions are stored alongside the existing ones. The MVCC implementation in etcd also provides read-write separation, which allows users to read data without locking, making it suitable for read-intensive use cases.
PostgreSQL's MVCC implementation differs from that of etcd in that it is not focused on providing incrementing version numbers, but rather on implementing transactions and different isolation levels transparently to the user. MVCC is an optimistic locking mechanism that enables concurrent updates. Each row in a table has a transaction ID record, with xmin
representing the transaction ID of the row creation and xmax
representing the transaction ID of the row update.
- Transactions can only read data that has already been committed before them.
- When updating data, if a version conflict is encountered, PostgreSQL will retry with a matching mechanism to determine whether the update should proceed.
To view an example, please refer to the following link: https://devcenter.heroku.com/articles/postgresql-concurrency
Unfortunately, using transaction IDs for version control of configuration data in PostgreSQL is not possible for several reasons:
- Transaction IDs are assigned to all rows involved in the same transaction, meaning version control cannot be applied at the row-level.
- Historical queries cannot be performed, and only the latest version of a row can be accessed.
- Due to their 32-bit counter nature, transaction IDs are prone to overflow and reset during vacuuming.
- It is not possible to implement watch functionality based on transaction IDs.
As a result, PostgreSQL requires alternative methods for version control of configuration data since built-in support is unavailable.
Client Interface
The design of a client interface is a critical aspect when it comes to determining the cost and resource consumption associated with its use. By analyzing the differences between interfaces, one can make informed choices when selecting the most suitable option.
etcd’s kv/watch/lease APIs have proved to be particularly adept at managing configurations. However, how can one implement these APIs in PostgreSQL?
Unfortunately, PostgreSQL does not provide built-in support for these APIs, and encapsulation is necessary to implement them. To analyze their implementation, we will examine the pg_watch_demo project developed by myself: pg_watch_demo.
gRPC/HTTP vs TCP
PostgreSQL follows a multi-process architecture, where each process handles only one TCP connection at a time. It uses a custom protocol to deliver functionality via SQL queries and follows a request-response interaction model (similar to HTTP/1.1, which handles only one request at a time and requires pipelining for processing multiple requests simultaneously). However, given the high resource consumption and relatively low efficiency, a connection pool proxy (such as pgbouncer) is crucial for improving performance, especially in scenarios with high QPS.
On the other hand, etcd is designed on a multi-coroutine architecture in Golang and offers two user-friendly interfaces: gRPC and RESTful. These interfaces are easy to integrate with and are efficient in terms of resource consumption. Additionally, each gRPC connection can handle multiple concurrent queries, which ensures optimal performance.
Defining Data
etcd
message KeyValue {
bytes key = 1;
// Revision number when the key was created
int64 create_revision = 2;
// Revision number when the key was last modified
int64 mod_revision = 3;
// Incrementing counter that increases every time the key is updated.
// This counter is reset to zero when the key is deleted, and is used as a tombstone.
int64 version = 4;
bytes value = 5;
// The lease object used by the key for TTL. If the value is 0, then there is no TTL.
int64 lease = 6;
}
PostgreSQL
PostgreSQL needs to use a table to simulate etcd's global data space:
CREATE TABLE IF NOT EXISTS config (
key text,
value text,
-- Equivalent to `create_revision` and `mod_revision`
-- Here, a big integer incrementing sequence type is used to simulate revision
revision bigserial,
-- Tombstone
tombstone boolean NOT NULL DEFAULT false,
-- Composite index, search by key first, then by revision
primary key(key, revision)
);
get
etcd
etcd's get
API has a wide range of parameters:
- Range queries, for example, setting
key
as/abc
andrange_end
as/abd
will retrieve all the key-value pairs with/abc
as the prefix. - Historical queries, specifying
revision
or a range ofmod_revision
. - Sorting and limiting the number of returned results.
message RangeRequest {
...
bytes key = 1;
// Range queries
bytes range_end = 2;
int64 limit = 3;
// Historical queries
int64 revision = 4;
// Sorting
SortOrder sort_order = 5;
SortTarget sort_target = 6;
bool serializable = 7;
bool keys_only = 8;
bool count_only = 9;
// Historical queries
int64 min_mod_revision = 10;
int64 max_mod_revision = 11;
int64 min_create_revision = 12;
int64 max_create_revision = 13;
}
PostgreSQL
PostgreSQL can perform the get function of etcd through SQL, and even provide more complex functionalities. Since SQL itself is a language rather than a fixed-parameter interface, it is highly versatile. Here we show a simple example of retrieving the latest key-value pair. Since the primary key is a combined index, it can be quickly searched by range, resulting in high-speed retrieval.
CREATE FUNCTION get1(kk text)
RETURNS table(r bigint, k text, v text, c bigint) AS $$
SELECT revision, key, value, create_time
FROM config
where key = kk and tombstone = false
ORDER BY key, revision desc
limit 1
$$ LANGUAGE sql;
put
etcd
message PutRequest {
bytes key = 1;
bytes value = 2;
int64 lease = 3;
// whether to respond with the key-value pair data before the update from this `Put` request.
bool prev_kv = 4;
bool ignore_value = 5;
bool ignore_lease = 6;
}
PostgreSQL
Just like in etcd, PostgreSQL does not execute changes in place. Instead, a new row is inserted, and a new revision is assigned to it.
CREATE FUNCTION set(k text, v text) RETURNS bigint AS $$
insert into config(key, value) values(k, v) returning revision;
$$ LANGUAGE SQL;
delete
etcd
message DeleteRangeRequest {
bytes key = 1;
bytes range_end = 2;
bool prev_kv = 3;
}
PostgreSQL
Similar to etcd, deletion in PostgreSQL does not modify data in place. Instead, a new row is inserted with the tombstone field set to true to indicate that it is a tombstone.
CREATE FUNCTION del(k text) RETURNS bigint AS $$
insert into config(key, tombstone) values(k, true) returning revision;
$$ LANGUAGE SQL;
watch
etcd
message WatchCreateRequest {
bytes key = 1;
// Specifies the range of keys to watch
bytes range_end = 2;
// Starting revision for the watch
int64 start_revision = 3;
...
}
message WatchResponse {
ResponseHeader header = 1;
...
// For efficiency, multiple events can be returned
repeated mvccpb.Event events = 11;
}
PostgreSQL
PostgreSQL does not come with a built-in watch function, and instead, it requires a combination of triggers and channels to achieve similar functionality. By using pg_notify
, data can be sent to all applications that are listening to a specific channel.
-- trigger function for distributing put/delete events
CREATE FUNCTION notify_config_change() RETURNS TRIGGER AS $$
DECLARE
data json;
channel text;
is_channel_exist boolean;
BEGIN
IF (TG_OP = 'INSERT') THEN
-- use JSON to encode
data = row_to_json(NEW);
-- Extract channel name for distribution from key
channel = (SELECT SUBSTRING(NEW.key, '/(.*)/'));
-- If an application is watching the channel, send an event through it
is_channel_exist = NOT pg_try_advisory_lock(9080);
IF is_channel_exist THEN
PERFORM pg_notify(channel, data::text);
ELSE
PERFORM pg_advisory_unlock(9080);
END IF;
END IF;
RETURN NULL; -- Result is ignored since this is an AFTER trigger
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER notify_config_change
AFTER INSERT ON config
FOR EACH ROW EXECUTE FUNCTION notify_config_change();
Since the watch feature is encapsulated, client applications must also implement corresponding logic. Using Golang as an example, the following steps must be taken:
- Start listening: When listening starts, all notify data will be cached at both the PostgreSQL and Golang channel levels.
- Retrieve all data using get_all(key_prefix, revision): This function reads all existing data starting from the specified revision. For each key, only the latest revision data will be returned, with any deleted data automatically removed. If revision is not specified, it returns the latest data for all keys with the given
key_prefix
. - Watch for new data, including any notifications that may have been cached between the first and second steps, to avoid missing any new data that may occur during this time window. Ignore any revisions that have already been read in step two.
func watch(l *pq.Listener) {
for {
select {
case n := <-l.Notify:
if n == nil {
log.Println("listener reconnected")
log.Printf("get all routes from rev %d including tombstones...\n", latestRev)
// When reconnecting, resume transmission based on the revision before the disconnection.
str := fmt.Sprintf(`select * from get_all_from_rev_with_stale('/routes/', %d)`, latestRev)
rows, err := db.Query(str)
...
continue
}
...
// maintain a state that records the latest revision it has received
updateRoute(cfg)
case <-time.After(15 * time.Second):
log.Println("Received no events for 15 seconds, checking connection")
go func() {
// If no events are received for a prolonged period, check the health of the connection
if err := l.Ping(); err != nil {
log.Println("listener ping error: ", err)
}
}()
}
}
}
log.Println("get all routes...")
// When initializing, the application should obtain all current key-value pairs and then incrementally monitor updates through watch
rows, err := db.Query(`select * from get_all('/routes/')`)
...
go watch(listener)
transaction
etcd
etcd's transactions are a collection of multiple operations with conditional checks, and the modifications made by the transaction are atomically committed.
Regenerate response
message TxnRequest {
// Specify the transaction execution condition
repeated Compare compare = 1;
// Operations to be executed if the condition is met
repeated RequestOp success = 2;
// Operations to be executed if the condition is not met
repeated RequestOp failure = 3;
}
PostgreSQL
The DO
command in PostgreSQL allows for the execution of any command, including stored procedures. It supports multiple languages, including built-in languages like PL/pgSQL and Python. With these languages, any conditional judgments, loops, and other control logic can be implemented, making it more versatile than etcd.
DO LANGUAGE plpgsql $$
DECLARE
n_plugins int;
BEGIN
SELECT COUNT(1) INTO n_plugins FROM get_all('/plugins/');
IF n_plugins = 0 THEN
perform set('/routes/1', 'foobar');
perform set('/upstream/1', 'foobar');
...
ELSE
...
END IF;
END;
$$;
lease
etcd
In etcd, it is possible to create a lease object that applications must renew periodically to prevent it from expiring. Each key-value pair can be linked to a lease object, and when the lease object expires, all associated key-value pairs will also expire, automatically deleting them.
message LeaseGrantRequest {
// TTL of the lease
int64 TTL = 1;
int64 ID = 2;
}
// Lease renewal
message LeaseKeepAliveRequest {
int64 ID = 1;
}
message PutRequest {
bytes key = 1;
bytes value = 2;
// Lease ID, used to implement TTL
int64 lease = 3;
...
}
PostgreSQL
- In PostgreSQL, a lease can be maintained through a foreign key. When querying, if there is an associated lease object that has expired, it is considered a tombstone.
- Keepalive requests update the
last_keepalive
timestamp in the lease table.
CREATE TABLE IF NOT EXISTS config (
key text,
value text,
...
-- Use a foreign key to specify the associated lease object.
lease int64 references lease(id),
);
CREATE TABLE IF NOT EXISTS lease (
id text,
ttl int,
last_keepalive timestamp;
);
Performance Comparison
PostgreSQL needs to simulate various APIs of etcd through encapsulation. So how is its performance? Here are the results of a simple test:https://github.com/kingluo/pg_watch_demo#benchmark.
The results show that the read and write performance are nearly identical, with PostgreSQL even outperforming etcd. Additionally, the latency from an update occurring to the application receiving the event determines the efficiency of the update distribution, and both PostgreSQL and etcd perform similarly. When tested on the same machine for both the client and the server, the watch latency was less than 1 millisecond.
PostgreSQL, however, has some shortcomings worth mentioning:
- The WAL log for each update is larger, resulting in twice as much disk I/O compared to etcd.
- It consumes more CPU compared to etcd.
- Notify based on channels is a transaction-level concept. When updating the same type of resource, the update is sent to the same channel, and the update requests contend for mutual exclusion locks, resulting in serialized requests. In other words, using channels to implement watch will affect the parallelism of put operations.
This highlights that to achieve the same requirements, we need to invest more in learning and optimizing PostgreSQL.
Storage
The performance is determined by the underlying storage, and how data is stored determines the database's resource requirements for memory, disk, and other resources.
etcd
Architecture diagram of etcd storage:
etcd first writes updates to the write-ahead log (WAL) and flushes them to disk to ensure that the updates are not lost. Once the log is successfully written and confirmed by a majority of nodes, the results can be returned to the client. etcd also asynchronously updates TreeIndex and BoltDB.
To avoid the log from growing infinitely, etcd periodically takes a snapshot of the storage, and logs prior to the snapshot can be deleted.
etcd indexes all keys in memory (TreeIndex), recording the version information of each key, but only keeps a pointer to BoltDB (revision) for the value.
The value corresponding to the key is stored on disk and maintained using BoltDB.
Both TreeIndex and BoltDB use the btree data structure, which is known for its efficiency in lookups and range lookups.
TreeIndex structure diagram:
(Image source: https://blog.csdn.net/H_L_S/article/details/112691481, licensed under CC 4.0 BY-SA)
Each key is divided into different generations, with each deletion marking the end of a generation.
The pointer to the value is composed of two integers. The first integer main
is the transaction ID of etcd, while the second integer sub
represents the update ID of this key within that transaction.
Boltdb supports transactions and snapshots, and it stores the value corresponding to the revision.
(Image source: https://blog.csdn.net/H_L_S/article/details/112691481, licensed under CC 4.0 BY-SA)
Example of writing data:
Writing key="key1", revision=(12,1), value="keyvalue5"
. Note the changes in the red parts of treeIndex and BoltDB:
(Image source: https://blog.csdn.net/H_L_S/article/details/112691481, licensed under CC 4.0 BY-SA)
Deleting key="key", revision=(13,1)
creates a new empty generation in treeIndex and generates an empty value in BoltDB with key="13_1t"
.
Here, the t
stands for "tombstone". This implies that you cannot read the tombstone because the pointer in treeIndex is (13,1)
, but in BoltDB, it is 13_1t
, which cannot be matched.
(Image source: https://blog.csdn.net/H_L_S/article/details/112691481, licensed under CC 4.0 BY-SA)
It is worth noting that etcd schedules both reads and writes to BoltDB using a single goroutine to reduce random disk I/O and improve I/O performance.
PostgreSQL
Architecture diagram of PostgreSQL storage:
Similar to etcd, PostgreSQL appends updates to a log file first, and waits for the log to be successfully flushed to disk before considering the transaction complete. Meanwhile, the updates are written to the shared_buffer memory.
The shared_buffer is a memory area that is shared by all tables and indexes in PostgreSQL, and it serves as a mapping for these objects.
In PostgreSQL, each table consists of multiple pages, with each page being 8 KB in size and containing multiple rows.
In addition to tables, indexes (such as btree indexes) are also made up of table pages in the same format. However, these pages are special and are interconnected to form a tree structure.
PostgreSQL is equipped with a checkpointer process that periodically flushes all modified table and index page to disk. Prior to each checkpoint, log files can be deleted and recycled to prevent the log from growing indefinitely.
Page structure:
(Image source: https://en.wikibooks.org/wiki/PostgreSQL/Page_Layout, licensed under CC 3.0 BY-SA)
Btree index structure:
(Image source: https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree, licensed under CC 3.0 BY-SA)
To enhance read performance, certain SQL statements in PostgreSQL consider using bitmaps to sequentially read scattered pages, thus improving I/O performance.
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100;
QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1 (cost=5.07..229.20 rows=101 width=244)
Recheck Cond: (unique1 < 100)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0)
Index Cond: (unique1 < 100)
Conclusion
PostgreSQL and etcd both prioritize I/O performance in their storage design. etcd even places the index of all keys in memory, and both systems optimize batch operations for sequential disk reads and writes.
As a result, as seen in the performance comparison above, PostgreSQL and etcd demonstrate similar read and write performance.
However, compared to PostgreSQL, etcd requires greater memory capacity and faster disks, as detailed in their Hardware guidelines for administering etcd clusters.
Distributed Computing
Decentralization and data consistency are the hallmark features of etcd, and they are also necessary requirements for cloud-native systems. However, how can traditional databases fulfill these requirements?
etcd
Raft is a popular distributed protocol used by etcd to distribute updates to multiple nodes, ensuring that committed data is confirmed by a majority of nodes.
Raft has rigorously defined roles and its role switching diagram is shown below:
(Image source: https://raft.github.io, licensed under CC 3.0 BY-SA)
By default, all reads and writes are executed on the master node in etcd. While it's easy to understand the need for consistency in writes, it's worth noting the importance of consistency in reads. Consistency in reads ensures that the reads come from committed data and the latest version of the data. It also ensures that each read version is equal to or greater than the previous read version.
To implement consistent reads, the slave nodes obtain the latest version of data from the master node. If the slave node's version is older than that of the master node, it waits for synchronization.
As a result, all read and write tasks in etcd are handled exclusively by the master node, while the distribution mechanism only guarantees the availability of replicas and data consistency, without offering any load balancing capabilities.
PostgreSQL
PostgreSQL originates from a traditional database background and does not include built-in implementations of distributed protocols like Raft. Nevertheless, it does possess the necessary data replication capabilities for clustering. By incorporating third-party Raft components, PostgreSQL can function as a distributed system that operates in the same manner as etcd.
PostgreSQL comes with a variety of basic features, including:
- Synchronous commit
- Quorum replication
- Failover trigger
- Hot standby
On the master node, transactions can be configured to require confirmation from multiple nodes for successful submission, with the number of confirmation nodes set to a majority (quorum).
For more information, see: https://www.2ndquadrant.com/en/blog/evolution-fault-tolerance-postgresql-synchronous-commit/
The role of data replication can be switched through a failover trigger, and tools like pg_rewind can remove data that hasn't been confirmed by a majority of nodes in order to rejoin the cluster later.
Hot-standby allows for serializable read, akin to etcd, which permits already committed data to be read on replica nodes, although it doesn’t guarantee to be the latest version.
Below is an example of relevant configurations:
-- set quorum sync replication in postgresql.conf
-- assume you have 5 nodes, then at least 2 standbys must be committed
-- then you could tolerate 2 nodes failures
synchronous_commit on
synchronous_standby_names ="ANY 2 (*)"
-- if master fails, check flushed lsn of each standby
-- promote a standby with max lsn to master
select flushed_lsn from pg_stat_wal_receiver;
PostgreSQL provides full support for clustering on the data plane, and the provision of the Raft component on the control plane enables the creation of a decentralized cluster. As a PostgreSQL worker process, the pg_raft
component, which I have provided to multiple commercial clients, offers cluster management functions such as leader election based on the Raft protocol.
Maintenance
etcd is a database designed for specific needs, which means it doesn't require much maintenance, making it one of its selling points.
Meanwhile, PostgreSQL requires less maintenance from DBAs compared to other relational databases due to its well-designed architecture. Like etcd, many of the maintenance tasks in PostgreSQL are automatic and built-in.
Database management involves various routine maintenance tasks, but this discussion will focus on two: compaction and snapshot backup.
Compaction
Maintaining multiple versions of data can lead to a bloated database and decreased read/write efficiency. To address this issue, older versions of data should be deleted when they are no longer needed, and any resulting gaps should be merged through a process called compaction.
etcd
etcd offers compact and defrag operations in its API to support this process.
The compact operation is used to delete all old versions of data before a certain revision. If the revision range includes the latest version, the latest version will be preserved. For example, if the command compact 100
is used and there is a key-value pair with key=foo, revision=87
it will be preserved, but the key-value pair with key=foo, revision=65
will be deleted. In other words, compact does not delete the current data version for each key.
etcd also provides an auto compaction feature that allows users to specify how often to run compact, such as every few hours.
When compact is used, it leaves gaps in the boltdb, which must be consolidated using defrag. However, defrag involves significant I/O and can block read and write operations, so it should be used with caution.
PostgreSQL
On the other hand, PostgreSQL's compaction process is also simple. For example, to delete old data before revision 100, one can use the following SQL command:
with alive as (
select r as revision from get_all('/routes/')
)
delete from config
where revision < 100 and not exists (
select 1 from alive where alive.revision = config.revision limit 1
);
If you need to execute compaction regularly, you can use crontab or pg_cron.
As for MVCC cleanup within the database, PostgreSQL has its own VACUUM command (VACUUM FULL equivalent to defrag in etcd), as well as an automated autovacuum feature.
Snapshot
Snapshot backup is a necessary maintenance task for databases, as it can be used for emergency recovery.
etcd
etcd provides an API for creating and restoring snapshots, for example:
$ etcdctl snapshot save backup.db
$ etcdctl --write-out=table snapshot status backup.db
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| fe01cf57 | 10 | 7 | 2.1 MB |
+----------+----------+------------+------------+
$ etcdctl snapshot restore backup.db
PostgreSQL
PostgreSQL also has very comprehensive backup tools:
- pg_basebackup is used to prepare data for new PostgreSQL replica nodes.
- pgdump is used to clone the database instance online and select which tables to back up.
In fact, based on WAL and logical replication, PostgreSQL also supports more advanced backup mechanisms. Please see the following link for more information:
www.postgresql.org/docs/current/continuous-archiving.html
www.postgresql.org/docs/current/logical-replication.html
Conclusion
PostgreSQL is a versatile traditional SQL database, while etcd is a specialized distributed KV database.
Compared to a pure data access system like etcd, PostgreSQL has several additional benefits:
- Rich authentication mechanisms that can implement complete RBAC and fine-grained access control, support multi-tenancy (multiple database instances), can filter IP addresses, and do not require additional proxies.
- SQL has built-in schemas, supports foreign keys, and does not require additional control logic to ensure data completeness.
- Supports JSON data types, JSON-based indexes, and various JSON operations, such as indexing routing configurations for routing matching.
- Supports data encryption and can access HashiCorp Vault to obtain secrets through FDW (Foreign Data Wrappers).
- Logical replication can achieve data synchronization between multiple independent clusters.
- Support for stored procedures, which can implement additional functionality, such as implementing upstream slow start.
In terms of functionality, PostgreSQL is a superset of etcd, so PostgreSQL can reproduce the functionality of etcd through its rich built-in features and third-party components, and can also be cloud-native.
While using PostgreSQL to implement etcd's functionality is technically feasible, it is akin to converting an aircraft carrier into a cruiser. However, if there are no requirements beyond etcd's capabilities, this approach can be cost-ineffective due to significant development and maintenance costs.
The most significant advantage of etcd is its out-of-the-box nature, meeting the configuration distribution needs of the cloud-native era. etcd can also serve as a core component for features such as leader election, distributed locks, and task scheduling.