Leveraging API Gateway for Data Sovereignty and Data Compliance
November 28, 2022
Background and Challenges
With the growing popularity of smartphones, IoT, and high-speed mobile networks, a large amount of sensitive data has been generated, such as photos, financial transactions, geographic locations, DNA sequencing, medical records, and clinical trials. Statistical analysis based on these sensitive data can accurately portray individuals, companies, or groups of people’s identities, threatening personal privacy and national security.
More and more countries’ legislative bodies are aware of the seriousness and urgency of this problem. They have introduced many laws and regulations to regulate data collection and cross-border transfer. The General Data Protection Regulation (GDPR) in Europe and the Health Insurance Portability and Accountability Act (HIPAA) in the US are pioneers of these laws. Many developing countries are also stepping up:
- Malaysia: Personal Data Protection Act 2010 (PDPA), effective in November 2013.
- China: The Cybersecurity Law of the People's Republic of China, Data Security Law of the People's Republic of China, and Personal Information Protection Law of the People’s Republic of China were promulgated successively from 2017 to 2021.
- Brazil: Brazilian General Data Protection Law (BGDP), effective in September 2020.
- Thailand: Thailand's Personal Data Protection Act (PDPA), effective in June 2022.
So, data generated by terminals and users, stored and kept by manufacturers, are supervised by multiple law enforcement agencies. Thus, enterprises, especially large multinational enterprises, are faced with many urgent new problems:
- Which data can be collected? Which cannot?
- How and where is the data stored?
- Can data be transferred across borders?
It will be a massive project to sort out and formulate solutions for all of them. Here we will mainly focus on one issue:
For the client data transmitted through the API, how to determine the data sovereignty at the API gateway level to ensure that the data is being processed and stored lawfully?
For example, an American user on a European business trip uses his mobile phone to make a bank transfer. At this time, the transaction data must be processed and stored by a farther server in the United States, not a closer European server.
In the second half of this article, we will give a specific technical solution for this example. Before that, let's first look at data sovereignty and compliance.
What are Data Sovereignty and Data Compliance?
Data Sovereignty
A country not only has sovereignty over physical space, such as territory, airspace, and territorial waters, but also has sovereignty over its data and national cyberspace.
Take the General Data Protection Regulation (GDPR) as an example, which is an EU regulation for the privacy and protection of personal data. There is one of the most basic requirements in GDPR. "All user data collection behaviors require the user's consent and the user has the right to clear and remove personal data stored at all times."
Therefore, if a company wants to transfer European data to other regions, it must ensure that the third country's data sovereignty requirements meet that of the EU. Concerning the need for data to comply with local laws, there are indeed many concerns in multinational businesses.
Another concern is the USA PATRIOT Act, which requires all data stored in the US, or data stored by American companies, to be under the supervision of the United States. The US Department of Justice and the Central Intelligence Agency (CIA) have the right to ask companies to provide data. In 2013, the U.S. Department of Justice asked Microsoft to disclose some email information stored on its Ireland servers. Microsoft rejected the U.S. Department of Justice's request because it would violate European Union regulatory requirements. Then the U.S. Department of Justice took Microsoft to court, but Microsoft won the case. Later, to avoid risking data sovereignty, many companies in the United States placed their data centers directly in Europe, thinking this would be safe. Nevertheless, there have been some cases recently that the judges ruled that the U.S. has the authority to request data from U.S. companies in Europe. This is the long-arm jurisdiction of the United States.
Data sovereignty has indeed brought significant challenges to the global business of enterprises, and how to properly handle the issue of data sovereignty in companies has become particularly important.
Data Compliance
For multinational companies, data synchronization is relatively simple if there is no requirement for data sovereignty. A user's data in the United States can easily be synchronized to servers in Asia and the United Kingdom, as shown in the diagram below. In this way, when an American travels in Asia, he can also access various data generated when he was back in the US.
With the compliance requirements of data sovereignty, a lot of data cannot be synchronized and accessed across countries. Enterprises need to distinguish users and isolate their associated data. A standard method is to divide users based on regions.
Take Amazon's Kindle, e-books purchased by users in the US cannot be downloaded to their Kindle with a Chinese account. This is because the data between different countries (regions) is entirely isolated. The architecture of the system is as follows:
So, what should be done technically if a user in the UK wants to access Amazon UK with a US account? Let's take a look at the architecture diagram below. Most of the existing API gateway products propose similar solutions.
Existing API Gateway Level Solution
We can put the core of this solution in one sentence:
The API gateway in the UK identifies the user. If the API finds out that the user is registered in the US, it will be routed to the US servers for processing.
Nevertheless, there are also some technical challenges hidden behind this, as well as hidden dangers of compliance:
-
The API gateway needs fine-grained route scheduling capabilities, obtains data from HTTP header, request args, and request body, and cooperates with external database queries to determine which server to handle the user.
-
The network between the regions needs to be connected to forward the request. The UK server room and the US server room need to be connected.
-
The API gateway in the UK server might have already offloaded the SSL certificate, read the API's content, and recorded the data to the local disk or other services through access logs, audit logs, observability systems, etc.
Is there a way to solve these problems?
Multilayer Network: Apache APISIX's Solution to Ensure the Compliance of API Data Transmission
Here we introduce the concept of a "multilayer network" in APISIX to ensure the compliance and security of data transmitted by API at the API Gateway level. A multilayer network, as the name suggests, divides the API gateway into two layers, Layer 1 and Layer 2, as shown in the following figure:
- Layer 1 API gateway: responsible for SSL certificate offloading, fine-grained route scheduling, and deciding which Layer 2 API gateway should handle the API requests.
- Layer 2 API gateway: This is the original API gateway, which does not need to worry about data compliance.
Back to the question at the beginning of the article: how can a user registered in the United States ensure API data compliance regardless of the location of his transaction?
First, the API request will be sent to the Layer 1 API gateway, which is essentially Apache APISIX but adds the multi-layer network
object, on which custom plugins can be bound:
- The Layer 1 API gateway defines the address, weight, and other information of the Layer 2 API gateway clusters. Here we set up the US cluster and the UK cluster:
http://Layer-1-API-Gateway-IP/apisix/admin/multilayer_network/clusters/cluster-US
{
"desc": "description",
"http_port": 80,
"https_port": 443,
"gateways": [
{"host": "IP1", "weight": 1},
{"host": "IP2", "weight": 2}
]
}
http://Layer-1-API-Gateway-IP/apisix/admin/multilayer_network/clusters/cluster-UK
{
"desc": "description",
"http_port": 80,
"https_port": 443,
"gateways": [
{"host": "IP1", "weight": 1},
{"host": "IP2", "weight": 2}
]
}
- Define the routing rules on the multilayer network, and bind with the
bar
plugin:
http://Layer-1-API-Gateway-IP/apisix/admin/multilayer_network/routes/bank-foo
{
"desc": "bank API",
"hosts": ["foo.com"],
"uris": ["/*"],
"plugin_id": "bar"
}
- Define custom plugins:
http://***/apisix/admin/multilayer_network/plugins/bar
{
"desc": "plugin",
"plugins": {
"jwt-auth": {
... ...
},
"foo-upstream-selector": {
"scheme": "HTTPS"
... ...
},
... ...
}
}
Here we bound two plugins. The jwt-auth plugin is used to complete the identity authentication of the request. The foo-upstream-selector is used to read information such as user ID, country/region, and cluster to which the user belongs from the database and specifies which Layer 2 API Gateway cluster it routes.
This multilayer architecture ensures data compliance across different countries.
Conclusion
In short, the fine-grained route scheduling enabled by the API gateway's multilayer architecture can help enterprises process API data quickly and safely while ensuring data compliance requirements. We provide this out-of-the-box functionality in API7 Cloud and API7 Enterprise. Welcome to fill out the form to contact us.