What Is GraphQL?

Zexuan Luo

Zexuan Luo

November 4, 2022

Technology

GraphQL is an API-oriented query manipulation language released by Facebook in 2015. In contrast to other API designs, GraphQL allows clients to form query statements based on a pre-agreed data structure and enables the server to parse the statement and return only what is needed. In this way, GraphQL provides richness and flexibility while avoiding the performance loss caused by redundant data, which makes GraphQL a great choice for applications that require dealing with many complex data objects.

In 2018, GraphQL came out with a full specification and a stable version. That same year, Facebook donated the GraphQL project to the GraphQL Foundation under the Linux Foundation. Since then, GraphQL has landed in many open-source projects and commercial organizations. Till now, there are several major client-side implementations of GraphQL on the market. Server-side implementations are available in all major server-side programming languages, and even in niche languages such as D and R.

Some Real Scenarios and Challenges for GraphQL

The best-known example of GraphQL is GitHub's GraphQL API.

Before embracing GraphQL, GitHub provided a REST API to expose the rich data generated by millions of hosted projects, which was so successful that it became a model for people to emulate when designing REST APIs. However, as the number of data objects grew and the fields within the objects grew larger, the REST API began to reveal more and more drawbacks. On the server side, GitHub had to set strict limits on the frequency of calls in order to reduce costs because of the amount of data generated with each call. On the developer side, they had to contend with this limitation since while a single call returns a lot of data, most of it is useless. To get a particular piece of information, developers often need to launch multiple queries and then write a lot of glue code to stitch the meaningful data together from the query results into the desired content. In the process, they also have to put on the shackles of "number of calls".

Therefore, Github embraced GraphQL as soon as it came out. GitHub became the "ambassador" of GraphQL, delivering its application to thousands of developers. The GraphQL API is now the top choice for Github. Since the first announcement of support for GraphQL, GitHub has posted several articles about GraphQL every year. To enable developers to migrate to GraphQL, GitHub has written an interactive query application specifically for this purpose: https://docs.github.com/en/graphql/overview/explorer. Developers can learn how to write GraphQL through this application.

However, GraphQL is not a panacea. Just recently, GitHub deprecated its own GraphQL implementation of the package API. Many people have also started discussing some of the shortcomings of GraphQL. Many of the problems with GraphQL stem from the fact that its structure is so different from that of the HTTP standard that there is no easy way to map some of the concepts of GraphQL into a structure like the HTTP path/header. Treating GraphQL as a normal HTTP API requires additional development work. As a result, developers who want to manage their own GraphQL APIs will have to use a GraphQL-enabled API gateway.

How APISIX Supports GraphQL

Currently, APISIX supports dynamic routing through some properties of GraphQLs. With this capability, we can accept only specific GraphQL requests or have different GraphQLs forwarded to different upstreams.

Take the following GraphQL statement as an example:

query getRepo {
    owner {
        name
    }
    repo {
        created
    }
}

APISIX extracts the following three properties of GraphQL for routing:

  • graphql_operation
  • graphql_name
  • graphql_root_fields

In the above GraphQL statement:

  • graphql_operation corresponds to query
  • graphql_name corresponds to getRepo
  • graphql_root_fields corresponds to ["owner", "repo"]

Let's create a route to demonstrate APISIX's fine-grained routing capabilities for GraphQL:

curl http://127.0.0.1:9180/apisix/admin/routes/1 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -i -d '
{
    "methods": ["POST"],
    "uri": "/graphql",
    "vars": [
        ["graphql_operation", "==", "query"],
        ["graphql_name", "==", "getRepo"],
        ["graphql_root_fields", "has", "owner"]
    ],
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:2022": 1
        }
    }
}'

Next, use a request with a GraphQL statement to access:

curl -i -H 'content-type: application/graphql' \
-X POST http://127.0.0.1:9080/graphql -d '
query getRepo {
    owner {
        name
    }
    repo {
        created
    }
}'
HTTP/1.1 200 OK
...

We can see that the request reached upstream since the query statement matched all three conditions.

Conversely, if we access with a mismatched statement, for example, the owner field is not included:

curl -i -H 'content-type: application/graphql' \
-X POST http://127.0.0.1:9080/graphql -d '
query getRepo {
    repo {
        created
    }
}'
HTTP/1.1 404 Not Found
...

It will not match the corresponding routing rule.

We can additionally create a route that allows statements that do not contain an owner field to be routed to another upstream:

curl http://127.0.0.1:9180/apisix/admin/routes/2 \
-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -i -d '
{
    "methods": ["POST"],
    "uri": "/graphql",
    "vars": [
        ["graphql_operation", "==", "query"],
        ["graphql_name", "==", "getRepo"],
        ["graphql_root_fields", "!", "has", "owner"]
    ],
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "192.168.0.1:2022": 1
        }
    }
}'
curl -i -H 'content-type: application/graphql' \
-X POST http://127.0.0.1:9080/graphql -d '
query getRepo {
    repo {
        created
    }
}'
HTTP/1.1 200 OK
...

Prospects of APISIX's Future Support for GraphQL

In addition to dynamic routing, APISIX may also introduce more operations based on specific fields of GraphQL in the future. For example, GitHub's GraphQL API has a specific formula for rate limiting, and we can apply similar rules to convert a single GraphQL request into a corresponding number of "virtual calls" to accomplish GraphQL-specific rate limiting.

We can also think of the problem in a different way. The application itself still provides the REST API, and the gateway converts GraphQL requests into REST requests and REST responses into GraphQL responses at the outermost level. The GraphQL API provided in this way can perform functions such as RBAC, rate limiting, caching, etc., without developing special plugins. From a technical point of view, this idea is not that hard to implement. After all, in 2022, even REST APIs tend to provide OpenAPI specs as schema, which is just a transfer between GraphQL schema and OpenAPI schema, plus GraphQL-specific field filtering. (Of course, I must admit that I haven't practiced it myself. Maybe there are challenges in some details that have yet to be overcome.)

Careful readers will find that the GraphQL APIs converted in this way can only operate on one model at a time, which obviously does not meet the flexibility requirements of GraphQL and is nothing more than a REST API in GraphQL's clothing. However, GraphQL has a concept called schema stitching that allows implementers to combine multiple schemas together.

As an example, we have two APIs, one called GetEvent and the other called GetLocation, which return the types Event and Location respectively.

type Event {
    id: string
    location_id: string
}

type Location {
    id: string
    city: string
}

type Query {
    GetEvent(id: string): Event
    GetLocation(id: string): Location
}

We can add a configuration that combines these two APIs into a new API called GetEventWithLocation, which looks like this:

type EventWithLocation {
    id: string
    location: Location
}

type Query {
    GetEventWithLocation(id: string): EventWithLocation
}

The specific implementation of stitching is finished by the gateway. In the above example, the gateway splits the API into two, calling GetEvent to get the location_id and then GetLocation to get the combined data.

In short, by converting REST to GraphQL, each REST API can be turned into a corresponding GraphQL model; and with the help of schema stitching, multiple models can be combined into one GraphQL API. In this way, we can build a rich and flexible GraphQL API on top of the existing REST API and manage specific plugins at the granularity of the REST API. This design incidentally solves some of the API orchestration problems. As in the example above, we take the output of one API (Event.location_id) as the input of another API (Location.id).

Tags:
GraphQLAPI Gateway Concept