Lightweight summary of our paper “Semantics and Complexity of GraphQL”

While the ease of use of its query language has certainly contributed to the phenomenal adoption of GraphQL, little has been known about fundamental computational properties of this language. To fill this gap, together with Jorge Pérez from the Universidad de Chile, we have recently embarked on a formal study of the language. The result of our work is a research paper titled “Semantics and Complexity of GraphQL” (download a pre-print of our paper). After a typical academic review process, this paper has been accepted to be presented in The Web Conference 2018. Due to the highly technical, math-heavy nature of this work, the paper may not be easily digestible for a broader, Web-developer audience. Even the summary of the paper by Adrian Colyer of “the morning paper” is still fairly technical. Therefore, in this post I aim to provide a more lightweight high-level summary of the results of our work (if you prefer a more visual summary, you may want to check out the video recording of my talk at GraphQL Europe). Hereafter, when I talk about the GraphQL language, I mean the query-specific part of the language that the GraphQL framework introduces to retrieve data from a Web server.

We have studied this language in terms of a number of typical computation-related questions that people focus on when analyzing query languages. In Theoretical Computer Science such types of questions are captured in the form of formally-defined problems and the subject of study is to achieve a mathematically-proven understanding of how difficult it is to solve any possible instance of a given problem. A typical example of such a problem in the context of database query languages is to decide for any given answer that may be in the result of a given query over a given database whether this answer is indeed in the query result. You may notice that this decision problem is related to the task of producing the query result. In fact, the difficulty of this problem (for a given query language) is one of the most commonly used formal measures to identify how computationally complex a query language is. The problem is usually called the evaluation problem of a query language.

The Evaluation Problem of GraphQL

One of our contributions in our paper has been to show that evaluation problem of the GraphQL language is “very simple”. More precisely, we show that the problem is NL-complete. It is well-known that NL-complete problems can be solved in practice by programs that use a high degree of parallelism. Hence, this gives us our first positive observation about the GraphQL language. Admittedly, this is a very abstract finding. However, to appreciate this property of the GraphQL language we may look at other well-known query languages. For instance, we may consider the Relational Algebra, which is the formal foundation of SQL. It has been shown that the complexity of the evaluation problem of the Relational Algebra is PSPACE-complete. The same has been shown for version 1.0 of the RDF query language SPARQL. Problems in this class are highly intractable. As another example, for general conjunctive queries the evaluation problem is NP-complete. NP-complete problems are commonly believed to be less complex than PSPACE-complete problems but still intractable. In contrast, there is a class called PTIME which consists of problems that are known to be tractable; that is, for each such problem there exists an algorithm whose runtime is polynomial in the size of the input. This is the case for SPARQL basic graph patterns (which comprise the conjunctive fragment of SPARQL) for which the evaluation problem is in PTIME. How does this compare to the NL-completeness of the evaluation problem of GraphQL? The good news for GraphQL fans is that NL is an even lower complexity class than PTIME.

The Enumeration Problem of GraphQL

After having done this initial theoretical comparison of the GraphQL language to other query languages, we have looked at another, more practical computation-related problem. This problem is called the enumeration problem and is concerned with how difficult it is to produce the complete result of a query. To study this problem we made our life a bit easier by considering only a particular class of GraphQL queries. We call the queries in this class “non-redundant queries in ground-typed normal form“. Informally, these are queries for which the process of field collection (as specified in Section 6.3.2 of the GraphQL spec) is not necessary. For these queries, and any arbitrary set of data, we have shown that the respective query results can be produced symbol-by-symbol with only constant time delay between symbols. This property is the best that one can hope for in a query language and not many query languages possess this property!

At this point it has to be emphasized that focusing on non-redundant queries in ground-typed normal form is not a limitation of our work. In contrast! As we also show in our paper, every arbitrary GraphQL query can be rewritten into a non-redundant, ground-typed query that is guaranteed to produce the same result. For the necessary rewriting rules refer to Proposition 3.9 in our paper.

Given our positive finding regarding the constant time delay, we can also infer the following property of the GraphQL language: The time required to produce the complete query result (for any non-redundant query in ground-typed normal form) depends linearly on the size of this result. This is another highly desirable property of a query language!

The Issue of Exponential Result Size Blow-up in GraphQL

The only caveat regarding the latter finding is that there are cases in which the size of the result of a GraphQL query depends exponentially on the size of the query. For a simple example, consider querying data with a GraphQL schema whose query type has a field that allows us to retrieve data about a person, say Alice; this data has a scalar field name with value Alice and another field knows with an array of two data objects about two other persons; each of these data objects also has a knows field referring back to Alice. Then, for a query of the form

start { knows { knows { ... { knows { name } } ... } } }

in which 2xN knows fields are nested, the result contains the value Alice 2^N times! In other words, increasing the size of the query linearly (adding two more levels of nesting in each step) will cause the size of the query result to increase exponentially. This is not just a theoretical issue. In the introduction of our paper we describe an experiment in which we have observed the issue when querying the public GraphQL API of Github. As a sidenote, and in all fairness to the GraphQL language, such an exponential result size blow-up is possible in many query languages (all that it needs is the possibility of expressing conjunctive queries).

While we are the first to formally quantify the possible extend of the result size blow-up of GraphQL, the basic issue of potentially huge results has been recognized before in the GraphQL community. Existing approaches to address this issue are to restrict queries i) based on their nesting depth, ii) based on a calculation of the maximum possible number of result nodes at each level of the nesting depth, or ii) based on some cost estimation or complexity estimation for which users have to provide various cost factors for different elements of GraphQL schemas. Notice that all these approaches use heuristics and none of them takes into account the actual data being queried. As a consequence, these approaches fall short in providing a robust solution for the issue as they can fail in both directions: discarding requests for which the response may be of a manageable size, and allowing requests for which producing the complete response is too resource intensive.

Our Solution to the Result Size Issue

Instead of relying on data-agnostic heuristics and estimates, we introduce an approach which is accurate. This approach comes in the form of an algorithm that returns the exact size of the result of any given non-redundant GraphQL query in ground-typed normal form (see above) over any possible data set (without actually producing this query result). We use this algorithm to prove that the result size can be computed in an amount of time that depends only linearly on the product of the query size times the data set size. Hence, our algorithm achieves this complexity bound. The idea of the algorithm is to recursively consider every subquery and add up the sizes of results of the subquery for every part of the queried data for which the subquery has to be evaluated. During the recursive process, data objects are labeled with the result sizes of subqueries evaluated at them. These labels are then used to avoid the repeated calculation of subquery result sizes.

As an example, consider some simple example data that may be illustrated as the following graph of objects (where the node named r in the graph corresponds to the query object that would have to be specified in a GraphQL schema for this data).

Moreover, consider the following GraphQL query.

query {
  start {
    advisor {
      univ { name }
    }
    friend {
      univ { name }
    }
  }
}

Apparently, the result of this query over our example data would be the following JSON object.

start: {
  advisor: {
    univ: { name: UCh }
  }
  friend: {
    univ: { name: UCh }
  }
}

Now, instead of immediately starting to produce this result, suppose we first want to calculate its size. We measure the size of GraphQL query results in terms of symbols, where every field name counts as a symbol, and so does every scalar value and every special character (such as colons and curly braces). For instance, the aforementioned result of our example query happens to be of size 26. However, we want to calculate this size without already having produced the query result. To this end, we may first look at the root field start of the query object, and we observe that the number of result symbols that would be produced from this field is 4+x where the number 4 covers the symbols in the first and the last line of the example result and x is the sum of the result symbols obtained from going to the data object u and evaluating each of the two subqueries, q1 = advisor{univ{name}} and q2 = friend{univ{name}}. To make the following explanation more concise, let us denote the size of these two subquery results by writing size(q1,u) and size(q2,u), respectively. Thus, we know that the size of the complete query result is equivalent to

4 + size(q1,u) + size(q2,u)

What remains is to calculate the sizes of the two subquery results, which we can do in a recursive fashion using a similar reasoning as before: First, let’s focus on size(q1,u). By looking at the data object u, we observe that the root field advisor of subquery q1 is present in u. Therefore, we know that the subquery result will contain 4 result symbols from the advisor field plus the symbols obtained from going to data object v and evaluating the subquery q3 = univ{name}. Hence, we have that

size(q1,u) = 4 + size(q3,v)

and we need to enter the recursion again to calculate size(q3,v). In this case, we have that

size(q3,v) = 4 + size(q4,w)

with subquery q4 = name, and another recursion step is needed for size(q4,w). This step now brings us to a base case of the recursion because, by looking at data object w, we see that the result of subquery q4 is name:UCh and, thus, contains exactly 3 symbols. At this point, our algorithm adds a label to data object w to indicate that the result size of subquery q4 at this object is 3. Next, the recursion comes back to calculating the value of size(q3,v), for which it now becomes clear that this value is 7. So, data object v is labeled to indicate that the result of subquery q3 at this object is 7. Returning now to the calculation of size(q1,u), it becomes clear that size(q1,u)=11 and the data object u is labeled accordingly.

Being back at the initial step of the recursion now, we still have to calculate the value of size(q2,u). Observe that q2 contains the subquery q3 that we have seen before and it also has to be evaluated at the same data object as before (that is, object v). Therefore, to calculate

size(q1,u) = 4 + size(q3,v)

it is now possible to obtain the value of size(q3,v) by reading the corresponding label at object v instead of going into the recursion again for size(q3,v). The remaining steps of the calculation should not be difficult to guess at this point.

As can be observed from this example, labeling the data objects during the execution of our algorithm enables the algorithm to avoid repeating the same calculations and, thus, to achieve the polynomial runtime (even in cases in which actually producing the query result would have an exponential runtime). For a pseudo-code representation of the complete algorithm refer to our paper. We also have a prototypical JavaScript implementation of the algorithm.

Given this algorithm, we propose that GraphQL servers should perform the following steps for any given query:

First, compute the size of the response, which can be done efficiently by using our algorithm.
If the size is too big, reject query.
Else, inform the size to the client, and
Produce and send the response byte by byte.

Conceptual Contributions

Before wrapping up, I would like to also highlight some of the conceptual contributions that we have made in addition to all the aforementioned technical contributions. As a foundation of our work, we have developed a formally more precise definition of the GraphQL language. The main reason why this was necessary is that the semantics of GraphQL queries—i.e., the definition of what the expected result of any given query is—is given in the GraphQL specification by means of a recursive program specified by pseudo code. This recursion is based on an operation to resolve any field in a query. Surprisingly, this operation is not fully specified and, instead, simply assumes access to an “internal function […] for determining the […] value of [the] field.” While the lack of a more precise definition of this internal function is likely to be intentional (to allow for implementations of GraphQL on top of arbitrary database back-ends), it makes a systematic analysis of the GraphQL language unworkable. As a basis for providing a well-defined formalization of the language, we had to introduce a formal model that defines the notion of a so-called GraphQL graph. This notion is an artifact of realizing that GraphQL enables users to query an underlying data set in terms of a virtual, graph-like view of this data set. The form of this view depends on the GraphQL schema being used, and the resolver functions that implement the schema can be understood to establish the view. Our notion of a GraphQL graph is an abstraction to conceptualize such a view independent of its relationship to the underlying data set. As a sidenote for readers who are familiar with the notion of Property Graphs as supported by popular graph database systems like Neo4J and by the Apache Tinkerpop graph processing framework, a GraphQL graph is very similar to a Property Graph (with a few additional, GraphQL-specific features). The technical details of all these definitions can be found in our paper.

Thanks

As a final remark, I want to thank the GraphQL community (in particular, the initial creators of the original GraphQL spec) for the excellent documentation of GraphQL which has made our work of formalizing the language a pleasant journey.

Hi Ashera, thanks for your interest in our work! In your first example, our algorithm may visit the continent nodes and the country nodes multiple times, but only once per subquery for a subresult has to be produced at the visited node. In contrast, to actually produce the overall query result, these subresults have to be obtained multiple times from the visited nodes. Let’s make this a bit more concrete: Consider the following subquery of your query.

countries {
continent {
code
}
}

There will be a point during the execution of your query where we have to produce the result of this subquery for the graph node that represents the continent Europe. In fact, there are multiple different ways (“paths”) how we can reach the Europe node starting from the root/query node to come to this point. For instance,

query node –continents–> Europe –countries–> Sweden –continent–> Europe

…and many more (in this case, as many as there are countries in Europe). Each of these paths will contribute to a separate subtree of the overall query result. Then, for each of these paths, after reaching the Europe node, the subresult for the aforementioned subquery at the Europe node will always be the same. Hence, we will end up with identical copies of this same subresult in each of the subtrees of the overall query result. This is how the query results may blow up in size.

Now, for our algorithm we make use of the fact that the subresult of the given subquery at the Europe node is the same for all of the paths described above: When our algorithm reaches the Europe node with the given subquery for the first time, it calculates the size of the subresult (by calling itself recursively). Say that this subresult size is an integer k. Next, the algorithm annotates the Europe node to record that the size of the subquery for this node is k. Later, whenever the algorithm reaches the Europe node with the given subquery again (i.e., for each of the other possible paths), it does not need to call itself recursively anymore but, instead, may simply return the value k from the annotation.

So, in summary, the algorithm has to consider the subresult of each subquery at each respectively relevant graph node only once, whereas producing the query result (i.e., actually executing the query) requires repeatedly producing the subresults for all subqueries at all the respectively relevant graph nodes.

Your second question cannot be answered because your assumption here is wrong. There are two subqueries within the selection set of ‘a’:

b(first:10) { c }

and

b(last:30) { d }

Now, it is not just one ‘b’ node that will be visited by these two subqueries. In contrast, the first subquery visits 10 ‘b’ nodes and the second subquery visits 30 ‘b’ nodes. If the overall number of ‘b’ nodes that can be reached from the corresponding ‘a’ node is 40 or greater, then there is no overlap in terms of the ‘b’ nodes that the first subquery visits and the ‘b’ nodes that the second subquery visits. However, if the number is smaller than 40, then some ‘b’ nodes may be visited by both subqueries. For this case, I can answer your question, and the answer is: no, it does not matter for our algorithm. The reason is that, within the two subqueries, you are querying for something different. In other words, the next-level subqueries within these two subqueries are different.

I hope this helps.

4 Replies to “Lightweight summary of our paper “Semantics and Complexity of GraphQL””

Olaf Hartig says:

April 22, 2020 at 09:12

Hi Ashera, thanks for your interest in our work! In your first example, our algorithm may visit the continent nodes and the country nodes multiple times, but only once per subquery for a subresult has to be produced at the visited node. In contrast, to actually produce the overall query result, these subresults have to be obtained multiple times from the visited nodes. Let’s make this a bit more concrete: Consider the following subquery of your query.

countries {
continent {
code
}
}

There will be a point during the execution of your query where we have to produce the result of this subquery for the graph node that represents the continent Europe. In fact, there are multiple different ways (“paths”) how we can reach the Europe node starting from the root/query node to come to this point. For instance,

query node –continents–> Europe –countries–> Sweden –continent–> Europe

or

query node –continents–> Europe –countries–> Sweden –continent–> Europe

…and many more (in this case, as many as there are countries in Europe). Each of these paths will contribute to a separate subtree of the overall query result. Then, for each of these paths, after reaching the Europe node, the subresult for the aforementioned subquery at the Europe node will always be the same. Hence, we will end up with identical copies of this same subresult in each of the subtrees of the overall query result. This is how the query results may blow up in size.

Now, for our algorithm we make use of the fact that the subresult of the given subquery at the Europe node is the same for all of the paths described above: When our algorithm reaches the Europe node with the given subquery for the first time, it calculates the size of the subresult (by calling itself recursively). Say that this subresult size is an integer k. Next, the algorithm annotates the Europe node to record that the size of the subquery for this node is k. Later, whenever the algorithm reaches the Europe node with the given subquery again (i.e., for each of the other possible paths), it does not need to call itself recursively anymore but, instead, may simply return the value k from the annotation.

So, in summary, the algorithm has to consider the subresult of each subquery at each respectively relevant graph node only once, whereas producing the query result (i.e., actually executing the query) requires repeatedly producing the subresults for all subqueries at all the respectively relevant graph nodes.

Your second question cannot be answered because your assumption here is wrong. There are two subqueries within the selection set of ‘a’:

b(first:10) { c }

and

b(last:30) { d }

Now, it is not just one ‘b’ node that will be visited by these two subqueries. In contrast, the first subquery visits 10 ‘b’ nodes and the second subquery visits 30 ‘b’ nodes. If the overall number of ‘b’ nodes that can be reached from the corresponding ‘a’ node is 40 or greater, then there is no overlap in terms of the ‘b’ nodes that the first subquery visits and the ‘b’ nodes that the second subquery visits. However, if the number is smaller than 40, then some ‘b’ nodes may be visited by both subqueries. For this case, I can answer your question, and the answer is: no, it does not matter for our algorithm. The reason is that, within the two subqueries, you are querying for something different. In other words, the next-level subqueries within these two subqueries are different.

I hope this helps.
Ashera says:

April 17, 2020 at 05:57

Hello professor, I read your paper and I’m not quite sure if these cases will be handled by the suggested algorithm. Given the below queries, won’t the whole query be executed for the response size calculation? I would appreciate it if you can show the way the algorithm can be applied for these use-cases. Please point out if my idea is wrong altogether, as I am new to the area and just read your paper a few times.

If a schema like this is considered (note that this is just a part of the schema):

type Query {
continents(filter: ContinentFilterInput): [Continent!]!
}

type Country {
code: ID!
name: String!
continent: Continent!
}

type Continent {
code: ID!
name: String!
countries: [Country!]!
}

Then, if a query like this is provided:

{
continents {
countries {
continent {
countries {
continent {
code
}
}
}
}
}
}

I used a public GraphQL API to formulate the above-mentioned query. This is where I tested this query: https://countries.trevorblades.com/

Secondly, if a query like this is considered:
{ a { b(first:10) { c } b(last:30) { d } } }

Will the fact that node b has already been visited once matter, for the second section of the query (that is for “b(last:30) { d }”)?
Olaf Hartig says:

May 5, 2019 at 18:06

I am not sure what exactly you mean by “the query had still resolved by graphql server.” Notice that the query does not need to be answered completely in order to determine the result size as per our algorithm (i.e., the complete result does not need to be produced in order to determine its size). Nonetheless, running the algorithm to determine the result size (without producing the result) consumes server resource. Sure. Nothing comes for free. However, the amount of resources consumed does not grow exponentially with the size of the query, which is the case for the amount of resources consumed to produce the query result (i.e., to actually answer the query completely).
LiMingze says:

May 5, 2019 at 16:37

hey! professor, I have a question. the query had still resolved by graphql server before computing its result size, it had consumed server resources that nested queries brought. I want to know whether the proposion is merely cuting network traffic output.