Defining Graph Database Schemas by using the GraphQL Schema Definition Language

Abstract: An important component of the GraphQL framework is the GraphQL schema definition language (SDL). The original purpose of this language is to define a so-called GraphQL schema that specifies the types of objects that can be queried when accessing a specific GraphQL Web API. In this document we propose an approach to repurpose the SDL such that it may also be used to define schemas for graph databases that are represented using the Property Graph data model.

Update (June 9, 2019): In the meantime we have published a research paper about the approach described in this document. That research paper introduces additional features for the approach; additionally, it provides a formal definition and some complexity results that show fundamental properties of the approach.

We assume that the reader is familiar with the basics of both the GraphQL schema definition language and the notion of a Property Graph. To learn about the former we suggest to read the “Schemas and Types” section of the official GraphQL documentation. A brief description of the latter can be found in the section “The Property Graph Model” of neo4j’s documentation.

Outline:

  1. Informal Description of the Approach
  2. Detailed Definition

1. Informal Description

In this section we describe the proposed approach informally. That is, we outline our general idea of what it means for a Property Graph to conform to a schema defined in the GraphQL SDL and we introduce how the various features of the SDL are interpreted in this context. The next section shall then go into the details of the various features of the language and their exact meaning in the context of defining Property Graph schemas.

If you want to first see an example that covers many aspects of the proposal before you start reading, then jump to Example 12 and come back here when you are ready.

1.1 Specifying Types of Nodes using Object Type Definitions

The main type of elements defined in a GraphQL schema are object types that consist of a name and a list of field definitions. If used for Property Graphs, we propose that object types define the types of nodes that a conformant Property Graph may contain. That is, for every node in the graph, the label of the node must be the name of one of the object types in the schema. We call this object type the type of the node or we say that the node is of this type.

Example 1: Consider the following example schema. Every Property Graph that conforms to it can contain only two types of nodes: nodes whose label is “User” and nodes whose label is “UserSession”.

type UserSession {
  id: ID!
  user: User!
  startTime: Time
  endTime: Time
}

scalar Time

type User {
  id: ID!
  loginName: String! 
  name: String
}

While the name of an object type has to be used as the label of each node that is of this type, the field definitions of the object type specify what properties (key-value pairs) these nodes may have and what their outgoing edges may be. To this end, we generally distinguish two types of field definitions, namely, i) field definitions in which the type of the possible field values is a scalar type (e.g., Int, Float, String) or an enumeration type (or a list type that wraps a scalar type or an enumeration type) and ii) field definitions in which the type is an object type or a list type wrapping an object type. We call the former attribute definitions and the latter relationship definitions.

Example 2: The object type UserSession in the aforementioned example schema contains three attribute definitions (the fields id, startTime, and endTime) and one relationship definition (the field user). The three field definitions in the type User all are attribute definitions (notice that String and ID are built-in scalar types defined by the specification of the GraphQL DSL).

1.2 Specifying Node Properties using Field Definitions

Every field definition that is an attribute definition specifies that the corresponding nodes may have a property (a key-value pair) whose key is the name of the field and whose values must be of a type that depends on the type given in that field definition. Hence, if the type in the field definition is a scalar type, the property value must be of that type; if, on the other hand, the type in the field definition is an enumeration type, the property value must be one of the values of that enumeration type. If the type in the field definition is a list type that wraps a scalar type or an enumeration type, then the property value must be an array of values of the wrapped type. Finally, if the type in the field definition is marked to be non-nullable (i.e., there is an exclamation mark after the type), then the property is mandatory; otherwise, it is optional.

Example 3: Given the aforementioned example schema, every node whose label is “User” may have two or three properties. That is, two properties are mandatory and they must have the key “id” and “loginName”, respectively. The third property is optional and, if used, its key must be “name”. The value of this “name” property must be a (single) string, and so must be the value of the “loginName” property. The value of the “id” property must be a unique identifier. Similarly, nodes with the label “UserSession” may have one to three properties, where an “id” property is mandatory and the properties “startTime” and “endTime” are optional. Notice that the field definition of the field called user does not define a property because it is a relationship definition not an attribute definition.

1.3 Specifying Outgoing Edges using Field Definitions

Every field definition that is a relationship definition specifies that the corresponding nodes may have outgoing edges whose label is the name of the field and whose target node is of the type that is the object type given in that field definition. If the type in the field definition is a list type (wrapping an object type), then a node may have multiple such outgoing edges; otherwise, the nodes must not have more than one such outgoing edge. Hence, the latter case (no list type) presents a form of cardinality constraints for the relationship type that is captured by the field definition. Additionally, it is also possible to specify a form of participation constraints: If the type in the field definition is marked to be non-nullable, then it is mandatory for nodes to have such an outgoing edge; otherwise, it is optional.

Example 4: In a Property Graph that conforms to the aforementioned example schema, every “UserSession” node must have exactly one outgoing edge. The label of this edge must be “user” and the edge must point to a node with the label “User”. Such “User” nodes must not have any outgoing edges.

Example 5: To illustrate all possible combinations of the aforementioned constraints on outgoing edges, consider the following schema.

type A {
  name: String!
  favoriteB: B
  relatedA: [A]
}

type B {
  favoriteA: A!
  otherA: [A!]
}

Based on this schema, every “A” node may have at most one “favoriteB” edge to a “B” node (but it is not mandatory for every “A” node to have such an edge). Additionally, every “A” node may have an arbitrary number of “relatedA” edges to “A” nodes (including none) . Hence, there may also be “A” nodes that do not have any outgoing edge. In contrast, every “B” node must have at least two outgoing edges. More precisely, it must have one “favoriteA” edge to an “A” node, and one or more “otherA” edges to “A” nodes.

In the cases in which a node may have multiple outgoing edges of the same type (i.e., with the same label), we may want to require that each of these edges must point to a different target node. To capture this type of constraints we may use the notion of directives that the GraphQL SDL introduces as a form of annotations that can be added to field definitions (and to other elements in a schema). That is, we introduce the directive @distinct that can be used for the aforementioned purpose. Additionally, we introduce the directive @noloops that can be used to specify that the target node of edges must not be the same as their source node; i.e., they must point to a target node that is different from the source node. Of course, such a no-loops constraint makes sense only for edges that may point to nodes of the same type as the source nodes.

Example 6: Assume we extend the previous example schema with the following object type.

type C {
  otherC: [C!] @distinct @noloops 
  b: [B]  @distinct
}

Now, we may have a “C” node that has one or more “otherC” edges to other “C” nodes, where each of these edges must point to a different “C” node (because of the @distinct directive) and none of them should point back to it source node (because of the @noloops directive). Similarly, a “C” node may have zero, one, or more “b” edges to distinct “B” nodes. In contrast, an “A” node, for instance, may have multiple “relatedA” edges that all point to the same “A” node, which may even be the source “A” node itself.

Notice that the directive @distinct is symmetric; that is, it represents a constraint not only for the source node of an edge but also for the target node. In other words, if a target node may have multiple incoming edges of the same type and the definition of this edge type (which appears in the object type definition for the source node) contains the @distinct directive, then all of these incoming edges must have different source nodes.

While the types of constraints introduced so far focus on the source node of edges, we may also want to express constraints from the perspective of the target nodes. On the one hand, we may want to require that for some type of edges, any possible target node can have at most one incoming edge of this type. We introduce the directive @uniqueForTarget to represent this constraint. On the other hand, we may also want to require that for some type of edges, any possible target node must have at least one incoming edge of this type. To capture this constraint we introduce the directive @requiredForTarget.

Example 7: Consider the following possible extension of the previous example schema.

type D {
  a: [A]  @uniqueForTarget
  b: [B]  @requiredForTarget
  c: [C]  @uniqueForTarget @requiredForTarget 
}

Based on this extension, we now have the following requirements: Every “A” node may have at most one incoming “a” edge (from a “D” node), but there may be “A” nodes that do not have any such incoming edge. Every “B” node must have at least one incoming “b” edge from a “D” node, and every “C” node must have exactly one incoming “c” edge.

We emphasize that the types of constraints that we have introduced can be used to capture any combination of cardinality restrictions that is possible for binary relationships. The following list demonstrates all these combinations for an example relationship labeled “rel” from nodes of type “A” to nodes of type “B”.

  1. For a 1:1 relationship, the definition of the type “A” has to contain the following field definition.
     rel: B  @uniqueForTarget
  2. For a 1:N relationship, the definition of the type “A” has to contain the following field definition.
     rel: B
  3. For an N:1 relationship, the definition of the type “A” has to contain the following field definition.
     rel: [B]  @uniqueForTarget
  4. For an N:M relationship, the definition of the type “A” has to contain the following field definition.
     rel: [B]

1.4 Specifying Edges with Multiple Types of Target or Source Nodes based on Interface Types or Union Types

In addition to object types, the GraphQL SDL introduces the notion of interface types and union types. While our proposal does not use these notions to specify types that can be explicitly assigned to nodes in Property Graphs (that is what the object types are used for), we propose to use these notions to capture cases in which the possible target nodes of some types of edges may not all be of the same type.

Example 8: According to the following example schema, every “Person” node may have an outgoing “favoriteVehicle” edge that points either to a “Car” node or to a “Motorcycle” node.

type Person {
  id: ID!
  name: String!
  favoriteVehicle: Vehicle
}

union Vehicle = Car | Motorcycle 

type Car {
  brand: String!
  color: String
}

type Motorcycle {
  brand: String!
}

Example 9: Consider the following example schema. It captures exactly the same restrictions on Property Graphs as are captured by the schema in the previous example.

type Person {
  id: ID!
  name: String!
  favoriteVehicle: Vehicle
}

interface Vehicle {
  brand: String!
}
 
type Car implements Vehicle {
  brand: String!
  color: String
}

type Motorcycle implements Vehicle { 
  brand: String!
}

As the two examples illustrate, in the context of SDL-based schema definitions for Property Graphs, using interface types or union types are two different options that serve the exact same purpose. Our proposal allows for both options to give users more flexibility if they aim to use their schema also as a basis for developing a GraphQL API on top of their Property Graph.

While using union types or interface types as described above focuses on the possible target nodes of edges, we may also specify for some type of edges in a schema that the possible source nodes may not all be of the same type. To this end, the corresponding relationship definition simply needs to be repeated in every object type definition of all types of nodes that may have such outgoing edges.

Example 10: The following example schema allows both “Car” nodes and “Motorcycle” nodes to have an outgoing “owner” edge to some “Person” node.

type Car {
  brand: String!
  color: String
  owner: Person
}

type Motorcycle {
  brand: String!
  owner: Person
}

type Person {
  name: String!
}

1.5 Specifying Edge Properties using Field Argument Definitions

An important feature of Property Graphs that we have not covered so far are edge properties. To specify what properties an edge may have we use the definition of field arguments that can be provided for every field definition. That is, every field argument defined in a relationship field definition specifies that the corresponding edge may have a property (a key-value pair) whose key is the name of the field argument and whose value must be of the type mentioned in definition of the field argument. Hence, field argument definitions that can be used in this way must have a scalar type, an enumeration type, or a list type that wraps a scalar or an enumeration type.

Example 11: We may modify the UserSession type in our initial example schema as follows.

type UserSession {
  id: ID!
  user(certainty:Float! comment:String): User! 
  startTime: Time
  endTime: Time
}

As a result of this modification, every “user” edge must have a “certainty” property with a floating point number as value. Additionally, such an edge may have an optional “comment” property with a string value.

As the previous example demonstrates, if the type in the field argument definition is marked as non-nullable, then the specified edge property is mandatory. If the type is a list type (wrapping a scalar type or and enumeration type), then the value of the specified edge property must be an array of values of the wrapped type.

1.6 Final Example

We complete the description of the proposed approach by presenting another example that covers many aspects of the proposal.

Example 12: The following figure of an example Property Graph is copied from the documentation of the Apache Tinkerpop graph computing framework. The SDL code below the figure defines a schema for Property Graphs that the example graph conforms to.

type person {
  name: String!
  age: Int
  knows(weight:Float!): [person]  @distinct @noloops
  created(weight:Float!): [software]  @distinct @requiredForTarget 
}

type software {
  name: String!
  lang: Language
}

enum Language {
  java
  javascript
  python
}

1.7 Additional Remarks

While our proposal leverages and repurposes most of the features of the GraphQL SDL, some features cannot be meaningfully adapted when it comes to defining schemas for Property Graphs. Consequently, our proposed approach ignores these features. If a schema definition uses such a feature that we have not adapted for our approach, then this part of the schema definition should simply be ignored when checking whether a given Property Graph conforms to the schema definition.

For instance, when adopting the definition of field arguments as a means to specify what properties an edge may have, we deliberately consider only the field argument definitions that are based on a scalar type, on an enumeration type, or on a list type that wraps one of the former. Hence, we ignore field argument definitions in which the type of possible values is defined to be a complex input type. Such field argument definitions are not suitable to specify potential edge properties because the value of any edge property can only be a simple atomic value or a list of such values.

A related example are field arguments in attribute definitions. Recall that attribute definitions are the field definitions that, according to our proposal, can be used to specify what properties particular nodes may have. Since in the Property Graph model the key (or the value) of a node property cannot have additional arguments associated with it, field arguments in attribute definitions cannot be meaningfully used for our proposal. Hence, an SDL-based schema definition for Property Graphs should not contain attribute definitions that contain field arguments.

A last example of SDL features that are not meaningful for Property Graph schemas are the root operation types query, mutation, and subscription. These types specify the objects that have to be used as entry points in requests to a GraphQL API. For instance, the query type defines the object from which all queries have to start when retrieving data from a GraphQL API. Since the purpose of our proposal is to use the SDL to define schemas for Property Graphs (and not to define schemas for GraphQL APIs or to define operations for accessing Property Graphs), root operation types are not needed in our context. Notice, however, that by omitting root operation types, the SDL-based Property Graph schemas created based on our proposal are not complete GraphQL schemas (as used for GraphQL APIs) because at least the query type is mandatory in such GraphQL API schemas.

Nonetheless, even if it is not the primary purpose of the schemas defined based on our proposed approach, it seems like a natural next step to also use them as a basis for developing GraphQL APIs to access Property Graphs. To this end, they may be extended into actual GraphQL schemas as required for creating GraphQL APIs. From a technical perspective, the only thing that needs to be added in this case is the query type, and perhaps also the mutation type for providing write access. However, for practical purposes, further elements will have to be added when extending an SDL-based Property Graph schema into a GraphQL API schema.

In particular, it may be useful for GraphQL APIs over Property Graphs to support queries with which the directed edges in the graph can also be traversed in their opposite directions. Such a bidirectional traversal is not possible with a schema defined based on our approach. The reason for this limitation is that our SDL-based Property Graph schemas specify potential edges in the object types for the nodes for which the edges are outgoing. Hence, the object types in the schema that specify the potential target nodes do not contain any mention of the incoming edges. We emphasize that specifying every type of edges only once is sufficient for the purpose of defining a Property Graph schema, but it is not sufficient for supporting bidirectional traversal in GraphQL queries. We also emphasize that in query languages that are explicitly designed for Property Graphs (such as Gremlin and Cypher) it is a native feature that edges can be traversed both ways. In contrast, to enable bidirectional traversal in GraphQL queries, the schema of the GraphQL API has to explicitly mention potential edges twice: once from the perspective of the source nodes and once from the perspective of the target nodes. Although we believe that it is not difficult to address this limitation when extending an SDL-based Property Graph schema into a GraphQL API schema, we are planning to complement our proposed approach with guidelines on how to develop such an extension in a principled manner.

2. Detailed Definition

This section provides a detailed definition of what it means for a Property Graph to conform to a given schema that is defined in the GraphQL SDL. This definition assumes that we have defined conformance on a per-node basis (which we shall do in this section). So, given the following definition of what it means for a node in a Property Graph to conform to a given schema, we now define that a given Property Graph conforms to a given GraphQL schema iff all of its nodes conform to the schema. Then, in the rest of this section we only need to define the per-node notion of conformance.

To this end, we first need to introduce a few auxiliary notions: First, from now on we simply use the word schema to refer to a schema that is defined in the GraphQL SDL. Second, for such a schema S, we define the notion of a simple type in S as follows: i) every built-in scalar type (i.e., Int, Float, String, Boolean, ID) is a simple type in S, ii) every scalar type for which the schema S contains a scalar type definition is a simple type in S, and iii) every enumeration type for which the schema S contains an enum type definition is a simple type in S.

Now we are ready for the main definition: Let n be a node in a Property Graph, node n conforms to a schema S if each of the following six conditions holds.

Condition 1 (node label)

The label of the node n is set, and the schema S contains an object type definition whose Name element is equivalent to the label of the node n. In the following conditions we refer to this object type definition as ODT (note that “all types within a GraphQL schema must have unique names”).

Condition 2 (existing properties of the node)

For every property (key-value pair) associated with the node n, there exists a field definition in ODT such that each of the following points is true:

  1. The Name element of this field definition is equivalent to the key of the property.
  2. The Type element of this field definition is one of the following:
    – the name of a simple type in S,
    – a non-null type that wraps a simple type in S, or
    – a list type that wraps any of the previous two.
  3. If the Type element of this field definition is the name of a simple type in S, then the value of the property is a value of this simple type.
  4. If the Type element of this field definition is a non-null type that wraps a simple type in S, then the value of the property is a value of this simple type.
  5. If the Type element of this field definition is a list type that wraps a simple type in S, then the value of the property is an array (potentially empty) of values of this simple type.
  6. If the Type element of this field definition is a list type that wraps a non-null type that, in turn, wraps a simple type in S, then the value of the property is a nonempty array of values of this simple type.

Notice that this condition deliberately ignores the arguments definition (if any) of such field definitions.

Condition 3 (existing out-edges of the node)

For every outgoing edge e of the node n, there exists a field definition in ODT such that each of the following points is true:

  1. The Name element of this field definition is equivalent to the label of the edge e.
  2. The Type element of this field definition is one of the following:
    – the name of an object type for which the schema S contains an object type definition,
    – the name of an interface type for which the schema S contains an interface type definition,
    – the name of a union type for which the schema S contains a union type definition,
    – a non-null type that wraps any of the previous three, or
    – a list type that wraps any of the previous four.
  3. If the Type element of this field definition is the name of an object type, then the target node of the edge e must have this name as its label.
  4. If the Type element of this field definition is the name of an interface type, then the schema S must contain an object type definition whose ImplementsInterfaces element mentions this name of the interface type and whose Name element is equivalent to the label of the target node of the edge e.
  5. If the Type element of this field definition is the name of a union type, then the UnionMemberTypes element in the union type definition of this union type must contain a name that is equivalent to the label of the target node of the edge e.
  6. If the Type element of this field definition is a wrapping type (i.e., a non-null type or a list type) and the underlying named type is an object type, then the target node of the edge e must have the name of this object type as its label.
  7. If the Type element of this field definition is a wrapping type and the underlying named type is an interface type, then the schema S must contain an object type definition whose ImplementsInterfaces element mentions the name of this interface type and whose Name element is equivalent to the label of the target node of the edge e.
  8. If the Type element of this field definition is a wrapping type and the underlying named type is a union type, then the UnionMemberTypes element in the union type definition of this union type must contain a name that is equivalent to the label of the target node of the edge e.
  9. If the Type element of this field definition is not a list type, then node n must not have another outgoing edge that has the same label as edge e.
  10. If this field definition contains a Directives element that contains the directive @distinct, then node n must not have another outgoing edge that has both i) the same label as edge e and ii) the same target node as edge e.
  11. If this field definition contains a Directives element that contains the directive @noloops, then the target node of the edge e must not be node n.

Condition 4 (existing in-edges of the node)

For every incoming edge e of the node n, there exists an object type definition in schema S such that i) the Name element of this object type definition is equivalent to the label of the source node of edge e and ii) this object type definition contains a field definition for which each of the following points is true:

  1. The Name element of this field definition is equivalent to the label of the edge e.
  2. The Type element of this field definition is one of the following:
    – the name of the object type whose name is equivalent to the label of node n,
    – the name of an interface type that is mentioned in the ImplementsInterfaces element of the object type definition ODT,
    – the name of a union type for which the schema S contains a union type definition whose UnionMemberTypes element contains a name that is equivalent to the label of node n,
    – a non-null type that wraps any of the previous three, or
    – a list type that wraps any of the previous four.
  3. If this field definition contains a Directives element that contains the directive @uniqueForTarget, then node n must not have any other incoming edge that has both i) the same label as edge e and ii) a source node whose label is the same as the label of the source node of edge e.

Condition 5 (mandatory node properties and mandatory out-edges)

For every field definition contained in ODT, each of the following points is true:

  1. If the Type element of this field definition is a non-null type that wraps a simple type in S, then the node n is associated with a property whose key is equivalent to the Name element of the field definition.
  2. If the Type element of this field definition is a list type that wraps a non-null type that, in turn, wraps a simple type in S, then the node n is associated with a property whose key is equivalent to the Name element of the field definition.
  3. If the Type element of this field definition is a non-null type that wraps either an object type, an interface type, or a union type, then the node n has an outgoing edge whose label is equivalent to the Name element of the field definition.
  4. If the Type element of this field definition is a list type that wraps a non-null type that, in turn, wraps either an object type, an interface type, or a union type, then the node n has an outgoing edge whose label is equivalent to the Name element of the field definition.

Notice that this condition checks only the existence of mandatory node properties and mandatory out-edges, but not whether they violate the schema is some other way. The latter is checked based on Conditions 2 and 3.

Condition 6 (mandatory in-edges)

For every field definition FD’ in every object type definition ODT’ in schema S, if i) the field definition FD’ contains a Directives element that contains the directive @requiredForTarget and ii) the Type element of the field definition FD’ is any of the following:
– the name of the object type whose name is equivalent to the label of node n,
– the name of an interface type that is mentioned in the ImplementsInterfaces element of the object type definition ODT,
– the name of a union type for which the schema S contains a union type definition whose UnionMemberTypes element contains a name that is equivalent to the label of node n,
– a non-null type that wraps any of the previous three, or
– a list type that wraps any of the previous four,
then the node n has an incoming edge whose label is equivalent to the Name element of the field definition FD’ and whose source node is labeled with the Name element of the object type definition ODT’.

Notice that this condition checks only the existence of mandatory in-edges, but not whether they violate the schema is some other way. The latter is checked based on Condition 4.

3. Suggestions for Defining a GraphQL API based on the Approach

TODO

2 Replies to “Defining Graph Database Schemas by using the GraphQL Schema Definition Language”

  1. Interesting article, found it my mistake. Trying to build a IDL -> DDL generator

Leave a Reply

Your email address will not be published. Required fields are marked *