Abstract: This document specifies SPARQL* Update, an update language for RDF* graphs. This language extends SPARQL Update, the update language for RDF graphs, by adding RDF*-specific features and semantics.
We assume that the reader is familiar with SPARQL Update as well as with the RDF*/SPARQL* approach to represent and query statement-level metadata in RDF.
Acknowledgements: I would like to thank the following people for their valuable feedback on earlier versions of this document: Brian Chu (Cambridge Semantics), Erin Martin (Cambridge Semantics), Michael Schmidt (Amazon), Bryan Thompson (Amazon), Jem Rayfield (OntoText), and Olivier Corby (Inria). My work on this document was funded by the CENIIT program at Linköping University (project no. 17.05).
Outline:
1. Informal Description
While SPARQL Update operates over a graph store that consists of pure RDF graphs, SPARQL* Update extends the notion of graph store to contain RDF* graphs instead of RDF graphs. That is, a graph store in the context of SPARQL* Update contains one (unnamed) slot holding an RDF* graph, referred to as the default graph, and zero or more named slots holding other RDF* graphs, referred to as named graphs. Then, all graph management operations in SPARQL Update (CREATE, DROP, COPY, MOVE, ADD) carry over directly to SPARQL* Update with the only difference being that in SPARQL* Update these operations manage RDF* graphs. For instance, the CREATE operation in SPARQL* Update creates an RDF* graph rather than a pure RDF graph. Similarly, the graph update operations LOAD and CLEAR in SPARQL* Update operate with RDF* graphs.
The only operations that SPARQL* Update actually extends with additional functionality are the graph update operations INSERT DATA, DELETE DATA, and DELETE/INSERT. This section describes the effect of the extended operations informally. While this description focuses only on updates to the default graph, the operations can also be applied to the named graphs.
INSERT DATA
The INSERT DATA operation can be used to add a given set of triples into the graph store. In the context of SPARQL* Update, these triples may be nested RDF* triples. Then, by the SPARQL* Query semantics, it is possible to query for both the nested triples and the triples inside the nested triples.
Example 1. Suppose the following INSERT INTO operation is executed over an empty default graph (prefix declarations omitted).
INSERT DATA { << :bob foaf:age 23 >> dct:creator :crawler1 . }
After executing this operation, the default graph contains the given nested triple. Now, it is possible to query for this nested triple. For instance, the following SPARQL* query (prefix declarations omitted) returns a single solution mapping in which the variables ?a and ?c are mapped to the literal 23 and the IRI :crawler1, respectively.
SELECT ?a ?c WHERE { << :bob foaf:age ?a >> dct:creator ?c . }
Example 2. It is also possible to query only for the base triple that is inside the inserted nested triple. For instance, assuming the default graph with the nested triples as inserted in the previous example, the following SPARQL* query returns a single solution mapping in which the variable ?a is mapped to the literal 23.
SELECT ?a WHERE { :bob foaf:age ?a . }
Inserting a new nested triple into an RDF* graph that already contains the base triple that is inside the new nested triple must not have any effect on the queries for that base triple.
Example 3. Assume the default graph contains the following triple.
:bob foaf:age 23 .
Hence, when executing the query of Example 2 over this default graph, we obtain the same result as in Example 2 (namely, a single solution mapping in which ?a is mapped to 23). Now, if we execute the INSERT DATA operation of Example 1 over this default graph, the default graph contains two triples: the aforementioned triple that was in the graph initially and the new nested triple that was inserted by the INSERT DATA operation. However, when executing the query of Example 2 again, we will still obtain the same result as before.
The example illustrates that inserting a nested triple into an RDF* graph may cause some form of redundancy. The query semantics of SPARQL* queries is defined to ignore such redundancy as mentioned in the example and as discussed more formally in Section 2.4 of the RDF*/SPARQL* research paper. In fact, any implementation of an RDF* data store may chose to remove redundancies upon insert, or explicitly add redundancies, as long as this does not affect the correctness of the query results.
DELETE DATA
The DELETE DATA operation can be used to remove a given set of triples from the graph store. Given that, in the case of SPARQL* Update, the graph store may contain nested triples, removing a triple involves removing also all the nested triples that contain the triple to be removed (where containment here may be recursive).
Example 4. Consider the following DELETE DATA operation (prefix declarations omitted).
DELETE DATA { :bob foaf:age 23 . }
If the default graph consists of the following two triples (where the second one is redundant), then executing this DELETE DATA operation will remove both of these triples (i.e., the default graph becomes empty).
<< :bob foaf:age 23 >> dct:creator :crawler1 . :bob foaf:age 23 .
While deleting a base triple may result in the deletion of additional nested triples as illustrated in the previous example, it is also possible to explicitly delete nested triples. In this case, however, the triples that are inside the deleted nested triples must not be deleted from the graph store (unless explicitly specified in the DELETE DATA operation).
Example 5. Consider the following DELETE DATA operation (prefix declarations omitted).
DELETE DATA { << :bob foaf:age 23 >> dct:creator :crawler1 . }
If the default graph consists of the same two triples as in the previous example, then executing this DELETE DATA operation will result in the following default graph.
:bob foaf:age 23 .
Example 6. In contrast to the previous example, assume the default graph contains only the nested triple as follows.
<< :bob foaf:age 23 >> dct:creator :crawler1 .
Then, executing the DELETE DATA operation as in the previous example will also result in the following default graph (i.e., the base triple has to remain in the graph even if its existence in the graph before the deletion was only implicit through the nested triple that contained it).
:bob foaf:age 23 .
Example 7. Consider again the initial default graph in the previous example. The following DELETE DATA operation will leave this graph empty, and the same holds for the initial default graph in Examples 4 and 5.
DELETE DATA { :bob foaf:age 23 . << :bob foaf:age 23 >> dct:creator :crawler1 . }
DELETE/INSERT
The DELETE/INSERT operation can be used to remove or add triples based on variable bindings obtained by evaluating a given WHERE clause.
Example 8. The following DELETE/INSERT operation (prefix declarations omitted) replaces all nested triples in which the dct:creator is :crawler1 by nested triples in which the dct:creator is :newCrawler2.
DELETE { << ?s ?p ?o >> dct:creator :crawler1 . } INSERT { << ?s ?p ?o >> dct:creator :newCrawler2 . } WHERE { << ?s ?p ?o >> dct:creator :crawler1 . }
As specified for SPARQL Update, in SPARQL* Update it is possible to use variations of the DELETE/INSERT operations in which either the DELETE clause or the INSERT clause are omitted.
When removing triples via the DELETE clause (irrespective of whether the INSERT clause is omitted or not), the effects with respect to nested triples must be the same as described above for the DELETE DATA operation. For instance, deleting base triples must result in the additional deletion of all nested triples that contain the deleted base triples (see Example 4 above).
2. Detailed Definition
This section defines SPARQL* Update by defining a corresponding extension of the formalization of SPARQL Update that is given in the W3C recommendation document.
Syntax
The grammar of SPARQL* Update is provided as part of the grammar of the SPARQL* query language, which is defined by the SPARQL 1.1 Query grammar with the extensions specified in Section 5.1 of the RDF*/SPARQL* document. As a result of these extensions, the production rules QuadData and QuadPattern, which are used in the definition of SPARQL Update, are extended to capture nested triples and nested triple patterns as demonstrated in the examples above.
Semantics
The semantics of SPARQL* Update operations can also be defined by a simple extension of the formalization of SPARQL Update. This extension assumes that any mention of “RDF triple” or “triple” in the recommendation document is understood as an RDF* triple; similarly, “RDF graph” and “solution mapping” are understood as RDF* graph and solution* mapping, respectively. The mention of the “evaluation function eval()” is understood as the function eval() defined in Section 5.3 of the RDF*/SPARQL* document.
Then, the only part of the SPARQL Update formalization that actually needs to be adapted is the definition of the auxiliary function Dataset-DIFF. To this end, we first need to define formally what it means for an RDF* triple to contain another RDF* triple.
Definition: Containment of Triples
Let t and t’ be RDF* triples. Then, containment of t’ in t is defined recursively as follows:
- If t’ is the subject or the object of t, then we say that t contains t’.
- If t contains an RDF* triple t” such that t” contains t’, then we say that t contains t’.
Example 9. Consider the following (double) nested triple.
<< <<:bob foaf:age 23>> dct:creator :crawler1 >> ex:metameta 123 .
This triple contains two triples, namely, the nested triple ( (:bob, foaf:age, 23), dct:creator, :crawler1) and the regular (non-nested) triple (:bob, foaf:age, 23).
In addition to the containment of RDF* triples, we need to introduce a specific notion of difference between two RDF* graphs.
Definition: minus* operator
Let G and G’, be two RDF* graphs. Then, G minus* G’ is defined as
G minus* G’ = ( G minus del(G, G’) ) union add1(G, G’) union add2(G, G’)
with
del(G, G’) = { t | t in G and in G’ } union { t | t in G and there is a triple t’ in G’ such that t contains t’ },
add1(G, G’) = { t | there is a triple t’ in del(G, G’) such that t is the subject of t’ and t not in del(G, G’) },
add2(G, G’) = { t | there is a triple t’ in del(G, G’) such that t is the object of t’ and t not in del(G, G’) } ,
and G minus del(G, G’) is defined as set-difference over the sets of triples in the two graphs, and the union operator denotes the set-union.
Now we are ready to define the version of the Dataset-DIFF function that is adapted for SPARQL* Update (the text highlighted in blue is where this definition differs from the original definition).
Definition: Dataset-DIFF
Let DS = {DG} union {(irii, Gi) | 1 ≤ i ≤ n} and DS’ = {DG’} union {(iri’j, G’j) | 1 ≤ j ≤ m}) be two RDF Datasets.
Let further graphNames(DS) = { irii | 1 ≤ i ≤ n} and graphNames(DS’) = {iri’j | 1 ≤ j ≤ m}.
The Dataset-DIFF between DS and DS’ is defined as follows:
Dataset-DIFF(DS, DS’) = {DG minus* DG’} union { (iri, G) | iri in graphNames(DS) })
and G defined as
- Gi for iri = irii such that irii in graphNames(DS) minus graphNames(DS’)
- Gi minus* G’j for iri = irii = iri’j in graphNames(DS) intersect graphNames(DS’).