Documentation

Background

The extraction of RDF statements from XML data can be a complicated process involving a lot of XSLT magic. Transformations are always specific to the XML data in question and cannot easily be applied to other XML repositories. It gets even more difficult if the markup of the XML data in question cannot be influenced.

However, the underlying principle of creating RDF statments out of XML data is simple.

  • If we take a XML resource or a value in it as the thing we want to talk about, this is our subject.
  • We want to join this subject to "something else" using a predicate from a controlled vocabulary.
  • This "something else" is our object, consisting either of the XML resource itself, some value in it or some other resource on the semantic web.

The XTriples webservice makes it possible to create such statements out of any XML data based on a simple XML configuration. The configuration defines the XML resources to crawl, the vocabularies to use and a number of XPATH/XQuery based statement patterns to apply on the data.

The webservice accepts POST or GET request to http://xtriples.spatialhumanities.de/extract.xql.

Requests

POST requests

You can submit POST requests to http://xtriples.spatialhumanities.de/extract.xql

The request body should contain your XTriples configuration. Additionally, you need to send the Content-Type HTTP header with a value of application/xml and the format HTTP header with one of the following values:

Reference
value result
rdf returns extraction result as RDF
turtle returns extraction result in Turtle notation
ntriples returns extraction result as N-Triples
nquads returns extraction result as N-Quads
trix returns extraction result as TriX named graph
json returns extraction result as JSON-LD
svg returns extraction result as SVG Graph
xtriples returns extraction result as XTriples XML for extractging purposes

If you send no format header, the format defaults to rdf.

GET requests

The most compact way to use the webservice is via HTTP GET requests. This is the URL scheme:

http://xtriples.spatialhumanities.de/extract.xql?configuration=###YOUR_URI###&format=###FORMAT_KEYWORD###

The keywords for the format parameter are the same as for direct POST requests (see above).

Basic configuration

A configuration consists of a simple XML structure that tells the webservice which XML collections to crawl and which statements to extract from each resource of a collection. It has three relevant areas.

Code
<xtriples>
    <configuration>
        <vocabularies>
            [1]
        </vocabularies>
        <triples>
            <statement>[...]</statement>
            <statement>[...]</statement>
            <statement>[...]</statement>
            [2]
        </triples>
    </configuration>
    <collection>
        [3]
    </collection>
</xtriples>

Below <collection> you configure the XML data that should be processed by the service. Below <vocabularies> you can configure the RDF vocabularies you would like to use. Below <triples> you configure one ore more <statement> patterns that contain your subjects, predicates and objects.

The <collection> tag

A collection consists of XML data. It can be a single XML file with repeating nodes that represent the "resources". It can also be a XML based list of URLs pointing to single XML resources. During extraction the webservice crawls over all resources of a collection and applies the configured statements patterns to each XML resource. It is possible to use several <collection> tags per configuration.

Reference
attributes required values description
uri no string The XML of the collection will be fetched from this URI if it is not submitted literally to the webservice.
max no integer When set the extraction will stop after the number of resources set in this attribute.

There are three methods for crawling the resources of a collection: XPATH based, link based and literal.

Note: It is possible to mix the three methods within a <collection> tag.

XPATH based resource crawling

XPATH based resource crawling is an automatic way of extracting XML resources. It is very handy if you want to crawl a collection with an unkown number of resources. The XPATH constructs the path to the XML resources of a collection. You specify it with curly braces in the uri attribute of a child <resource> of your <collection> tag.

Code
<xtriples>
    <configuration>
    [...]
    </configuration>
    <collection uri="http://xml.collection.somewhere/resources.xml">
        <resource uri="http://xml.collection.somewhere/resources/{//id}.xml" />
    </collection>
</xtriples>

In this example, the uri attribute of the <collection> tag points to a XML file that contains a list of IDs of XML resources to harvest. During extraction, this list provides the context for the expression in the uri attribute of the child <resource> tag. The attribute of the <resource> tag contains a URL. At any position within the URL string you are allowed to use XPATH expressions in curly braces that will be executed on the XML of the <collection>. In the above example, the service will walk through all ids found by the XPATH expression in the <collection> and substitute the curly brackets, building correct links to XML resources.

Example 1: XPATH based resource crawling with resources all in one single file

XMLConfigurationResultGraph

Example 2: XPATH based resource crawling with resources spread over multiple files

XMLConfigurationResultGraph

Link based resource crawling

Link based resource crawling works by submitting <resource> tags below the <collection> tag with complete URLs to the XML files.. This approach is handy if you know all URLs of a collection in advance or if you want to crawl a fixed number of resources. Just add any number of resource tags below the <collection> tag with the link to the according resource in the uri attribute. Here is an example:

Code
<xtriples>
    <configuration>
    [...]
    </configuration>
    <collection>
        <resource uri="http://xml.collection.somewhere/resources/1.xml" />
        <resource uri="http://xml.collection.somewhere/resources/2.xml" />
        <resource uri="http://xml.collection.somewhere/resources/3.xml" />
    </collection>
</xtriples>

This will make the webservice crawl the three resources in question.

Example 3: Link based resource crawling with fixed resources in the configuration file

ConfigurationResultGraph

Literal resource crawling

Crawling of literal XML resources is the fastest way of extracting statements. In this approach the XML resources are submitted literally to the webservice. Just submit any well formed XML below a <resource> tag.

Code
<xtriples>
    <configuration>
    [...]
    </configuration>
    <collection>
        <resource>
            <!-- Any wellformed XML -->
        </resource>
        <resource>
            <!-- Some more wellformed XML -->
        </resource>
    </collection>
</xtriples>
Example 4: Literal resource crawling with XML resources

ConfigurationResultGraph

The <vocabularies> tag

Below the <vocabularies> tag you can configure any number of RDF vocabularies you would like to use for your statements. Its comparable to the header of a SPARQL query where you declare your vocabularies. Each <vocabulary> tag can have the following attributes.

Reference
attributes required values description
prefix yes string Sets the prefix for the vocabulary. This prefix can then be used in the prefix attribute of a <subject>, <predicate> or <object> tag (see below).
uri yes xsAnyURI Sets the URI for the vocabulary.
Code
<xtriples>
    <configuration>
        <vocabularies>
            <vocabulary prefix="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
            <vocabulary prefix="dc" uri="http://purl.org/dc/elements/1.1/"/>
            <vocabulary prefix="owl" uri="http://www.w3.org/2002/07/owl#"/>
        </vocabularies>
        [...]
    </configuration>
    [...]
</xtriples>

The <triples> tag

Below the <triples> tag you can define any number of <statement> patterns for the extraction. The statement patterns are the core of the XTriples webservice.

<triples>
    <statement>[...]</statement>
    <statement>[...]</statement>
    <statement>[...]</statement>
</triples>

The <statement> tag

Each <statement> consists of exactly one <subject>, <predicate> and <object> tag. An optional <condition> tag is allowed. Together they form a statement pattern that will be applied to each XML resource that is crawled by the webservice.

Reference
attributes required values description
repeat no integer and/or {XPATH/XQuery} This will repeat the extraction of the statement pattern as many times as the number set in the attribute. The value of the current iteration is written to the $repeatIndex variable. The attribute can contain a valid XPATH/XQuery expression in curly braces that must result in an integer (no node sets allowed).
Code
<statement>
    <subject prefix="resource">//id</subject>
    <predicate prefix="rdf">type</predicate>
    <object prefix="skos" type="uri">/string("Concept")</object>
</statement>

The code above results in a skos statement for each resource from the configured XML repository.

Example 5: FOAF statement for one resource

XMLConfigurationResultGraph

The <subject> tag

The <subject> tag of a statement pattern can either result in an URI or a blank node. In case of a blank node the blank node must have been created already by an earlier <object> tag.

Code
<statement>
    <subject prefix="###MY_VOCABULARY_PREFIX###">//id</subject>
    [...]
</statement>

In the above example, the XPATH expression will fetch the id value from each crawled resource. It is used together with the prefix attribute to construct the subject URIs.

Reference
attributes required values description
prefix no string The prefix attribute can contain a value defined in one of the vocabularies' prefix attributes. This binds the <subject> to a <vocabulary> namespace. During exraction the value of the prefix attribute is substituted with the URI that has been defined for the vocabulary's uri attribute. Similar to the prefix concept known from SPARQL.
type no
  • bnode
Only needed for connecting blank nodes. When you set the type of a <subject> tag to bnode, then you should apply the same identifier that has already been created by an earlier <object> tag expression. This will make the former object (blank node) the subject of a new statement.
resource no xsAnyURI If you set the resource attribute to a valid URL for an external XML document, this URL will be fetched during extraction and the <subject> XPATH/Xquery will be executed on the external XML rather than the XML of the current resource.
prepend no string + {XPATH/XQuery} The prepend attribute is handy if you need to prepend a value to the current patterns XPATH/XQuery result. The attribute value should contain a valid XPATH/XQuery expression in curly braces that results in a string (no node sets allowed). This expression will be executed after the <subject> expression and it's result will be prepended to the overall result.
append no string + {XPATH/XQuery} The append attribute works the same as the prepend attribute only that the result will be appended to the overall result.
Examples
Example 6: Extracting a subject URI

ConfigurationResultGraph

Example 7: Creating a subject blank node

ConfigurationResultGraph

Example 8: Including a value from an external XML resource in the subject

ConfigurationResultGraph

Example 9: Prepend and append values to the subject result

ConfigurationResultGraph

The <predicate> tag

The <predicate> tag of a statement pattern contains the property URIs of the vocabularies defined in the <vocabularies> section of the configuration. The final result of a predicate expression must always be an URI. The predicate tag mostly contains simple property strings. But it is also possible to execute an XPATH/XQuery in the tag or retrieve a value from an external XML resource by using the <resource> attribute.

Code
<statement>
    [...]
    <predicate prefix="###MY_VOCABULARY_PREFIX###">###MY_PROPERTY###</subject>
    [...]
</statement>
Reference
attributes required values description
prefix yes string The required prefix attribute of the <predicate> must contain a value defined in a vocabulary's prefix attribute. This binds the <predicate> to a <vocabulary> namespace. The value of the prefix attribute is substituted with the URI that has been defined in the vocabulary's uri attribute.
resource no xsAnyURI If you set the resource attribute to a valid URL for an external XML document, this URL will be fetched during extraction and the <subject> XPATH/Xquery will be executed on the external XML rather than the XML of the current resource.
prepend no string + {XPATH/XQuery} The prepend attribute is handy if you need to prepend a value to the current patterns XPATH/XQuery result. The attribute value should contain a valid XPATH/XQuery expression in curly braces that results in a string (no node sets allowed). This expression will be executed after the <predicate> expression and it's result will be prepended to the overall result.
append no string + {XPATH/XQuery} The append attribute works the same as the prepend attribute only that the result will be appended to the overall result.

The <object> tag

The <object> tag of a statement pattern can either result in an URI, a literal value or a blank node. Literal values can be typed or tagged with a language code. In case of a blank node, the object expression should lead to a unique identifier for the node.

Code
<statement>
    [...]
    <object prefix="###MY_VOCABULARY_PREFIX###" type="uri">###MY_VALUE###</object>
    [...]
    <object type="literal" lang="en">###MY_VALUE###</object>
    [...]
    <object type="literal" datatype="integer">###MY_VALUE###</object>
    [...]
    <object type="bnode">###MY_UNIQUE_ID###</object>
</statement>
Reference
attributes required values description
prefix no string If the result of the <object> tag is a URI, the prefix attribute can be used to bind the <object> to a defined <vocabulary> namespace. The value of the prefix attribute is substituted with the URI that has been defined in the according vocabulary's uri attribute. Similar to the prefix concept known from SPARQL.
type no
  • uri
  • literal
  • bnode
If the value is set to uri, the final result of the object expression should be a xsAnyURI. The <prefix> attribute can be used to bind a vocabulary namespace to the object value. Alternatively, you can of course construct or extract full URIs out of values in the current resource's XML without using a prefix. If the value is set to literal, a plain string value is expected as the result of the object expression. This value can then be typed using the <datatype> attribute or tagged with a language attribute by using the <lang> attribute. If the value is set to bnode the result of the object expression should be a unique identifier (see below for more details about blank nodes and some examples).
resource no xsAnyURI If you set the resource attribute to a valid URL for an external XML document, this URL will be fetched during extraction and the <subject> XPATH/Xquery will be executed on the external XML rather than the XML of the current resource.
lang no ISO code You can language tag your object literals by using this attribute. Set the attribute value to one of the ISO language codes.
datatype no XML Schema datatype You can type your object literals with this attribute. The attribute value can contain any of the official data types from XML schema (like integer, float, double, decimal, time, date etc.)
prepend no string + {XPATH/XQuery} The prepend attribute is handy if you need to prepend a value to the current patterns XPATH/XQuery result. The attribute value should contain a valid XPATH/XQuery expression in curly braces that results in a string (no node sets allowed). This expression will be executed after the <object> expression and it's result will be prepended to the overall result.
append no string + {XPATH/XQuery} The append attribute works the same as the prepend attribute only that the result will be appended to the overall result.
Example 10: Creating an object URI

ConfigurationResultGraph

Example 11: Creating a typed object literal

ConfigurationResultGraph

Example 12: Creating a language tagged literal

ConfigurationResultGraph

Example 13: Including a value from an external resource in the object

ConfigurationResultGraph

The <condition> tag

Sometimes a statement pattern should only be applied to resources if a specific condition matches. The <condition> tag allows you to define an expression and only if the result returns TRUE the statement pattern will be applied to the resource.

Code
<statement>
    <condition>/image</condition>
    <subject type="uri">/image/@url</subject>
    <predicate prefix="rdf">type</predicate>
    <object prefix="foaf" type="uri">Image</object>
</statement>

The above statement pattern would only be applied to resources that have an image tag. If there is no image tag, the XPATH returns empty and the statement pattern is not executed.

Example 14: Applying a <condition> to a statement pattern

ConfigurationResultGraph

Advanced configuration

Context variables

The context node for each subject/predicate/object expression is always the resource that is currently crawled. You can reference it with "." in your expressions or by using the $currentResource variable. When crawling single file repositories where resources consist of repeating nodes in one XML, it is also possible to walk above the resource node by using "..".

Reference
variable description
$currentResource The $currentResource variable contains the full XML content of the current resource. It can be used in all XQuery functions and XPATH expressions of the XTriples webservice (see example below).
$externalResource Same as $currentResource but for contexts where you load an external XML resource instead of the current one using the resource attribute in a subject, predicate or object tag.
$resourceIndex The $resourceIndex variable contains the number of the XML resource currently crawled.
$repeatIndex The $repeatIndex variable contains the number of the extraction iteration triggered by the repeat attribute of a statement pattern.
Example 15: Using the $currentResource variable in statement patterns

ConfigurationResultGraph

Example 16: Using the $repeatIndex variable in statement patterns

ConfigurationResultGraph

1:n statements

It is possible to create 1:n statements with a single statement pattern. This happens automatically when an XPATH/XQuery in the object part of a statement pattern yields a node set. The subject and predicate are then repeated for each node of the result set, creating n statements in total, each with the same subject and predicate but with different object values.

Example 17: Creating a 1:n statement with an object node set

ConfigurationResultSVGGraph

n:1 statements

Example 18: Creating a n:1 statement with a subject node sets

It is also possible to create n:1 statements with a single statement pattern. The technique is the same as with 1:n statements, with the difference that this time the XQuery/XPATH of the subject part of the statement pattern yields a node set. The predicate and object are then repeated for each node of the result set using a different subject value.

ConfigurationResultSVGGraph

n:m statements

In a use case where n-subjects should be bound to m-objects with a single statement pattern, the same rules as described above apply. In a n:m case both, the subject and the object expressions yield node sets. The webservice then iterates over all nodes of the subject result set and connects them to all nodes from the object result set.

Example 19: Creating n:m statements in a single statement pattern

ConfigurationResultSVGGraph

Error handling and debugging

Quite naturally when writing XPATH/XQuery expressions on a given XML, there will be some errors (like typecast errors etc.) or unexpected results (like empty node sets etc.). XTriples tries to handle such errors transparently. The strategy is not to terminate query execution due to a fatal error but to catch it and then pass a meaningful error description back to the user.

The easiest way to debug unexpected outcomes is to use the internal XTriples result format. This format is generated right before the transformation to RDF and contains all results of all statement expressions in a structure similar to the original configuration.

In this format it is easy to identify XPATH expressions for subjects, predicates or objects that came out empty or see the error messages thrown by XQuery functions. All that needs to be done to get the internal format for debugging is to append the call to the webservice with a &format=xtriples parameter.

Example 20: Using the internal xtriples result format for debugging purposes

ConfigurationResult

Webservice components

XTriples is built on top of some great software packages. The foundation for XTriples is the eXist XML database. The RDF transformations are done with the Apache any23 webservice. For SVG graph rendering, the Redefer rdf2svg webservice is used. A nice and simple software package for vizualising RDF graphs with JavaScript is the visualRDF webservice.

process chain of xtriples

Setup your own instance

eXist XML database

First of all install an instance of eXist on your local computer or a server of your choice. You can find the eXist packages and the installation instructions on the eXist webpage: http://exist-db.org/exist/apps/homepage/index.html. Once eXist is installed and running, grab the latest XTriples XAR file from here: http://download.spatialhumanities.de/ibr/. Alternatively you can build the XAR file yourself from the latest sources on GitHub. Once you have the XAR, simply install it via the eXist dashboard. Finally adapt the value of the $xtriplesWebserviceURL variable in extract.xql to the URL of your instance of XTriples.

Depending on how independent you want to run your instance of XTriples, you can optionally install local instances of the any23 and redefer webservices. If you do this dont forget to adapt the $any23WebserviceURL, $redeferWebserviceURL and $redeferWebserviceRulesURL in extract.xql to the URLs on which you run these webservices.

Any23 webservice

You can download the latest version of Apache Any23 right here: https://any23.apache.org/download.html. Check out the installation instructions.

Redefer webservice

You can download and install the latest version of the Redefer rdf2svg from here: https://github.com/rhizomik/redefer-rdf2svg.

Schema

Check out the RelaxNG schema for an exact documentation of all configuration options.