The gist Namespace Delimiter: Hash to Slash

The change in gist:

We recently changed the namespace for gist from

http://ontologies.semanticarts.com/gist#
to
http://ontologies.semanticarts.com/gist/

What you need to do:

This change is backwards-incompatible with existing versions of gist. The good news is that the changes needed are straightforward. To migrate to the new gist will require changing all uses of gist URIs to use the new namespace. This will include the following:

any ontology that imports gist
any ontology that does not import gist, but that refers to some gist URIs
any data set of triples that uses gist URIs

For 1 and 2, you need only change the namespace prefix and carry on as usual. For files with triples that use namespaces you need to first change the namespaces and then reload the triples into any triple stores where the old files were loaded into. If there triples use prefixed terms, then you need only change the prefixes. If the triples use full URIs then you will need to go a global replace swapping out the old namespace for the new one.

The rationale for making this change:

We think that other ontologists and semantic technologists may be interested in the reasons for this change. To that end, we re-trace the thought process and discussions we had internally as we debated the pros and cons of this change.

There are three key aspects of URIs that we are primarily interested in:

Global Uniqueness – the ability of triple stores to self-assemble graphs without resorting to metadata relies on the fact that URIs are globally unique
Human readability – we avoid traditional GUIDs because we prefer URIs that humans can read and understand.
Resolvability – we are interested in URIs that identify resources that could be located and resolved on the web (subject to security constraints).

The move from hash to slash was motivated by the third concern, the first two are not affected.

In the early days the web was a web of documents. For efficiency reasons, the standards (including and especially RFC 3986 [1]) declared that the hash designated a “same-document reference” that is everything after the hash was assumed to be in the document represented by the string up to the hash. Therefore, the resolution was done in the browser and not on the server. This was a good match for standards, and for small (single document) ontologies. As such, for many years, most ontologies used the hash convention, including owl, rdf, skos, void, vcard, umbel and good relations.

Anyone with large ontologies or large datasets that were hosted in databases and not documents adapted the / convention, including DBpedia, Schema.org, Snomed, Facebook, Foaf, Freebase, Open Cyc and the New York Times.

The essential tradeoff is for resolving the URI. If you can be reasonably sure that everything you would want to provide to the user at resolution time, would be in relatively small document, then the hash convention is fine.

If you wish your resolution to have additional data that may not have been in the original document (say where used information that isn’t in the defining document) you need to do the resolution on the server. Because of the standards, the server does not see anything after the hash so if you use the hash convention, rather that resolving the uri from the url address bar, you must programmatically call a server with the URI as an argument in the API call.

With the slash convention you have the choice of putting the URI in the URL bar and getting it resolved, or calling an API similar to the hash option above.

If you commit to API calls then there is a slight advantage to hash as it is slightly easier to parse on the back end. In our opinion this slight advantage does not compare to the flexibility of being able to resolve through the URL bar as well as still having the option of using an API call for resolution.

The DBpedia SPARQL endpoint (http://dbpedia.org/sparql ) has thoughtfully prepopulated 240 of the most common namespaces in their Sparql editor. At the time of this writing, 59 of the 240 use the hash delimiter. Nearly 100 of the namespaces come from DBpedia’s decision to have a different namespace for each language, and when these are excluded the slash advantage isn’t nearly as pronounced (90 slashes versus 59 hashes) but still a predominance for slash.

We are committed to providing, in the future, a resolution service to make it easy to resolve our concepts through a URL address bar. For the present the slash is just as good for all other purposes. We have decided to eat the small migration cost now rather than later.

[1] https://www.rfc-editor.org/info/rfc3986

The gist Namespace Delimiter: Hash to Slash

Dave McComb

Data-Centric Transformation Made Possible

info@semanticarts.com

Dave McComb

Related Posts

The Data-Centric Revolution: Best Practices and Schools of Ontology Design

Financial Data Transparency Act “PitchFest”

How to Take Back 40-60% of Your IT Spend by Fixing Your Data