The Flagging Art of Saying Nothing

Who doesn’t like a nice flag? Waving in the breeze, reminding us of who we are and what we stand for. Flags are a nice way of providingUnderstanding Meaning in Data a rallying point around which to gather and show our colors to the world. They are a way of showing membership in a group, or providing a warning. Which is why it is so unfortunate when we find flags in a data management system, because they are reduced to saying nothing. Let me explain.

When we see Old Glory, we instantly know it is emblematic of the United States. We also instantly recognize the United Kingdom’s emblematic Union Jack and Canada’s Maple Leaf Flag. Another type of flag is a Warning flag alerting us to danger. In either case, we have a clear reference to what the flag represents. How about when you look at a data set and see ‘Yes’, or ‘7’? Sure, ‘Yes’ is a positive assertion and 7 is a number, but those are classifications, not meaning. Yes what? 7 what? There is no intrinsic meaning in these flags. Another step is required to understand the context of what is being asserted as ‘Yes’. Numeric values have even more ambiguity. Is it a count of something, perhaps 7 toasters? Is it a ranking, 7th place? Or perhaps it is just a label, Group 7?

In data systems the number of steps required to understand a value’s meaning is critical both for reducing ambiguity, and, more importantly, for increasing efficiency. An additional step to understand that ‘Yes’ means ‘needs review’, so the processing steps have doubled to extract its meaning. In traditional systems, the two-step flag dance is required because two steps are required to capture the value. First a structure has to be created to hold the value, the ‘Needs Review’ column. Then a value must be placed into that structure. More often than not, an obfuscated name like ‘NdsRvw’ is used which requires a third step to understand what that means. Only when the structure is understood can the value and meaning the system designer was hoping to capture be deciphered.

In cases where what value should be contained in the structure isn’t known, a NULL value is inserted as a placeholder. That’s right, a value literally saying nothing. Traditional systems are built as structure first, content second. First the schema, the structure definition, gets built. Then it is populated with content. The meaning of the content may or may not survive the contortions required to stuff it into the structure, but it gets stuffed in anyway in the hope it can deciphered later when extracted for a given purpose. Given situations where there is a paucity of data, there is a special name for a structure that largely says nothing – sparse tables. These are tables known to likely contain only a very few of the possible values, but the structure still has to be defined before the rare case values actually show up. Sparse tables are like requiring you to have a shoe box for every type of shoe you could possibly ever own even though you actually only own a few pair.

Structure-first thinking is so embedded in our DNA that we find it inconceivable that we can manage data without first building the structure. As a result, flag structures are often put in to drive system functionality. Logic then gets built to execute the flag dance and get executed every time interaction with the data occurs. The logic says something like this:
IF this flag DOESN’T say nothing
THEN do this next thing
OTHERWISE skip that next step
OR do something else completely.
Sadly, structure-first thinking requires this type of logic to be in place. The NULL placeholders are a default value to keep the empty space accounted for, and there has to be logic to deal with them.

Semantics, on the other hand, is meaning-first thinking. Since there is no meaning in NULL, there is no concept of storing NULL. Semantics captures meaning by making assertions. In semantics we write code that says “DO this with this data set.” No IF-THEN logic, just DO this and get on with it. Here is an example of how semantics maintains the fidelity of our information without having vacuous assertions.

The system can contain an assertion that the Jefferson contract is categorized as ‘Needs Review’ which puts it into the set of all contracts needing review. It is a subset of all the contracts. The rest of the contracts are in the set of all contracts NOT needing review. These are separate and distinct sets which are collectively the set of all contracts, a third set. System functionality can be driven by simply selecting the set requiring action, the “Needs Review” set, the set that excludes those that need review, or the set of all contracts. Because the contracts requiring review are in a different set, a sub-set, and it was done with a single step, the processing logic is cut in half. Where else can you get a 50% discount and do less work to get it?

I love a good flag, but I don’t think they would have caught on if we needed to ask the flag-bearer what the label on the flagpole said to understand what it stood for.

Blog post by Mark Ouska 

For more reading on the topic, check out this post by Dave McComb.

The Data-Centric Revolution: Lawyers, Guns and Money

My book “The Data-Centric Revolution” will be out this summer.  I will also be presenting at Dataversity’s Data Architecture Summit coming up in a fewThe Data-Centric Revolution months.  Both exercises reminded me that Data-Centric is not a simple technology upgrade.  It’s going to take a great deal more to shift the status quo.

Let’s start with Lawyers, Guns and Money, and then see what else we need.

A quick recap for those who just dropped in: The Data-Centric Revolution is the recognition that maintaining the status quo on enterprise information system implementation is a tragic downward spiral.  Almost every ERP, Legacy Modernization, MDM, or you name it project is coming in at ever higher costs and making the overall situation worse.

We call the status quo the “application-centric quagmire.”  The application-centric aspect stems from the observation that many business problems turn into IT projects, most of which end up with building, buying, or renting (Software as a Service) a new application system.  Each new application system comes with its own, arbitrarily different data model, which adds to the pile of existing application data models, further compounding the complexity, upping the integration tax, and inadvertently entrenching the legacy systems.

The alternative we call “data-centric.”  It is not a technology fix.  It is not something you can buy.  We hope for this reason that it will avoid the fate of the Gartner hype cycle.  It is a discipline and culture issue.  We call it a revolution because it is not something you add to your existing environment; it is something you do with the intention of gradually replacing your existing environment (recognizing that this will take time.)

Seems like most good revolutions would benefit from the Warren Zevon refrain: “Send lawyers, guns, and money.”  Let’s look at how this will play out in the data-centric revolution.

Click here to read more on TDAN.com

The 1st Annual Data-Centric Architecture Forum: Re-Cap

In the past few weeks, Semantic Arts, hosted a new Data-Centric Architecture Forum.  One of the conclusions made by the participants was that it wasn’t like a traditional conference.  This wasn’t marching from room to room to sit through another talking head and PowerPoint lead presentation. There were a few PowerPoint slides that served to anchor, but it was much more a continual co-creation of a shared artifact.

The agreed consensus was:

  • — Yes, let’s do it again next year.
  • — Let’s call it a forum, rather than a conference.
  • — Let’s focus on implementation next year.
  • — Let’s make it a bit more vendor-friendly next year.

So retrospectively, last week was the first annual Data-Centric Architecture Forum.

What follows are my notes and conclusions from the forum.

Shared DCA Vision

I think we came away with a great deal of commonality and more specifics on what a DCA needs to look like and what it needs to consist of. The straw-man (see appendix A) came through with just a few revisions (coming soon).  More importantly, it grounded everyone on what was needed and gave a common vocabulary about the pieces.

Uniqueness

I think with all the brain power in the room and the fact that people have been looking for this for a while, after we had described what such a solution entailed, if anyone knew of a platform or set of tools that provided all of this, out of the box, they would have said so.

I think we have outlined a platform that does not yet exist and needs to.  With a bit of perseverance, next year we may have a few partial (maybe even more than partial) implementations.

Completeness

After working through this for 2 ½ days, I think if there were anything major missing, we would have caught it.  Therefore, this seems to be a pretty complete stack. All the components and at least a first cut as to how they are related seems to be in place.

Doable-ness

While there are a lot of parts in the architecture, most of the people in the room thought that most of the parts were well-known and doable.

This isn’t a DARPA challenge to design some state-of-the-art thing, this is more a matter of putting pieces together that we already understand.

Vision v. Reference Architecture

As noted right at the end, this is a vision for an architecture— not a specific architecture or a reference architecture.

Notes From Specific Sessions

DCA Strawman

Most of this is covered was already covered above.  I think we eventually suggested that “Analytics” might deserve its own layer.  You could say that analytics is a “behavior” but it seems to be burying the lead.

I also thought it might be helpful to have some of the specific key APIs that are suggested by the architecture, and it looks like we need to split the MDM style of identity management from user identity management for clarity, and also for positioning in the stack.

State of the Industry

There is a strong case to be made that knowledge graph driven enterprises are eating the economy.  Part of this may be because network effect companies are sympathetic with network data structures.  But we think the case can be made so that the flexibility inherent in KGs applies to companies in any industry.

According to research that Alan provided, the average enterprise now executes 1100 different SaaS services.  This is fragmenting the data landscape even faster than legacy did.

Business Case

A lot of the resistance isn’t technical, but instead tribal.

Even within the AI community there are tribes with little cross-fertilization:

  • Symbolists
  • Bayesians
  • Statisticians
  • Connectionists
  • Evolutionaries
  • Analogizers

On the integration front, the tribes are:

  • Relational DB Linkers
  • Application-Centric ESB Advocates
  • Application-Centric RESTful developers
  • Data-centric Knowledge Graphers

Click here to read more on TDAN.com

The Data-Centric Revolution: Chapter 2

The Data-Centric Revolution

Below is an excerpt and downloadable copy of the “Chapter 2: What is Data-Centric?”

CHAPTER 2

What is Data-Centric?

Our position is:

A data-centric enterprise is one where all application functionality is based on a single, simple, extensible data model.

First, let’s make sure we distinguish this from the status quo, which we can describe as an application-centric mindset. Very few large enterprises have a single data model. They have one data model per application, and they have thousands of applications (including those they bought and those they built). These models are not simple. In every case we examined, application data models are at least 10 times more complex than they need to be, and the sum total of all application data models is at least 100-1000 times more complex than necessary.

Our measure of complexity is the sum total of all the items in the schema that developers and users must learn in order to master a system.  In relational technology this would be the number of classes plus the number of all attributes (columns).  In object-oriented systems, it is the number of classes plus the number of attributes.  In an XML or json based system it is the number of unique elements and/or keys.

The number of items in the schema directly drives the number of lines of application code that must be written and tested.  It also drives the complexity for the end user, as each item, eventually surfaces in forms or reports and the user must master what these mean and how the relate to each other to use the system.

Very few organizations have applications based on an extensible model. Most data models are very rigid.  This is why we call them “structured data.”  We define the structure, typically in a conceptual model, and then convert that structure to a logical model and finally a physical (database specific) model.  All code is written to the model.  As a result, extending the model is a big deal.  You go back to the conceptual model, make the change, then do a bunch of impact analysis to figure out how much code must change.

An extensible model, by contrast is one that is designed and implemented such that changes can be added to the model even while the application is in use. Later in this book and especially in the two companion books we get into a lot more detail on the techniques that need to be in place to make this possible.

In the data-centric world we are talking about a data model that is primarily about what the data means (that is, the semantics). It is only secondarily, and sometimes locally, about the structure, constraints, and validation to be performed on the data.

Many people think that a model of meaning is “merely” a conceptual model that must be translated into a “logical” model, and finally into a “physical” model, before it can be implemented. Many people think a conceptual model lacks the requisite detail and/or fidelity to support implementation. What we have found over the last decade of implementing these systems is that done well, the semantic (conceptual) data model can be put directly into production. And that it contains all the requisite detail to support the business requirements.

And let’s be clear, being data-centric is a matter of degree. It is not binary. A firm is data-centric to the extent (or to the percentage) its application landscape adheres to this goal.

Data-Centric vs. Data-Driven

Many firms claim to be, and many firms are, “data-driven.” This is not quite the same thing as data-centric. “Data-driven” refers more to the place of data in decision processes. A non-data-driven company relies on human judgement as the justification for decisions. A data-driven company relies on evidence from data.

Data-driven is not the opposite of data-centric. In fact, they are quite compatible, but merely being data-driven does not ensure that you are data-centric. You could drive all your decisions from data sets and still have thousands of non-integrated data sets.

Our position is that data-driven is a valid aspiration, though data-driven does not imply data-centric. Data-driven would benefit greatly from being data-centric as the simplicity and ease of integration make being data-driven easier and more effective.

We Need our Applications to be Ephemeral

The first corollary to the data-centric position is that applications are ephemeral, and data is the important and enduring asset. Again, this is the opposite of the current status quo. In traditional development, every time you implement a new application, you convert the data to the new applications representation. These application systems are very large capital projects. This causes people to think of them like more traditional capital projects (factories, office buildings, and the like). When you invest $100 Million in a new ERP or CRM system, you are not inclined to think of it as throwaway. But you should. Well, really you shouldn’t be spending that kind of money on application systems, but given that you already have, it is time to reframe this as sunk cost.

One of the ways application systems have become entrenched is through the application’s relation to the data it manages. The application becomes the gatekeeper to the data. The data is a second-class citizen, and the application is the main thing. In data-centric, the data is permanent and enduring, and applications can come and go.

Data-Centric is Designed with Data Sharing in Mind

The second corollary to the data-centric position is default sharing. The default position for application-centric systems is to assume local self-sufficiency. Most relational database systems base their integrity management on having required foreign key constraints. That is, an ordering system requires that all orders be from valid customers. The way they manage this is to have a local table of valid customers. This is not sharing information. This is local hoarding, made possible by copying customer data from somewhere else. And this copying process is an ongoing systems integration tax. If they were really sharing information, they would just refer to the customers as they existed in another system. Some API-based systems get part of the way there, but there is still tight coupling between the ordering system and the customer system that is hosting the API. This is an improvement but hardly the end game.

As we will see later in this book, it is now possible to have a single instantiation of each of your key data types—not a “golden source” that is copied and restructured to the various application consumers, but a single copy that can be used in place.

Is Data-Centric Even Possible?

Most experienced developers, after reading the above, will explain to you why this is impossible. Based on their experience, it is impossible. Most of them have grown up with traditional development approaches. They have learned how to build traditional standalone applications. They know how applications based on relational systems work. They will use this experience to explain to you why this is impossible. They will tell you they tried this before, and it didn’t work.

Further, they have no idea how a much simpler model could recreate all the distinctions needed in a complex business application. There is no such thing as an extensible data model in traditional practice.

You need to be sympathetic and recognize that based on their experience, extensive though it might be, they are right. As far as they are concerned, it is impossible.

But someone’s opinion that something is impossible is not the same as it not being possible. In the late 1400s, most Europeans thought that the world was flat and sailing west to get to the far east was futile. In a similar vein, in 1900 most people were convinced that heavier than air flight was impossible.

The advantage we have relative to the pre-Columbians, and the pre-Wrights is that we are already post-Columbus and post-Wrights. These ideas are both theoretically correct and have already been proved.

The Data-Centric Vision

To fix your wagon to something like this, we need to make a few aspects of the end game much clearer. We earlier said the core of this was the idea of a single, simple, extensible data model. Let’s drill in on this a bit deeper.

Click here to download the entire chapter.

Use the code: SemanticArts for a a 20% discount off of Technicspub.com

Field Report from the First Annual Data-Centric Architecture Conference

Our Data-Centric Architecture conference a couple weeks ago was pretty incredible. I don’t think I’ve ever participated in a single intense, productive conversation with 20 people that lasted 2 1/2 days, with hardly a let up. Great energy, very balanced participation.

And I echo Mark Wallace’s succinct summary on LinkedIn.

I think one thing all the participants agreed on was that it wasn’t a conference, or at least not a conference in the usual sense. I think going forward we will call it the Data-centric Architecture Forum. Seems more fitting.

My summary take away was:

  1. This is an essential pursuit.
  2. There is nothing that anyone in the group (and this is a group with a lot of coverage) knows of that does what a Data-Centric Architecture has to do, out of the box.
  3. We think we have identified the key components. Some of them are difficult and have many design options that are still open, but no aspect of this is beyond the reach of competent developers, and none of the components are even that big or difficult.
  4. The straw-man held up pretty well. It seemed to work pretty well as a communication device. We have a few proposed changes.
  5. We all learned a great deal in the process.

A couple of immediate next steps:

  1. Hold the date, and save some money: We’re doing this again next year Feb 3-5, $225 if you register by April 15th: http://dcc.semanticarts.com.
  2. The theme of next year’s forum will be experience reports on attempting to implement portions of the architecture.
  3. We are going to pull together a summary of points made and changes to the straw-man.
  4. I am going to begin in earnest on a book covering the material covered.

Field Report by Dave McComb

Join us next year!

What will we talk about at the Data-Centric Conference?

“The knowledge graph is the only currently implementable and sustainable way for businesses to move to the higher level of integration needed to make data truly useful for a business.”

data-centric conferenceYou may be wondering what some of our Data-Centric Conference panel topics will actually look like, what the discussion will entail. This article from Forbes is an interesting take on knowledge graphs and is just the kind of thing we’ll be discussing at the Data-Centric Conference.

When we ask Siri, Alexa or Google Home a question, we often get alarmingly relevant answers. Why? And more importantly, why don’t we get the same quality of answers and smooth experience in our businesses where the stakes are so much higher?

The answer is that these services are all powered by extensive knowledge graphs that allow the questions to be mapped to an organized set of information that can often provide the answer we want.

Is it impossible for anyone but the big tech companies to organize information and deliver a pleasing experience? In my view, the answer is no. The technology to collect and integrate data so we can know more about our businesses is being delivered in different ways by a number of products. Only a few use constructs similar to a knowledge graph.

But one company I have been studying this year, Cambridge Semantics, stands out because it is focused primarily on solving the problems related to creating knowledge graphs that work in businesses. Cambridge Semantics technology is powered by AnzoGraph, its highly scalable graph database, and uses semantic standards, but the most interesting thing to me is how the company has assembled all the elements needed to create a knowledge graph factory.  Because in business we are going to need many knowledge graphs that can be maintained and evolved in an orderly manner.

Read more here: Is The Enterprise Knowledge Graph Finally Going To Make All Data Usable?

Register for the conference here.

P.S. The Early Bird Special for Data-Centric Conference registration runs out 12/31/18.

 

The Data-Centric Revolution: Implementing a Data-Centric Architecture

Dave McComb returns to The Data Administration Newsletter with news of roll-your-own data-centric architecture stacks. Rather, he makes an introduction to what the early adopters of data-centric architectures will need to undertake the data-centric revolution and make such a necessary transition.

At some point, there will be full stack data-centric architectures available to buy, to use as a service or as an open source project.  At theThe Data-Centric Revolution: Implementing a Data-Centric Architecture moment, as far as we know, there isn’t a full stack data-centric architecture available to direct implementation.  What this means is that early adopters will have to roll their own.

This is what the early adopters I’m covering in my next book have done and—I expect for the next year or two at least— what the current crop of early adopters will need to do.

I am writing a book that will describe in much greater detail the considerations that will go into each layer in the architecture.

This paper will outline what needs to be considered to become data-centric and give people an idea of the scope of such an undertaking.  You might have some of these layers already covered.

Find his answers in The Data-Centric Revolution: Implementing a Data-Centric Architecture.

Click here to read a free chapter of Dave McComb’s book, “The Data-Centric Revolution”.

The Data-Centric Revolution: Implementing a Data-Centric Architecture

At some point, there will be full stack data-centric architectures available to buy, to use as a service or as an open source project.  At the moment, as far as we know, there isn’t a full stack data-centric architecture available to direct implementation.  What this means is that early adopters will have to roll their own.

This is what the early adopters I’m covering in my next book have done and—I expect for the next year or two at least— what the current crop of early adoptersThe Data-Centric Revolution will need to do.

I am writing a book that will describe in much greater detail the considerations that will go into each layer in the architecture.

This paper will outline what needs to be considered to give people an idea of the scope of such an undertaking.  You might have some of these layers already covered.

Simplicity

There are many layers to this architecture, and at first glance it may appear complex.  I think the layers are a pretty good separation of concern, and rather than adding to the complexity, I believe it may simplify it.

As you review the layers, do so through the prism of the two driving APIs.  There will be more than just these two APIs and we will get into the additional ones, as appropriate, but this is not going to be the usual Swiss army knife of a whole lot of APIs, with each one doing just a little bit.  The APIs are of course RESTful.

The core is composed of two APIs (with our working titles):

  • ExecuteNamedQuery—This API assumes a SPARQL query has been stored in the triple store and given a name. In addition, the query is associated with a set of substitutable parameters.  At run time, the name of the query is forwarded to the server with the parameter names and values.  The back end fetches the query, rewrites it with the parameter values in place, executes that, and returns it to the client.  Note that if the front end did not know the names of the available queries, it could issue another named query that returns all the available named queries (with their parameters).  Also note that this also implies the existence of an API that will get the queries into the database, but we’ll cover that in the appropriate layer when we get to it.
  • DeltaTriples—This API accepts two arrays of triples as its payload. One is the “adds” array, which lists the new triples that the server needs to create, and the other is “deletes,” which are the triples to be removed.  This puts a burden on the client.  The client will be constructing a UI from the triples it receives in a request, allowing a user to change data interactively, and then evaluate what changed.  This part isn’t as hard as it sounds when you consider that order is unimportant with triples.  There will be quite a lot going on with this API as we descend down the stack, but the essential idea is that this API is the single route through which all updates pass through, and will ultimately result in an ACID compliant transaction being updated to the triple store.

I’m going to proceed from the bottom (center) of the architecture up, with consideration for how these two key APIs will be influenced by each of the layers.

A graphic that ties this all together appears at the end of this article.

Data Layer

At the center of this architecture is the data.  It would be embarrassing if something else were at the center of the data-centric architecture.  The grapefruit wedges here are each meant to represent a different repository. There will be more than one repository in the architecture.

The darker yellow ones on the right are meant to represent repositories that are more highly curated.  The lighter ones on the left represent those less curated (perhaps data sets retrieved from the web).  The white wedge is a virtual repository.  The architecture knows where the data is but resolves it at query time. Finally, the cross hatching represents provenance data.  In most cases, the provenance data will be in each repository, so this is just a visual clue.

The two primary APIs bottom out here, and become queries and updates.

Federation Layer

One layer up is the ability to federate a query over multiple repositories.  At this time, we do not believe it will be feasible or desirable to spread an update over more than one repository (this would require the semantic equivalent of a two-phased commit).  In most implementations this will be a combination between native abilities of a triple store, reliance on support for the standards-based federation, and bespoke capability.  The federation layer will be interpreting the ExecuteNamedQuery requests.

Click here to read more on TDAN.com

Are You Spending Way Too Much on Software?

Alan Morrison, senior research fellow at PwC’s Center for Technology and Innovation, interviews Dave McComb forstrategy+business about why IT systems and software continue to cost more, but still under-deliver. McComb argues that legacy processes, excess code, and a mind-set that accepts high price tags as the norm have kept many companies from making the most of their data.

Global spending on enterprise IT could reach US$3.7 trillion in 2018, according to Gartner. The scale of this investment is surprising, given the evolution of the IT sector. Basic computing, storage, and networking have become commodities, and ostensibly cheaper cloud offerings such as infrastructure-as-a-service and Are you spending too much on software? software-as-a-service are increasingly well established. Open source software is popular and readily available, and custom app development has become fairly straightforward.

Why, then, do IT costs continue to rise? Longtime IT consultant Dave McComb attributes the growth in spending largely to layers of complexity left over from legacy processes. Redundancy and application code sprawl are rampant in enterprise IT systems. He also points to a myopic view in many organizations that enterprise software is supposed to be expensive because that’s the way it’s always been.

McComb, president of the information systems consultancy Semantic Arts, explores these themes in his new book, Software Wasteland: How the Application-Centric Mindset Is Hobbling Our Enterprises. He has seen firsthand how well-
intentioned efforts to collect data and translate it into efficiencies end up at best underdelivering — and at worst perpetuating silos and fragmentation. McComb recently sat down with s+b and described how companies can focus on the standard models that will ultimately create an efficient, integrated foundation for richer analytics.

Click here to read the Question & Answer session.

The gist Namespace Delimiter: Hash to Slash

The change in gist:

gist namespace delimiter

We recently changed the namespace for gist from

  • http://ontologies.semanticarts.com/gist#
    to
  • http://ontologies.semanticarts.com/gist/

What you need to do:

This change is backwards-incompatible with existing versions of gist. The good news is that the changes needed are straightforward. To migrate to the new gist will require changing all uses of gist URIs to use the new namespace. This will include the following:

  1. any ontology that imports gist
  2. any ontology that does not import gist, but that refers to some gist URIs
  3. any data set of triples that uses gist URIs

For 1 and 2, you need only change the namespace prefix and carry on as usual.  For files with triples that use namespaces you need to first change the namespaces and then reload the triples into any triple stores where the old files were loaded into.  If there triples use prefixed terms, then you need only change the prefixes. If the triples use full URIs then you will need to go a global replace swapping out the old namespace for the new one.

The rationale for making this change:

We think that other ontologists and semantic technologists may be interested in the reasons for this change. To that end, we re-trace the thought process and discussions we had internally as we debated the pros and cons of this change.

There are three key aspects of URIs that we are primarily interested in:

  • Global Uniqueness – the ability of triple stores to self-assemble graphs without resorting to metadata relies on the fact that URIs are globally unique
  • Human readability – we avoid traditional GUIDs because we prefer URIs that humans can read and understand.
  • Resolvability – we are interested in URIs that identify resources that could be located and resolved on the web (subject to security constraints).

The move from hash to slash was motivated by the third concern, the first two are not affected.

In the early days the web was a web of documents.  For efficiency reasons, the standards (including and especially RFC 3986[1]) declared that the hash designated a “same-document reference” that is everything after the hash was assumed to be in the document represented by the string up to the hash.  Therefore, the resolution was done in the browser and not on the server. This was a good match for standards, and for small (single document) ontologies.  As such, for many years, most ontologies used the hash convention, including owl, rdf, skos, void, vcard, umbel and good relations.

Anyone with large ontologies or large datasets that were hosted in databases and not documents adapted the / convention, including DBpedia, Schema.org, Snomed, Facebook, Foaf, Freebase, Open Cyc and the New York Times.

The essential tradeoff is for resolving the URI.  If you can be reasonably sure that everything you would want to provide to the user at resolution time, would be in relatively small document, then the hash convention is fine.

If you wish your resolution to have additional data that may not have been in the original document (say where used information that isn’t in the defining document) you need to do the resolution on the server.  Because of the standards, the server does not see anything after the hash so if you use the hash convention, rather that resolving the uri from the url address bar, you must programmatically call a server with the URI as an argument in the API call.

With the slash convention you have the choice of putting the URI in the URL bar and getting it resolved, or calling an API similar to the hash option above.

If you commit to API calls then there is a slight advantage to hash as it is slightly easier to parse on the back end.  In our opinion this slight advantage does not compare to the flexibility of being able to resolve through the URL bar as well as still having the option of using an API call for resolution.

The DBpedia SPARQL endpoint (http://dbpedia.org/sparql ) has thoughtfully prepopulated 240 of the most common namespaces in their Sparql editor.  At the time of this writing, 59 of the 240 use the hash delimiter.  Nearly 100 of the namespaces come from DBpedia’s decision to have a different namespace for each language, and when these are excluded the slash advantage isn’t nearly as pronounced (90 slashes versus 59 hashes) but still a predominance for slash.

We are committed to providing, in the future, a resolution service to make it easy to resolve our concepts through a URL address bar.  For the present the slash is just as good for all other purposes.  We have decided to eat the small migration cost now rather than later.

[1] https://www.rfc-editor.org/info/rfc3986