Debugging Enterprise Ontologies

Michael Uschold gave a talk at the International Workshop on Completing and Debugging the Semantic Web held in Crete on May 30, 2016.   Here is a preview of the white paper, “Finding and Avoiding Bugs in Enterprise Ontologies” by Michael Uschold:

Finding and Avoiding Bugs in Enterprise Ontologies

Abstract: We report on ten years of experience building enterprise ontologies for commercial clients. We describe key properties that an enterprise ontology should have, and illustrate them with many real world examples. They are: correctness, understandability, usability, and completeness. We give tips and guidelines for how best to use inference and explanations to identify and track down problems. We describe a variety of techniques that catch bugs that an inference engine will not find, at least not on its own. We describe the importance of populating the ontology with data to drive out more bugs. We point out some common ontology design practices in the community that lead to bugs in ontologies and in downstream semantic web applications based on the ontologies. These include proliferation of namespaces, proliferation of properties and inappropriate use of domain and range. We recommend doing things differently to prevent bugs from arising.

Introduction
In a manner analogous to software debugging, ontologies need to be rid of their flaws. The types of flaws to be found in an ontology are slightly different than those found in software, and revolve around the ideas of correctness, understandability, usability and completeness. We report on our experience (spanning more than a decade) in building and debugging enterprise ontologies for large companies in a wide variety of industries including: finance, healthcare, legal research, consumer products, electrical devices, manufacturing and digital assets. For the growing number of companies starting to use ontologies, the norm is to build a single ontology for a point solution in one corner of the business. For large companies, this leads to any number of independently developed ontologies resulting in many of the same heterogeneity problems that ontologies are supposed to solve. It would help if they all used the same upper ontology, but most upper ontologies are unsuitable for enterprise use. They are hard to understand and use because they are large and complex, containing much more than is necessary, or the focus is too academic to be of use in a business setting. So the first step is to start with a small, upper, enterprise ontology such as gist [McComb 2006], which includes core concepts relevant to almost any enterprise. The resulting enterprise ontology itself will consist of a mixture of concepts that are important to any enterprise in a given industry, and those that are important to a particular enterprise. An enterprise ontology plays the role of an upper ontology for all the ontologies in a company (Fig. 1). Major divisions will import and extend it. Ontologies that are specific to particular applications will, in turn, import and extend those. The enterprise ontology evolves to be the semantic foundation for all major software systems and databases that are core to the enterprise.

Click here to download the white paper.

Click here to download the presentation.

Evolve your Non-Temporal Database in Place

At Semantic Arts, we recently decided to upgrade our internal system to turn something that was a not temporal (our billing rates) into something that was. Normally, that would be a pretty big change.  As it turned out, it was pretty straightforward and could be done, as an in place update.  It turned out to be a pretty good mini case study for how using semantics and a graph database can make these kinds of changes far less painful.

So, Dave McComb documented it in a YouTube video.

 

Click here to view: Upgrade a non Temporal Database in Place

Introduction to FIBO Quick Start

We have just launched our “FIBO Quick start” offering.  If you are in the financial industry you likely have heard about the Financial Industry Business Ontology, which has beenFIBO championed by the EDM Council, a consortium of virtually the entire who’s who of the financial industry. We’ve been helping with FIBO almost since its inception, and more recently Michael Uschold has be co-leading the mortgage and loan ontology development effort.  Along the way we’ve done several major projects for financial clients, and have reduced what we know to a safe and quick approach to adopting semantics in the financial sector. We have the capacity to take on one more client in the financial space, so if you’re interested, by all means contact us.

FIBO Quick Start: Developing Business Value Rapidly with Semantics

The Financial Industry Business Ontology is nearing completion. As of June 2016, nine major financial institutions have joined the early adopter program. It is reasonable to expect that in the future all Financial Industry participants will have aligned some of their systems with FIBO. Most have focused their initial projects on incorporating the FIBO vocabulary. This is a good first step and can jump start a lot of compliance work.

But the huge winners, in our opinion, will be the few institutions that see the potential and go all-in with this approach. For sixteen years, we have been working with large enterprises who are interested in adopting semantic technology. Initially, our work focused on architecture and design as firms experimented with ways to incorporate these new approaches. More recently, we have been implementing what we call the “data-centric approach” to building semantically-centered systems in an agile fashion.

Click here to read more. 

Naming an Inverse Property: Yay or Nay?

Inverse Property

Figure 1: Quantum Entanglement

 

For a fuller treatment of this topic, see the whitepaper:  Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

  1. being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
  2. guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed.  From the perspective of the child we might assert the triple: Michael :hasParent Joan.  Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa.  So asserting any triple results in the implicit assertion of an inverse triple.  It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple.  For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

 

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful.   Below are four downsides of using named inverses (roughly in order of growing importance).  The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

  1. Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
  2. Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
  3. Slower Inference: Too many named inverses can significantly slow down inference
  4. More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.

Semantic Modeling: Getting to the Core

Most large organizations have a lot of data and very little useful information. The reason being, every time they encounter a problem, they build (or more often buy) another computer application system. Each application has its own completely arbitrary data model designed for the task at hand, at that time, and which used whatever simplification seemed appropriate in that instance.

The net result, depending on the size of the organization, is hundreds or thousands of applications— occasionally, tens of thousands—each with its own data model. Each data model has hundreds to thousands of tables, occasionally, tens of thousands (the average SAP install has 95,000 tables), and each table has dozens of columns. The net result is trying to run your company using upwards of millions of distinct data types. For all practical terms, this is impossible.

Most companies spend most of their (very high) IT budget on maintaining these systems (as they are very complex) or attempting to integrate them (and doing a very partial job of it).

This seems pretty bleak and makes it hard to see a way out. What will drop the scales from your eyes is when you see a model that covers all the concepts you use to run your business that has just a few hundred concepts—a few hundred concepts—with a web of relationships between them. Typically, this core is then augmented by thousands of “taxonomic” distinctions; however, these thousands of distinctions can be organized and put into their place for much better management and understanding.

data model

Once you have this core model (or ontology, as we call it, just to be fancy), everything becomes simpler: integration, because you map the complex systems to the sample and not to each other, and application development, because you build on a smaller footprint. And it now becomes possible to incorporate types of data previously thought un-integrate-able, such as unstructured, semi-structured, and/or social media data.

Semantic Arts has built these types of core data models for over a dozen very large firms, in almost as many industries, and helped to leverage them for their future information systems.  We now can do this in a very predictable and short period of time.  We’d be happy to discuss the possibilities with you.

Feel free to send us a note at [email protected].

Written by Dave McComb

The Evolution of the Data-Centric Revolution Part One

We have been portraying the move to a Data-Centric paradigm as a “Revolution” because of the major mental and cultural shifts that are prerequisites to making this shift. In another sense, the shift is the result of a long, gradual process; one which would have to be characterized as “evolutionary.”

This column is going to review some of the key missing links in the evolutionary history of the movement.

(For more on the Data Centric Revolution, see The Data Centric Revolution. In the likelihood that you’re not already data-centric, see The Seven Warning Signs of Appliosclerosis)

Applications as Decks of Cards

In the 50’s and 60’s, many computer applications made very little distinction between data and programs. A program was often punched out on thin cardboard “computer cards.” The data was punched out on the same kind of cards. The two decks of cards were put in the hopper together, and voila, output came out the other end. Payroll was a classic example of applications in this era. There was a card for each employee with their Social Security Number, rate of pay, current regular hours, overtime hours, and a few other essential bits of data. The program referred to data by the “column” numbers on the card where the data was found. Often people didn’t think of the data as separate from the program, as the two were intimately connected.

Click here to view on TDAN.com

What’s exciting about SHACL: RDF Data Shapes

An exciting new standard is under development at the W3C to add some much needed functionality to OWL. The main goals are to provide a concise, uniform syntax (presently called SHACL for Shapes Constraint Language) for both describing and constraining the contents of an RDF graph.  This dual purpose is what makes this such an exciting and useful technology.

RDF Data Shapes

What is a RDF Data Shape?

An RDF shape is a formal syntax for describing how data ishow data should be, or how data must be.

For example:

ex:ProductShape 
	a sh:Shape ;
	sh:scopeClass ex:Product ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];
	sh:property [
		sh:predicate ex:soldBy;
		sh:valueShape ex:SalesOrganizationShape ;
		sh:minCount 1;
	].

ex:SalesOrganizationShape
	a sh:Shape ;
	sh:scopeClass ex:SalesOrganization ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];

This can be interpreted as a description of what is (“Products have one label and are sold by at least one sales organization”), as a constraint (“Products must have exactly one label and must be sold by at least one sales organization”), or as a description of how data should be even if nonconforming data is still accepted by the system.  In the next sections I’d like to comment on a number of use cases for data shapes.

RDF Shapes as constraints

The primary use case for RDF data shapes is to constrain data coming into a system.  This is a non-trivial achievement for graph-based systems, and I think that the SHACL specification is a much better solution for achieving this than most.  Each of the SHACL atoms can, in principle, be expressed as an ASK query to evaluate the soundness of a repository.

RDF Shapes as a tool for describing existing data

OWL ontologies are good for describing the terms and how they can be used but lack a mechanism for describing what kinds of things have been said with those terms.  Data shapes fulfill this need nicely, which can make it significantly easier to perform systems integration work than simple diagrams or other informal tools.

Often in the course of building applications, the model is extended in ways that may be perfectly valid but otherwise undocumented.  Describing the data in RDF shapes provides a way to “pave the cow paths”, so to speak.

A benefit of this usage is that you get the advantages of being schema-less (since you may want to incorporate data even if it doesn’t conform) while still maintaining a model of how data can conform.

Another use case for this is when you are providing data to others.  In this case, you can provide a concise description of what data exists and how to put it together, which leads us to…

RDF Shapes as an outline for SELECT queries

A nice side-effect of RDF shapes that we’ve found is that once you’ve defined an object in terms of a shape, you’ve also essentially outlined how to query for it.

Given the example provided earlier, it’s easy to come up with:

SELECT ?product ?productLabel ?orgLabel WHERE {
	?product 
		a ex:Product ;
		rdfs:label ?productLabel ; 
		ex:soldBy ?salesOrg .
	?salesOrg
		a ex:SalesOrganization ;
		rdfs:label ?orgLabel .
}

None of this is made explicit by the OWL ontology—we need either something informal (e.g., diagrams and prose) or formal (e.g., the RDF shapes) to tell us how these objects relate in ways beyond disjointedness, domain/range, etc.

RDF Shapes as a mapping tool

I’ve found RDF shapes to be tremendously valuable as a tool for specifying how very different data sources map together.  For several months now we’ve been performing data conversion using R2RML.  While R2RML expresses how to map the relational DB to an RDF graph, it’s still extremely useful to have something like an RDF data shapes document to outline what data needs to be mapped.

I think there’s a lot of possibility for making these two specifications more symbiotic. For example, I could imagine combining the two (since it is all just RDF, after all) to specify in one pass what shape the data will take and how to map it from a relational database.

The future – RDF Shapes as UI specification

Our medium-term goal for RDF shapes is to generate a basic UI from a shapes specification. While this obviously wouldn’t work in 100% of use cases, there are a lot of instances where a barebones form UI would be fine, at least at first.  There are actually some interesting advantages to this; for instance, validation can be declared right in the model.

For further reading, see the W3C’s SHACL Use Cases and Requirements paper.  It touches on these use cases and many others.  One very interesting use case suggested in this paper is as a tool for data interoperability for loose-knit communities of practice (say, specific academic disciplines or industries lacking data consortia).  Rather than completely go without models, these communities can adopt guidelines in the form of RDF shapes documents.  I can see this being extremely useful for researchers working in disciplines lacking a comprehensive formal model (e.g., the social sciences); one researcher could simply share a set of RDF shapes with others to achieve a baseline level of data interoperability.

Governance in a Data-Centric Environment

How a Data-Centric Environment Becomes Harder to Govern

A traditional data landscape has the advantage of being extremely silo-ed.  By taking your entire data landscape and dividing it into thousands of databases, there is the potential that each database is small enough to be manageable.

As it turns out this is more potential than actuality.  Many of the individual application data models that we look at are individually more complex than the entire enterprise model should be.  However, that doesn’t help anyone trying to govern.  It is what it is.

What is helpful about all this silo-ization is that each silo has a smaller community of interest.  When you cut through all the procedures, maturity models and the like, governance is a social problem.  Social problems, such as “agreement,” get harder the more people you get involved.

From this standpoint, the status quo has a huge advantage, and a Data-Centric firm has a big challenge: there are far more people whose agreement one needs to solicit and obtain.

The other problem that Data-Centric brings to the table is the ease of change.  Data Governance likes things that change slower than the process can manage.  Often this is a toss-up.  Most systems are hard to change and most data governance processes are slow.  They are pretty much made for each other.

I remember when we built our first model driven application environment (unfortunately we chose health care for our first vertical).  We showed how you could change the UI, API, Schema, Constraints, etc.  in real time.  This freaked our sponsors out.  They couldn’t imagine how they would manage [govern] this kind of environment.  In retrospect, they were right.  They would not have been able to manage it.

This doesn’t mean the approach isn’t valid— it means we need to spend a lot more time on the approach to governance. We have two huge things working against us: we are taking the scope from tribal silos to the entire firm and we are increasing the tempo of change.

How a Data-Centric Environment Becomes Easier to Govern

A traditional data landscape has the disadvantage of being extremely silo-ed.  You get some local governance being silo-ed, but you have almost no hope on enterprise governance.  This is why its high-fives all around for local governance, while making little progress on firm wide governance.

One thing that data-centric provides that makes the data governance issues tractable is incredible reduction in complexity.  Because governance is a human activity, getting down to human scales of complexity is a huge advantage.

Furthermore, to enjoy the benefits of data-centric you have to be prepared to share.  A traditional environment encourages copying of enterprise data to restructure it and adapt it to your own local needs.  Pretty much all enterprises have data on their employees.  Lots of data actually.  A large percentage of applications also have data on employees.  Some merely have “users” (most of whom are employees) and their entitlements, but many have considerably more.  Inventory systems have cycle counters, procurement systems have purchasing agents, incident systems have reporters, you get the pattern.

Each system is dealing with another copy (maybe manually re-entered, maybe from a feed) of the core employee data.  Each system has structured the local representation differently and of course named all the fields differently.  Some of this is human nature, or maybe data modeler nature, that they want to put their own stamp on things, but some of it is inevitable.  When you buy a package, all the fields have names.  Few, if any of them, are the names you would have chosen, or the names in your enterprise model, if you have one.

With the most mature form of data-centric, you would have one set of enterprise employee data.  You can extend it, but the un-extended parts are used just as they are.  For most developers, this idea sounds either too good to be true or too bad to be true.  Most developers are comfortable with a world they control.  This is a world of tables within their database.  They can manage referential integrity within that world.  They can predict performance within that world.  They don’t like to think about a world where they have to accept someone else’s names and structures, and to agree with other groups decision making.

But once you overcome developer inertia on this topic and you are actually re-using data as it is, you have opened up a channel of communication that naturally leads to shared governance. Imagine a dozen departments consuming the exact same set of employee data.  Not local derivations of the HR golden record, or the LDAP files, but an actual shared data set.  They are incented to work together on the common data.  The natural thing to happen, and we have seen this in mature organizations, is the focus shifts to the smallest, realest, most common data elements.  This social movement, and this focus on what is key and what is real, actually makes it easier to have common governance.  You aren’t trying to foist one applications view of the world on the rest of the firm, you are trying to get the firm to understand and communicate what it cares about and what it shares.

And this creates a natural basis for governance despite the fact that the scope became considerably larger.

Click here to read more on TDAN.com

Linked Data Platform

The Linked Data Platform has achieved W3C recommendation status (which is pretty much acceptance as a standard) Linked Data Platform .  There are some good hints in LDP Primer and LDP Best Practiceslinked data .

This is the executive two paragraph treatment, to get you at least conversant with the topic.

Basically, LDP says if you treat everything like a container, and use the ldp:contains relationship to the things in the container, then the platform can treat everything consistently.  This gives us a RESTful interface onto a rdf database.  You can read from it and write to it, as long as there is a way to map your ontology to Containers and ldp:contains relationships.

Say you have a bunch of inventory related data.  You could declare that there is an Inventory container, and the connection between the Inventory Container and the Warehouses might be based on the hasStockkeeping locations.  Each Warehouse in turn could be cast as a Container and the contains relationship could point to the CatalogItems.

A promising way of getting a RESTful interface on a triple store.