Whitepaper: Quantum Entanglement, Flipping Out and Inverse Properties

We take a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.

Property Inverses and Perspectives

It is important to understand that logically, both perspectives always exist; they are joined at the hip. If Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa. If from one perspective, a new relationship link is created or an existing one is broken, then that change is immediately reflected when viewed from the other perspective. This is a bit like two quantumly entangled particles. The change in one is instantly reflected in the other, even if they are separated by millions of light years. Inverse properties and entangled particles are more like two sides of the same coin, than two different coins.

 a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.
Figure 2: Two sides of the same coin.

 

In OWL we call the property that is from the other perspective the inverse property. Given that a property and its inverse are inseparable, technically, you cannot create or use one without [implicitly] creating or using the other. If you create a property hasParent, there is an OWL syntax that lets you refer to and use that property’s inverse. In Manchester syntax you would write: “inverse(hasParent)”. The term ‘inverse’ is a function that takes an object property as an argument and returns the inverse of that property. If you assert that Michael hasParent Joan, then the inverse assertion, Joan inverse(hasParent) Michael, is inferred to hold. If you decide to give the inverse property the name parentOf, then the inverse assertion is that Joan parentOf Michael. This is summarized in Figure 3 and the table below.

Click here to read more and download the White-paper

Written by Michael Uschold

Naming an Inverse Property: Yay or Nay?

Inverse Property

Figure 1: Quantum Entanglement

 

For a fuller treatment of this topic, see the whitepaper:  Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

  1. being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
  2. guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed.  From the perspective of the child we might assert the triple: Michael :hasParent Joan.  Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa.  So asserting any triple results in the implicit assertion of an inverse triple.  It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple.  For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

 

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful.   Below are four downsides of using named inverses (roughly in order of growing importance).  The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

  1. Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
  2. Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
  3. Slower Inference: Too many named inverses can significantly slow down inference
  4. More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.

Semantic Modeling: Getting to the Core

Most large organizations have a lot of data and very little useful information. The reason being, every time they encounter a problem, they build (or more often buy) another computer application system. Each application has its own completely arbitrary data model designed for the task at hand, at that time, and which used whatever simplification seemed appropriate in that instance.

The net result, depending on the size of the organization, is hundreds or thousands of applications— occasionally, tens of thousands—each with its own data model. Each data model has hundreds to thousands of tables, occasionally, tens of thousands (the average SAP install has 95,000 tables), and each table has dozens of columns. The net result is trying to run your company using upwards of millions of distinct data types. For all practical terms, this is impossible.

Most companies spend most of their (very high) IT budget on maintaining these systems (as they are very complex) or attempting to integrate them (and doing a very partial job of it).

This seems pretty bleak and makes it hard to see a way out. What will drop the scales from your eyes is when you see a model that covers all the concepts you use to run your business that has just a few hundred concepts—a few hundred concepts—with a web of relationships between them. Typically, this core is then augmented by thousands of “taxonomic” distinctions; however, these thousands of distinctions can be organized and put into their place for much better management and understanding.

data model

Once you have this core model (or ontology, as we call it, just to be fancy), everything becomes simpler: integration, because you map the complex systems to the sample and not to each other, and application development, because you build on a smaller footprint. And it now becomes possible to incorporate types of data previously thought un-integrate-able, such as unstructured, semi-structured, and/or social media data.

Semantic Arts has built these types of core data models for over a dozen very large firms, in almost as many industries, and helped to leverage them for their future information systems.  We now can do this in a very predictable and short period of time.  We’d be happy to discuss the possibilities with you.

Feel free to send us a note at [email protected].

Written by Dave McComb

The Evolution of the Data-Centric Revolution Part One

We have been portraying the move to a Data-Centric paradigm as a “Revolution” because of the major mental and cultural shifts that are prerequisites to making this shift. In another sense, the shift is the result of a long, gradual process; one which would have to be characterized as “evolutionary.”

This column is going to review some of the key missing links in the evolutionary history of the movement.

(For more on the Data Centric Revolution, see The Data Centric Revolution. In the likelihood that you’re not already data-centric, see The Seven Warning Signs of Appliosclerosis)

Applications as Decks of Cards

In the 50’s and 60’s, many computer applications made very little distinction between data and programs. A program was often punched out on thin cardboard “computer cards.” The data was punched out on the same kind of cards. The two decks of cards were put in the hopper together, and voila, output came out the other end. Payroll was a classic example of applications in this era. There was a card for each employee with their Social Security Number, rate of pay, current regular hours, overtime hours, and a few other essential bits of data. The program referred to data by the “column” numbers on the card where the data was found. Often people didn’t think of the data as separate from the program, as the two were intimately connected.

Click here to view on TDAN.com

The Data-Centric Revolution: The Warning Signs

Of all the dangers that befall those on the journey to data centrism, by far the greatest danger is Appliosclerosis. Appliosclerosis, or as lay people know it- hardening of the silos, can strikewarning signs any one at any time, but some are more prone to it than others. By the time Appliosclerosis has metastasized it may be too late, isolated and entrenched data models may already be firmly established in various vital departments, and extreme rationalization therapy may be the only option, perhaps followed by an intense taxo regimen.

In this brief piece we will lay out the symptoms that are most associated with the condition, and steps you can take to avoid early onset.

  • Warning Sign 1: Fear of New Wheels – one of the most consistent early behavioral predictors of Appliosclerosis is Wheelophobia. This usually begins with executives making statements such as “Let’s not reinvent the wheel here.” This is an innocuous sounding bromide, after all, who wants yet another wheel? But this cliché is a Trojan Horse, and each of the Greek soldiers that come out of its belly carries the gift of yet another incompatible data model. Before you know it, the intention to avoid new wheels leaves the afflicted with a panoply of arbitrarily different and disconnected data models.
  • Warning Sign 2: The Not (Not Invented Here Syndrome) – Curiously one of the most potent antibodies against Wheelophobia is the “Not Invented Here Syndrome (NIHS)” Those afflicted with NIHS (and it is generally believed to be hereditary) have a predisposition to custom build information systems whenever given a chance. While this does have many negative side effects, the positive side effect is that it curtails Appliosclerosis through two mechanisms. The first is by starving the nascent Applio tumors from developing by denying resources. The second is that NIHS is a very slow growing condition. Most organizations die with NIHS, not from it. The slow growth prompts some organizations to suppress the NIHS antibody with the bio-reactive NIHS complement (!NIHS).

Click here to read more on TDAN.com

What’s exciting about SHACL: RDF Data Shapes

An exciting new standard is under development at the W3C to add some much needed functionality to OWL. The main goals are to provide a concise, uniform syntax (presently called SHACL for Shapes Constraint Language) for both describing and constraining the contents of an RDF graph.  This dual purpose is what makes this such an exciting and useful technology.

RDF Data Shapes

What is a RDF Data Shape?

An RDF shape is a formal syntax for describing how data ishow data should be, or how data must be.

For example:

ex:ProductShape 
	a sh:Shape ;
	sh:scopeClass ex:Product ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];
	sh:property [
		sh:predicate ex:soldBy;
		sh:valueShape ex:SalesOrganizationShape ;
		sh:minCount 1;
	].

ex:SalesOrganizationShape
	a sh:Shape ;
	sh:scopeClass ex:SalesOrganization ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];

This can be interpreted as a description of what is (“Products have one label and are sold by at least one sales organization”), as a constraint (“Products must have exactly one label and must be sold by at least one sales organization”), or as a description of how data should be even if nonconforming data is still accepted by the system.  In the next sections I’d like to comment on a number of use cases for data shapes.

RDF Shapes as constraints

The primary use case for RDF data shapes is to constrain data coming into a system.  This is a non-trivial achievement for graph-based systems, and I think that the SHACL specification is a much better solution for achieving this than most.  Each of the SHACL atoms can, in principle, be expressed as an ASK query to evaluate the soundness of a repository.

RDF Shapes as a tool for describing existing data

OWL ontologies are good for describing the terms and how they can be used but lack a mechanism for describing what kinds of things have been said with those terms.  Data shapes fulfill this need nicely, which can make it significantly easier to perform systems integration work than simple diagrams or other informal tools.

Often in the course of building applications, the model is extended in ways that may be perfectly valid but otherwise undocumented.  Describing the data in RDF shapes provides a way to “pave the cow paths”, so to speak.

A benefit of this usage is that you get the advantages of being schema-less (since you may want to incorporate data even if it doesn’t conform) while still maintaining a model of how data can conform.

Another use case for this is when you are providing data to others.  In this case, you can provide a concise description of what data exists and how to put it together, which leads us to…

RDF Shapes as an outline for SELECT queries

A nice side-effect of RDF shapes that we’ve found is that once you’ve defined an object in terms of a shape, you’ve also essentially outlined how to query for it.

Given the example provided earlier, it’s easy to come up with:

SELECT ?product ?productLabel ?orgLabel WHERE {
	?product 
		a ex:Product ;
		rdfs:label ?productLabel ; 
		ex:soldBy ?salesOrg .
	?salesOrg
		a ex:SalesOrganization ;
		rdfs:label ?orgLabel .
}

None of this is made explicit by the OWL ontology—we need either something informal (e.g., diagrams and prose) or formal (e.g., the RDF shapes) to tell us how these objects relate in ways beyond disjointedness, domain/range, etc.

RDF Shapes as a mapping tool

I’ve found RDF shapes to be tremendously valuable as a tool for specifying how very different data sources map together.  For several months now we’ve been performing data conversion using R2RML.  While R2RML expresses how to map the relational DB to an RDF graph, it’s still extremely useful to have something like an RDF data shapes document to outline what data needs to be mapped.

I think there’s a lot of possibility for making these two specifications more symbiotic. For example, I could imagine combining the two (since it is all just RDF, after all) to specify in one pass what shape the data will take and how to map it from a relational database.

The future – RDF Shapes as UI specification

Our medium-term goal for RDF shapes is to generate a basic UI from a shapes specification. While this obviously wouldn’t work in 100% of use cases, there are a lot of instances where a barebones form UI would be fine, at least at first.  There are actually some interesting advantages to this; for instance, validation can be declared right in the model.

For further reading, see the W3C’s SHACL Use Cases and Requirements paper.  It touches on these use cases and many others.  One very interesting use case suggested in this paper is as a tool for data interoperability for loose-knit communities of practice (say, specific academic disciplines or industries lacking data consortia).  Rather than completely go without models, these communities can adopt guidelines in the form of RDF shapes documents.  I can see this being extremely useful for researchers working in disciplines lacking a comprehensive formal model (e.g., the social sciences); one researcher could simply share a set of RDF shapes with others to achieve a baseline level of data interoperability.

Governance in a Data-Centric Environment

How a Data-Centric Environment Becomes Harder to Govern

A traditional data landscape has the advantage of being extremely silo-ed.  By taking your entire data landscape and dividing it into thousands of databases, there is the potential that each database is small enough to be manageable.

As it turns out this is more potential than actuality.  Many of the individual application data models that we look at are individually more complex than the entire enterprise model should be.  However, that doesn’t help anyone trying to govern.  It is what it is.

What is helpful about all this silo-ization is that each silo has a smaller community of interest.  When you cut through all the procedures, maturity models and the like, governance is a social problem.  Social problems, such as “agreement,” get harder the more people you get involved.

From this standpoint, the status quo has a huge advantage, and a Data-Centric firm has a big challenge: there are far more people whose agreement one needs to solicit and obtain.

The other problem that Data-Centric brings to the table is the ease of change.  Data Governance likes things that change slower than the process can manage.  Often this is a toss-up.  Most systems are hard to change and most data governance processes are slow.  They are pretty much made for each other.

I remember when we built our first model driven application environment (unfortunately we chose health care for our first vertical).  We showed how you could change the UI, API, Schema, Constraints, etc.  in real time.  This freaked our sponsors out.  They couldn’t imagine how they would manage [govern] this kind of environment.  In retrospect, they were right.  They would not have been able to manage it.

This doesn’t mean the approach isn’t valid— it means we need to spend a lot more time on the approach to governance. We have two huge things working against us: we are taking the scope from tribal silos to the entire firm and we are increasing the tempo of change.

How a Data-Centric Environment Becomes Easier to Govern

A traditional data landscape has the disadvantage of being extremely silo-ed.  You get some local governance being silo-ed, but you have almost no hope on enterprise governance.  This is why its high-fives all around for local governance, while making little progress on firm wide governance.

One thing that data-centric provides that makes the data governance issues tractable is incredible reduction in complexity.  Because governance is a human activity, getting down to human scales of complexity is a huge advantage.

Furthermore, to enjoy the benefits of data-centric you have to be prepared to share.  A traditional environment encourages copying of enterprise data to restructure it and adapt it to your own local needs.  Pretty much all enterprises have data on their employees.  Lots of data actually.  A large percentage of applications also have data on employees.  Some merely have “users” (most of whom are employees) and their entitlements, but many have considerably more.  Inventory systems have cycle counters, procurement systems have purchasing agents, incident systems have reporters, you get the pattern.

Each system is dealing with another copy (maybe manually re-entered, maybe from a feed) of the core employee data.  Each system has structured the local representation differently and of course named all the fields differently.  Some of this is human nature, or maybe data modeler nature, that they want to put their own stamp on things, but some of it is inevitable.  When you buy a package, all the fields have names.  Few, if any of them, are the names you would have chosen, or the names in your enterprise model, if you have one.

With the most mature form of data-centric, you would have one set of enterprise employee data.  You can extend it, but the un-extended parts are used just as they are.  For most developers, this idea sounds either too good to be true or too bad to be true.  Most developers are comfortable with a world they control.  This is a world of tables within their database.  They can manage referential integrity within that world.  They can predict performance within that world.  They don’t like to think about a world where they have to accept someone else’s names and structures, and to agree with other groups decision making.

But once you overcome developer inertia on this topic and you are actually re-using data as it is, you have opened up a channel of communication that naturally leads to shared governance. Imagine a dozen departments consuming the exact same set of employee data.  Not local derivations of the HR golden record, or the LDAP files, but an actual shared data set.  They are incented to work together on the common data.  The natural thing to happen, and we have seen this in mature organizations, is the focus shifts to the smallest, realest, most common data elements.  This social movement, and this focus on what is key and what is real, actually makes it easier to have common governance.  You aren’t trying to foist one applications view of the world on the rest of the firm, you are trying to get the firm to understand and communicate what it cares about and what it shares.

And this creates a natural basis for governance despite the fact that the scope became considerably larger.

Click here to read more on TDAN.com

The Data-Centric Revolution

This is the first of a regular series of columns from Dave McComb. Dave’s column, The Data-Centric Revolution, will appear every quarter. Please join TDAN.com in welcoming Dave to these pages and stop by often to see what he has to say.

We are in the early stages of what we believe will be a very long and gradual transition of corporate and government information systems. As the transition gets underway, many multi-The Data-Centric Revolutionbillion dollar industries will be radically disrupted. Unlike many other disruptions, the revenues currently flowing to information systems companies will not merely be allocated to newer more nimble players. Much of the revenue in this sector will simply evaporate as we collectively discover what a large portion of the current amount spent on IT is unnecessary.

The benefits will mostly accrue to the consumers of information systems, and those benefits will be proportional to the speed and completeness that they embrace the change.

The Data-Centric Revolution in a Nutshell

In the data-centric enterprise, data will be a permanent shared asset and applications will come and go. When your re-ordering system no longer satisfies your changing requirements, you will bring in a new one, and let the old one go. There will be no data conversion. All analytics that worked before will continue to work. User interfaces, names of fields, and code values will be similar enough that very little training will be required.

Click here to read more on TDAN.com

Click here to read Chapter 2 of Dave’s book, “The Data-Centric Revolution”

Human Scale Software Architecture

In the physical built world there is the concept of “human scale” architecture, in other words, architecture that has been designed explicitly with the needs and constraints of humans in mind:data model humans that are typically between a few feet and 7 ft. tall and will only climb a few sets of stairs at a time, etc.

What’s been discovered in the physical construction of human scale architecture is that it is possible to build buildings that are more livable and more desirable to be lived in, which are more maintainable, can be evolved and turned into different uses over time, and need not be torn down far short of their potential useful life. We bring this concept to the world of software and software architecture because we feel that some of the great tragedies of the last 10 or 15 years have been the attempts to build and implement systems that are far beyond human scale.

Non- human scale software systems

There have been many reported instances of “runaway” projects; mega projects and projects that collapse under their own weight. The much quoted Standish Group reports that projects over $10 million in total cost have close to a 0% chance of finishing successfully, with success being defined as most of the promised functions within some reasonable percent of the original budget.

James Gosling, father of Java, recently reported that most Java projects have difficulty scaling beyond one million lines of code. Our own observations of such mega projects as the Taligent Project, the San Francisco project, and various others, find that tens of thousands or in some cases hundreds of thousands of classes in a class library are not only unwieldy for any human to comprehend and manage but are dysfunctional in and of themselves.

Where does the “scale” kick in?

What is it about systems that exceeds the reach of humans? Unlike buildings where the scale is proportional to the size of our physical frames, information systems have no such boundary or constraint. What we have are cognitive limits. George Miller famously pointed out in the mid-fifties that the human mind could only retain in its immediate short-term memory seven, plus or minus two, objects. That is a very limited range of cognitive ability to hold in one’s short-term memory. We have discovered that the short-term memory can be greatly aided by visual aids and the like (see our paper, “The Magic Number 200+/- 50”), but even then there are some definite limits in the realm of visual acuity and field of vision.

Leveling the playing field

What data modelers found a long time ago, although in practice had a difficult time disciplining themselves to implement, was that complex systems needed to be “leveled,” i.e., partitioned in the levels of detail such that at each level a human could comprehend the whole. We need this for our enterprise systems now. The complexity of existing systems is vast, and in many cases there is no leveling mechanism.

The Enterprise Data Model: Not Human Scale

Take, for instance, the corporate data model. Many corporations constructed a corporate data model in the 1980s or 1990s. Very often they may have started with a conceptual data model which was then transformed into a logical data model and eventually found its way to a physical data model; an actual implemented set of tables and columns and relationships in databases. And while there may have been some leveling or abstraction in the conceptual and logical models, there is virtually none in the physical implementation. There is merely a partitioning which has usually occurred either by the accidental selection of projects or by the accidental selection of packages to acquire and implement.

As a result, we very often have the very same concept implemented in different applications with different names or sometimes a similar concept with different names. In any case, what is implemented or purchased very often is a large flat model consisting of thousands and usually tens of thousands of attributes. Any programmer and many users must understand what all or many of these attributes are and how are they used and how they are related to each other in order to be able to safely use the system or make modifications to it. Understanding thousands or tens of thousands of attributes is at the edge of human cognitive ability, and generally is only done by a handful of people who devote themselves to it full time.

Three approaches to taming the complexity

Divide and Conquer

One of the simplest ways of reducing complexity is to divide the problem down. This only works if after you’ve made the division you no longer need to understand the rest of the parts in detail. Merely dividing an ERP system into modules generally does not reduce the scope of the complexity that need to be understood.

Useful Abstraction

By abstracting we can gain two benefits. First there are fewer things to know and deal with, and second we can concentrate on behavior and rules that apply to the abstraction. Rather than deal separately with twenty types of licenses and permits (as one of our clients was doing) it is possible to treat all of them as special cases of a single abstraction. For this to be useful two more things are needed: there must be a way to distinguish the variations, without having to deal with the difference all the time; and it must be possible to deal with the abstraction without invoking all the detail.

Just in Time Knowledge

Instead of learning everything about a data model, withmodeproper tools we can defer our learning about part of the model until we need to. This requires an active metadata repository that can explain the parts of the model we don’t yet know in terms that we do know.

Written by Dave McComb

White Paper: What is a Market?

The term “market” is a common term in the business industry. We talk about the automotive market, the produce market, the disk drive market, etc. And yet, what do we really mean when we useWhat is a market? that term?  It is an instructive question because very often CRM (Customer Relationship Management) systems or other sales analytic systems group customers and sales based on their “market”.  However, if we don’t have a clear understanding of what a market is, we misrepresent and misgroup and therefore mislead ourselves as we study the implication of our advertising and promotional efforts.

The Historical Market

Historically, markets were physical places: marketplaces.  People went to the market in order to conduct trade. In many communities there was a single market where all goods were bought and sold.  However, as communities became larger and developed into cities, marketplaces began to specialize and there was a particular place to go to buy and sell fresh produce. There was another place to go to buy and sell spices; and yet another to buy and sell furniture.

As time progressed and the economy became more and more differentiated and cities grew larger and larger, marketplaces became even more specialized and more geographically localized.  So, for instance, we have the diamond marketplace in Antwerp, Belgium, and the screenplay marketplace in Hollywood.

Why Did Physical Marketplaces Emerge and Dominate?

The trend toward physical marketplaces was not necessarily inevitable. Buyers could seek out sellers at their own place of business and conduct business that way.  Conversely, sellers could seek out buyers in their own place of business.  What led to the popularity of the marketplace were two factors. One was that the physical movement to the marketplace was collectively more efficient for most of the participants. The second reason was that the marketplace allowed easy selection and comparison between similar offerings.  Additionally, the cost of information about potential sources or demands for a given product or service was not nearly as economical as it is today with computers and the Internet.

Download the White-paper

Written by Dave McComb