Whitepaper: Avoiding Property Proliferation

Domain and range for ontological properties are not about data integrity, but logical necessity. Misusing them leads to an inelegant (and unnecessary) proliferation of properties.

Logical Necessity Meets Elegance

Screwdrivers generally have only a small set of head configurations (flat, Phillips, hex) because the intention is to make accessingproperties contents or securing parts easy (or at least uniform). Now, imagine how frustrating it would be if every screw and bolt in your house or car required a unique screwdriver head. They might be grouped together (for example, a bunch of different sized hex heads), but each one was slightly different. Any maintenance task would take much longer and the amount of time spent just organizing the screwdrivers would be inordinate. Yet that is precisely the approach that most OWL modelers take when they over-specify their ontology’s properties.
On our blog, we once briefly discussed the concept of elegance in ontologies. A key criterion was, “An ontology is elegant if it has the fewest possible concepts to cover the required scope with minimal redundancy and complexity.” Let’s take a deeper look at object properties in that light. First, a quick review of some of the basics.

  1. An ontology describes some subject matter in terms of the meaning of the concepts and relationships within that ontology’s domain.
  2. Object properties are responsible for describing the relationships between things.
  3. In the RDFS and OWL modeling languages, a developer can declare a property’s domain and/or its range (the class to which the Subject and/or Object, respectively, must belong). Domain and range for ontological properties are not about data integrity, but logical necessity. Misusing them leads to an inelegant (and unnecessary) proliferation of properties. Avoiding Property Proliferation 2

Break the Habit

In our many years’ experience teaching our classes on designing and building ontologies, we find that most new ontology modelers have a background in relational databases or Object-Oriented modelling/development. Their prior experience habitually leads them to strongly tie properties to classes via specific domains and ranges. Usually, this pattern comes from a desire to curate the triplestore’s data by controlling what is getting into it. But specifying a property’s domain and range will not (necessarily) do that.
For example, let’s take the following assertions:

  • The domain of the property :hasManager is class :Organization.
  • The individual entity :_Jane is of type class :Employee.
  • :_Jane :hasManager :_George.

Many newcomers to semantic technology (especially those with a SQL background) expect that the ontology will prevent the third statement from being entered into the triplestore because :_Jane is not declared to be of the correct class. But that’s not what happens in OWL. The domain says that :_Jane must be an :Organization, which presumably is not the intended meaning. Because of OWL’s Open World paradigm, the only real constraints are those that prevent us from making statements that are logically inconsistent. Since in our example we have not declared the :Organization and :Employee classes to be disjoint, there is no logical reason that :_Jane cannot belong to both of those classes. A reasoning engine will simply infer that :_Jane is also a member of the :Organization class. No errors will be raised; the assertion will not be rejected. (That said, we almost certainly do want to declare those
classes to be disjoint.)

Read More and Download the White-paper

White Paper by Dan Carey

Screwdrivers & Properties

Screwdrivers generally have only a small set of head configurations (flat, Phillips, hex) because the intention is to make accessing contents or securing parts easy (or at least uniform).Properties & Proliferations

Now imagine how frustrating it would be if every screw and bolt in your house or car required a unique screwdriver head.  They might be grouped together (for example, a bunch of different sized hex heads), but each one was slightly different.  Any maintenance task would take much longer and the amount of time spent just organizing the screwdrivers would be inordinate.

Yet that is precisely the approach that most OWL modelers take when they over-specify their ontology’s properties.

“Avoiding Property Proliferations – Part 1” discusses the pitfalls of habitually applying domains and ranges to properties.

Click here to download the whitepaper.

Binary Instances

Sometimes when we’re designing ontologies we’re faced with design choices that would lead us to create what we call “binary instances” or a situation where it will take the instantiation of two instances (often of different classes) in order to capture one concept.  For instance we may be considering creating a patient instance that is different from the corresponding person instance.

In an effort to move this design decision from the realm of arbitrary designers choice to something more principled, this article we explore the factors that go into a decision that leads to binary instances.

Some Examples

This section will outline some examples that we have come across, as it is often easier to work from a large pallet of examples than from abstractions.  Some of these examples may seem odd, some you may be surprised that anyone would consider them either one way or the other (binary or unary) but we have seen these at various times.

My guess is your background and predisposition will cause you to look at each one of these and say, either “obviously one instance” or “obviously two instances” but we suggest that any of these could go either way (a few are a bit of a stretch, but bear with it, we’re trying to make a point).  After the examples we introduce some principles that we think will lead to reasonably consistent decisions in this arena.

Statue v. Bronze

This is a classic philosophical argument.  What is the difference between the statue and the clay, or bronze.  The knee jerk reaction is to think they are two things, but consider: if you have a 10-pound statute made out of 10 pounds of bronze, when you go to ship it will you be charged for 20 pounds of freight or 10?

Person v. Employee

When you take on a job, are you two things (person and employee) or one thing (person who is an employee).  Hint: your employer and the Unemployment Insurance Agency are likely to come up with different answers for this one.

The Restrictions of Law v. The Text of Statute

If a lawmaker writes a law that says “it is illegal to turn right on a red light” and we model this.  What do we end up with?  Semantically the law is a restriction on behavior.  Tthere is a behavior (turning on the red) that the law intends to reduce the incidence of, either through cooperation or through punishment.  The question is: is the text of law (the literal words) its own object, separate from the meaning of the words.  If we are writing a text management system, or even a statute management system, there probably is only the text object (the system doesn’t care much about what the words mean).  However if we attempt to manage meaning, we need to consider that there are objects that represent the behavior we are interested in reducing, such that we could detect (via cameras say) behavior in the world that was in violation.  The question then becomes: is there one object that represents the restriction and a second that holds the text of the law, or is there just the restriction with a data type property that is the text?

A Creative Work v A Document

We know that there is a particular rendition of Moby Dick (in English or the Portuguese translation).  Certainly the English and Portuguese documents are different instances.  The real question is: is the recognition of the “work” (Moby Dick in the slightly abstract) a different instance, and do we need it dragging around with each rendition ( i.e. The Portuguese Moby Dick is a derivative of the creative work)

Government Organization v. Region Governed

When we speak of the Ukraine, are we referring to the governing body, which is an organization, or the region (recently diminished) that the government holds sway over.  Should we have one instance that represents the government and the region or two that are linked?

Specification v Model

When companies design and build products they often create specifications (is has 8 GB of memory, is 8 inches wide, and 2 inches tall, etc) and they also create “models” which they usually name (iPhone 6 for instance).  Is the specification a separate object from the model, or is there just one object?

Position v. Incumbent

Barack Obama is the President of the United States.  Is that two instances or one?

Actor v. Role

When Val Kilmer played Doc Holliday in Tombstone, was there one instance (Val Kilmer) who was a Person and was a role, or are there two instances, the role and the person?

Event v. Time Interval

We say an event is something that happened over a particular time interval.  So a particular concert, your attendance at the staff meeting Tuesday morning or World War II would all be considered events.  Each of course has a beginning and ending date and time.  The question is: is the time interval (May 22 from 9:00 AM to 10:00 AM) a separate instance from the staff meeting that occurred over that interval?

Diagnosis v. Disease

Up until the moment we are diagnosed with Cancer, or Diabetes, or even Toe nail fungus, we were unaware of our having the disease.  The diagnosis and the disease seem to coexist in most cases.  Are they two things or one?

Person v. Legal Person

We’ve seen systems that focus on the distinction between the flesh and blood person and the social artifact that is allowed to enter into contract. Two instances or one?

Organization v. Organization in Role

In some systems we’ve seen recently there is a distinction between an Organization (say Goldman Sachs) and an Organization in a Role (Goldman Sachs as an Underwriter v. Goldman Sachs as a Trader)

Contract Document v. Financial Agreement

Two parties agree to a complex financial transaction.  They paper it up with a contract that they sign.  If we model the essence of their agreement is it a separate instance from the written contract?  If not, how?

Person v. Patient

As a matter of history, your medical record is attached to your patient ID. If you’ve been to many medical institutions you have many patient IDs.  The question is, at any one of them are there two instances (Person and Patient) or one instance who is both Person and Patient?

Person v. Address

This one is hilarious.  Of course a person is separate from his or her address.  Except in almost every system ever built, where a persons address are merely attributes attached to the Person record.  When should we make the two distinct instances?

Planned Task v. Completed Task

If we plan a vacation, that is what we would call a Planned Event. We can book flights, hotels and the like and continue to add to this instance.  When we finally go on the vacation, we’ve created an actual or historical event.  Is there one event that changed state from planned to actual, or two events?

Person v. Sole Proprietor

Many independent contractors file tax returns as “Sole Proprietors” should we consider the person as a separate entity from the Sole Proprietor?

Part v. Catalog Item

Our definition of a Catalog Item, is the description of parts to a sufficient level of detail that a buyer would accept any item offered that met the description.  The Catalog Item typically has a part number, in retail a UPC.  The physical part also has the same UPC. Is the part a different item from the Catalog Item.

Customer v. (Person or Organization)

Is your customer, the person or organization that purchased your product or received your services, your customer, or is there another instance that represents your relationship with that entity?  Norms in your industry or limitations of your development environment probably color your answer here more than you think.

Relational technology makes it a relatively unnatural act to have say a Person table and an Organization table and then an order table with a foreign key to one or the other.  It’s far more “natural” in relational to have another table that represents the role of the customer.  Even if you have a “party” table, (which both the Person and the Organization extend) you have created another instance.  There is an id for each entry in the Party table, an id for each entry in the organization table (with a foreign key to the party) and an id for each entry in the person table (with a foreign key to the party).  Even without the role concept, there is an extra instance there.

Having a technology that allows us to have a single id to represent either a Person or Organization (Object Oriented or Semantic Technology) doesn’t get us completely out of the woods.  Now we could have the order refer directly to the Person or Organization.  Now the question becomes: should we?

I have been told by a data modeler from an Australian airline, that many of the people riding in an airplane are not customers.  The only ones they consider to be customers are those that belong to their frequent flyer program.  This makes some sense: they need to keep track of the miles and segments flown and accumulate them, only for the frequent flyers.  Additionally they incur obligations (to redeem balances for flights) but again only for the frequent flyers.

Pictorially

What we’re talking about is: are there two different things, that each have their own identity and properties, but that occur as a pair:

binary instances

Or is there really just one thing, and it is the conventions of our speech that make us think there are two things when really all the properties are on the one thing.

Historical Perspective

Very often design decisions are influenced by the tools that we use to implement solutions. We protest that our designs are independent of target architectures but years of designing databases and then converting them to relational DBMSs lead to thinking in design terms that more easily translate.

One implication is that relational DBMSs (and most Object Oriented languages) tend to see a class as a template for instances.  This has a tendency to suggest that instances that have properties not shared by most of the other instances should be shuttled off to another table.  This almost always ends up creating additional primary keys in other tables and therefore binary instances for anything that is in both tables.  Designed brought up on relational will be inclined to think of the Person and the Patient as two different instances (this isn’t wrong as much as it is an indication of how our experience shapes our design choices)

In an analogous fashion, Object Oriented developers often invoke the Decorator Pattern (from the Gang of Four Pattern Language).  In the decorator pattern, some functionality has been shuffled off to a companion object that performs some of the functionality.  People from this background will tend to see the decorator as a separate individual.

Principles

Our starting point is ten principles: the first principle is: if at all possible have one instance.  The next eight principles suggest circumstances where one instance is not appropriate.  The last one, we call the ambiguity trump, says even if the principles suggest two instances are needed to model the concept in question, you have a final override to say: in this domain we don’t care enough about the distinction and are willing to live with the ambiguity.

Principle 1 – Ockham’s Razor – “Entities should not be multiplied needlessly” The first principle here says the benefit of the doubt goes to simplicity.  If you can represent the concept adequately with one instance, then by all means do so.  This should be the starting point.  Start by imagining one instance.

A second consideration for sticking with one, even if you are tempted by previous designs, habits, industry norms etc., is: with a binary set of objects, each property (predicate) that is to be attached to the concept, must be attached to one or the other.  If you find it difficult to decide which of the two the property belongs on, and you end up making arbitrary choices, you should really consider sticking with one.

Principle 2 – Cardinality – There are two aspects of the concept, and you’re considering whether to devote an instance to each.  One of the trump concepts is: can you have more than one of one aspect for each one of the other.  This is trickier than it first sounds, because we have fooled ourselves a lot over time with the way we couch the question.  One of the more clear cases is Person and Sole Proprietor.  Normally “Joe Jones, the plumber” is “Joe Jones” and when he files his taxes as a Sole Proprietor, the proprietorship is Joe.  Certainly he doesn’t have the firewall that he would have had, had he incorporated.  “Joe Jones, LLC” is recognized as a separate entity, can contract on its own behalf, and can, at least in theory, and declare bankruptcy without bankrupting Joe.  So the corporate case clearly two or more instances.  But at first it would seem that the sole proprietor should fall back to principle 1. However, it turns out that Joe can have multiple Sole Proprietorships.  It doesn’t happen often, but the existence of this case, makes the case that there must be something different between Joe and his Sole Proprietorship.

Principle 3 —   Potential instance Separation   — Is it possible to separate the two aspects that are being potentially represented by two instances?  Can you have the statute without the bronze or vice versa? (probably not and this argues for one) Can you have a waterway without the river (seems like a dry riverbed would satisfy the waterway without being a river, argues for potential separation) can some properties only logically apply to one of the pair and not the other?

Principle 4 – Separate properties – are there properties that would apply only to one of the instances?  For instance a property like “annual rainfall” would apply to a country region but not to the country government.   Often the different properties are shining a light on something deeper: that there are really two different types of things yearning to be separated.  In the case of the customer v Person or Organization, when you start entertaining adding properties (number of segments flown, miles about to expire etc.) you may realize that the entity with the balances is actually an agreement.

Principle 5 – Behavioral Impact – do most (all?) real world behaviors that apply to one also apply to the other? If we end an employee (employment really) have we ended (killed) the person (no wonder so many people cringe at the thought of termination).

Principle 6 – Inference from Definition – if we have formal definitions for the classes that make sense and an inference engine infers one to be a subclass of the other, that makes a case for one instance.  If the formal definitions put the two in disjoint classes, that is a strong argument for two instances.

Principle 7 – Identify Function – is the way we establish whether we already have an instance different in one or the other of these?  The identity function is a set of properties that we use to figure out whether we already have a particular instance in our database.  For instance if the identity function for Person is SSN + Date of Birth, and so is the identity function for employee, then it argues for one instance (it may be that the identity functions are wrong, but it should at least have us pause to reflect)

Principle 8 Granularity – Sometimes the two instances are trying to represent different levels of specificity.  For instance the difference between a Product Model and a Catalog Item may be level of detail.  If there are so many Product Models (or so little variation offered) then the Product Model and Catalog Item are at the same granularity and could be considered one instance.  If however they are at different levels of detail it makes the case for two instances.

Principle 9 – Temporal Difference – if one instance can end independent of the other, that is if they have different lifetimes, it suggests two instances.

Principle 10 – Tolerating Ambiguity  — there are cases where the above analysis suggest that there should be, semantically there are, two instances, but in our domain we really don’t care.  For instance we may be convinced that the GeoRegion of a country is different from the organization that governs it, but for our application or domain, which will not exercise any of the properties that would highlight that difference, we may say we really don’t care.  In this case we would suggest created a supertype of the two classes, and instantiating the supertype. So for instance you may create the class of GeoPoliticalEntities as the union of GeoRegion and Government Organization.  Make your instances of the supertype.  What this does is two fold:

  • If you later decide that you do need to make a distinction, very few things you’ve built to date will be adversely affected. Anything that didn’t care whether you were talking about a region or a government will still not care after you make that distinction.
  • If you have to interface with applications or domains that do make the distinction you will have what you need to incorporate their distinctions without upsetting your part of the system.

Re-examining the examples in light of the principles

Let’s return to the examples we introduced in the beginning and see if the principles shine any light on them.  Note: there will still be situations and domains that come to different conclusions, but we think these will be the conclusion informed from the above principles

Design Example Proposal (one instance or two) Principled Evidence
Statue v. Bronze 1 Principle 1, if you steal the statue you’ve stolen the bronze.  They’re really inseparable.  Also principle 7, how we establish the identity of the item (say we have an RFID tag on the statute it is also identifying the bronze)
Person v. Employee 2 for employers, 1 for unemployment Principle 2 (you can have two jobs at a time) and principle 4 (your employee(ment) has a salary and seniority, you don’t, you have a birthday, your employee role doesn’t) and principle 9 (your job can end before you do) argue for 2 . However the Unemployment Division point of view argues for one.  A formal definition of someone who is employed (has at least one job) argues by principle 6 and the cardinality argument works the other way (your second job doesn’t alter the unemployment rate)
The Restrictions of Law v. The Text of Statute 2, and will have drug / drug interactions regardless of which patients you give the drugs.ed for one. ense tually an agreement. Principle 8, granularity, and principle 2 cardinality.  When we start to interpret the law and get it to the point that we can begin having systems make at least some initial determination of the legality of an action, we find that a given law is many restrictions and at many levels of detail.
A Creative Work v A Document 2 Principle 2 (many derivatives from a single work)
Government Organization v. Region Governed 2 Principle 3 (we can separate the government from the land, and the land area can change without changing the government (sorry Ukraine) and principle 4 there are properties (rainfall) that apply to one and not the other
Specification v Model 2 Principle 8 in most cases the specification is at a lower level of detail than the product model (color is typically not part of the product model, but is typically in the specification, and most product domains different colors of the same product are not equally interchangeable)
Position v. Incumbent 2 Principle 9 (the position usually outlives the incumbent) and also occasionally principle 2 (can have co-presidents, two people in one position)
Actor v. Role 2 Principle 2 (Greater Tuna where two actors played all the roles)
Event v. Time Interval 1 Principle 6 (if a time interval is defined as having a start and an end, and so is an event, the event is a time interval)
Diagnosis v. Disease 2 Even though they initially co-exist, they soon develop their own time lines (principle 9) and properties
Person v. Legal Person 1 Principle 1 the person is the legal person, there isn’t another entity to hide behind.  None of the other principles argues for 2.  Legal Person is a type of Person, except in the case where it means Organization and in that case they are separate because of principle 6, they are disjoint and can’t be the same.
Organization v. Organization in Role 1, unless there is something formal set up to establish the extra role Even though there is a bit of temptation from principle 9 it isn’t convincing.  If you participate as a buyer in one transaction and a seller in another are you three entities (yourself, you the buyer and you the seller) no not really.  Only if there is something formal set up.  In the airline industry the difference between a customer (has a role and therefore 2 entities) and a passenger (doesn’t one entity) is the frequent flyer agreement, where they are accumulating miles, getting various metal colors etc.
Contract Document v. Financial Agreement 1 Principle 1: the document is a representation of the agreement.  Where there are cardinality issues (the contract/ agreement contains many obligations) the cardinality is true of both, in the same way (if the contract has 6 obligations so does the agreement).
Person v. Patient 1 Principle 1. Unlike the cat with nine lives, the person that has 9 patient identities will die if any of them die, and will have drug / drug interactions regardless of which patients you give the drugs.
Person v. Address 2 Principle 1 Addresses are not attributes of people.  Addresses are attributes of buildings that people live in and work in which are obviously separate entities.
Planned Task v. Completed Task 1 for personal 2 for hospital, project management Principle 2 (cardinality) trumps for any organization that has to keep track of either multiple appointments for one visit, or multiple reschedulings for the same task.  Where that doesn’t apply (say your vacation plan or personal todo’s) you can just have one task that transitions from planned to actual by merely being done, in a way that is principle 10, suggesting their may be a difference in personal task management but we just don’t care .
Person v. Sole Proprietor 2 Principle 2 cardinality, since we can have multiple sole proprietorships, we need to allow for two.
Part v. Catalog Item 2 Principle 4: while they both appear to have some of the same characteristics (weight for instance) they aren’t really the same.  That is a structural similarity not a semantic similarity.  A catalog with parts that weight thousands of pounds can be picked up with a single hand.
Customer v (Person or Organization) 1 unless there is a separate agreement, then 2 Principle 4: it is the existence of a separate agreement (separate from the individual order) that is the second instance.  Really the second instance isn’t “customer” but “customer agreement.”  In the absence of a second agreement (Master agreement, frequent shopper agreement etc.) there is only need for one.

Greatest hits from the Data-Centric Manifesto

I was just reading through what some folks have written on the Data-Centric Manifesto web site. Thought I’d capture some of the more poignant:

“I believe [Linked] Data Centric approach is the way of the future. I am committing my company to assisting enterprises in their quest to Data-Centric transformation.” -Alex Jouravlev

 

“I have experienced first-hand in my former company the ravages of application-centric architectures. Development teams have rejected SQL-based solutions that performed 10 to 100 times better with less code and fewer resources, all because of application-centric dogma. Databases provide functional services, not just technical services – otherwise they’re not worth the money.” – Stew Ashton

 

“I use THE DATA-CENTRIC MANIFESTO as a mantra, a guide-line, a framework, an approach and a method, with which to add value as a consultant to large enterprises.” -Mark Besaans

 

“A data-centric approach will finally allow IT to really support the way we think and work instead of forcing us to think in capabilities of an application.” -Mark Schenk

 

“The principles of a data-centric approach would seem obvious, but the proliferation of application-centric implementations continues. Recognizing the difference is critical to positive change, and the benefits organizations want and need.” -Kim L Hoover

Data-centric is a major departure from the current application-centric approach to systems development and management. Migration to the data-centric approach will not happen by itself. It needs champions. If you’re ready to consider the possibility that systems could be more than an order of magnitude cheaper and more flexible, then become a signatory of the Data-Centric Manifesto.

Read more here.

Do Data Lakes Make My Enterprise Look Data-Centric?

Dave McComb discusses data lakes, schema, and data-centricity in his latest post on the Data Centric Revolution for The Data Administration Newsletter. Here’s a brief excerpt to pique your interest:The Data-Centric Revolution: Implementing a Data-Centric Architecture

“I think it is safe to say that there will be declared successes in the Data Lake movement. A clever data scientist, given petabytes of data to troll through, will find insights that will be of use to the enterprise. The more enterprising will use machine learning techniques to speed up their exploration and will uncover additional insights.

But in the broader sense, we think the Data Lake movement will not succeed in changing the economics or overall architecture of the enterprise. In a way, the Data Lake is something to do instead of dealing with the very significant problems of legacy ecosystems and dis-economics of change.

Even at the analytics level, where the Data Lake has the most promise, we think it will fall short…

Conceptually, the Data Lake is not far off from the Data Centric Revolution. The data does have a more central position. However, there are three things that a Data Lake needs in order to be Data Centric…”

Click here to read the entire article.

 

Data-Centric vs. Data-Driven

In this column, I am making the case for Data Centric architectures for enterprises.  There is a huge economic advantage to converting to the data-centric approach, but curiously few companies are making the transition. One reason may be the confusion of Data Centric with Data Driven, and the belief that you are already on the road to data centric nirvana, when in fact you are nowhere near it.

Data-Centric

Data-centric refers to an architecture where data is the primary and permanent asset, and applications come and go.  In the data-centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone.

Many people may think this is what happens now or what should happen.  But it very rarely happens this way.  Businesses want functionality, and they purchase or build application systems.  Each application system has its own data model, and its code is inextricably tied with this data model.  It is extremely difficult to change the data model of an implemented application system, as there may beThe Data-Centric Revolution millions of lines of code dependent on the existing model.

Of course, this application is only one of hundreds or thousands of such systems in an enterprise.  Each application on its own has hundreds to thousands of tables and tens of thousands of attributes. These applications are very partially and very unstably “interfaced” to one another through some middleware that periodically schleps data from one database to another.

The data centric approach turns all this on its head. There is a data model—a semantic data model (but more on that will be in a subsequent white paper)—and each bit of application functionality reads and writes through the shared model.  If there is application functionality that calculates suggested reorder quantities for widgets, it will make its suggestion, and add it to the shared database, using the common core terms.  Any other system can access the suggestions and know what they mean.  If the reordering functionality goes away tomorrow, the suggestions will still be there.

Click here to read more on TDAN.com

Debugging Enterprise Ontologies

Michael Uschold gave a talk at the International Workshop on Completing and Debugging the Semantic Web held in Crete on May 30, 2016.   Here is a preview of the white paper, “Finding and Avoiding Bugs in Enterprise Ontologies” by Michael Uschold:

Finding and Avoiding Bugs in Enterprise Ontologies

Abstract: We report on ten years of experience building enterprise ontologies for commercial clients. We describe key properties that an enterprise ontology should have, and illustrate them with many real world examples. They are: correctness, understandability, usability, and completeness. We give tips and guidelines for how best to use inference and explanations to identify and track down problems. We describe a variety of techniques that catch bugs that an inference engine will not find, at least not on its own. We describe the importance of populating the ontology with data to drive out more bugs. We point out some common ontology design practices in the community that lead to bugs in ontologies and in downstream semantic web applications based on the ontologies. These include proliferation of namespaces, proliferation of properties and inappropriate use of domain and range. We recommend doing things differently to prevent bugs from arising.

Introduction
In a manner analogous to software debugging, ontologies need to be rid of their flaws. The types of flaws to be found in an ontology are slightly different than those found in software, and revolve around the ideas of correctness, understandability, usability and completeness. We report on our experience (spanning more than a decade) in building and debugging enterprise ontologies for large companies in a wide variety of industries including: finance, healthcare, legal research, consumer products, electrical devices, manufacturing and digital assets. For the growing number of companies starting to use ontologies, the norm is to build a single ontology for a point solution in one corner of the business. For large companies, this leads to any number of independently developed ontologies resulting in many of the same heterogeneity problems that ontologies are supposed to solve. It would help if they all used the same upper ontology, but most upper ontologies are unsuitable for enterprise use. They are hard to understand and use because they are large and complex, containing much more than is necessary, or the focus is too academic to be of use in a business setting. So the first step is to start with a small, upper, enterprise ontology such as gist [McComb 2006], which includes core concepts relevant to almost any enterprise. The resulting enterprise ontology itself will consist of a mixture of concepts that are important to any enterprise in a given industry, and those that are important to a particular enterprise. An enterprise ontology plays the role of an upper ontology for all the ontologies in a company (Fig. 1). Major divisions will import and extend it. Ontologies that are specific to particular applications will, in turn, import and extend those. The enterprise ontology evolves to be the semantic foundation for all major software systems and databases that are core to the enterprise.

Click here to download the white paper.

Click here to download the presentation.

Evolve your Non-Temporal Database in Place

At Semantic Arts, we recently decided to upgrade our internal system to turn something that was a not temporal (our billing rates) into something that was. Normally, that would be a pretty big change.  As it turned out, it was pretty straightforward and could be done, as an in place update.  It turned out to be a pretty good mini case study for how using semantics and a graph database can make these kinds of changes far less painful.

So, Dave McComb documented it in a YouTube video.

 

Click here to view: Upgrade a non Temporal Database in Place

Introduction to FIBO Quick Start

We have just launched our “FIBO Quick start” offering.  If you are in the financial industry you likely have heard about the Financial Industry Business Ontology, which has beenFIBO championed by the EDM Council, a consortium of virtually the entire who’s who of the financial industry. We’ve been helping with FIBO almost since its inception, and more recently Michael Uschold has be co-leading the mortgage and loan ontology development effort.  Along the way we’ve done several major projects for financial clients, and have reduced what we know to a safe and quick approach to adopting semantics in the financial sector. We have the capacity to take on one more client in the financial space, so if you’re interested, by all means contact us.

FIBO Quick Start: Developing Business Value Rapidly with Semantics

The Financial Industry Business Ontology is nearing completion. As of June 2016, nine major financial institutions have joined the early adopter program. It is reasonable to expect that in the future all Financial Industry participants will have aligned some of their systems with FIBO. Most have focused their initial projects on incorporating the FIBO vocabulary. This is a good first step and can jump start a lot of compliance work.

But the huge winners, in our opinion, will be the few institutions that see the potential and go all-in with this approach. For sixteen years, we have been working with large enterprises who are interested in adopting semantic technology. Initially, our work focused on architecture and design as firms experimented with ways to incorporate these new approaches. More recently, we have been implementing what we call the “data-centric approach” to building semantically-centered systems in an agile fashion.

Click here to read more. 

Data-Centric and Model Driven

Model Driven Development

Model Driven seems to be enjoying a bit of an upsurge lately.  Gartner has recently been hyping (is it fair to accuse the inventors of the hype curve of hyping something? or is it redundant?) what they call “low code/ no code” environments.

Perhaps they are picking up on and reporting a trend, or perhaps they are creating one.

Model Driven Development has been around for a long time. To back fill what this is and where it came from, I’m going to recount my experience with Model Driven, as I think it provides a first person narrative for most of what was happening in the field at the time.

I first encountered what would later be called Model Driven in the early 80’s when CAD (Computer Aided Design—of buildings and manufactured parts) was making software developers jealous.  Why didn’t we have workbenches where we could generate systems from designs?  Early experiments coalesced into CASE (Computer Aided Software Engineering).  I was running a custom ERP development project in the early 80’s (on an ICL Mainframe!) and we ended up building our own CASE platform.  The interesting thing about that platform was that we built the designs on recently acquired 8-bit microcomputers, which we then pushed to a compatible framework on the mainframe.  We were able to iterate our designs on the PCs, work out the logistical issues, and get a working prototype UI to review with the users before we committed to the build.

The framework built a scaffold of code based on the prototype and indicated where the custom code needed to go.  This forever changed my perspective on how systems could and should be built.

What we built was also being built at the same time by commercial vendors (we did this project in Papua, New Guinea and were pretty out of the loop as to what was happening in mainstream circles).  When we came up for air, we discovered what we had built was being called “I-CASE” (Integrated Computer Aided Software Engineering), which referred to the integration of design with development (seemed like that was the idea all along).  I assume Gartner would call this approach “low code” as there still was application code to be written for the non-boiler-plate functionality.

Next stop on my journey through model driven was another ERP custom build.  By the late 80’s a few new trends had emerged.  One was CAD was being invaded by parametric modeling.  Parametric modeling recognizes that many designs of physical products did not need to be redesigned by a human every time a small change was made to the input factors.  A motor mount could be designed in such a way that a change to the weight, position, and torque would drive a new design optimized for those new factors. The design of the trusses for a basketball court could be automatically redesigned if the span, weight, or snow load changed and the design of big box retail outlets could be derived from, among other things: wind shear, maximum rainfall, and seismic potential.

The other trend was AI (remember AI?  Oh yeah, of course you remember AI, which you forgot about from the early 90’s until Watson and Google’s renaissance of AI).

Being privy to these two trends, we decided to build a parametric model of applications and have the code generation be driven by AI.  Our goal was to be able to design a use case on a post-it note. We didn’t quite achieve our goal.  Most of our designs were up to a page long.  But this was a big improvement over what was on offer at the time.  We managed to generate 97% of the code in this very sophisticated ERP system.  While it was not a very big company, I have yet to see more complex requirements in any system I have seen (lot based inventory, multi-modal outbound logistics, a full ISO 9000 compliant laboratory information management system, in-line QA, complex real time product disposition based on physical and chemical characteristics of each lot).

In the mid 90’s we were working on systems for ambulatory health care.  We were building semantic models for our domain.  Instead of parametric modeling we defined all application behavior in a scripting language called tcl. One day we drew on a white board where all the tcl scripts fit in the architecture (they defined the UI, the constraint logic, the schema, etc.) It occurred to us that with the right architecture, the tcl code, and therefore the behavior of the application, could be reduced to data.  The architecture would interpret the data, and create the equivalent of application behavior.

We received what I believe to be the original patents on fully model driven application development (patent number 6,324,682).  We eventually built an architecture that would interpret the data models and build user interfaces, constraints, transactions, and even schemas.  We built several healthcare applications in this architecture and were rolling out many more when our need for capital and the collapse of the .com bubble ended this company.

I offer this up as a personal history of the “low code / no code” movement.  It is not only real, as far as we are concerned, but its value is underrepresented in the hype.

Data-Centric Architecture

More recently we have become attracted to the opportunity that lies in helping companies become data-centric.  This data-centric focus has mostly come from our work with semantics and enterprise ontology development.

What we discovered is that when an enterprise embraces the elegant core model that drives their business, all their problems become tractable.  Integration becomes a matter or conforming to the core.  New system development becomes building to a much, much simpler core model.

Most of these benefits come without embracing model driven.  There is amazing economy in reducing the size of your enterprise data model by two orders of magnitude.

Click here to read more on TDAN.com