Sometimes when we’re designing ontologies we’re faced with design choices that would lead us to create what we call “binary instances” or a situation where it will take the instantiation of two instances (often of different classes) in order to capture one concept. For instance we may be considering creating a patient instance that is different from the corresponding person instance.
In an effort to move this design decision from the realm of arbitrary designers choice to something more principled, this article we explore the factors that go into a decision that leads to binary instances.
This section will outline some examples that we have come across, as it is often easier to work from a large pallet of examples than from abstractions. Some of these examples may seem odd, some you may be surprised that anyone would consider them either one way or the other (binary or unary) but we have seen these at various times.
My guess is your background and predisposition will cause you to look at each one of these and say, either “obviously one instance” or “obviously two instances” but we suggest that any of these could go either way (a few are a bit of a stretch, but bear with it, we’re trying to make a point). After the examples we introduce some principles that we think will lead to reasonably consistent decisions in this arena.
Statue v. Bronze
This is a classic philosophical argument. What is the difference between the statue and the clay, or bronze. The knee jerk reaction is to think they are two things, but consider: if you have a 10-pound statute made out of 10 pounds of bronze, when you go to ship it will you be charged for 20 pounds of freight or 10?
Person v. Employee
When you take on a job, are you two things (person and employee) or one thing (person who is an employee). Hint: your employer and the Unemployment Insurance Agency are likely to come up with different answers for this one.
The Restrictions of Law v. The Text of Statute
If a lawmaker writes a law that says “it is illegal to turn right on a red light” and we model this. What do we end up with? Semantically the law is a restriction on behavior. Tthere is a behavior (turning on the red) that the law intends to reduce the incidence of, either through cooperation or through punishment. The question is: is the text of law (the literal words) its own object, separate from the meaning of the words. If we are writing a text management system, or even a statute management system, there probably is only the text object (the system doesn’t care much about what the words mean). However if we attempt to manage meaning, we need to consider that there are objects that represent the behavior we are interested in reducing, such that we could detect (via cameras say) behavior in the world that was in violation. The question then becomes: is there one object that represents the restriction and a second that holds the text of the law, or is there just the restriction with a data type property that is the text?
A Creative Work v A Document
We know that there is a particular rendition of Moby Dick (in English or the Portuguese translation). Certainly the English and Portuguese documents are different instances. The real question is: is the recognition of the “work” (Moby Dick in the slightly abstract) a different instance, and do we need it dragging around with each rendition ( i.e. The Portuguese Moby Dick is a derivative of the creative work)
Government Organization v. Region Governed
When we speak of the Ukraine, are we referring to the governing body, which is an organization, or the region (recently diminished) that the government holds sway over. Should we have one instance that represents the government and the region or two that are linked?
Specification v Model
When companies design and build products they often create specifications (is has 8 GB of memory, is 8 inches wide, and 2 inches tall, etc) and they also create “models” which they usually name (iPhone 6 for instance). Is the specification a separate object from the model, or is there just one object?
Position v. Incumbent
Barack Obama is the President of the United States. Is that two instances or one?
Actor v. Role
When Val Kilmer played Doc Holliday in Tombstone, was there one instance (Val Kilmer) who was a Person and was a role, or are there two instances, the role and the person?
Event v. Time Interval
We say an event is something that happened over a particular time interval. So a particular concert, your attendance at the staff meeting Tuesday morning or World War II would all be considered events. Each of course has a beginning and ending date and time. The question is: is the time interval (May 22 from 9:00 AM to 10:00 AM) a separate instance from the staff meeting that occurred over that interval?
Diagnosis v. Disease
Up until the moment we are diagnosed with Cancer, or Diabetes, or even Toe nail fungus, we were unaware of our having the disease. The diagnosis and the disease seem to coexist in most cases. Are they two things or one?
Person v. Legal Person
We’ve seen systems that focus on the distinction between the flesh and blood person and the social artifact that is allowed to enter into contract. Two instances or one?
Organization v. Organization in Role
In some systems we’ve seen recently there is a distinction between an Organization (say Goldman Sachs) and an Organization in a Role (Goldman Sachs as an Underwriter v. Goldman Sachs as a Trader)
Contract Document v. Financial Agreement
Two parties agree to a complex financial transaction. They paper it up with a contract that they sign. If we model the essence of their agreement is it a separate instance from the written contract? If not, how?
Person v. Patient
As a matter of history, your medical record is attached to your patient ID. If you’ve been to many medical institutions you have many patient IDs. The question is, at any one of them are there two instances (Person and Patient) or one instance who is both Person and Patient?
Person v. Address
This one is hilarious. Of course a person is separate from his or her address. Except in almost every system ever built, where a persons address are merely attributes attached to the Person record. When should we make the two distinct instances?
Planned Task v. Completed Task
If we plan a vacation, that is what we would call a Planned Event. We can book flights, hotels and the like and continue to add to this instance. When we finally go on the vacation, we’ve created an actual or historical event. Is there one event that changed state from planned to actual, or two events?
Person v. Sole Proprietor
Many independent contractors file tax returns as “Sole Proprietors” should we consider the person as a separate entity from the Sole Proprietor?
Part v. Catalog Item
Our definition of a Catalog Item, is the description of parts to a sufficient level of detail that a buyer would accept any item offered that met the description. The Catalog Item typically has a part number, in retail a UPC. The physical part also has the same UPC. Is the part a different item from the Catalog Item.
Customer v. (Person or Organization)
Is your customer, the person or organization that purchased your product or received your services, your customer, or is there another instance that represents your relationship with that entity? Norms in your industry or limitations of your development environment probably color your answer here more than you think.
Relational technology makes it a relatively unnatural act to have say a Person table and an Organization table and then an order table with a foreign key to one or the other. It’s far more “natural” in relational to have another table that represents the role of the customer. Even if you have a “party” table, (which both the Person and the Organization extend) you have created another instance. There is an id for each entry in the Party table, an id for each entry in the organization table (with a foreign key to the party) and an id for each entry in the person table (with a foreign key to the party). Even without the role concept, there is an extra instance there.
Having a technology that allows us to have a single id to represent either a Person or Organization (Object Oriented or Semantic Technology) doesn’t get us completely out of the woods. Now we could have the order refer directly to the Person or Organization. Now the question becomes: should we?
I have been told by a data modeler from an Australian airline, that many of the people riding in an airplane are not customers. The only ones they consider to be customers are those that belong to their frequent flyer program. This makes some sense: they need to keep track of the miles and segments flown and accumulate them, only for the frequent flyers. Additionally they incur obligations (to redeem balances for flights) but again only for the frequent flyers.
What we’re talking about is: are there two different things, that each have their own identity and properties, but that occur as a pair:
Or is there really just one thing, and it is the conventions of our speech that make us think there are two things when really all the properties are on the one thing.
Very often design decisions are influenced by the tools that we use to implement solutions. We protest that our designs are independent of target architectures but years of designing databases and then converting them to relational DBMSs lead to thinking in design terms that more easily translate.
One implication is that relational DBMSs (and most Object Oriented languages) tend to see a class as a template for instances. This has a tendency to suggest that instances that have properties not shared by most of the other instances should be shuttled off to another table. This almost always ends up creating additional primary keys in other tables and therefore binary instances for anything that is in both tables. Designed brought up on relational will be inclined to think of the Person and the Patient as two different instances (this isn’t wrong as much as it is an indication of how our experience shapes our design choices)
In an analogous fashion, Object Oriented developers often invoke the Decorator Pattern (from the Gang of Four Pattern Language). In the decorator pattern, some functionality has been shuffled off to a companion object that performs some of the functionality. People from this background will tend to see the decorator as a separate individual.
Our starting point is ten principles: the first principle is: if at all possible have one instance. The next eight principles suggest circumstances where one instance is not appropriate. The last one, we call the ambiguity trump, says even if the principles suggest two instances are needed to model the concept in question, you have a final override to say: in this domain we don’t care enough about the distinction and are willing to live with the ambiguity.
Principle 1 – Ockham’s Razor – “Entities should not be multiplied needlessly” The first principle here says the benefit of the doubt goes to simplicity. If you can represent the concept adequately with one instance, then by all means do so. This should be the starting point. Start by imagining one instance.
A second consideration for sticking with one, even if you are tempted by previous designs, habits, industry norms etc., is: with a binary set of objects, each property (predicate) that is to be attached to the concept, must be attached to one or the other. If you find it difficult to decide which of the two the property belongs on, and you end up making arbitrary choices, you should really consider sticking with one.
Principle 2 – Cardinality – There are two aspects of the concept, and you’re considering whether to devote an instance to each. One of the trump concepts is: can you have more than one of one aspect for each one of the other. This is trickier than it first sounds, because we have fooled ourselves a lot over time with the way we couch the question. One of the more clear cases is Person and Sole Proprietor. Normally “Joe Jones, the plumber” is “Joe Jones” and when he files his taxes as a Sole Proprietor, the proprietorship is Joe. Certainly he doesn’t have the firewall that he would have had, had he incorporated. “Joe Jones, LLC” is recognized as a separate entity, can contract on its own behalf, and can, at least in theory, and declare bankruptcy without bankrupting Joe. So the corporate case clearly two or more instances. But at first it would seem that the sole proprietor should fall back to principle 1. However, it turns out that Joe can have multiple Sole Proprietorships. It doesn’t happen often, but the existence of this case, makes the case that there must be something different between Joe and his Sole Proprietorship.
Principle 3 — Potential instance Separation — Is it possible to separate the two aspects that are being potentially represented by two instances? Can you have the statute without the bronze or vice versa? (probably not and this argues for one) Can you have a waterway without the river (seems like a dry riverbed would satisfy the waterway without being a river, argues for potential separation) can some properties only logically apply to one of the pair and not the other?
Principle 4 – Separate properties – are there properties that would apply only to one of the instances? For instance a property like “annual rainfall” would apply to a country region but not to the country government. Often the different properties are shining a light on something deeper: that there are really two different types of things yearning to be separated. In the case of the customer v Person or Organization, when you start entertaining adding properties (number of segments flown, miles about to expire etc.) you may realize that the entity with the balances is actually an agreement.
Principle 5 – Behavioral Impact – do most (all?) real world behaviors that apply to one also apply to the other? If we end an employee (employment really) have we ended (killed) the person (no wonder so many people cringe at the thought of termination).
Principle 6 – Inference from Definition – if we have formal definitions for the classes that make sense and an inference engine infers one to be a subclass of the other, that makes a case for one instance. If the formal definitions put the two in disjoint classes, that is a strong argument for two instances.
Principle 7 – Identify Function – is the way we establish whether we already have an instance different in one or the other of these? The identity function is a set of properties that we use to figure out whether we already have a particular instance in our database. For instance if the identity function for Person is SSN + Date of Birth, and so is the identity function for employee, then it argues for one instance (it may be that the identity functions are wrong, but it should at least have us pause to reflect)
Principle 8 Granularity – Sometimes the two instances are trying to represent different levels of specificity. For instance the difference between a Product Model and a Catalog Item may be level of detail. If there are so many Product Models (or so little variation offered) then the Product Model and Catalog Item are at the same granularity and could be considered one instance. If however they are at different levels of detail it makes the case for two instances.
Principle 9 – Temporal Difference – if one instance can end independent of the other, that is if they have different lifetimes, it suggests two instances.
Principle 10 – Tolerating Ambiguity — there are cases where the above analysis suggest that there should be, semantically there are, two instances, but in our domain we really don’t care. For instance we may be convinced that the GeoRegion of a country is different from the organization that governs it, but for our application or domain, which will not exercise any of the properties that would highlight that difference, we may say we really don’t care. In this case we would suggest created a supertype of the two classes, and instantiating the supertype. So for instance you may create the class of GeoPoliticalEntities as the union of GeoRegion and Government Organization. Make your instances of the supertype. What this does is two fold:
- If you later decide that you do need to make a distinction, very few things you’ve built to date will be adversely affected. Anything that didn’t care whether you were talking about a region or a government will still not care after you make that distinction.
- If you have to interface with applications or domains that do make the distinction you will have what you need to incorporate their distinctions without upsetting your part of the system.
Re-examining the examples in light of the principles
Let’s return to the examples we introduced in the beginning and see if the principles shine any light on them. Note: there will still be situations and domains that come to different conclusions, but we think these will be the conclusion informed from the above principles
|Design Example||Proposal (one instance or two)||Principled Evidence|
|Statue v. Bronze||1||Principle 1, if you steal the statue you’ve stolen the bronze. They’re really inseparable. Also principle 7, how we establish the identity of the item (say we have an RFID tag on the statute it is also identifying the bronze)|
|Person v. Employee||2 for employers, 1 for unemployment||Principle 2 (you can have two jobs at a time) and principle 4 (your employee(ment) has a salary and seniority, you don’t, you have a birthday, your employee role doesn’t) and principle 9 (your job can end before you do) argue for 2 . However the Unemployment Division point of view argues for one. A formal definition of someone who is employed (has at least one job) argues by principle 6 and the cardinality argument works the other way (your second job doesn’t alter the unemployment rate)|
|The Restrictions of Law v. The Text of Statute||2, and will have drug / drug interactions regardless of which patients you give the drugs.ed for one. ense tually an agreement.||Principle 8, granularity, and principle 2 cardinality. When we start to interpret the law and get it to the point that we can begin having systems make at least some initial determination of the legality of an action, we find that a given law is many restrictions and at many levels of detail.|
|A Creative Work v A Document||2||Principle 2 (many derivatives from a single work)|
|Government Organization v. Region Governed||2||Principle 3 (we can separate the government from the land, and the land area can change without changing the government (sorry Ukraine) and principle 4 there are properties (rainfall) that apply to one and not the other|
|Specification v Model||2||Principle 8 in most cases the specification is at a lower level of detail than the product model (color is typically not part of the product model, but is typically in the specification, and most product domains different colors of the same product are not equally interchangeable)|
|Position v. Incumbent||2||Principle 9 (the position usually outlives the incumbent) and also occasionally principle 2 (can have co-presidents, two people in one position)|
|Actor v. Role||2||Principle 2 (Greater Tuna where two actors played all the roles)|
|Event v. Time Interval||1||Principle 6 (if a time interval is defined as having a start and an end, and so is an event, the event is a time interval)|
|Diagnosis v. Disease||2||Even though they initially co-exist, they soon develop their own time lines (principle 9) and properties|
|Person v. Legal Person||1||Principle 1 the person is the legal person, there isn’t another entity to hide behind. None of the other principles argues for 2. Legal Person is a type of Person, except in the case where it means Organization and in that case they are separate because of principle 6, they are disjoint and can’t be the same.|
|Organization v. Organization in Role||1, unless there is something formal set up to establish the extra role||Even though there is a bit of temptation from principle 9 it isn’t convincing. If you participate as a buyer in one transaction and a seller in another are you three entities (yourself, you the buyer and you the seller) no not really. Only if there is something formal set up. In the airline industry the difference between a customer (has a role and therefore 2 entities) and a passenger (doesn’t one entity) is the frequent flyer agreement, where they are accumulating miles, getting various metal colors etc.|
|Contract Document v. Financial Agreement||1||Principle 1: the document is a representation of the agreement. Where there are cardinality issues (the contract/ agreement contains many obligations) the cardinality is true of both, in the same way (if the contract has 6 obligations so does the agreement).|
|Person v. Patient||1||Principle 1. Unlike the cat with nine lives, the person that has 9 patient identities will die if any of them die, and will have drug / drug interactions regardless of which patients you give the drugs.|
|Person v. Address||2||Principle 1 Addresses are not attributes of people. Addresses are attributes of buildings that people live in and work in which are obviously separate entities.|
|Planned Task v. Completed Task||1 for personal 2 for hospital, project management||Principle 2 (cardinality) trumps for any organization that has to keep track of either multiple appointments for one visit, or multiple reschedulings for the same task. Where that doesn’t apply (say your vacation plan or personal todo’s) you can just have one task that transitions from planned to actual by merely being done, in a way that is principle 10, suggesting their may be a difference in personal task management but we just don’t care .|
|Person v. Sole Proprietor||2||Principle 2 cardinality, since we can have multiple sole proprietorships, we need to allow for two.|
|Part v. Catalog Item||2||Principle 4: while they both appear to have some of the same characteristics (weight for instance) they aren’t really the same. That is a structural similarity not a semantic similarity. A catalog with parts that weight thousands of pounds can be picked up with a single hand.|
|Customer v (Person or Organization)||1 unless there is a separate agreement, then 2||Principle 4: it is the existence of a separate agreement (separate from the individual order) that is the second instance. Really the second instance isn’t “customer” but “customer agreement.” In the absence of a second agreement (Master agreement, frequent shopper agreement etc.) there is only need for one.|