White Paper: Shedding Light on the “Shared Services” Conversation

Although there are at least seven levels of granularity to “shared services,” little time has been spent to categorize these.

My observation is that although there are at least seven levels of granularity to “shared services,” little time has been spent to categorize these. Please refer to the illustration below. The degree of sharing runs a gamut from the most sharing most at the top to the least at the bottom. Mostly the higher levels of sharing imply the levels below, but that’s only most of the time, not all the time.

The colors could come in handy later to help visualize sharing by function and by agency in a large matrix. An example might help–let’s say we were trying to sort out shared services in the area of the motor pool. Let’s go through each level:

The motor pool example doesn’t quite do justice to the distinction between the application front end and the application back end, which we think may end up being the significant difference.

A larger and more traditional application may showcase that difference better. Let’s take payroll. When most people talk about HR as a shared service they are talking about sharing the application (there hasn’t been much discussion about the possibility of rebadging HR employees or relocating them) So assume we’re just talking about the HR application, there is still an extra degree of sharing to discuss; front end or back end. Traditionally when you implement a package, like SAP, most everyone affected has to learn the new application. It has new screens, new terminology, new work flow, new exceptions and new conventions. It requires new interfaces to existing systems in the field. This is why packaged implementations cost so much. The software isn’t very expensive. The literal installation and configuration doesn’t take all that much effort. It is the number and degree to which people, processes and other systems are impacted
that runs the price tags up. For most of the agencies we have been involved with, HRMS was a wrenching conversion. Many have still not recovered to their previous level of productivity. But at least one agency that we know of had a pretty easy go of it. This is because they had built an app they called HR Café. HR Café was the interface that everyone in the agency knew and used. HR Café implemented many of their local idiosyncrasies. Almost no one had direct access to the old Payroll system. So when HRMS came up, the agency just changed the interface from HR Café so that it now interacted with HRMS, and there was very little collateral damage. The back end of HRMS was shared and not the front end. In this case, the good result was sort of an inadvertent result of some other good decisions that were taken. But we think this approach can be generalized with a tremendous amount of economic benefit.

Download the White-paper

White Paper: The Distinctionary

Encyclopedias are generally not intended to help with definition. An encyclopedia is useful in that once you know what something means, you can find out what else is known about it.

Semantics is predicated on the idea of good definitions. However, most definitions are not very good. In this essay we’re going to explore why well-intentioned definitions miss the mark and propose an alternate way to construct definitions. We call this alternate way the “distinctionary.”

Dictionary Definitions

The dictionary creates definitions for words based on common and accepted usage. Generally, this usage is culled from reputable,distinctionary published sources. Lexicographers comb through existing uses of a word and create definitions that describe what the word means in those contexts. Very often this will give you a reasonable understanding for many types of words. This is why dictionaries have become relatively popular and sometimes even bestsellers. However, it is not nearly enough. In the first place, there is not a great deal of visibility in attaching the definitions to their source. There’s a very casual relationship between the source of the definition and the definition itself

Perhaps the larger problem is that the definition describes but it does not discern. In other words, if there are other terms or concepts that are close in meaning, this type of definition would not necessarily help you distinguish between them.

Thesauri Definitions

Another way to get at meaning is through a thesaurus. The trouble with a thesaurus is that it is a connected graph of similar concepts. This is helpful if you are overusing a particular word and would like to find a synonym, or if you want to search for a similar word with a slightly different concept. But again, it does very little good actually describing the differences between the similar terms.

WordNet

WordNet is an online searchable lexicon that in some ways is similar to a thesaurus. The interesting and important difference is that in WordNet there are six or seven relationship links between terms and each has a specific meaning. So whereas in a thesaurus the two major links between terms are the synonym and antonym links, in other words, similar to and not similar to, in WordNet there are links that define whether one term is a proper subtype of another term, whether one term is a part of another term, etc. This is very helpful, and it takes us a good way toward definitions that make a difference.

Taxonomies

A rigorous taxonomy is a hierarchical arrangement of terms where each subterm is a proper subtype of the parent term. A really good taxonomy includes rule in and rule out tests to help with the placement of items in the taxonomy. Unfortunately, few good taxonomies are available but they do form a good starting point for rigorous definitions.

Ontologies

An ontology, as Tom Gruber pointed out, is a specification of a conceptualization,. A good ontology will have not only the characteristics of a good taxonomy, with formal subtyping and rules for inclusion and exclusion, it will also include other more complex inference relationships. The ontology as well as the taxonomy also has the powerful notion of “committing to” the ontology. With a dictionary definition there’s no formal concept of the user committing to the meaning as defined by the source authority for the term. However, we do find this in taxonomies and ontologies.

The Distinctionary

The preceding lays out a landscape of gradually increasing rigor in the tools we use for defining and managing the terms and concepts we employ. We’re going to propose one more tool not nearly as comprehensive or rigorous as a formal taxonomy or ontology, but which we have found to be very useful in the day to day task of defining and using terms: the distinctionary.

The distinctionary is a glossary. It is distinct from other glossaries in that it is structured such that a term is first placed as a type of a broader term or concept and then a definition is applied which would distinguish this particular term or concept from its peers.

Eventually, each of the terms or concepts referred to in a distinctionary definition, i.e., “this term is a subtype of another one,” would also have to have their own entry in the distinctionary. But in the short term and for practical purposes we have to agree that there is some common acceptance of some of the terms we use.

A Few Examples

I looked up several definitions of the word “badger” In this case I was looking for the noun, the mammal. I remembered that a badger was an animal but I couldn’t remember what kind of animal, so I thought maybe the dictionary would help. Here is what I found:

Badger:

Merriam Webster:

1 a: any of various burrowing mammals (especially Taxidea taxus and Meles meles) that are related to the weasel and are widely distributed in the northern hemisphere

Encarta:

a medium-sized burrowing animal that is related to the weasel and has short legs, strong claws, and a thick coat. It usually has black and white stripes on the sides of its head.

Cambridge Advanced Learners Dictionary:

an animal with greyish brown fur, a black and white head and a pointed face, which lives underground and comes out to feed at night

American Herritage:

1. Any of several carnivorous burrowing mammals of the family Mustelidae, such as Meles meles of Eurasia or Taxidea taxus of North America, having short legs, long claws on the front feet, and a heavy grizzled coat.

Websters Dictionary (1828 Edition)

1. In law, a person who is licensed to buy corn in one place and sell it in another, without incurring the penalties of engrossing.

2. A quadruped of the genus Ursus, of a clumsy make, with short, thick legs, and long claws on the fore feet. It inhabits the north of Europe and Asia, burrows, is indolent and sleepy, feeds by night on vegetables, and is very fat. Its skin is used for pistol furniture; its flesh makes good bacon, and its hair is used for brushes to soften the shades in painting. The American badger is called the ground hog, and is sometimes white.

Encyclopedia Definitions

Columbia Encyclopedia

name for several related members of the weasel family. Most badgers are large, nocturnal, burrowing animals, with broad, heavy bodies, long snouts, large, sharp claws, and long, grizzled fur. The Old World badger, Meles meles, is found in Europe and in Asia N of the Himalayas; it is about 3 ft (90 cm) long, with a 4-in. (10-cm) tail, and weighs about 30 lb (13.6 kg). Its unusual coloring, light above and dark below, is unlike that of most mammals but is found in some other members of the family. The head is white, with a conspicuous black stripe on each side. European badgers live, often in groups, in large burrows called sets, which they usually dig in dry slopes in woods. They emerge at night to forage for food; their diet is mainly earthworms but also includes rodents, young rabbits, insects, and plant matter. The American badger, Taxidea taxus, is about 2 ft (60 cm) long, with a 5-in. (13-cm) tail and weighs 12 to 24 lb (5.4–10.8 kg); it is very short-legged, which gives its body a flattened appearance. The fur is yellowish gray and the face black, with a white stripe over the forehead and around each eye. It is found in open grasslands and deserts of W and central North America, from N Alberta to N Mexico. It feeds largely on rodents and carrion; an extremely swift burrower, it pursues ground squirrels and prairie dogs into their holes, and may construct its own living quarters 30 ft (9.1 m) below ground level. American badgers are solitary and mostly nocturnal; in the extreme north they sleep through the winter. Several kinds of badger are found in SE Asia; these are classified in a number of genera. Badgers are classified in the phylum Chordata, subphylum Vertebrata, class Mammalia, order Carnivora, family Mustelidae.

Wikipedia

is an animal of the typical genus Meles or of the Mustelidae, with a distinctive black and white striped face – see Badger (animal). [Badger Animal] Badger is the common name for any animal of three subfamilies, which belong to the family Mustelidae: the same mammal family as the ferrets, the weasels, the otters, and several other types of carnivore.

Firstly, I intentionally picked a very easy word. Specific nouns like this are among the easiest things to define. I could have picked “love” or “quantum mechanics” or a verb like “generate” if I wanted to make this hard. As a noun, the definition of this word would be greatly aided by (although not completed by) a picture.

Let’s look at what we got. First, all the definitions establish that a badger is an animal, or mammal. Anyone trying to find out what a badger was could easily be assumed to know what those two terms are. Most rely on Latin genus/species definitions, which is not terribly helpful. If you already know the precise definition of these things then you know what a badger is. Worse, many of them are imprecise in their references:“especially Taxidea taxus and Meles meles.”What is that supposed to mean?

Some of the more useful parts of these definitions are “burrowing” and “carnivorous.” However, these don’t actually distinguish badgers from, say, skunks, foxes or anteaters. “Weasle-like” is interesting, but we don’t know in what way they are like weasels. Indeed some of these definitions would have you think they were weasels.

Encyclopedias are generally not intended to help with definition. An encyclopedia is useful in that once you know what something means, you can find out what else is known about it. However, these encyclopedia entries are much better at defining “badger” than the dictionary definitions. (By the way, a lot of the encyclopedia information will make great “rule in/rule out” criteria.)

I had to include the 1828 definition, if only for its humor value. In the first place, the first definition is one that, less than 200 years later, is now virtually extinct. The rest of the definition seems to be in good form, but mostly wrong (“genus ursus” [bears] “feeds on vegetables,” “ground hog”) or irrelevant (“pistol furniture” and “brushes to soften the shades in painting”).

So what would the distinctionary entry look like for badger? I’m sad to say, even after reading all this I still don’t know what a badger is. Structurally the definition would look something like this:

A badger is a mammal. It is a four legged, burrowing carnivore. It is distinct from other burrowing carnivores in that [this is the part I still don’t know, but this part should distinguish it from close relatives (weasels and otters) as well as more distant burrowing carnivores, such as fox and skunk]. Its most distinguishing feature is two white stripes on the sides of its head.

The point of the distinctionary is to help us keep from getting complacent about our definitions. In the everyday world of glossaries and dictionaries, most definitions sound good, but when you look more closely you realize that they hide as much ignorance as they reveal. As you can see from my above attempt at a distinctionary entry for badger, it’s pretty hard to cover up ignorance.

White Paper: Semantic Profiling

Semantic profiling is a technique using semantic-based tools and ontologies in order to gain a deeper understanding of the information being stored and manipulated in an existing system.

Semantic Profiling

In this paper we will describe an approach to understanding the data in an existing system through a process called semantic profiling.

What is semantic profiling?

Semantic profiling is a technique using semantic-based tools and ontologies in order to gain a deeper understanding of the information being stored and manipulated in an existing system. This approach leads to a more systematic and rigorous approach to the problem and creates a result that can be correlated with profiling efforts in other applications.

Why would you want to do semantic profiling?

The immediate motivation to do semantic profiling is typically either a system integration effort, a data conversion effort, a new data warehousing project, or, more recently, a desire to use some form of federated query in order to pull together enterprise-wide information. Each of these may be the initial motivator for doing semantic profiling but the question still remains: why do semantic profiling rather than any of the other techniques that we might do? To answer that let’s look at each of the typically employed techniques:

  • Analysis. By far, the most common strategy is some form of “analysis.” What this usually means is studying existing documentation and interviewing users and developers about how the current system works and what data is contained in it. From this the specification for the extraction or interface logic is designed. This approach, while popular, is fraught with many problems. The most significant is that very often what the documentation says and what the users and developers think or remember is not a very high fidelity representation of what will actually be found when one looks deeper.
  • Legacy understanding. The legacy understanding approach is to examine the source code of the system that maintains the current data and, from the source code, deduce the rules that are being applied to the data in the current system. This can be done by hand for relatively small applications. We have done it with custom analysis tools in some cases and there are commercial products from companies like Relativity and Merant that will automate this process. The strength of this approach is that it makes explicit some of what was implicit, and it’s far more authoritative than the documentation. The code is what’s being implemented; the documentation is someone’s interpretation of either what should have been done or their idea of what was done. While legacy understanding can be helpful, it’s generally expensive and time-consuming and still only gives a partial answer. The reason it only gives a partial answer is that there are many fields in most applications that have relatively little system enforcement of data values. Most fields with text data and many fields with dates and the like have very little system enforced validation. Over time users have adapted their usage and procedures have been refined to fill in missing semantics for the system. It should be noted though that the larger the user base the more useful legacy understanding is. In a larger user base, relying on informal convention becomes less and less likely, because the scale of the system means that users would have had to institutionalize their conventions, which usually means systems changes.
  • Data profiling. Data profiling is a technique that’s been popularized by vendors of data profiling software such as Evoke, Ascential and Firstlogic. This process relies on reviewing the existing data to determine and uncover anomalies in the databases. These tools can be incredibly useful in finding areas where the content of the existing system is not what we would have expected it to be. Indeed, the popularity of these tools stems largely from the almost universal surprise factor when people are shown the content of their existing databases that they were convinced were populated only with clean, scrubbed data of high integrity, only to find a gross number of irregularities. While we find data profiling very useful, we find that it doesn’t go far enough. In this paper we’ll outline a procedure that adds on and takes it further.

So how is semantic profiling different?

The first difference is that semantic profiling is more rigorous. We will get into exactly why this is in the section on how to do semantic profiling but the primary difference is that with data profiling you can search for and catalog as many anomalies as you like. After you’ve found and investigated five strange circumstances in a database you can stop. It is primarily an aid to doing other things and as such you can take it as far as you want. With semantic profiling, once you select a domain of study you are pretty much committed to take it “to ground.” The second main difference is that the results are reusable. Once you’ve done a semantic profile on one system, if you do a profile on another system the results of the first system will be available and can be combined with those from the second system. This is extremely useful in environments where you are attempting to draw information from multiple sources to pull into one definitive source; whether that is for data warehousing or EII (Enterprise Information Integration). And finally the semantic profiling approach sets up a series of testable hypotheses that can be used to monitor a system as it continues in production, to detect semantic drift.

What you’ll need

For this exercise you will need the following materials:

  • A database to be studied, with live or nearly live data. You can’t do this exercise with developer-created test data.
  • Data profiling software. Any of the major vendors’ products will be suitable for this. It is possible for you to roll your own, although this can be a pretty time consuming exercise.
  • A binding to your database available to the data profiling software. If your database is in a traditional relational form with an ODBC or JDBC access capability then that’s all you need. If your data is in some more exotic format you will need an adapter.
  • Meta-data. You will need access to as much as you can find about the official meta-data for the fields under study. This may be in a data dictionary, it may be in a repository, it may be in the copy books; you may have to search around a bit for it.
  • An ontology editor. You will be constructing an ontology based on what you find in the actual data. There are a number of good ontology editors; however, for our purposes Protégé from Stanford, a freeware version, should be adequate for most versions.
  • An inferencing engine. While there are many proprietary inferencing engines, we strongly advocate adopting one based on the recent standards RDF and OWL. There are open and freeware versions, such as open RDF or Kowari.
  • A core ontology. The final ingredient is a starting point ontology that you will use to define concepts as you uncover them in your database. For some applications this may be an industry reference data model such as HL7 for health care. However, we are advocating the use of what we call the semantic primes as the initial starting point. We’ll cover the semantic primes in another white paper or perhaps in a book. However, they are a relatively small number of primitive concepts that are very useful in clarifying your thinking regarding other concepts.

How to proceed

Overall, this process is one of forming and testing hypotheses about the semantics of the information in the extant database.

The hypotheses being formed concern both the fidelity and the precision of the definition of the items as well as uncovering and defining the many hidden subtypes that lurk in any given system.

This business of “running to ground” means that we will continue the process until every data item is unambiguously defined and all variations and subtypes have been identified and also unambiguously defined.

The process begins with some fairly simple hypotheses about the data, hypotheses that can be gleaned directly from the meta-data. Let’s say we notice in the data dictionary that BB104 has a data type of date or even that it has a mask of MMDDYYYY. We hypothesize that it is a date and further, in our case, our semantic prime ontology forces us to select between a historical date or a planned date. We select historical. We add this assertion to our ontology. The assertion is that BB104 is of type historical date. We run the data profiling and find all kinds of stuff. We find that some of the “historical dates” are in the future. So, depending on the number of future dates and other contextual clues we may decide that either our initial assignment was incorrect and these actually represent planned dates, some of which are in the past because the plans were made in the past, or, in fact, that most of these dates are historical dates but there are some records in this database of a different type. Additionally, we find some of these dates are not dates at all. This begins an investigation to determine if there’s a systemic pattern to the dates that are not dates at all. In other words, is there a value in field BB101, BB102, or BB103 that correlates with the non-date values? And if so, does this create a different subtype of record where we don’t need a date?

In some cases we will uncover errors that are just pure errors. We found that in some cases data validation rules had changed over time and that older records had different and anomalous values. And in some cases people have, on an exception basis, used system level utilities to “repair” data records and in some cases they create these strange circumstances. In cases where we uncover what is finally determined to be genuine errors, rather than semantically defining them we should be creating a punch list for both correcting them and correcting their cause if possible or necessary.

Meanwhile, back to the profiling exercise. As we discover subtypes with different constraints on their date values, we introduce these into the ontology we’re building. In order to do this, as we are documenting our date, we need to further qualify the date. What is it the date of? For instance, if we determine that it is, in fact, a historical date, what event was recorded on that date? As we hypothesize and deduce this we add it to the ontology and the information that this BB104 date is the occurred on date for the event that we described. As we find that the database has some records with legitimate historic dates and others with future dates and we find some correlation with another value, we hypothesize that, indeed, there are two types of historical events or perhaps even some historical events mixed with some planned events or planned activities. What we then do is define these as separate concepts in the ontology with the predicate for defining eligibility in the class. To make it simple, if we found that BB101 had one of two values, either P or H, we may hypothesize that H meant historical and P meant planned and we would say that the inclusion criteria for the planned events is that the value of BB101 equals P. This is a testable hypothesis. At some point the ontology becomes rich enough to begin its own interpretation. We load the data either directly from the database or, more likely, from the profiling tool as instances in the RDF inferencer. The inferencing engine itself can then challenge class assignment; can detect inconsistent property values; etc. We proceed in this fashion until we have unambiguously defined all the semantics of all the data in the area under question.

Conclusion

Having done this, what do you have? You have an unambiguous description of the data as it exists and a set of hypotheses against which you can test any new data to determine whether it agrees or not. But more simply, you know exactly what you have in your database if you were to perform a conversion or a system integration. More interestingly, you also have the basis for a set of rules if you wanted to do a combined integration of data from many sources. You would know, for instance, that you would need to apply a predicate from records in certain databases to exclude those that do not match the semantic criteria that you want to use from the other system. Say you want to get a “single view of the customer.” You will need to know, of all the many records in all your many systems that say or allude to customer data; which ones really are customers; which ones are channel partners or prospects or various other partners that might be included in some file. You need a way to unambiguously define that or system wide integration efforts are going to fall flat. We believe this to be the only rigorous and complete approach to this problem. While it is somewhat complex and time-consuming, it delivers a reasonable value and contributes to the predictability of other efforts which are often incredibly unpredictable.

Written by Dave McComb

White Paper: The Enterprise Ontology

At the time of this writing almost no enterprises in North America have a formal enterprise ontology. Yet we believe that within a few years this will become one of the foundational pieces to most information system work within major enterprises. In this paper, we will explain just what an enterprise ontology is, and more importantly, what you can expect to use it for and what you should be looking for, to distinguish a good ontology from a merely adequate one.

What is an ontology?

An ontology is a “specification of a conceptualization.” This definition is a mouthful but bear with me, it’s actually pretty useful. InEnterprise Ontology general terms, an ontology is an organization of a body of knowledge or, at least, an organization of a set of terms related to a body of knowledge. However, unlike a glossary or dictionary, which takes terms and provides definitions for them, an ontology works in the other direction. An ontology starts with a concept. We first have to find a concept that is important to the enterprise; and having found the concept, we need to express it in as precise a manner as possible and in a manner that can be interpreted and used by other computer systems. One of the differences between a dictionary or a glossary and ontology is, as we know, dictionary definitions are not really processable by computer systems. But the other difference is that by starting with the concept and specifying it as rigorously as possible, we get definitive meaning that is largely independent of language or terminology. Then the definition states that an ontology is a “specification of a conceptualization.” That is what we just described. In addition, of course, we then attach terms to these concepts, because in order for us humans to use the ontology we need to associate the terms that we commonly use.

Why is this useful to an enterprise?

Enterprises process great amounts of information. Some of this information is structured in databases, some of it is unstructured in documents or semi structured in content management systems. However, almost all of it is “local knowledge” in that its meaning is agreed within a relatively small, local context. Usually, that context is an individual application, which may have been purchased or may have been built in-house. One of the most time- and money-consuming activities that enterprise information professionals perform is to integrate information from disparate applications. The reason this typically costs a lot of money and takes a lot of time is not because the information is on different platforms or in different formats – these are very easy to accommodate. The expense is because of subtle, semantic differences between the applications. In some cases, the differences are simple: the same thing is given different names in different systems. However, in many cases, the differences are much more subtle. The customer in one system may have an 80 or 90% overlap with the definition of a customer in another system, but it’s the 10 or 20% where the definition is not the same that causes most of the confusion; and there are many, many terms that are far harder to reconcile than “customer.” So the intent of the enterprise ontology is to provide a “lingua franca” to allow, initially, all the systems within an enterprise to talk to each other and, eventually, for the enterprise to talk to its trading partners and the rest of the world.

Isn’t this just a corporate data dictionary or consortia of data standards?

The enterprise ontology does have many similarities in scope to both a corporate data dictionary and consortia data standard. The similarity is primarily in the scope of the effort: both of those initiatives, as well as enterprise ontologies, aim to define the shared terms that an enterprise uses. The difference is in the approach and the tools. With both a corporate data dictionary and a consortia data standard the interpretation and use of the definitions is strictly by humans, primarily system designers. Within an enterprise ontology, the expression of the ontology is such that tools are able to interpret and make inferences on the information when the system is running.

Download the White-paper to Read More.

Written by Dave McComb

White Paper: Categories and Classes

Getting the categories and classes distinction right is one of the key drivers of the cost of traditional systems.

We’ve been working with two clients lately, both of whom are using an ontology as a basis for their SOA messages as well as the design of their future systems. As we’ve been building an ontology for this purpose we became aware of a distinction that we think is quite important, we wanted to formalize it and share it here. In an ontology there is no real distinction that I know of between a class and a category. That is: classes are used for categorizing, and you categorize things into classes. If you wanted to make a distinction, it might be that category is used more in the verb form as something you do, and the class is the noun form.

Categories and Classes in Traditional Apps

But back in the world of traditional applications there is a quite significant difference (although again I don’t believe this difference has ever been elaborated). In a traditional (relational or object oriented) application if you just wanted to categorize something, (say by gender: male and female) you would create a gender attribute and depending on how much control you wanted to put on its content you
would either create an enum, a lookup table or just allow anything. On the other hand if you wanted behavioral or structural differences between the categories (let’s say you wanted to distinguish sub
contractors from employees) you would set up separate classes or tables for them, potentially with different attributes and relationships. We’ve been studying lately what drives the cost of traditional systems,
and getting this category/class distinction right is one of the key drivers. Here’s why: in a traditional system, every time you add a new class you have increased the cost and complexity of the system. If you
reverse engineer the function point methodology, you’ll see that the introduction of a new “entity” (class) is the single biggest cost driver for an estimate. So every distinction that might have been a class, that
gets converted to a category, provides a big economic payoff. It’s possible to overdo this. If you make something a category that should have been a class, you end up pushing behavior into the application code, which generally is even less tractable than the schema. So we were interested in coming up with some guidelines for when to make a distinction a category and when to make it a class.

Categories and Classes in gist

As it turns out, we had foreshadowed this distinction, although not for this reason, in gist, our upper ontology. Gist has a class called “category” whose intent is to carry categorical distinctions (from one lower level ontology to another) without necessarily carrying their definitions. For instance when we worked with a State Department of Transportation, we had a class in their enterprise ontology called “Roadside Feature.” A Roadside Feature has properties such as location and when by what process it was recorded. Several of their applications had specific roadside features, for instance “fire hydrants.” In the application fire hydrant is a class, and therefore is one in the application ontology. But in the enterprise ontology “fire hydrant” is an instance of the category class. Instances of fire hydrant are members of the roadside feature class at the enterprise ontology level, and associated with the category “fire hydrant” via a property “categorizedBy.” A fire hydrant can therefore be created in an application and communicated to another application that doesn’t know the definition of fire hydrant, with almost no loss of information. The only thing that is lost on the receiving end is the definition of fire hydrant, not any of the properties that had been acquired by this fire hydrant.

Download the White-paper to read more.

Written by Dave McComb

White Paper: The Seven Faces of Dr. “Class”

The Seven Faces of Dr. “Class”: Part 1

“Class” is a heavily overloaded term in computer science. Many technologies have implemented the concept slightly differently. In this paper we look at the sum total of concepts that might be implemented under the banner of “class” and then later we’ll look at how different technologies have implemented subsets.

The seven facets are:

  • Template
  • Set
  • Query
  • Type
  • Constraint
  • Inclusion
  • Elaboration

Template

One aspect of a class is to act as a “template” or “cookie cutter” for creating new instances. This is also called a “frame” based system, where the template sets up the frame in which the slots (properties) are defined. In the simplest case, say in relational where we define a table with DDL (Data Definition Language) we are essentially saying ahead of time what attributes a new instance (tuple) of this class (table) can have. Object Oriented has this same concept, each instance of a class can have the attributes as defined in the class and its superclasses.

Set

A class can be seen as a collection of all the instances that belong to the set. Membership could be extensional (that is instances are just asserted to be members of the class) or intensional (see below under the discussion about the inclusional aspect). In the template aspect, it’s almost like a caste system, instances are born into their class and stay there for their lifetime. With set-like classes an instance can be simultaneously members of many sets. One of the things that is interesting is what we don’t say about class membership. With sets, we have the possibility that an instance is either provably in the set, provably not in the set, or satisfiably either.

Query

Classes create an implied query mechanism. When we create instances of the Person class, it is our expectation that we can later query this class and get a list of the currently know members of the class. In Cyc classes are called “collections” which reflect this idea that a class is, among other things, a collection of its members. A system would be pretty useless if we couldn’t query the members of a class. We separate the query facet out here to shine a light on the case where we want to execute the query without previously having defined the class. For instance if we tag photos in Flickr with a folksonmy, and someone later wants to find a photo that had a combination of tags, a class, in the traditional sense was not created, unless you consider that the act of writing the query is the act of creating the class, and in which case that is the type of class we’re talking about here. This is primarily the way concept like taxonomies such as SKOS operate: tags are proxies for future classes.

Type

Classes are often described as being types. But the concept of “type” despite being bandied about a lot is rarely well defined. The distinction we’re going to use here is one of behavior. That is, it is the type aspect that sets up the allowable behavior. This is a little clearer in implementations that have type and little else, like xsd. It is the xsd type “date” that sets up the behavior for evaluating before or after or concurrent. And it is the polymorphism of types in object oriented that sets up the various behaviors (methods) that an object can respond to. It is the “typeness” of a geographicalRegion instance that allows us to calculate things like its centroid and where the overlap or boundary is with another geographicalRegion. We rarely refer to the class of all items that have xsd:date as if it were a collection, but we do expect them to all behave the same.

Constraint

Constraints are generally implemented as “guards” and prevent noncompliant instances from being persisted. There is no reason that the constraints need to be associated with the classes, they could easily be written separately and applied to instances, but many implementations do package constraints with the class definition, for two reasons: one the constraints are naturally written in and lexically tied to the cl ass definition and the other is just for packaging around the concept of cohesion. The constraint can be a separate language (as with OCL the Object Constraint Language) or may be an extension to the class definition (as ranges and foreign key constraints are in relational).

Inclusion

That is, inclusion criteria. This is for classes that support inference, and are the rules that determine whether an instance is a member of the class, or whether all members of a class are necessarily members of another class. It also includes exclusion criteria, as they are just inferred membership in the complement. While it is conceivable to think of the “open world” without inclusion criteria, it really comes to the fore when we consider inclusion criteria. Once we have rules of inclusion and exclusion from a set, we have set up the likelihood that we will have many instances that are neither provably members or provably not members, hence “satisfiability.”

Elaboration

Elaboration is what else can be known about an item once one knows its class membership. In Object Oriented you may know things about an instance because of the superclasses it is also a member of, but this is a very limited case: all of this elaboration was known at the time the instance was created. With more flexible systems, as an instance creates new membership, we know more about it. For instance, let’s say we use a passport number as evidence of inclusion in the class of Citizens, and therefore the class of People, we can know via elaboration that the passport holder has a birthday (without knowing what their birthday is). To the best of our knowledge, there is no well supported language and or environment that supports all these facets well. As a practical consequence designers select a language implement the aspects that are native and figure out other strategies for the remaining facets. In the next installment of this series, we will examine how popular environments satisfy these aspects, and what we need to do to shore up each.

Click here to read part 2. 

Written by Dave McComb