Time Zones

Reflections on low-level ontology primitives.

We had a workshop last week on gist (our minimalist upper ontology). As part of the aftermath, I decided to get a bit more rigorous about some of the lowest level primitives. One of the basic ideas about gist is that you may not be able to express every distinction you might want to make, but at least what you do exchange through gist will be understood and unambiguous. In the previous version of gist I had some low level concepts, like distance, which was a subtype of magnitude. And there was a class distanceUnit which was a subclass of unitOfMeasure. And unit of measure has a property that points to conversion factor (i.e., how to convert from one unit of measure to the base unit of that “dimension”). But what occurred to me just after the workshop is that two applications or two organizations communicating through gist could still create problems by just picking a different base (i.e., if one said their base for distance was a meter and another a foot, they have a problem).

This was pretty easily solved by going to NIST, and getting the best thinking on what these dimensions should be and what the base unit of each dimension should be. Looking at it, I don’t think there ought to be much problem with people adopting these. Emboldened, I thought I would do the same for time.

For starters, universal time seems to the way to go. However, many applications record time in local time so we need some facility to recognize that and provide an offset. Here’s where the problem came in and maybe you dear readers can help. After about an hour of searching the web the best I could find for a standard in this area is something called the tz database. While you can look up various cities, I didn’t see anything definitive on what the geographical regions are that make up each of the time zones. To make things worse, the abbreviations for time zones are not unique, for instance, there is an EST in North America and one in Australia. If anyone has a thought in this area, I’m all ears.

Semantisize

Semantic technology resources

I was alerted to this site: www.semantisize.com from a comment. It’s pretty cool. You can while away a lot of time on this site which is rounding up lots of podcasts, videos, etc., all related to Semantic Technology. I got a kick out of a video of Eric Schmidt taking a question from the floor on “What is Web 3.0?” Schmidt’s answer: “I think you [the questioner] just made it up.”

Part 6: Definitions are even more important than terms are

In recent posts, we stated that while terms are less important than concepts, and they mean nothing from a formal semantics perspective, they are very important for socializing the ontology. The same is true for text definitions, but even more so. Just like terms, the text definitions and any other comments have zero impact on the inferences that will be sanctioned by the ontology axioms. However, from the perspective of communicating meaning (i.e. semantics) to a human being, they play a very important role. Many of the people that want to understand the enterprise ontology to will mainly be looking at the terms and the text definitions, and never see the axioms. Text definitions help the human get a better idea of the intended semantics for a term, even for those that choose to view the axioms as well. For those interested in the axioms, the text helps clarify the meaning and makes it possible to spot errors in the axioms. For example, the text may imply something that conflicts with or is very different from with what the axioms say. The text definitions also say things that are too difficult or are unnecessary to say formally with axioms. Other comments that are not definitions, but that should be included in the ontology include: examples and counter examples, things that are true about a concept, but that are not part of defining it. Collectively all this informal text that is hidden from the inference engine contributes greatly to human understanding of the ontology, which is on the critical path to putting the ontology to use.

Read Next:

Part 2: Don’t let terms get in the way!

It frequently happens that a group of experts use a term so differently that they just cannot agree on a single meaning or definition.  This problem arises in spades in the area of ‘risk’. For example, in traditional operational risk management (ORM), when you measure risk, you multiply the probability of a loss times the amount of the loss.  In the modern view of ORM, risk is a measure of loss at a level of uncertainty. The modern definition of risk requires both exposure and uncertainty[1].  So you get two different numbers if you measure risk from these different perspectives. One can go round and round with a group of experts trying to agree on a definition of ‘risk’ and generate a lot of heat with little illumination.   But, when we change our perspective from the term, and instead start looking for underlying concepts that everyone agrees on we don’t have to look very far.  When we found them, we expressed them in simple non-technical terms to minimize ambiguity.   Here they are:

  1. Something bad might happen
  2. There is a likelihood of the bad thing happening
  3. There are undesirable impacts whose nature and severity varies (e.g. financial, reputational)
  4. There is a need to take steps to reduce the likelihood of the bad thing happening, or to reduce the impact if it does happen.

After many discussions and no agreement on a definition the term, ‘risk’, we wrote down these four things and asked the experts: “when you are talking about risk, are you always talking about some combination of these four things”?  “Yes” was unanimous. The experts differ on how to combine them and what to call them. For example, the modern view and the traditional view of risk each combine these underlying concepts in different ways to define what they mean by ‘risk’.  In the modern view, if the probably of loss is 100%, there is no risk because there is no uncertainty.   The concept that is called ‘risk’ in the traditional view, is called ‘expected loss’ in the modern view, but it is the same underlying concept. Compared to wading through the muck and the mire of trying to agree on terms, focusing on the underlying concepts using simple non-jargon terms is like a hot knife going through cold butter. Terms get in the way of a happy marriage too!  How many times have you disagreed with your partner on the meaning of a word?  It’s more than just semantics, it’s often emotional too.   I believe we are all divided by a common language, in that no two people use words to mean exactly the same thing, even everyday words like “support” or “meeting”.    I have learned that it is easier to learn and use the language of my spouse than it is to convince her that the term I use is the right one (despite the seductive appeal of the latter).


[1] “A New Approach for Managing Operational Risk Addressing the Issues Underlying the 2008 Global Financial Crisis”  Sponsored by: Joint Risk Management Section, Society of Actuaries,  Canadian Institute of Actuaries, and Casualty Actuarial Society

For further reading, refer to Michael Uschold’s additional posts in this series.

Read Next:

Application Scope: A Fractal View

The scope of an application is one of those topics that seems to be quite important and yet frustratingly slippery.

There are several very good reasons for its seeming importance. One, the scope of an application eventually determines its size and, therefore, the cost to develop and/or implement it. It’s also important in that it defines what a given team is responsible for and what they have chosen to leave undone or be left for others to complete. But it’s slippery because our definitions of scope sound good; we have descriptive phrases about what is in and what is out of scope.

But as a project progresses, we find initially a trickle and eventually a tidal wave of concerns that seem to be right on the border between what is inside and outside scope. Also, one of the most puzzling aspects of an application’s scope is the way a very similar scope description for one organization translates into a drastically different size of implementation project when applied to another organization. In this paper, we explore what scope really is, the relationship between it and the effort to implement a project, and how an analogy to fractal geometry will have some bearing and perhaps be an interesting metaphor for thinking about application scope issues.

What is Fractal Geometry?

Fractal geometries are based on recursive functions that describe a geometrical shape. For instance, in the following illustrations we see a simple geometric shape, a triangle. In the second figure, we see a shape that’s been defined by a function which says, “in the middle of each side of the triangle put another triangle; and in the middle of the side of that triangle put another, and yet another, etc.” We’ve all seen the beautiful, multicolored, spiral shapes that have been created with the Mandelbrot sets. These are all variations of the fractal geometry idea. There are several things of interest about fractal geometry besides their aesthetic attractiveness. The main thing to note in this example, and in most others like this, is that while the contained area of the shape grows a little bit with each iteration of the function or with each increase in resolution of the shape, the perimeter continues to grow at a constant rate. In this case, with each iteration, the perimeter gets 33% longer. You can see that on any one side which was three units long, we could remove the middle unit and replace it with two equal length units. So we now have four thirds of the perimeter on that one side; this is repeated on all sides, and is repeated at every level. If we were to continue to run the recursion, the detail would get finer and finer until the resolution would be such that we would no longer be able to see all the details of the edges. And yet, as we magnify the image we would see that in fact the detail was there. Fractal geometricians sometimes say that Britain has an infinitely long coastline. The explanation is that if you took a low resolution map of Britain and measured the coastline you would get a particular figure. But as you increase the resolution of your map, you find many crags and edges and inlets that the previous resolution had omitted. If you were to add them up, the total coastline would now be greater. And if you went to an even higher resolution, it would be greater again and supposedly ad infinitum. So as we see from this, the perimeter of a fractal shape is a matter of the resolution that we use to observe and measure it. Hold that thought.

Application Size

When we talk about the size of application, we are usually concerned with some aspect of that application that could change in such a way that our effort to develop or implement it would change. For instance, a common size metric for applications has been number of lines of code. A system with one million lines of code is considered in many ways ten times as large as one with 100,000 lines of code. That is a gross simplification but let’s start there. It would take ten times as long to write one million lines of code as it would to write 100,000 lines of code.

Similarly, it would take ten times as long, on average, to modify part of a system that has one million lines of code as opposed to one with 100,000 lines of code. But lines of code was a reasonable metric only when most of our systems were written by hand, in procedural code, in languages that were at least comparable in their expressiveness.

Nowadays, a given system will be built of handwritten code plus generated code plus graphically designed components, style sheets, inherited code, libraries, etc. The general concepts of size and complexity still exist, but we are more likely to find the complexity in some other areas. One of the main areas where we find complexity in information systems is the size and variety of the user interfaces. For instance, a system with 100 different user interfaces can be thought of as one-tenth as complex as one with 1000 different user interfaces.

This relationship held with the lines of code metric in the old days because each screen was written by hand; the more screens, the more lines of code. Nowadays, although the connection between the screens and the lines of code may not be as valid because much of the code is generated, there is still ten times as much complexity in designing the user interfaces, laying them out, discussing them with users, etc. Other metrics that tend to indicate larger sized applications are APIs (application programming interfaces).

An API is a function call that the system has made public and can be used by other systems. An application API could be almost anything. It could be get_ quote (order_number) or it could be post_ ledger (tran). The size of applications that support APIs tends to be proportional to the number of APIs they have published. This is partly because each API must have code behind it to implement the behavior described, and partly because the act of creating the signature and the interface requires definition, design, negotiating with other system users, and maintaining and upgrading these interfaces.

Finally, one other metric that tends to move proportionally with the size of application systems is the size of their schema. The schema is the definition of the data that is being maintained by the application. Classically, it is the tables and columns in the database. However, with object oriented or XML style systems, the schema will have a slightly different definition.

An application for a database with 10,000 items in its schema (which might be 1,000 tables and 9,000 columns in total), all other things being equal, will very often be ten times as complex as an application with 1,000 items in its schema.. This is because each of those individual attributes is there for some reason and typically has to be moved from the database to a screen, or an API, or an algorithm, and has to be moved back when it’s been changed, etc.

Surface Area v Volume

Now, imagine an application as a shape. It might be an amorphous amoeba-like shape or it might be a very ordered and beautiful snowflake-like shape that could have been created from fractal geometry. The important thing to think about is the ratio of the surface area to the volume of the application. My contention is that most business applications have very little volume. What would constitute volume in a business application would be algorithms or rules about what is allowed and permissible. And while it may first seem that this is the preponderance of what you find when you look at a business application, the truth is far from that.

Typically, 1 or 2% of the code of a traditional business application is devoted to algorithms and/or true business rules. The vast majority is dealing with the surface area itself. To try another analogy, you may think of a business system as very much like a brain, where almost all the cognitive activity occurs on the surface and the interior is devoted to connecting regions of the surface. This goes a long way towards explaining why the surface of the brain is a convoluted and folded structure rather than a smooth organ like a liver.

Returning to our business application, let’s take a simple example of an inventory control system. Let’s say in this example that the algorithms of the application (the “volume”) have to do economic order quantity analysis, in other words, calculate how fast inventory is moving and determine, therefore, how much should be reordered and when. Now, imagine this is a very simple system. The surface area is relatively small compared to the volume and would consist of user interfaces to process receipts, to process issues, perhaps to process adjustments to the inventory, and a resultant inventory listing.

The schema, too, would be simple and, in this case, would have no API. Here is where it becomes interesting. This exact same system, implemented in a very small company with a tiny amount of inventory, would be fairly small and easy to develop and implement. Now take the same system and attempt to implement it in a large company.

The first thing you’ll find is that whatever simplistic arrangement we made for recording where the inventory was would now have to be supplemented with a more codified description, such as bin and location; for an even larger company that has automated robotic picking, we might need a very precise three-dimensional location.

To take it further, a company that stores its inventory in multiple geographic locations must keep track of not only the inventory in each of these locations but also the relative cost and time to move inventory between locations. This is a non-issue in our simplistic implementation.

So it would go with every aspect of the system. We would find in the large organization enough variation in the materials themselves that would warrant changes in the user interfaces and the schema to accommodate them, such as the differences between raw materials and finished goods, between serialized parts and non-serialized parts, and between items that do and do not have shelf life. We would find the sheer volume of the inventory would require us to invent functionality for cycle counting, and, as we got large enough, special functionality to deal with count errors, to hold picking while cycle count was occurring, etc.

Each of these additions increases the surface area, and, therefore, the size of the application. I’m not suggesting at all that this is wasteful. Indeed, each company will choose to extend the surface area of its application in exactly those areas where capital investment in computerizing their process will pay off. So what we have is the fractal shape becoming more complex in some areas and not in others, very dependent on the exact nature of the problem and the client site to which it is being implemented.

Conclusion

Once we see an application in this light, that is, as a fractal object with most of its size in or near its perimeter or its surface, we see why the “same” application implemented in organizations of vastly different size will result in applications of vastly different size.. We also see why a general description of the scope of an application may do a fair job of describing its volume but can miss the mark wildly when describing its surface area and its perimeter.

To date, I know of no technique that allows us to scope and estimate the surface area of an application until we’ve done sufficient design, nor to analyze return on investment to determine just how far we should go with each aspect of the design. My intention with this white paper is to make people aware of the issue and of the sensitivity of project estimates to fine details that can rapidly add up.

Part 5: Good terms are important for socializing the ontology

In my previous post, I explained why, when building an enterprise ontology, it is a good idea to focus on concepts first and to decide on terms later. Today, we will discuss what to do when ‘later’ arrives. But first, if terms don’t matter from a logic perspective, why do we care about them? In short, they are essential for learning and understanding the ontology.

This is true even if there is a single developer, who should be able to immediately know the meaning of a term, at a glance, not having to rely on memory. It is more important if there are multiple developers. However, the most important reason to have good terms is because without them, it is nearly impossible for anyone else to become familiar with the ontology, which in turn severely limits its potential for being used. How do we choose good terms for an enterprise ontology? An ideal term is one that is strongly suggestive of its meaning and would be readily understood by anyone in the enterprise who needs to know what is going on and is unfamiliar with terms tied to specific applications.

A term being strongly suggestive of its meaning just makes things easier for everyone. It requires that not only that the term is (or could be with minimal disruption) commonly use across the enterprise to express the concept it is naming, but also that the same term is not used for a variety of other things too. Such ambiguity is the enemy. Because an enterprise ontology is designed to represent the real world in the given enterprise independent from its applications, it is important to the terms are independent from any particular application.

This is easier said than done, as the terminology of a widely used application in an enterprise often becomes the terminology of the enterprise in general. Individuals in the enterprise forget that various terms are tied to a particular application and vendor just like we forget that ‘Kleenex’ is tied to a particular brand and manufacturer. Also, because the enterprise ontology is intended for use across the whole enterprise, it is not a good idea to use jargon terms that are only understood by specialists in a given area, and will likely be confusing to others. Future applications that are based on the enterprise ontology can introduce local terms that are understood by the narrower group of people.

To reap the most rewards from the enterprise ontology in the long term, it is important to explicitly link the terms in the application to the concepts in the enterprise ontology. This way, the terms in the application effectively become synonyms for the terms in the ontology reflecting the mapped concepts.

Read Next:

Part 4: Identify the underlying concepts

aIn the previous posts in the series, we discussed how it is important to focus on the concepts first and then the terms. Today we discuss identifying what the central concepts are in the enterprise. Every enterprise typically has a small handful of core concepts that all the other concepts hinge on. For retail manufacturing, it is all about products and specifications (of products) which leads to manufacturing. For health care, everything hinges on patients and providing care (to patients) which is the driver for diagnosis and treatment procedures. But how do we identify those core concepts? Unfortunately, there is no magic answer.

The trick is to get into beginners mind and start asking basic questions. Sometimes it takes a while before it is clear what the core concepts are. One good sign that you have them is that everything seems to click nicely into place. It is the distilled essence of a complex web of ideas. Once identified, this small handful of concepts becomes the glue for holding the enterprise ontology together as well as the basis for the story of explaining and socializing it to stakeholders when it is ready.

Read Next:

Part 3: Concepts first, then terms

In my previous blog, I described how for very broad and general terms, it can be nearly impossible to get a roomful of experts to agree on a definition of the term. However, it can be relatively easy to identify a small set of core concepts that everyone agrees are central to what they are talking about when they use that particular term.

In this blog, we explore the role of concepts vs. terms in the ontology engineering process more broadly; that is focusing on all terms, not just the more challenging ones. First of all, is it important to understand the role of terms when building an ontology in a formal logic formalism such as OWL. Basically, they don’t matter. Well, that’s not quite true. What is true is that from the perspective of the logic, the formal semantics and the behavior of any inference engine that uses the ontology, they don’t matter.

You could change any term in an ontology, or all of them, and logically, the ontology is still exactly the same. A rose by any other name, smells just as sweet. You can call it ‘soccer’ or ‘football’, but it is still the same game. So, especially in the early stages of building the ontology, it is important to focus first on getting the concepts nailed, and to defer any difficult discussions about terms. Of course, you have give the concept a name, so you can refer to it in the ontology, or to anyone else interested in the ontology. If there is a handy name that people are generally happy with, then just use that. When no good terms come to mind, then use a long descriptive term that is suggestive of the meaning, and get back to it later.

Spectrograph

Last night at the EDM Council meeting, Dave Newman from Wells Fargo used spectroscopy as an analogy to the process used to decompose business concepts into their constituent parts. The more I’ve been thinking about it the more I like it. Spectrograph Last week I was staring at the concept “unemployment rate” as part of a client project. Under normal light (that is, using traditional modeling approaches) we’d see “unemployment rate” as a number, probably attach it to the thing that the unemployment rate was measuring, say the US Economy, and be done with it. But when we shine the semantic spectrometer at it, the constituent parts start to light up. It is a measurement. Stare at those little lines a bit harder: the measurement has a value (because the term “value” is overloaded, in gist we’d say it has a magnitude) and the magnitude is a percentage. Percentages are measured in ratios and this one (stare a little harder at the spectrograph) is the ratio of two populations (in this case, groups of humans). One population consists of those people who are not currently employed and who have been actively seeking employment over the past week, and the other is the first group plus those who are currently employed. These two populations are abstract concepts until someone decides to measure unemployment. At that time, the measurement process has us establish an intensional group (say residents of Autuaga County, Alabama) and perform some process (maybe a phone survey) of some sample (a sub population) of the residents. Each contacted resident is categorized into one of three sub sub populations (currently working, currently not working and actively seeking work, and not working and not actively seeking work). Note: there is another group that logically follows from this decomposition, is not of interest to the Bureau of Labor Standards, but is of interest to recruiters: working and actively seeking employment. Finally, the measurement process dictates whether the measure is a point in time or an average of several measures made over time. This seems like a lot of work for what started as just a simple number. But look at what we’ve done: we have a completely non-subjective definition of what the concept means. We have a first class concept that we can associate with many different reference points, for example, the same concept can be applied to National, State, or Local unemployment. An ontology will organize this concept in close proximity to other closely-related concepts. And the constituent parts of the model (the populations for instance) are now fully reusable concepts as well. The other thing of interest is that the entire definition was built out of reusable parts (Magnitude, Measurement, Population, Measurement Process, Residence, and Geographic Areas) that existed (in gist) prior to this examination. The only thing that needed to be postulated to complete this definition was what would currently be two taxonomic distinctions: working and seeking work. David, thanks for that analogy.Last night at the EDM Council meeting, Dave Newman from Wells Fargo used spectroscopy as an analogy to the process used to decompose business concepts into their constituent parts. The more I’ve been thinking about it the more I like it. Last week I was staring at the concept “unemployment rate” as part of a client project. Under normal light (that is, using traditional modeling approaches) we’d see “unemployment rate” as a number, probably attach it to the thing that the unemployment rate was measuring, say the US Economy, and be done with it. But when we shine the semantic spectrometer at it, the constituent parts start to light up. It is a measurement. Stare at those little lines a bit harder: the measurement has a value (because the term “value” is overloaded, in gist we’d say it has a magnitude) and the magnitude is a percentage. Percentages are measured in ratios and this one (stare a little harder at the spectrograph) is the ratio of two populations (in this case, groups of humans). One population consists of those people who are not currently employed and who have been actively seeking employment over the past week, and the other is the first group plus those who are currently employed. These two populations are abstract concepts until someone decides to measure unemployment. At that time, the measurement process has us establish an intensional group (say residents of Autuaga County, Alabama) and perform some process (maybe a phone survey) of some sample (a sub population) of the residents. Each contacted resident is categorized into one of three sub sub populations (currently working, currently not working and actively seeking work, and not working and not actively seeking work). Note: there is another group that logically follows from this decomposition, is not of interest to the Bureau of Labor Standards, but is of interest to recruiters: working and actively seeking employment. Finally, the measurement process dictates whether the measure is a point in time or an average of several measures made over time. This seems like a lot of work for what started as just a simple number. But look at what we’ve done: we have a completely non-subjective definition of what the concept means. We have a first class concept that we can associate with many different reference points, for example, the same concept can be applied to National, State, or Local unemployment. An ontology will organize this concept in close proximity to other closely-related concepts. And the constituent parts of the model (the populations for instance) are now fully reusable concepts as well. The other thing of interest is that the entire definition was built out of reusable parts (Magnitude, Measurement, Population, Measurement Process, Residence, and Geographic Areas) that existed (in gist) prior to this examination. The only thing that needed to be postulated to complete this definition was what would currently be two taxonomic distinctions: working and seeking work. David, thanks for that analogy.

The Profound Effect of Linked Data on Enterprise Systems

Jan Voskuil, of Taxonic, recently sent me this white paper. It is excellent.  It’s so good I put it in the taxonomic category, “I wish I would have written this”. But since I didn’t, I did the next best thing.  Jan has agreed to let us host a copy here on our web site. Enjoy a succinct 18 page white paper suitable for technical and non technical audiences.  Very nice explanation of why the flexible data structures of linked data are making such a profound difference in the cost of changing enterprise systems.

Download whitepaper by Jan Voskuil – Linked Data in the Enterprise