SPARQL: Changing Instance URIs

In a prior blog (SPARQL: Updating the URI of an owl:Class in place) we looked into how to use SPARQL to rename a class in a triple store.  The main steps are below. We showed how to do this for the example of renaming the class veh:Auto to veh:Car.

  1. change the instances of the old class to be instances of the new class
  2. replace the triples where the class is used in either the subject or object of the triple
  3. look around for anywhere else the old class name is used, and change accordingly.

The last step addresses the fact that there are a few other things that you might need to do to address all the consequences of renaming a class.   Today we will see how to handle the situation where your instances use a naming convention that includes the name of the class.  Let’s say the instances of Car (formerly Auto) are all like this:  veh:_Auto_234 and veh:_Auto_12. We will want to change them to be like: veh:_Car_234.

The main steps are:

  1. Figure out how you are going to use SPARQL string operations to create the new URI given an old URI.
  2. Replace triples using the oldURI in the object of a triple.
    1. Determine where the oldURI is used as the object in a triple, and use CONSTRUCT to preview the new triples using the results of step 1.
    2. Use DELETE and INSERT to swap out the old triples with the new URI in the object.
  3. Replace triples using the oldURI in the subject of a triple.
    1. Determine where the oldURI is used as the subject in a triple, and use CONSTRUCT to preview the new triples using the results of step 1.
    2. Use DELETE and INSERT to swap out the old triples with the new URI in the subject

In practice, we do step 1 and step 2a at the same time.  We find a specific instance, and filter on just that one (e.g. veh:_Auto_234) to keep things simple. Because we will be using strings to create URIs, we have to spell the namespaces out in full, or else the URI will incorrectly contain the string “veh:” instead the expanded form, which is: “http://ontologies.myorg.com/vehicles#”.

CONSTRUCT {?s ?p ?newURI}
WHERE {?oldURI rdf:type veh:Car .
       ?s ?p ?oldURI.
       FILTER (?oldURI in (veh:_Auto_234))
       BIND (URI(CONCAT ("http://ontologies.myorg.com/vehicles#_Car_",
                         STRAFTER (STR(?oldURI),"_Auto_")))
             AS ?newURI)
       }

This should return a table something like this:

Subject Predicate Object
veh:_TomJones gist:owns veh:_Car_234
veh:_JaneWrenchTurner veh:repaired veh:_Car_234
veh:_PeterSeller veh:sold veh:_Car_234

This tells you there are exactly three triples with veh:_Auto_234 in the object, and shows you what the new triples will be when you replace the old ones.   After this, you might want to remove the FILTER and see a wider range of triples, setting a LIMIT as needed. Now you are ready to do the actual replacement (step 2b).   This is what you do:

  1. Add a DELETE statement to remove the triple that will be replaced.
  2. Replace the “CONSTRUCT” with “INSERT” leaving alone what is in the brackets.
  3. Leave the WHERE clause as it is, except to remove the FILTER statement, if it is still there (or just comment it out).

sparql-changing-instance-uris

Sample Graph of Triples

This will do the change in place for all affected triples. Note that we have constructed the URI from scratch, when all we really needed to do was do a string replace.  The latter is simpler and more robust.  Using CONCAT and STRAFTER gives the wrong answer if the string “_Auto_” does not appear in the URI. Here is the query to execute, with the simpler string operation:

DELETE {?s ?p ?oldURI}
INSERT {?s ?p ?newURI }
WHERE {?oldURI rdf:type veh:Car .
       ?s ?p ?oldURI .
       BIND (URI(REPLACE(STR(?oldURI), "_Auto_", "_Car_")) AS ?newURI)
       }

Step 3 is pretty much identical, except flip the subject and object.  In fact, you can combine steps 2 and 3 into a single query.  There are a few things to watch out for:

  1. VERY IMPORTANT: make sure you do steps 2 and 3 in order.  If you do step 3 first, you will blow away the rdf:type statements that are needed to do step 2.
  2. It is easy to make mistakes, backup the store and work on a copy.
  3. When creating URIs from strings, use full namespaces rather than the abbreviated qname format.
  4. Check the count of all the triples before and after each time you make a change, track down any differences.
Read Next:

SPARQL: Updating the URI of an owl:Class in place

Background

We have been developing solutions for our clients lately that involve loading an ontology into a triple store, and building a UI for data entry. One of the challenges is how to handle renaming things.  If you want to change the URI of a class or property in Protégé you load all the ontologies and datasets that use the old URI and use the rename entity command in the Refactor menu.  Like magic, and all references to the URI are changed with the press of the Enter key.   In a triple store, it is not so easy. You have to track down and change all the triples that refer to the old URI. This means writing and executing SPARQL queries using INSERT and DELETE to make changes in place.  Below is an outline of how to rename a class.

Steps of Change

Let ?oldClass and ?newClass be variables bound to the URI for the old and new classes respectively – e.g. ?oldClass might be veh:Auto and ?newClass might be veh:Car.   The class rename operation involves the following steps:

  1. Change all instances of ?oldClass to be instances of ?newClass instead. e.g.
    veh:myTeslaS   rdf:type   veh:Auto is replaced with
    veh:myTeslaS   rdf:type   veh:Car
  2. Find and examine all the triples using ?oldClass as the object.  It may occur in triples where the subject is a blank node and the predicate is one of the several used for defining OWL  restrictions. E.g . _:123B456x78  owl:someValuesFrom   veh:Auto
    Replace triples with the old class URI in the object with new triples using the  new URI. Note, you might want to do the first part of the next step before doing the replace.
  3. Find and examine all the triples using ?oldClass as the subject. It may occur in triples for declaring subclass relationships, comments as well as the triple creating the class in the first place. e.g. veh:Auto   rdf:type   owl:Class
    Replace triples with the old class URI in the subject with new triples using the  new URI.
  4. Look around for anywhere else that the old name may be used.  Possibilities include:
    1. If your instances use a naming convention that includes the name of the class (e.g. veh:Auto_234)then you will have to find all the URIs that start with veh:_Auto and use veh:_Car  instead.  We will look into this in a future blog.
    2. The class name may occur in comment strings and other documentation.
    3. It may also be used in SPARQL queries that are programmatically called.

Here is some SPARQL for how to do to the first step.

# Rename class veh:Auto to veh:Car
# For each ?instance of ?oldClass
# Replace the triple <?instance rdf:type ?oldClass>
#               with <?instance rdf:type ?newClass>
DELETE {?instance rdf:type ?oldClass}
INSERT {?instance rdf:type ?newClass}
WHERE  {BIND (veh:Auto as ?oldClass)
        BIND (veh:Car as  ?newClass)
        ?instance rdf:type ?oldClass . }

Gotchas

There are many ways to make mistakes here. Watch for the following:

  • Having the DELETE before the INSERT seems wrong, fear not, it is just an oddity in the SPARQL syntax.
  • Save out a copy of the triple store, in case things go wrong that are hard to undo.  One way to do this is to make all the changes to a copy of the triple store before making them in the production one. Do all the steps, make sure things worked.
  • Make sure your namespaces are defined.
  • Before you make a change in place using INSERT and DELETE, always use CONSTRUCT to see what new triples will be created.
  • Think about the order in which you replace triples.  You can easily end up replacing triples in one step, that you needed to find the triples to replace in the next step.
  • Always check the total count of triples before and after an operation that replaces triples. Generally it should be the same; track down any exceptions.  The count may be less due to duplicate triples that may occur in different named graphs.
  • A cautious approach would be to first insert the new triples and on a second step remove the old ones.  I tried this and it did not work, it seems like a bug.  Throw caution to wind, and do the delete and insert at once. You have a backup, and once you get the hang of it, the extra step will just be extra work.
  • It may not be possible to fully automate changes in comments and SPARQL queries that are used programmatically.  Check to see what needs to change, and what doesn’t.

What Next?

After you get step 1 working, try out steps 2 and 3 on your own, all you need to do is some straight-forward modifications to the above example.  Step 4 involves more exploration and custom changes.

In an upcoming blog, we explore one of those changes.  Specifically, if your naming convention for instances uses the class name in the URI then those instance URIs will have to change (e.g. from veh:_Auto_2421 to veh:_Car_2421).

Read Next:

Why Not to Use Boolean Datatypes in Taxonomies

Many taxonomies, especially well designed taxonomies with many facets, have dimensions that consist of very few, often just two categories, however this may cause more harm than it’s worth.

Why Not to Use Boolean Datatypes in Taxonomies

It is tempting to give these Boolean like tags, such as “Yes”/”No” or “Y”/”N” or “True”/”False” or even  near Booleans like “H, M, L.”  I’m going to suggest in this article not doing that, and instead use self describing meaningful names for the categories.

Before I do, let me do a bit of color commentary on the types of situations where this shows up.  Recently we were designing a Resolution Planning system in the Financial Industry.  In the course of this design it became tempting to have categories for inter-affiliate services  such as “resolution criticality” or “materiality” or “Impact on Reputation,” not just tempting but these were part of the requirements from the regulators.  It was tempting to have the specific terms within each category be something like “yes”/”no” or “high”, “medium”, “low”.  Partly this is because you may want the reports to have columns like “resolution critical” and “yes” or “no” in the rows.

That’s the backdrop.  I can speak from experience that it is very tempting to just create two taxonomic categories “Yes” and “No.”  There are actually two flavors of this temptation:

  • Just create two terms “yes” and “no” and use them in all the places they occur, that is there is a instance with a uri like :_Yes and  an instance with a uri like :_No with labels “Yes” and “No”
  • Create a different “yes” and “no” instances for each of the categories (that is that there is a uri with a name like  :_resCrit_Yes which has a label “Yes” and elsewhere a uri with a name like :_materiality_Yes)

I’m going to suggest that both are flawed.  The first requires us to have a new property for every distinction we make.  In other words we can’t just say “categorizedBy” as we do with other categories, because you would need the name of the property to find out what “yes” means.  While at first this seems reasonable, it leads to the type of design we find in legacy systems, with an excessive number of properties that have to be modeled, programmed to and learned by consumers of the data.  The second approach is closer to what we will advocate here, but doesn’t go far enough as we’ll see.

My perspective here is based on two things:

  • Years of forensic work, profiling and reverse engineering trying to deduce what existing data in legacy systems actually means, plus
  • My commitment to the “Data Centric Revolution” wherein data becomes the permanent artifact and applications come and go.  This is not the way things are now.  In virtually all organizations now when people want new functionality they implement new applications, and “convert” their data from the old to the new.  Moving to truly data centric enterprises will take some changes to points of view in this area.

I am reminded of a project we did with Sallie Mae, where we were using an ontology as the basis for their Service Oriented Architecture messages.  Every day we’d tackle a few new messages and try to divine what the elements and attributes in the legacy systems meant.  We would identify the obvious elements and have to send the analysts back to the developers to try to work out the more difficult ones. After several weeks of this I made an observation: the shorter the length of the element, the longer it would take us to figure out what it meant, with Booleans taking the longest.

I’ve been reflecting on this for years, and I think the confluence of our Resolution Planning application and the emergence of the Data Centric approach have led me to what the issue was and is.

“Yes” or even “True” doesn’t mean anything in isolation.  It only means something in context.  Yes is often the answer to a question, and if you don’t know what the question was, you don’t know what “yes” means.  And in an application centric world, the question is in the application.  Often it appears in the user interface.  Then the reporting subsystem reinterprets it.  Usually, due to space restrictions the reporting interpretation is an abbreviated version of the user interface version.  So the user interface might say “Would the unavailability of this service for more than 24 hours impair the ability for a resolution team to complete trades considered essentially to continued operation of the financial system as a whole?” And the report might say “Resolution Critical.”  Of course the question could just as well be expressed the other way around: “Could a team function through the resolution period without this services?” (Where “Yes” would mean approximately the same as “No” to the previous question).

In either event, Boolean data like this does not speak for itself.  The data is inextricably linked to the application, which is what we’re trying to get beyond.

If we step back and reflect on what we’re trying to do we can address the problem.  We are attempting to categorize things.  In this case we’re trying to categorize “Inter-affiliate Services.”  The categories we are trying to put things in are categories like “Would be Essential in the Event of a Resolution” and “Would not be Essential in the Event of a Resolution.”   I recognize that this sounds a lot like “Yes” and “No” or perhaps the slightly improved “Essential” and “Non-Essential.”  Now if you ask the question “Would the unavailability of this service for more than 24 hours impair the ability for a resolution team to complete trades considered essentially to continued operation of the financial system as a whole?” the user answer “Yes” would correspond to “Would be Essential in the Event of a Resolution.”  If the question were changed to “Could a team function through the resolution period without this services?” we would map “No” to “Would be Essential in the Event of a Resolution.”

Consider the implication.  With the fully qualified categories, you get several advantages:

  • The data does speak for itself.  You can review the data and know what it means without having to refer to application code, and without being forever dependent on the application code for interpretation.
  • You could write a query and interpret the results, without needing labels from the application or the report.
  • You could query for all the essential services.  Consider how hard this would be in the Boolean case. You can query for things that are in the Resolution Critical mini taxonomy with the value of “Yes,” but you don’t really know what “Yes” means.  With the fully qualified category you just query for the things that are categorized by “Would be Essential in the Event of a Resolution” and you’ve got it
  • You can confidently create derivative classes.  Let’s say you wanted the set of all departments that provided resolution critical services.  You would just create a restriction class that related the department to the service with that category.  You could do it with the Boolean, but you’d be continually dogged by the question “what did ‘yes’ mean in this context?”
  • You can use the data outside the context in which it was originally created.  In a world of linked data, it will be far easier to consume and use data that has more fully qualified categories.

Finally if you find you really need to put “Yes” on a report, you can always put an alternate display label on the category and this way the data would know what “yes” meant without having to refer to the application.

In conclusion: it is often tempting to introduce Boolean values, or very small taxonomies that function as Booleans into your ontology design.  This leads to long term problems with coupling between the data and the application, and hampers maintenance and long term use of the data.

Preparing and using these more qualified categories only takes a bit more up front design work, and has no downside to implementation or subsequent use.

The Data-Centric Revolution: Data-Driven Resolution Planning

Perhaps the way to be ready for resolution is to flip from document-centric to data-centric. Build a system that expresses in real time, the on-going agreements between the many legal entities within the firm. Capture who is doing what for whom.

We just completed a project with an Investment Bank, whose name you would recognize in a minute, but I shall refrain from revealing it, as they are very shy.

This project was interesting.  It seems the Federal Reserve, as well as the British equivalent, the PRA, have decided that in the event ofdata-centric another Lehman Brothers they’d like to be far more prepared.  Turns out there wasn’t “a” Lehman Brothers (not even a couple of siblings) there were dozens of material entities and of course thousands of entities in total.  Part of this was due to regulatory pressure, local countries wanted their bankers to be domiciled and incorporated locally.  And much of it was opportunistic, logistical, whatever.  Regardless the reason, when the shinola hit the fan it was not pretty.  No one knew what entity owned the debts, which entity owned the building, or employed the people or had the rights to their software, or was even paying the utilities.  Needless to say it was an exercise that many don’t want to repeat.

So the regulators have put out guidelines for “Resolution Planning” aka: “The living will.” The basic idea is to understand and document the way you do business such that in the event of another melt down, when the Feds burst down the doors with their hazmat suits on, they will at least have a play book for what to do.

But regulators, being regulators, don’t tell the banks what to do.  They suggest.  And hint.  And then poke a bit if the banks aren’t getting the message.   So over the last few years the banks have been doing what banks do: they’ve been complying.  Or at least trying to comply.  And complying usually looks like the legal team and the compliance team getting together and writing.  And writing.  And writing.  Man, these people can write.  Thousands of pages describing how it all works and how to unwind things in the hopefully unlikely event things need unwinding.

Only problem is, the regulators decided (I should say hinted at) that they didn’t want a play book that made War and Peace look like a short story, they wanted something they could use.  Oops.

Not only was novel reading not something they wanted to do in a crisis, they knew that the novel they were reading was of course, hopelessly out of date.

Enter our hero (he too shall go unnamed, the same shyness thing that overtakes so many wall street types) with the idea that maybe this thing could be data driven rather than document driven.

This whole document driven/ data driven thing has been playing out in industry after industry.  What is bizarre is that in the second decade of the 21st century (you remember, flying cars and moon colonies) we still have an incredible number of our systems that are document driven.  It’s not to say that they aren’t computerized, it’s just that the protagonists haven’t made the mental (and therefore system level) shift to put the data first and the document as a by-product.  The airline industry made the shift in the 80’s.  If you’re old enough to recall we used to get these three part things with red carbon paper that allowed the bearer (no kidding) to get on the airplane, no id, nothing.  Lose that red carbon paper document and it was going to take you a while to get your money back, let alone get on an airplane. You could get your money back for a lost ticket,  but it was akin to a Freedom of Information Act request: theoretically possible but you really didn’t want to do it.

Nowadays the airline industry still has documents, you can print them and show them to the nice TSA people and the gate attendants, but lose it, no biggy, print another.  Indeed before you go to the airport you can print a dozen if you’d like.

So the airline industry has made the shift.  Our observation is that most business have one foot on the station and one foot on the train with this one.  That is they have made the shift in some of their systems and not others. We did project recently for the Washington Secretary of State (unlike banks they aren’t weird about having their name in print). As part of that project we looked at the state of the States with regard to corporation and charity registration.  While everyone is computerized, most are still document centric. That is, the submitted document is the real thing, and the system or the operators excerpt a few key details to put into their databases. You can query the database but the reality is in the document. You can tell when you’ve made the shift, when the data is the real thing and you only print it for convenient reading.  When the transition is complete the data will be the registration, and a document will merely be the rendering of the data already captured.

Returning to our hero, he had an insight that perhaps the way to be ready for resolution would be to flip from document centric to data-centric.  Build a system that expresses in real time, the on-going agreements between the many legal entities within the firm.  Capture who is going what for whom.

As obvious as this sounds it runs completely counter culture.  Banks set up departments in India (and other countries).  They send work to India.  India is part of the department.  And yet, in reality, India is another Legal Entity. The firm might go bankrupt.  The entity in India might go bankrupt. The entity in India might get hit by a flood or hurricane.  No one is thinking in these terms. Except the regulators.

So, we built a system. Based, of course, on semantic technology and model driven development.  It’s been a struggle to get managers to think of their daily handoffs as being in the scope of a service level agreement between two legal entities, but they are.  We are working through how to characterize work, and how to describe the controls that have been put in place to ensure the work is getting completed to standard. We’ve piloted the system.  In three months it went through literally hundreds of changes.  We ascribe about half the agility to semantic technology and half to model driven technology.

And the output of the system looks like documents.  Indeed we dump a lot of our data to Excel spread sheets because that is what bankers are comfortable with. But it is data-centric.

So the regulatory review was this week.  We haven’t heard the feedback.  Typically it takes many weeks to find out: did the regulators like the idea of being data-centric?

While we wait, we reflect.  We’re pretty convinced.  This is inevitable.  To become more compliant in resolution planning we believe that banks will have to make the shift from document driven to data driven.

Written by Dave McComb

D3 the Easy Way

We’ve found ourselves working with D3(d3js.org) more and more lately, both for clients and for our own projects. So far we’ve really just begun to scratch the surface of what it can do (if you’re unfamiliar, take a moment to browse the examples). Despite our relative lack of experience with the library, we’ve been able to crank out demos at a wonderfully satisfying pace thanks to our decision early on to focus on creating high-level, DRY abstractions. Though there is a ton of flexibility and extensibility, D3 does seem to have a few broad categories of visualizations based around the shape of input data. The one we’re going to focus on here is the flare.js format, which looks roughly like this:


{
"name": "flare",
"children": [
{
"name": "analytics",
"children": [
{
"name": "cluster",
"children": [
{ "name": "AgglomerativeCluster", "size": 3938 },
{ "name": "CommunityStructure", "size": 3812 },
{ "name": "HierarchicalCluster", "size": 6714 },
{ "name": "MergeEdge", "size": 743 }
]
}
]
}
]
}

view raw

flare.json

hosted with ❤ by GitHub

This format seems to be the basis of a large number of useful visualizations dealing with nested, tree-shaped data. The challenge is that in semantic systems it’s just not possible to construct a SPARQL query that returns data in a format anywhere near the “flare” example above. What you’re more likely to see something structurally very similar to this:


{
"keys": ["thisClass", "superClass", "instances"],
"0": { "instances": 0,
"thisClass": "<http://ontologies.semanticarts.com/gist#Equipment>",
"superClass": "<http://ontologies.semanticarts.com/gist#PhysicalIdentifiableItem>" },
"1": { "instances": 0,
"thisClass": "<http://ontologies.semanticarts.com/gist#SomeParentClass>",
"superClass": "None" },
"2": { "instances": 1,
"thisClass": "<http://ontologies.semanticarts.com/gist#SomeChildClass>",
"superClass": "<http://ontologies.semanticarts.com/gist#SomeParentClass>" },
"3": { "instances": 3,
"thisClass": "<http://ontologies.semanticarts.com/gist#AnotherChildClass>",
"superClass": "<http://ontologies.semanticarts.com/gist#SomeParentClass>" }
}

So what to do? One path we could take is to write a server-side endpoint that returns data in the right structure, but this has the disadvantage of being less flexible, not to mention it spreads out concerns of the visualization layer wider than they need to be. The other option is to do the conversion on the front end. Rather than dive into writing a bespoke method to covert the specific data on hand to a format tailored to the visualization we want, why not take a step back and write a reusable abstraction layer? The end result, without giving too much away, looks roughly like this:


var root = d3z.toFlareRecursive({
data: json,
parentKey: 'superClass',
childKey: 'thisClass',
nameKey: function(obj){
var prefix = obj['thisClass'].match(/\/([azAZ_]*)\#/)[1],
name = obj['thisClass'].match(/\#(.*)>/)[1];
return prefix + ":" + name;
},
sizeKey: function(obj){
var size = obj.instance + 4.5 || 4.5;
return size;
}
});

The advantage of this over D3’s built-in nest() method is that it will recursively construct n-depth trees from flat data, as long as the relationships are consistently expressed. This eliminates one of the major time-consumers in creating these visualizations–the data conversion in a reusable and flexible way. But why stop there? We created abstractions for various visualizations as well. Now, creating an icicle graph (like the one above) is as simple as:


new d3z.Icicle({
root: root,
width: 2400,
container: "#body"
});

view raw

icicle.js

hosted with ❤ by GitHub

This is still in the early stages of development so it’ll be some time before we release it to the public, but stay tuned.

Ontology and Taxonomy: Strange Bedfellows

Explore the relationship between Taxonomy and Ontology with this presentation by Michael Uschold  from a keynote talk at the International Conference on Semantic Computing. 

Click Here to View The PDF

The Menu (Taxonomy) vs. the Meal (Ontology)

Taxonomy and Thesauri:

  • Focus is on words, not concepts (the menu).
  • Relationships are between terms: synonym, hyponym, broader/narrower term.
  • Each term should refer to just one concept.

Ontology:

  • Focus is on concepts (the meal).
  • Relationships are between concepts.
  • Formal definitions.
  • Automated inference.

How do we bring it all together?

  • Understand the value where each approach adds the most value.
  • Find the touch points and link them all up.
  • Can Everyone and every tool live in harmony?
  • It is not impossible, we are pushing hard and it gets easier!

 

 

 

 

White Paper: The Enterprise Ontology

At the time of this writing almost no enterprises in North America have a formal enterprise ontology. Yet we believe that within a few years this will become one of the foundational pieces to most information system work within major enterprises. In this paper, we will explain just what an enterprise ontology is, and more importantly, what you can expect to use it for and what you should be looking for, to distinguish a good ontology from a merely adequate one.

What is an ontology?

An ontology is a “specification of a conceptualization.” This definition is a mouthful but bear with me, it’s actually pretty useful. InEnterprise Ontology general terms, an ontology is an organization of a body of knowledge or, at least, an organization of a set of terms related to a body of knowledge. However, unlike a glossary or dictionary, which takes terms and provides definitions for them, an ontology works in the other direction. An ontology starts with a concept. We first have to find a concept that is important to the enterprise; and having found the concept, we need to express it in as precise a manner as possible and in a manner that can be interpreted and used by other computer systems. One of the differences between a dictionary or a glossary and ontology is, as we know, dictionary definitions are not really processable by computer systems. But the other difference is that by starting with the concept and specifying it as rigorously as possible, we get definitive meaning that is largely independent of language or terminology. Then the definition states that an ontology is a “specification of a conceptualization.” That is what we just described. In addition, of course, we then attach terms to these concepts, because in order for us humans to use the ontology we need to associate the terms that we commonly use.

Why is this useful to an enterprise?

Enterprises process great amounts of information. Some of this information is structured in databases, some of it is unstructured in documents or semi structured in content management systems. However, almost all of it is “local knowledge” in that its meaning is agreed within a relatively small, local context. Usually, that context is an individual application, which may have been purchased or may have been built in-house. One of the most time- and money-consuming activities that enterprise information professionals perform is to integrate information from disparate applications. The reason this typically costs a lot of money and takes a lot of time is not because the information is on different platforms or in different formats – these are very easy to accommodate. The expense is because of subtle, semantic differences between the applications. In some cases, the differences are simple: the same thing is given different names in different systems. However, in many cases, the differences are much more subtle. The customer in one system may have an 80 or 90% overlap with the definition of a customer in another system, but it’s the 10 or 20% where the definition is not the same that causes most of the confusion; and there are many, many terms that are far harder to reconcile than “customer.” So the intent of the enterprise ontology is to provide a “lingua franca” to allow, initially, all the systems within an enterprise to talk to each other and, eventually, for the enterprise to talk to its trading partners and the rest of the world.

Isn’t this just a corporate data dictionary or consortia of data standards?

The enterprise ontology does have many similarities in scope to both a corporate data dictionary and consortia data standard. The similarity is primarily in the scope of the effort: both of those initiatives, as well as enterprise ontologies, aim to define the shared terms that an enterprise uses. The difference is in the approach and the tools. With both a corporate data dictionary and a consortia data standard the interpretation and use of the definitions is strictly by humans, primarily system designers. Within an enterprise ontology, the expression of the ontology is such that tools are able to interpret and make inferences on the information when the system is running.

Download the White-paper to Read More.

Written by Dave McComb

Concrete Abstractions

concrete abstractions

 

Gist is based on something we call “concrete abstractions”

Most upper level ontologies are based on “abstract abstractions” that is, they are based on philosophical ideas that might be correct but are counter productive to try to convince business people and IT people what they are and what they mean.

We have taken the same posture as Google has with schema.org, most of our classes are classes of “concrete” objects. Watch the video to get an idea what we’re talking about.

Ontologies and Taxonomies

Ontologies and TaxonomiesAre you struggling with how to make best use of your company’s knowledge assets that have grown overly complex?  Have you wondered how to blend the more informal taxonomic knowledge with the more formal ontological knowledge?  This has been a real head-scratcher for us for quite a while now.  We described some breakthroughs we have made in the past couple of years on this front in a keynote talk at the International Conference on Semantic Computing in Newport Beach.

Ontologies and Taxonomies: Strange Bedfellows

Abstract:

In large companies, key knowledge assets are often unnecessarily complex, making them hard to understand, evolve and reuse. Ambiguity is at the root of the problem; it is often reflected in poorly structured information. We describe an approach using taxonomies and ontologies to root out ambiguity and create a set of building blocks that acts as a solid foundation for creating more useful structure.

We describe the challenges of working with both taxonomies and ontologies, and how we married them to provide a foundation that supports integration across a wide range of enterprise assets including spreadsheets, applications and databases.

Click here to view the presentation.

Written and presented by Michael Uschold

Read Next:

How to Run a Project Over Budget by 300-500%

How to Run a Project Over Budget by 300-500%

A Playbook you Don’t want to Follow

A while back, I was working for a large consulting firm. When I was returning to the US from an overseas assignment, I was allowed to select the city I would return to. I told my boss, who was on the board of this firm, my choice. He counseled against it as apparently the office was being hollowed out, having just hosted the largest project write-off in the history of the firm. (This was a while ago so these numbers will seem like rounding errors to today’s consultants, but I think the lessons remain the same.)

When I found out that we had written off something like $30 million, I asked how anyone could possibly run a project over budget by that much. He said, it’s pretty hard, but you have to follow this specific playbook:

  • The consulting firm starts the project, creates an estimate, staffs up the project and goes to work (so far this is pretty much like any other project).
  • At about the time they’ve burned through most of the budget, they realize they’re not done, and not likely to finish anytime soon. At this point they declare a change in scope and convince the client to accept most of the projected overrun. Typically at this point it’s projected to be about 50%.
  • As they near the end of the extension it becomes obvious that they won’t hit the extended budget either. Senior management in the consulting firm recognizes this as well and sacrifices the project manager, and brings in Project Manager #2.
  • PM #2 has a very standard script (I don’t know if there is a school for this or if they all work it out on their own): “This is way worse than we thought. It’s not 90% complete (as the outgoing Project Manager had said). It’s not even 50% complete.” New estimates and budgets are drawn up, the client is apprised of the situation. The client has a lot into the project at this point, but also is very reluctant to pay for all this mismanagement. Eventually both parties agree to split the cost of the overrun. The total budget is now between 250% -%300 of the original.
  • In order to spend all this extra budget, and to get some new much-needed talent on the team, PM #2 brings in more staff. If the project completes in the new (3rd) budget (and sometimes they do) you have a reluctantly satisfied client (at least they got their system) and consultants (even at half price for the last portion they were making money).
  • Alas, sometimes even that doesn’t work. And when it doesn’t, back to the playbook. Bring in PM #3. PM #3 has to be very senior. This has to work. PM #3, in order to maintain his or her reputation, has to make this troubled project succeed.
  • PM #3 doesn’t miss a beat. “This is way worse than we thought…” (almost any number can be inserted at this point, but 400 – 500% of the original is not out of range. ) At this point there is no more going back to the client. They consulting firm will eat the rest of the overrun. PM #3 will make sure the new number will absolutely assure success. The consulting firm accepts the write off and finishes the project.

That is pretty much the playbook for how to run a project over budget by that amount. You might well ask, how did they manage to run over in the first place?

Tolstoy said, “Happy families are all alike; every unhappy family is unhappy in its own way.” And so it is with software projects. Each seems to go bad for a different reason. And if you do enough, the odds will catch up to you. But that will be a subject for another article.

Written by Dave McComb