White Paper: The Value of Using Knowledge Graphs in Some Common Use Cases

We’ve been asked to comment on the applicability of Knowledge Graphs and Semantic Technology in service of a couple of common use cases.  We will draw on our own experience with client projects as well as some examples we have come to from networking with our peers.

The two use cases are:

  • Customer 360 View
  • Compliance

We’ll organize this with a brief review of why these two use cases are difficult for traditional technologies, then a very brief summary of some of the capabilities that these new technologies bring to bear, and finally a discussion of some case studies that have successfully used graph and semantic technology to address these areas.

Why is This Hard?

In general, traditional technologies encourage complexity, and they encourage it through ad-hoc introduction of new data structures.  When you are solving an immediate problem at hand, introducing a new data structure (a new set of tables, a new json data structure, a new message, a new API, whatever) seems to be an expedient.  What is rarely noticed is the accumulated effect of many, many small decisions taken this way.  We were at a healthcare client who admitted (they were almost bragging about it) that they had patient data in 4,000 tables in their various systems.  This pretty much guarantees you have no hope of getting a complete picture of a patient’s health and circumstances. There is no human that could write a 4,000 table join and no systems that could process it even if it were able to be written.

This shows up everywhere we look.  Every enterprise application we have looked at in detail is 10-100 times more complex than it needs to be to solve the problem at hand.  Systems of systems (that is the sum total of the thousands of application systems managed by a firm) are 100- 10,000 times more complex than they need to be.  This complexity shows up for users who have to consume information (so many systems to interrogate, each arbitrarily different) and developers and integrators who fight a read guard action to keep the whole at least partially integrated.

Two other factors contribute to the problem:

  • Acquisition – acquiring new companies inevitably brings another ecosystem of applications that must be dealt with.
  • Unstructured information – a vast amount of important information is still represented in unstructured (text) or semi-structured forms (XML, Json, HTML). Up until now it has been virtually impossible to meaningfully combine this knowledge with the structured information businesses run on.

Let’s look at how these play out in the customer 360 view and compliance.

Customer 360

Eventually, most firms decide that it would be of great strategic value to provide a view of everything that is known about their customers. There are several reasons this is harder than it looks.  We summarize a few here:

  • Customer data is all over the place. Every system that places an order, or provides service, has its own, often locally persisted set of data about “customers.”
  • Customer data is multi-formatted. Email and customer support calls represent some of the richest interactions most companies have with their clients; however, these companies find data from such calls difficult to combine with the transactional data about customers.
  • Customers are identified differently in different systems. Every system that deals with customers assigns them some sort of customer ID. Some of the systems share these identifiers.  Many do not.  Eventually someone proposes a “universal identifier” so that each customer has exactly one ID.  This almost never works.  In 40 years of consulting I’ve never seen one of these projects succeed.  It is too easy to underestimate how hard it will be to change all the legacy systems that are maintaining customer data.  And as the next bullet suggests, it may not be logically possible.
  • The very concept of “customer” varies widely from system to system. In some systems the customer is an individual contact; in other, a firm; in another a role; in yet another, a household. For some it is a bank account (I know how weird that sounds but we’ve seen it).
  • Each system needs to keep different data about customers in order to achieve their specific function. Centralizing this puts a burden of gathering a great deal of data at customer on-boarding time that may not be used by anyone.

Compliance

The primary reason that compliance related systems are complex is that what you are complying with is a vast network of laws and regulations written exclusively in text and spanning a vast array of overlapping jurisdictions.  These laws and regulations are changing constantly and are always being re-interpreted through findings, audits, and court cases.

The general approach is to carve off some small scope, read up as much as you can, and build bespoke systems to support them. The first difficulty is that there are humans in the loop all throughout the process.  All documents need to be interpreted, and for that interpretation to be operationalized it generally has to be through a hand-crafted system.

A Brief Word on Knowledge Graphs and Semantic Technology

Knowledge Graphs and Graph Databases have gained a lot of mind share recently as it has become known that most of the very valuable digital native firms have a knowledge graph at their core:

  • Google – the google knowledge graph is what has made their answering capability so much better than the key word search that launched their first offering. It also powers their targeted ad placement.
  • LinkedIn, Facebook, Twitter – all are able to scale and flex because they are built on graph databases.
  • Most Large Financial Institutions – almost all major financial institutions have some form of Knowledge Graph or Graph Database initiative in the works.

Graph Databases

A graph database expresses all its information in a single, simple relationship structure: two “nodes” are connected by an “edge.”

A node is some identifiable thing.  It could be a person or a place or an email or a transaction.  An “edge” is the relationship between two nodes.  It could represent where someone lives, that they sent or received an email, or that they were a party to a transaction.

A graph database does not need to have the equivalent of a relational table structure set up before any data can be stored, and you don’t need to know the whole structure of the database and all its metadata to use a graph database.  You can just add new edges and nodes to existing nodes as soon as you discover them.  The network (the graph) grows organically.

The most common use case for graph databases are analytic.  There are a whole class of analytics that make use of network properties (i.e., how closely x is connected to y, what the shortest route is from a to b).

Knowledge Graphs

Most graph databases focus on low level data: transactions, communications, and the like. If you add a knowledge layer onto this, most people refer to this as a knowledge graph.  The domain of medical knowledge (diseases, symptoms, drug/drug interaction, and even the entire human genome) has been converted to knowledge graphs to better understand and explore the interconnected nature of health and disease.

Often the knowledge in a knowledge graph has been harvested from documents and converted to the graph structure.  When you combine a knowledge graph with specific data in a graph database the combination is very powerful.

Semantic Technology

Semantic Technology is the open standards approach to knowledge graphs and graph databases.  (Google, Facebook, LinkedIn and Twitter all started with open source approaches, but have built their own proprietary versions of these technologies.)  For most firms we recommend going with open standards.  There are many open source and vendor supported products at every level of the stack, and a great deal of accumulated knowledge as to how to solve problems with these technologies.

Semantic technologies implement an alphabet soup of standards, including: RDF, RDFS, OWL, SPARQL, SHACL, R2RML, JSON-LD, and PROV-O.  If you’re unfamiliar with these it sounds like a bunch of techno-babble. The rap against semantic technology has been that it is complicated.  It is, especially if you have to embrace and understand it all at once.  But we have been using this technology for almost 20 years and have figured out how to help people adapt by using carefully curated subsets of each of the standards and leading through example to drastically reduce the learning curve.

While there is still some residual complexity, we think it is well worth the investment in time.  The semantic technologies stack has solved a large number of problems that graph databases and knowledge graphs have to solve on their own, on a piecemeal basis.  Some of these capabilities are:

  • Schema – graph databases and even knowledge graphs have no standard schema, and if you wish to introduce one you have to implement the capability yourself. The semantic technologies have a very rich schema language that allows you to define classes based on what they mean in the real world.  We have found that disciplined use of this formal schema language creates enterprise models that are understandable, simple, and yet cover all the requisite detail.
  • Global Identifiers – semantic technology uses URIs (the Unicode version of which is called an IRI) to identify all nodes and arcs. A URI looks a lot like a URL, and best practice is to build them based on a domain name you own.  It is these global identifiers that allow the graphs to “self-assemble” (there is no writing of joins in semantic technology, the data is already joined by the system).
  • Identity Management – semantic technology has several approaches that make living with the fact that you have assigned multiple identifiers to the same person or product or place. One of the main ones is called “sameAs” and allows the system to know that ‘n’ different URIs (which were produced from data in ‘n’ different systems, with ‘n’ different local IDs) all represent the same real-world item, and all information attached to any of those URIs is available to all consumers of the data (subject to security, of course).
  • Resource Resolution – some systems have globally unique identifiers (you’ve seen those 48-character strings of numbers and letters that come with software licenses, and the like), but these are not very useful, unless you have a special means for finding out what any of them are or mean. Because semantic technology best practice says to base your URIs on a domain name that you own, you have the option for providing a means for people to find out what the URI “means” and what it is connected to.
  • Inference – with semantic technology you do not have to express everything explicitly as you do in traditional systems. There is a great deal of information that can be inferred based on the formal definitions in the knowledge graph as part of the semantic schema and combined with the detailed data assertions.
  • Constraint Management – most graph databases and knowledge graphs were not built for online interactive end user update access. Because of their flexibility it is hard to enforce integrity management. Semantic technology has a model driven constraint manager that can ensure the integrity of a database is maintained.
  • Provenance – one key use case in semantic technology is combining data from many different sources. This creates a new requirement when looking at data that has come from many sources you often need to know: Where did this particular bit of data come from?  Semantic Technologies have solved this in a general way that can go down to individual data assertions.
  • Relational and Big Data Integration – you won’t be storing all of your data in a graph database (semantic, or otherwise). Often you will want to combine data in your graph with data in your existing systems.  Semantic technology has provided standards, and there are vendors that have implemented these standards, such that you can write a query that combines information in the graph with that in a relational database or a big data store.

It is hard to cover a topic as broad as this in a page, but hopefully this establishes some of what the approach provides.

Applying Graph Technology

So how do these technologies deliver capability to some more common business problems?

Customer 360

We worked with a bank that was migrating to the cloud.  As part of the migration they wanted to unify their view of their customers.  They brought together a task force from all the divisions to create a single definition of a customer.  This was essentially an impossible task.  For some divisions (Investment Banking) a customer was a company, for others (Credit Card processing) it was usually a person.  Not only were there differences in type, all the data that they wanted and were required to have in these different contexts was different.  Further one group (corporate) espoused a very broad definition of customer that included anyone that they could potentially contact.  Needless to say, the “Know Your Customer” group couldn’t abide this definition as every new customer obligates them to perform a prescribed set of activities.

What we have discovered time and again is that if you start with a term (say, “Customer”) and try to define it, you will be deeply disappointed.  On the other hand, if you start with formal definitions (one of which for “Customer” might be, “a Person who is an owner or beneficiary on a financial account” (and of course financial account has to be formally defined)), it is not hard to get agreement on what the concept means and what the set of people in this case would be.  From there it is not hard to get to an agreed name for each concept.

In this case we ended up creating a set of formal, semantic definitions for all the customer related concepts.  At first blush it might sound like we had just capitulated to letting everyone have their own definition of what a “Customer” was.  While there are multiple definitions of “Customer” in the model, they are completely integrated in a way that any individual could be automatically categorized and simultaneously in multiple definitions of “Customer” (which is usually the case).

The picture shown below, which mercifully omits a lot of the implementation detail, captures the essence of the idea. Each oval represents a definition of “Customer.”

Knowledge graphs

In the lower right is the set of people who have signed up for a free credit rating service.  These are people who have an “Account’ (the credit reporting account), but it is an account without financial obligation (there is no balance, you cannot draw against it, etc.).  The Know Your Customer (KYC) requirements only kick in for people with Financial Accounts.  The overlap suggests some people have financial accounts and non-financial accounts.  The blue star represents a financial customer that also falls under the guidelines of KYC.  Finally, the tall oval at the top represents the set of people and organizations that are not to be customers, the so-called “Sanctions lists.”  You might think that these two ovals should not overlap, but with the sanctions continually changing and our knowledge of customer relations constantly changing, it is quite possible that we discover after the fact that a current customer is on the sanctions list.  We’ve represented this as a brown star that is simultaneously a financial customer and someone who should not be a customer.

We think this approach uniquely deals with the complexity inherent in large companies’ relationships with their customers.

In another engagement we used a similar approach to find customers who were also vendors, which is often of interest, and typically hard to detect consistently.

Compliance

Compliance also is a natural for solving with Knowledge Graphs.

Next Angles

Mphasis’ project “Next Angles” converts regulatory text into triples conforming to an ontology, which they can then use to evaluate particular situations (we’ve worked with them in the past on a semantic project).  In this white paper they outline how it has been used to streamline the process of detecting money laundering: http://ceur-ws.org/Vol-1963/paper498.pdf.

Legal and Regulatory Information Provider

Another similar project that we worked on was with a major provider of legal and regulatory information.  The firm ingests several million documents a day, mostly court proceedings but also all changes to laws and regulation.  For many years these documents were tagged by a combination of scripts and off shore human taggers.  Gradually the relevance and accuracy of their tagging began to fall behind that of their rivals.

They employed us to help them develop an ontology and knowledge graph; they employed the firm netOWL to perform the computational linguistics to extract data from documents and conform it to the ontology.  We have heard from third parties that the relevance of their concept-based search is now considerably ahead of their competitors.

They recently contacted us as they are beginning work on a next generation system, one that takes this base question to the next level: Is it possible to infer new information in search by leveraging the knowledge graph they have plus a deeper modeling of meaning?

Investment Bank

We are working in the Legal and Compliance Division for a major investment bank.  Our initial remit was to help with compliance to records retention laws. There is complexity at both ends of this domain.  On one end there are hundreds of jurisdictions promulgating and changing laws and regulations continually.  On the other end are the billions of documents and databases that must be classified consistently before they can be managed properly.

We built a knowledge graphs that captured all the contextual information surrounding a document or repository.  This included who authored it, who put it there, what department were they in, what cost code they charged, etc., etc.  Each bit of this contextual data had textual data available.  We were able to add some simple natural language processing that allowed them to accurately classify about 25% of the data under management.  While 25% is hardly a complete solution, this compares to ½ of 1% that had been classified correctly up to that point.  Starting from this they have launched a project with more sophisticated NLP and Machine Learning to create an end user “classification wizard” that can be used by all repository managers.

We have moved on to other related compliance issues, which includes managing legal holds, operation risk, and a more comprehensive approach to all compliance.

Summary: Knowledge Graphs & Semantic Technology

Knowledge Graphs and Semantic Technology are the preferred approach to complex business problems, especially those that require the deep integration of information that was previously hard to align, such as customer-related and compliance-related data.

Click here to download the white paper.

Whitepaper: Avoiding Property Proliferation

Domain and range for ontological properties are not about data integrity, but logical necessity. Misusing them leads to an inelegant (and unnecessary) proliferation of properties.

Logical Necessity Meets Elegance

Screwdrivers generally have only a small set of head configurations (flat, Phillips, hex) because the intention is to make accessingproperties contents or securing parts easy (or at least uniform). Now, imagine how frustrating it would be if every screw and bolt in your house or car required a unique screwdriver head. They might be grouped together (for example, a bunch of different sized hex heads), but each one was slightly different. Any maintenance task would take much longer and the amount of time spent just organizing the screwdrivers would be inordinate. Yet that is precisely the approach that most OWL modelers take when they over-specify their ontology’s properties.
On our blog, we once briefly discussed the concept of elegance in ontologies. A key criterion was, “An ontology is elegant if it has the fewest possible concepts to cover the required scope with minimal redundancy and complexity.” Let’s take a deeper look at object properties in that light. First, a quick review of some of the basics.

  1. An ontology describes some subject matter in terms of the meaning of the concepts and relationships within that ontology’s domain.
  2. Object properties are responsible for describing the relationships between things.
  3. In the RDFS and OWL modeling languages, a developer can declare a property’s domain and/or its range (the class to which the Subject and/or Object, respectively, must belong). Domain and range for ontological properties are not about data integrity, but logical necessity. Misusing them leads to an inelegant (and unnecessary) proliferation of properties. Avoiding Property Proliferation 2

Break the Habit

In our many years’ experience teaching our classes on designing and building ontologies, we find that most new ontology modelers have a background in relational databases or Object-Oriented modelling/development. Their prior experience habitually leads them to strongly tie properties to classes via specific domains and ranges. Usually, this pattern comes from a desire to curate the triplestore’s data by controlling what is getting into it. But specifying a property’s domain and range will not (necessarily) do that.
For example, let’s take the following assertions:

  • The domain of the property :hasManager is class :Organization.
  • The individual entity :_Jane is of type class :Employee.
  • :_Jane :hasManager :_George.

Many newcomers to semantic technology (especially those with a SQL background) expect that the ontology will prevent the third statement from being entered into the triplestore because :_Jane is not declared to be of the correct class. But that’s not what happens in OWL. The domain says that :_Jane must be an :Organization, which presumably is not the intended meaning. Because of OWL’s Open World paradigm, the only real constraints are those that prevent us from making statements that are logically inconsistent. Since in our example we have not declared the :Organization and :Employee classes to be disjoint, there is no logical reason that :_Jane cannot belong to both of those classes. A reasoning engine will simply infer that :_Jane is also a member of the :Organization class. No errors will be raised; the assertion will not be rejected. (That said, we almost certainly do want to declare those
classes to be disjoint.)

Read More and Download the White-paper

White Paper by Dan Carey

Whitepaper: Quantum Entanglement, Flipping Out and Inverse Properties

We take a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.

Property Inverses and Perspectives

It is important to understand that logically, both perspectives always exist; they are joined at the hip. If Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa. If from one perspective, a new relationship link is created or an existing one is broken, then that change is immediately reflected when viewed from the other perspective. This is a bit like two quantumly entangled particles. The change in one is instantly reflected in the other, even if they are separated by millions of light years. Inverse properties and entangled particles are more like two sides of the same coin, than two different coins.

 a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.
Figure 2: Two sides of the same coin.

 

In OWL we call the property that is from the other perspective the inverse property. Given that a property and its inverse are inseparable, technically, you cannot create or use one without [implicitly] creating or using the other. If you create a property hasParent, there is an OWL syntax that lets you refer to and use that property’s inverse. In Manchester syntax you would write: “inverse(hasParent)”. The term ‘inverse’ is a function that takes an object property as an argument and returns the inverse of that property. If you assert that Michael hasParent Joan, then the inverse assertion, Joan inverse(hasParent) Michael, is inferred to hold. If you decide to give the inverse property the name parentOf, then the inverse assertion is that Joan parentOf Michael. This is summarized in Figure 3 and the table below.

Click here to read more and download the White-paper

Written by Michael Uschold

White Paper: What is a Market?

The term “market” is a common term in the business industry. We talk about the automotive market, the produce market, the disk drive market, etc. And yet, what do we really mean when we useWhat is a market? that term?  It is an instructive question because very often CRM (Customer Relationship Management) systems or other sales analytic systems group customers and sales based on their “market”.  However, if we don’t have a clear understanding of what a market is, we misrepresent and misgroup and therefore mislead ourselves as we study the implication of our advertising and promotional efforts.

The Historical Market

Historically, markets were physical places: marketplaces.  People went to the market in order to conduct trade. In many communities there was a single market where all goods were bought and sold.  However, as communities became larger and developed into cities, marketplaces began to specialize and there was a particular place to go to buy and sell fresh produce. There was another place to go to buy and sell spices; and yet another to buy and sell furniture.

As time progressed and the economy became more and more differentiated and cities grew larger and larger, marketplaces became even more specialized and more geographically localized.  So, for instance, we have the diamond marketplace in Antwerp, Belgium, and the screenplay marketplace in Hollywood.

Why Did Physical Marketplaces Emerge and Dominate?

The trend toward physical marketplaces was not necessarily inevitable. Buyers could seek out sellers at their own place of business and conduct business that way.  Conversely, sellers could seek out buyers in their own place of business.  What led to the popularity of the marketplace were two factors. One was that the physical movement to the marketplace was collectively more efficient for most of the participants. The second reason was that the marketplace allowed easy selection and comparison between similar offerings.  Additionally, the cost of information about potential sources or demands for a given product or service was not nearly as economical as it is today with computers and the Internet.

Download the White-paper

Written by Dave McComb

White Paper: How long should your URIs be?

This applies to URIs that a system needs to generate when it finds it needs to mint a new resource.

I’ve been thinking a lot about automated URI assignment lately. In particular the scheme we’ve been using (relying on the database to maintain a “next available number” and incrementing that), is fraught with potential problems. However I really don’t like the guid style with their large unwieldy and mostly unnecessarily long strings.

I did some back of the envelop thinking and came up with the following recommendations. After the fact I decided to search the web and see what I could find. I found some excellent stuff, but not this in particular, nor anything that seemed to rule it out. Of note, Phil Archer has some excellent guidelines here: http://philarcher.org/diary/2013/uripersistence/. This is much broader than what I’m doing here, but it is very good. He even has “avoid auto increment” as one of his top 10 recommendations.

The points in this paper don’t apply to hand crafted URIs (as you would typically have for your classes and properties, and even some of your hand curated special instances). This applies to URIs that a system needs to generate when it finds it needs to mint a new resource. A quick survey of the approaches and who uses them:

  • Hand curate all—dbpedia essentially has the author create a URI when they create a new topic.
  • Modest-sized number—dbpedia page IDs and page revision IDs look like next available number types.
  • Type+longish number—yago has URIs like yago:Horseman110185793 (class plus up to a billion numbers; not sure if there is a next available number behind this, but it kind of looks like there is).
  • Guids—cyc identifies everything with a long string like Mx4rvkS9GZwpEbGdrcN5Y29ycA.
  • Guids—Microsoft uses 128 bit guids for identifying system components, such as {21EC2020-3AEA-4069-A2DD08002B30309D}. The random version uses 6 bits to indicate random and therefore has a namespace of 1036, thought to be large enough that the probability of generating the same number is negligible.

Being pragmatist, I wanted to figure out if there is an optimal size and way to generate URIs.

Download the White-paper

White Paper: Ontologies and Application

Outlining three and a half ways that applications can have their schemas derived from enterprise ontologies.

Many people (ok, a few people) have asked us: “what is the relationship between an ontology and an application?” We usually say, “That’s an excellent question” (this is partly because it is, and partly because these ‘people’ are invariably our clients). Having avoided answering this for all this time we finally feel motivated to actually answer the question. It seems that there are three (ok three and a half) ways that ontologies are or can be related to applications. They are:

  • Inspiration
  • Transformation
  • Extension

But, I fail to digress… Let’s go back to the ‘tic tac toe’ board. We call the following a ‘tic tac toe’ board, because it looks like one:

Outlining three and a half ways that applications can have their schemas derived from an enterprise ontology… Ontologies and Applications 2
Ontologies and Applications 2

What it is attempting to convey is that there are levels of abstraction and differences in perspective that we should consider when we are
modeling. An application is in the lower middle cell.

Data models are in the middle square. Ontologies could be anywhere. An ontology is a formal way of representing a model. And so we could have an ontology that describes an application, an ontology of a logical model, even ontologies of data or meta meta data.

In our opinion the most interesting ontologies are in the middle top: these are ontologies that represent concepts independent of their implementation. This is where we find upper ontologies as well as enterprise ontologies.

Now some companies have built enterprise wide conceptual models. The IRS has one, with 30,000 attributes. But all the ones we’ve seen are not actually in the top center cell, they are logical models of quite wide scope. Ambitious and interesting, but not really conceptual models and typically far more complex than is useful. What we’ve found (and written about in other articles (ref the Elegance article)) is that a conceptual model can cover the same ground as a logical model with a small percentage of the total number of concepts. Not only are there fewer concepts in total, there are few concepts that need to be accepted and agreed to.

Want to Read More? Download the White-paper.

Written by Dave McComb

White Paper: Six Axes of Decoupling

Loose coupling has been a Holy Grail for systems developers for generations.

The virtues of loose coupling have been widely lauded, yet there has been little description about what is needed to achieve loose coupling. In this paper we describe our observations from projects we’ve been involved with.

Coupling

Two systems or two parts of a single system are considered coupled if a change to one of the systems unnecessarily affects the other system. So for instance, if we upgrade the version of our database and it requires that we upgrade the operating system for every client attached to that database, then we would say those two systems or those two parts of the system are tightly coupled. Coupling is widely understood to be undesirable because of the spread of the side effects. As systems get larger and more complex, anything that causes a change in one part to affect a larger and larger footprint in the entire system is going to be expensive and destabilizing.

Loose Coupling/Decoupling

So, the converse of this is to design systems that are either “loosely coupled” or “decoupled.” Loosely coupled systems do not arise by accident. They are intentionally designed such that change can be introduced around predefined flex points. For instance, one common strategy is to define an application programming interface (API) which external users of a module or class can use. This simple technique allows the interior of the class or module or method to change without necessarily exporting a change in behavior to the users.

Loose coupling has been a Holy Grail for systems developers for generations.

The Role of the Intermediate

In virtually every system that we’ve investigated that has achieved any degree of decoupling, we’ve found an “intermediate form.” It is this intermediate form that allows the two systems or subsystems not to be directly connected to each other. As shown in Figure (1), they are connected through an intermediary. In the example described above with an API, the signature of the interface is the intermediate.

Click here to Download the White-paper

White Paper: Veracity

Encarta defines veracity as “the truth, accuracy or precision of something” and that seems like a pretty good place to start.

Our systems don’t model uncertainty very well, and yet that is exactly what we deal with on a day-to-day basis. This paper examines one aspect of modeling certainty, namely veracity, and begins a dialog on how to represent it.

Veracity

Encarta defines veracity as “the truth, accuracy or precision of something” and that seems like a pretty good place to start. In our case we will primarily be dealing with whether a symbolic representation of something in the real world faithfully represents the item in the real world. Primarily we are dealing with these main artifacts of systems:

  • Measurements – is the measurement recorded in the system an accurate reflection of what it was meant to measure in the real
    world?
  • Events – do the events recorded in the system accurately record what really happened?
  • Relationships – do the relationships as represented in the system accurately reflect the state of affairs in the world?
  • Categorization – are the categories that we have assigned things to useful and defensible?
  • Cause – do our implied notions of causality really bear out in the world? (This also includes predictions and hypotheses.)

Only the first has ever received systematic attention. Fuzzy numbers are a way of representing uncertainty in measurements, as is “interval math” and the uncertainty calculations used in Chemistry (2.034 +/ -.005 for instance).

But in business systems, all of these are recorded as if we are certain of them, and then as events unfold, we eventually may decide not only that we are not certain, but that we are certain of an opposite conclusion. We record an event as if it occurred and until we have
proof that it didn’t, we believe that it did.

Download the White-paper

White Paper: International Conference on Service Oriented Computing

In this write up I’ll try to capture the tone of the conference, what seemed to be important and what some of the more interesting presentations were.

This was the first ever Conference on Service Oriented Computing.  In some ways it was reminiscent of the first Object OrientedService Oriented Computing conference (OOPSLA in 1986): highly biased toward academic and research topics, at the same time shining a light on the issues that are likely to face the industry over the next decade.  In this write up I’ll try to capture the tone of the conference, what seemed to be important and what some of the more interesting presentations were.

Why Trento?

Apparently, a year and a half ago several researchers in Service Oriented Computing began planning an Italian conference on Service Oriented Computing and it kind of spun up to an International conference. Trento was an interesting, but logistically difficult choice.  Trento is in Dolomite region of the Italian Alps, and is difficult even for Europeans to get to.  It is a charming University town, founded in the roman times with a rich history though the middle ages.  The town is a beautiful blend of old and new and very pedestrian friendly, large cobblestone courtyards can be found every few blocks usually adjoining a renaissance building or two.  We took a side trip one hour further up the Alps to Balzano, and saw Otzi, “the ice man.”

This conference had some of the best after hours arrangements of any I’ve attended: one night we got a guided tour of “the castle” followed by a brief speech from the vice mayor and wine and dinner sized hors d’odourves.  The final night was a tour of Ferrari Spumante the leading producer of Italian Champagne, with a five or six course sit down dinner.

Attendees & Presenters

There were about 140 attendees, at least a third of who were also presenters. All but eight were from academia, and we were among 6 who were from North America.  Next Years venue will be in New York City in mid November, which should change the nature and size of the audience considerably.

Keynotes were by Peter Diry, who is in charge of a large European government research fund that is sinking billions into research topics in advanced technology.  There was a great deal of interest in this as I suspect many of the attendees bread was buttered either directly or indirectly from these funds.  Bertrand Meyer was the pre dinner keynote the night of the formal dinner.  Had a very provocative talk on the constructs that are needed to managed distributed concurrency (we’ve managed to avoid most of this in our designs, but you could certainly see how with some designs this could be a main issue.)  Frank Heyman from IBM was the final keynote, which was primarily about how all this fits into Grid computing and open standards.

The 37 major presenters and 10 who had informal talks at a wine and cheese event, were chosen from 140 submissions.  Apparently many of these people are leading lights in the research area of this discipline, although I had never heard of any of them. In addition there were two half day tutorials on the first day. Presentations were in English, although often highly accented English.

General Topics

It was a bit curious that the conference was “Service Oriented Computing” and not “Service Oriented Architecture” as we hear it; it marked some subtle and interesting distinctions.  This was far more about Web services than EAI or Message Oriented Middleware.  They were far more interested in Internet scale problems than enterprise issues.

Some of the main themes that recurred throughout the conference were: service discovery and composition, security, P2P and grid issues and Quality of Service issues.  Everyone has pretty much accepted WSDL and BPEL4WS (which everyone just calls “bee pell”) as the defacto technologies that will be used.  There was some discussion and reference to the Semantic Web technologies (RDF, DAML-S and OWL).  They seemed to be pretty consistent on the difference between Orchestration and Choreography (more later)

There was a lot of talk about dynamic composition, but when you probed a bit, not as much agreement as to how far it was likely to go or when the dynamic-ness was likely to occur.

Things clarified for me

There were several things that weren’t necessarily presented in a single talk, but with the combination and context several things became clearer to me.  Many people may have already tripped to these observations, but for the sake of those who haven’t:

Virtualization

In much the same way that SAN and NAS virtualized storage (that is removed the users  specific knowledge of where the data was being stored)  SOC is meant to virtualize functionality.  This is really the grid angle of Service Oriented Compu.  There were a few people there who noted that unlike application servers or web servers, it will not be as easy to virtualize “stateful” services.

Service Discovery

Most of the discussion about service discovery was design time discovery.  Although there were some who felt that using the UDDI registry in an interactive mode constituted run time discovery.  There were many approaches described to aid the discovery process.

Capabilities

There was pretty widespread agreement that WSDL’s matching of signatures was not enough.  Getting beyond that was called several different things and there were several difference approaches to it.  One of the terms used was “capabilities” in other words how can we structure a spec that describes the capability of the service.  This means finding a way to describe how the state of the caller and the called objects were changed as well as noting side effects (intentional and otherwise.)

Binding

Frank Heyman from IBM made the point that WSDL is really about describing the binding  between “port types” (what the service is constructed to deal with) and specific ports (what it gets attached to).  While the default binding is of course SOAP, he had several examples, and could show that the binding was no more complex for JMS, J2EE, even CICS Comm Region binding.

Orchestration and Choreography

The tutorial clarified and subsequent presentations seemed to agree that Orchestration is what you do in machine time.  It is BPEL.  It is a unit of composition.  It is message routing primarily synchronously.  While the tools that are good for Orchestration could be used for Choreography, that’s not using each tool to its strength.

Choreography involves coordination, usually over time.  So when you have multiple organizations involved, you often have Choreography issues.  Same with having other people in the loop.  Most of what we currently think of as work flow will be subsumed into this choreography category.

Specific Talks of Note

Capabilities: Describing What Web Services Can Do –Phillipa Oaks, eta all Queensland University

This paper gets at the need to model what a service does if we are to have any hope of “discovering” them either at design time or run time.  They had a meta model that expanded the signature based description to include rules such as pre and post conditions as well as effects on items not in the signature.  It also allowed for location, time and manner of delivery constraints.

Service Based Distributed Querying on the Grid, Alpdemir, et al University of Manchester

I didn’t see this presentation, but after reading the paper wish I had.  They outline the issues involved with setting up distributed queries, and outline using the OGSA (Open Grid Service Architecture) and OGSI (Open Grid Services Infrastructure).  They got into how to set up an architecture for managing distributed queries, and then into issues such as setting up and optimizing query plans in a distributed environment.

Single Sign on for Service Based Computing, Kurt Geihs, etal Berlin University of Technology

The presentation was given by Robert Kalchlosch (one of the etal’s).  One of the best values for me was a good overview of Microsoft Passport and the Liberty Alliance, especially in regard to what cooperating services need to do to work with these standards.  This paper took the position that it may be more economical to leave services as they are and wrap them with a service/ broker that is handling the security and especially the single sign on aspect.

Semantic Structure Matching for Assessing Web-Service Similarity, Yiqiao Wang etal Univeristy of Alberta

Issues and problems in using Semantics (rdf ) is service discovery.  They noted that a simple semantic match was not of much use, but that by using Word-Net similarity coupled with structural similarity they were able to get high value matching in discovery.

“Everything Personal, not Just Business” Improving User Experience through Rule-Based Service Customization, Richard Hull, etal Bell Labs

Richard Hull wrote one of the seminal works in Semantic Modeling, so I was hoping to meet him.  Unfortunately he didn’t make it and sent a tape of his presentation instead.  The context was if people had devices that revealed their geographic location, what sort of rules would they like to set up about who they would make this information available to?  One of the things that was of interest to us was their evaluation, and then dismissal of general purpose constraint solving rule engines (like ILOG) for performance reasons.  They had some statistics and some very impressive performance on their rule evaluation.

Conclusion

The first ever Conference on Service Oriented Computing was a good one, it provided a great deal of food for thought and ideas about where this industry is headed in the medium term.

Written by Dave McComb

White Paper: How Service Oriented Architecture is Changing the Balance of Power Between Information Systems Line and Staff

As service oriented architecture (SOA) begins to become widely adopted through organizations, there will be major dislocations in the balance of power and control within IS organizations.

As service oriented architecture (SOA) begins to become widely adopted through organizations, there will be major dislocations in the balance of power and control within IS organizations. In this paper when we refer to information systems (IS) line functions, we mean those functions that are primarily aligned with the line of business systems, especially development and maintenance. When we refer to the IS staff functions, we’re referring to functions that maintain control over the shared aspects of the IS structure, such as database administration, technology implementation, networks, etc.

What is Service Oriented Architecture?

Service oriented architecture is primarily a different way to arrange the major components in an information system.  There are many technologies involved with SOA that are necessary in order to implement an SOA and we will touch on them briefly here; but the important distinction for most enterprises will be that the exemplar implementations of SOA will involve major changes in boundaries between systems and in how systems communicate.

In the past, when companies wished to integrate their applications, they either attempted to put multiple applications on a single database or wrote individual interfacing programs to connect one application to another.  The SOA approach says that all communication between applications will be done through a shared message bus and it will be done in messages that are not application-specific.  This definition is a bit extreme for some people, especially those who are just beginning their foray into SOA, but this is the end result for the companies who wish to enjoy the benefit that this new approach promises.

A message is an XML document or transaction that has been defined at the enterprise level and represents a unit of business functionality that can be exchanged between systems.  For instance, a purchase order could be expressed as an HTML document and sent between the system that originated it, such as a purchasing system, and a system that was interested in it, perhaps an inventory system.

The message bus is implemented in a set of technologies that ensure that the producers and consumers of these messages are not talking directly to each other.  The message bus mediates the communication in much the same way as the bus within a personal computer mediates communication between the various subcomponents.

The net result of these changes is that functionality can be implemented once, put on the message bus, and subsequently used by other applications.  For instance, logic that was once replicated in every application (such as production of outbound correspondence, collection on receivables, workflow routing, management of security and entitlements), as well as functionality that has not existed because of a lack of a place to put it (such as enterprise wide cross-referencing of customers and vendors), can now be implemented only once.  [Note to self: I think that sentence is not correct anymore.]  However, in order to achieve the benefits from this type of arrangement, we are going to have to make some very fundamental changes to the way responsibilities are coordinated in the building and maintaining of systems.

Web Services and SOA

Many people have confused SOA with Web services.  This is understandable as both deal with communications between applications and services over a network using XML messages.  The difference is that Web services is a technology choice; it is a protocol for the API (application programming interface).  A service oriented architecture is not a technology but an overall way of dividing up the responsibilities between applications and having them communicate.  So, while it is possible to implement an SOA using Web services technology, this is not the only option.  Many people have used message oriented middleware, enterprise application integration technologies, and message brokers to achieve the same end.  More importantly, merely implementing Web services in a default mode will not result in a service oriented architecture.  It will result in a number of point-to-point connections between applications merely using the newest technology.

Now let’s look at the organizational dynamics that are involved in building and maintaining applications within an enterprise.

The Current Balance of Power

In most IS organizations, what has evolved over the last decade or so is a balance of power between the line organizations and the staff organizations that looks something like the following.

In the beginning, the line organizations had all the budget, all the power, and all the control.  They pretty much still do.  The reason they have the budget and the power is that it’s the line organization that has been employed to solve specific business problems.  Each business problem brings with it a return on investment analysis which specifies what functionality is needed to solve a particular business problem.  Typically, each business owner or sponsor has not been very interested or motivated in spending any more money than needed to in order to solve anyone else’s problem.

However, somewhere along the line some of the central IS staff noticed that solving similar problems over and over again, arbitrarily differently, was dis-economic to the enterprise as a whole.  Through a long series of cajoling and negotiating, they have managed to wrest some control of some of the infrastructure components of the applications from the line personnel.  Typically, the conversations went something like, “I can’t believe this project went out and bought their own database management system, paid a whole bunch of money when we already have one which would’ve worked just fine!”  And through the process, the staff groups eventually wrested at least some degree of control over such things as choice of operating systems, database management systems, middleware and, in some cases, programming languages.  They also very often had a great deal of influence or at least coordination on data models, data naming standards, and the like.  So what has evolved is a sort of happy peace where the central groups can dictate the technical environment and some of the data considerations, while the application groups are free to do pretty much as they will with the scope of their application, functionality, and interfaces to other applications.

For much of the same reason, the decentralization of these decisions leads to dis-economic behavior, however, it is not quite as obvious because the corporation is not shelling out for another license for another database management system that isn’t necessary.

The New World Order

In the New World, the very things that the line function had most control of, namely the scope, functionality, and interfaces of its applications, will move into the province of the staff organization.  In order to get the economic benefit of the service oriented architecture, the main thing that has to be determined centrally for the enterprise as a whole is: what is the scope of each application and service, and what interfaces is it required to provide to others?

In most organizations, this will not go down easily.  There’s a great deal of inertia and control built up over many years with the current arrangement.  Senior IS management is going to have to realize that this change needs to take place and may well have to intervene at some fairly low levels.  As Clayton Christensen stated in his recent book The Innovator’s Solution, the strategic direction that an enterprise or department takes doesn’t matter nearly as much. What matters is whether they can get agreement from the day-to-day decision makers who are allocating resources and setting short-term goals.  For most organizations, this will require a two-pronged attack.  On one hand, the senior IS management and especially the staff function management will have to partner more closely with the business units that are sponsoring the individual projects.  Part of this partnering and working together will be in order to educate the sponsors on the economic benefits that will accrue to the applications that adhere to the architectural guidelines.  While at first this sounds like a difficult thing to convince them of, the economic benefits in most cases are quite compelling.  Not only are there benefits to be had on the individual or initial project but the real benefit for the business owner is that it can be demonstrated that this approach leads to much greater flexibility, which is ultimately what the business owner wants.  This is really a governance issue, but we need to be careful and not confuse the essence of governance, with the bureaucracy that it often entails.

The second prong of the two-pronged approach is to put a great deal of thought into how project managers and team leads are rewarded for “doing the right thing.”  In most organizations, regardless of what is said, most rewards go to the project managers who deliver the promised functionality on time and on budget.  It is up to IS management to add to these worthwhile goals equivalent goals aimed at contributing to and complying with the newer, flexible architecture, such that a project that goes off and does its own thing will be seen as a renegade and that regardless of hitting its short-term budgets, the project managers will not be given accolades but instead will be asked to try harder next time.  Each culture, of course, has to find its own way in terms of its reward structure but this is the essential issue to be dealt with.

Finally, and by a funny coincidence, the issues that were paramount to the central group, such as choice of operating system, database, programming language, and the like, are now very secondary considerations.  It’s quite conceivable that a given project or service will find that acquiring an appliance running on a completely different operating system and database management system can be far more cost-effective, even when you consider the overhead costs of managing the additional technologies.  This difference comes from two sources.  First, in many cases, the provider of the service will also provide all the administrative support for the service and its infrastructure, effectively negating any additional cost involved in managing the extra infrastructure.  Second, the service oriented architecture implementation technologies shield the rest of the enterprise from being aware of what technology, language, operating system, and DBMS are being used, so the decision does not have the secondary side effects that it does in pre-SOA architectures.

Conclusion

To wrap up, the move to service oriented architecture is not going to be a simple transition or one that can be accomplished by merely acquiring products and implementing a new architecture.  It is going to be accompanied by an inversion in the traditional control relationship between line and staff IS functions.

In the past the business units and application teams they funded determined the scope and functionality of the projects, the central IS groups determined technology and to some extent common data standards.  In the service oriented future these responsibilities will be move in the opposite direction.  The scope and functionality of projects will be an enterprise wide decision, whilst individual application teams will have more flexibility in the technologies they can economically use, and the data designs they can employ.

The primary benefits of the architecture will only accrue to those who commit to a course of action where the boundaries, functionality, and interface points of their system will no longer be delegated to the individual projects implementing them but will be determined at a corporate level ahead of time and will merely delegate the implementation to the line organization.  This migration will be resisted by many of the incumbents and the IS management that wishes to enjoy the benefits will need to prepare themselves for the investment in cultural and organizational change that will be necessary to bring it about.

Skip to content