SHACL and OWL

There is a meme floating around out in the internet ether these days: “Is OWL necessary, or can you do everything you need to with SHACL?” We use SHACL most days and OWL every day and we find it quite useful.

It’s a matter of scope. If you limited your scope to replacing individual applications, you could probably get away with just using SHACL. But frankly, if that is your scope, maybe you shouldn’t be in the RDF ecosystem at all. If you are just making a data graph, and not concerned with how it fits into the broader picture, then Neo4j or TigerGraph should give you everything you need, with much less complexity.

If your scope is unifying the information landscape of an enterprise, or an industry / supply chain, and if your scope includes aligning linked open data (LOD), then our experience says OWL is the way to go. At this point you’re making a true knowledge graph.

By separating meaning (OWL) from structure (SHACL) we find it feasible to share meaning without having to share structure. Payroll and HR can share the definition and identity of employees, while sharing very little of their structural representations.

Employing formal definition of classes takes most of the ambiguity out of systems. We have found that leaning into full formal definitions greatly reduces the complexity of the resulting enterprise ontologies.

We have a methodology we call “think big/ start small.” The “think big” portion is primarily about getting a first version of an enterprise ontology implemented, and the “start small” portion is about standing up a knowledge graph and conforming a few data sets to the enterprise ontology. As such, the “think big” portion is primarily OWL. The “start small” portion consists of small incremental extensions to the core model (also in OWL) conforming the data sets to the ontology (TARQL, R2RML, SMS or similar technologies), SHACL to ensure conformance and SPARQL to prove that it all fits together correctly.

For us, it’s not one tool, or one standard, or one language for all purposes. For us it’s like Mr. Natural says, “get the right tool for the job.”

The Data-Centric Revolution: Avoiding the Hype Cycle

Gartner has put “Knowledge Graphs” at the peak of inflated expectations. If you are a Knowledge Graph software vendor, this might be good news. Companies will be buying knowledge graphs without knowing what they are. I’m reminded of an old cartoon of an executive dictating into a dictation machine: “…and in closing, in the future implementing a relational database will be essential to the competitive survival of all firms. Oh, and Miss Smith, can you find out what a relational database is?” I imagine this playing out now, substituting “knowledge graph” for “relational database” and by-passing the misogynistic secretarial pool.

If you’re in the software side of this ecosystem, put some champagne on ice, dust off your business plan, and get your VCs on speed dial. Happy times are imminent.

Oh no! Gartner has put Knowledge Graphs at the peak of the hype cycle for Artificial Intelligence

Those of you who have been following this column know that our recommendations for data-centric transformations strongly encourage semantic technology and model driven development implemented on a knowledge graph. As we’ve said elsewhere, it is possible to become data-centric without all three legs of this stool, but it’s much harder than it needs to be. We find our fate at least partially tethered to the knowledge graph marketplace. You might think we’d be thrilled by the news that is lifting our software brethren’s boats.

But we know how this movie / roller coaster ends. Once a concept scales this peak, opportunists come out of the woodwork. Consultants will be touting their “Knowledge Graph Solutions” application and vendors will repackage their Content Management System or ETL Pipeline product as a key step on the way Knowledge Graph nirvana. Anyone who can spell “Knowledge Graph” will have one to offer.

Some of you will avoid the siren’s song, but many will not. Projects will be launched with great fanfare. Budgets will be blown. What is predictable is that these projects will fail to deliver on their promises. Sponsors will be disappointed. Naysayers will trot out their “I told you so’s.” Gartner will announce Knowledge Graphs are in the Trough of Disillusionment. Opportunists will jump on the next band wagon.

Click here to continue reading on TDAN.com

If you’re interested in Knowledge Graphs, and would like to avoid the trough of disillusionment, contact me: [email protected]

Smart City Ontologies: Lessons Learned from Enterprise Ontologies

Smart City Ontologies: Lessons Learned from Enterprise OntologiesFor the last 20 years,  Semantic Arts has been helping firms design and build enterprise ontologies to get them on the data-centric path. We have learned many lessons from the enterprise that can be applied in the construction of smart city ontologies.

What is similar between the enterprise and smart cities?

  • They both have thousands of application systems. This leads to thousands of arbitrarily different data models, which leads to silos.
  • The enterprise and smart cities want to do agile, BUT agile is a more rapid way to create more silos.

What is different between the enterprise and smart cities?

  • In the enterprise, everyone is on the same team working towards the same objectives.
  • In smart cities there are thousands of different teams working towards different objectives. For example:

Utility companies.
Sanitation companies.
Private and public industry.

  • Large enterprises have data lakes in data warehouses.
  • In smart cities there are little bits of data here and there.

What have we learned?

“Simplicity is the ultimate sophistication.”

20 years of Enterprise Ontology construction and implementation has taught us some lessons that apply to Smart City ontologies. The Smart City Landscape informs us how to apply those lessons, which are:

Think big and start small.

Simplicity is key to integration.

Low code / No code is key for citizen developers.

Semantic Arts gave this talk at the W3c Workshop on Smart Cities originally recorded on June 25, 2021. Click here to view the talks and interactive sessions recorded for this virtual event. A big thank you to W3C for allowing us to contribute!

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs

There was a time when RDF and triplestores were only seen through the lens of massive data integration. Teams went to great extremes to show how many gazillion triples per second their latest development could ingest, and large integrations did likewise with enormous datasets. This was entirely appropriate and resulted in the outstanding engineering achievements that we now rely on daily. However, for this post I would like to look at the other end of the spectrum – the ‘small is beautiful’ end – and I will relate a story about networking tiny, community-based air quality sensors, a Raspberry Pi, low-code orchestration software, and an in-memory triplestore. This is perhaps one of a few blog posts, we’ll have to see. It starts with looking at using graphs at the edge, rather than the behemoth at the centre where this blog started.

The proposition was to see if the sensor data can be brought together through the Node Red low-code/no-code framework to feed the data into RDFox, an in-memory triplestore and then data summaries periodically pushed to a central, larger triplestore. Now, I’m not saying that this is the only way to do this – I’m sure many readers will have their views on how it can be done, but I wanted to see if a minimalist approach was feasible. I also wanted to see if it was rapid and reproducible. Key to developing a broad network of any IoT is the need to scale. What is also needed though is some sort of ‘semantic gateway’ where meaning can be added to what might otherwise be a very terse MQTT or similar feed.

So, let’s have a look for a sensor network to use. I found a great source of data from the Aberdeen Air Quality network (see https://www.airaberdeen.org/ ) led by Ian Watt as part of the Aberdeen ODI and Code The City activities. They are contributing to the global Sensor.Community (formerly Luftdaten) network of air quality sensors. Ian and his associated community have built a handful of small sensors that detect particulate matter in air, the PM10 and PM2.5 categories of particles. These are pollutants that lead to various lung conditions and are generated in exhaust from vehicles, and other combustion and abrasion actions that are common in city environments. Details of how to construct the sensors is given in https://wiki.57north.org.uk/doku.php/projects/air_quality_monitor and https://sensor.community/en/sensors/ . The sensors are all connected to the Sensor.Community (https://sensor.community/en/) project from which their individual JSON data feeds can be polled by a REST call over HTTP(S). These sensors cost about 50 Euros to build, in stark contrast to the tens of thousands of Euros that would be required to provide the large government air quality sensors that are the traditional and official sources of air quality information (http://www.scottishairquality.scot/latest/?la=aberdeen-city) . And yet, despite the cheapness of the device, many studies including those of Prof Rod Jones and colleagues in Cambridge University (https://www.ch.cam.ac.uk/group/atm/person/rlj1001) have found that a wide network of cheap sensors can provide reliable and useful data for air quality monitoring.

So, now that we’ve mentioned Cambridge University we can go on to mention Oxford, and in particular RDFox, the in-memory triplestore and semantic reasoner from Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). In this initial work, we are not using the Datalog reasoning or rapid materialization that this triplestore affords, but instead we stick with the simple addition of triples, and extraction of hourly digests of the data. In fact, RDFox is capable of far more beyond the scope of today’s exercise and much of the process here could be streamlined and handled more elegantly. However, I chose not to make use of these attributes for the sake of showing the simplicity of Node-RED. You might expect this to require a large server, but you’d be wrong – I managed all of it on a tiny Raspberry Pi 4 running the ARM version of Ubuntu. Thanks to Peter Crocker and Diana Marks of Oxford Semantic Technologies for help with the ARM version.

Next up is the glue, pipelining the raw JSON data from the individual sensors into a semantic gateway in which the data will be transformed into RDF triples using the Semantic Arts ‘gist’ ontology (https://www.semanticarts.com/gist/ and https://github.com/semanticarts/gist/releases). I chose to do this using a low-code/no-code solution called Node RED (https://nodered.org/). This framework uses a GUI with pipeline components drawn onto a canvas and linked together by arrows (how very RDF). As the website says, “Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.“ This is exactly what we need for this experiment in minimalism. Node-RED provides a wealth of functional modules for HTTP and MQTT calls, templating, decisions, debugging, and so forth. It make it easier to pipeline together a suite of processes to acquire data, transform it, and then push it on to somewhere else than traditional coding. And this ‘somewhere else’ was both the local RPi running RDFox and the Node-RED service, and also a remote instance of RDFox, here operating with disk-based persistence.

Following the threading together of a sequence of API calls to the air sensor endpoints with some templating to an RDF model based on ‘gist’ and uploading to RDFox over HTTP (Figs 1 & 2), the RDFox triplestore can then be queried using a SPARQL CONSTRUCT query to extract a summary of the readings for each sensor on an hourly basis. This summary included the minimum and maximum readings within the hour period for both particle categories (PM10 and PM2.5), for each of the sensors, together with the numbers of readings that were available within that hour period. This was then uploaded to the remote RDFox instance, (Figs 3 & 4) and that store becomes the source of the hourly information for dashboards and the like (Fig 5). Clearly, this approach can be scaled widely by simply adding more Raspberry Pi units.

The code is available from https://github.com/semanticarts/airquality

The experiment worked well. There were minor challenges in getting used to the Node RED framework, and I personally had the challenge of swapping between the dot notation of navigating JSON within Javascript and within the Node RED Mustache templating system. In all, it was a couple of Friday-afternoon experiment sessions, time well spent on an enjoyable starter project.

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 1:  The ‘gather sensor data’ flow in Node-RED.

 

Fig 2: Merging the JSON data from the sensors into an RDF template using Mustache-style templating.

 

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 3: A simple flow to query RDFox on an hourly basis to get digests of the previous hour’s sensor data.

 

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 4: Part of the SPARQL ‘CONSTRUCT’ query that is used to create hourly digests of sensor data to send to the remote persisted RDFox instance.

 

Fig 5: Querying the RDFox store with SPARQL to see hourly maximum and minimum readings of air pollutants at each of the sensor sites.

 

Connect with the Author

The Data-Centric Revolution: Fighting Class Proliferation

One of the ideas we promote is elegance in the core data model in a Data-Centric enterprise.  This is harder than it sounds.  Look at most application-centric data models: you would think they would be simpler than the enterprise model, after all, they are a small subset of it.  Yet we often find individual application data models that are far more complex than the enterprise model that covers them.

You might think that the enterprise model is leaving something out, but that’s not what we’re finding when we load data from these systems. We can generally get all the data and all the fidelity in a simpler model.

It behooves us to ask a pretty broad question:

Where and when should I add new classes to my Data-Centric Ontology?

To answer this, we’re going to dive into four topics:

  1. The tradeoff of convenience versus overhead
  2. What is a class, really?
  3. Where is the proliferation coming from?
  4. What options do I have?

Convenience and Overhead

In some ways, a class is a shorthand for something (we’ll get a bit more detailed in the next paragraph). As such, putting a label to it can often be a big convenience. I have a very charming book called, Thing Explainer – Complicated Stuff in Simple Words,[1] by Randall Munroe (the author of xkcd Comics). The premise of Thing Explainer is that even very complex technical topics, such as dishwashers, plate tectonics, the International Space Station, and the Large Hadron Collider, can all be explained using a vocabulary of just ten hundred words. (To give you an idea of the lengths he goes to he uses “ten hundred” instead of one “thousand” to save a word in his vocabulary.)

So instead of coining a new word in his abbreviated vocabulary, “dishwasher” becomes, “box that cleans food holders,” food holders being bowls and plates). I lived in Papua New Guinea part time for a couple of years, and the national language there, Tok Pisin, has only about 2,000 words. They ended up with similar word salads. I remember the grocery store was at “plas bilong san kamup,” or “place belong sun come up,” which is Tok Pisin for “East.”

It is much easier to refer to “dishwashers” and “East” than their longer equivalents. It’s convenient. And it doesn’t cost us much in everyday conversation.

But let’s look at the convenience / overhead tradeoff in an information system that is not data-centric. Every time you add a new class (or a new attribute) to an information system you are committing the enterprise to deal with it potentially for decades to come. The overhead starts with application programming, thatsoftware wasteland new concept has to be referred to by code, and not just a small amount.  I’ve done some calculations in my book, Software Wasteland, that suggests each attribute added to a system adds at least 1,000 lines of source code—code to move the item from the database to some API, code to take it from the API and put it in the DOM or something similar, code to display it on a screen, in a report, maybe even in a drop-down list, code to validate it.  Given that it costs money to write and test code, this is adding to the cost of a system. The real impact is felt downstream, felt in application maintenance, especially felt in the brittle world of systems integration, and it is felt by the users. Every new attribute is a new field on a form to puzzle about. Every new class is often a new form. New forms often require changes to process flow.  And so, the complexity grows.

Finally, there is cognitive load. When we have to deal with dozens or hundreds of concepts, we don’t have too much trouble. When we get to thousands it becomes a real undertaking. Tens of thousands and it’s a career. And yet many individual applications have tens of thousands of concepts. Most large enterprises have millions which is why becoming data-centric is so appealing.

One of the other big overheads in traditional technology is duplication. When you create a new class, let’s say, “hand tools,” you may have to make sure that the wrench is in the Hand Tools class / table and also in the Inventory table. This relying on humans and procedures to remember to put things in more than one place is a huge undocumented burden.

We want to think long and hard before introducing a new class or even a new attribute.

Read more on TDAN.com

Achieving Clarity in your Data Ecosystem

Achieving clarity in your data ecosystem is more difficult than ever these days.

With false news, cyber-attacks, social media, and a consistent blitz of propaganda – how does one sort it all out? Even our data and information practices have suffered from this proliferation (data warehouse, data lake, data fabric, data mesh … this is the short list). Data terminologies emerge like potholes every spring in Minnesota roads (I should know by being a 50 year resident and hitting more than my share).

Disambiguating the different terms (advantages and disadvantages) along with a history on the reasons these have developed and continue to be part of most every organizational data ecosystem was the topic for Dave McComb, founder of Semantic Arts and Dan DeMers, CEO of Cinchy. A conversational style webinar provided great insights and graphics to give needed clarity. Using my prior analogy, it filled some potholes.

Conclusions were that no one solution exists. What did emerge is how things relate to one another, how they are connected with context is vital to business agility, innovation, and ultimately a competitive advantage. Moving away from application centric to more data-centric thinking will inherently get you there faster and with abundantly less technical debt. 50 years of untangling a data mess starts by looking at the challenge with a different, data-centric lens.

DCAF 2021: Third Annual Data-Centric Architecture Forum Re-Cap

Written by Peter Winstanley

The Data-Centric Architecture Forum was a success!

If growth of participants is an indicator of the popularity of an idea, then the Third Annual Data-Centric Architecture Forum is reflecting a strong increase in popularity, for this year over 230 participants joined together for a three-day conversation.  This is a huge increase on the 50 or so who braved the snows of Fort Collins last year.  Perhaps the fact that the meeting was online was a contributing factor, but then spending three days in online meetings with presentations, discussion forums, and discussions with vendors needs a distinct kind of stamina, as it misses out on the usual conviviality of a conference – meals together, deep discussions over beer or coffee, and thoughtful walks and engaging sightseeing trips.  This year’s Data-Centric Architecture Forum was a paradigm shift in itself.  The preparatory work of Matt Faye and colleagues at Authentric provided us with a virtual auditorium, meeting rooms, Q&A sessions, socialization venues, and vendor spaces.  The Semantic Arts team were well-rehearsed with this new environment, but it was reassuring to find that conference attendees soon became acquainted with the layout and, quite quickly, the conference was on a roll with only the very infrequent glitch that was quickly sorted by Matt and team.

Paradigm shift was not only evident in the venue, it was also a central theme of the conference, the idea that we are on the cusp of a broad transformation of practice in informatics, particularly within the enterprise.  Dave McComb placed the Kuhnian idea of revolution squarely on the table as he commenced proceedings.   In many ways this is something we have become all too familiar with as the internet has given us hospitality companies with no hotels, taxi companies with no cars, and so on.  Here we are moving to there being applications with no built-in data store.  How can this possibly work?  It flies in the face of decades, perhaps centuries, of system design.  This Forum focused on architecture – the key elements necessary to implement a data-centric approach.

Some of the presentations covered the whole elephant, trunk to tail, whereas others focused on specific aspects.  I’ll take a meander through the key messages for me, but as ever in these sorts of reviews, there is no easy way to do justice to everyone’s contribution, and my focus may not be your focus.  However, given that the Forum was a ‘digital first’ production, you will be able to access the talks, slide decks and discussions yourself to make up your own mind—and I hope that you do.  A full complement of all recorded presentations can be available for purchase at the same price as admission.  They can be purchased here or inquire further at [email protected]

Understanding that “The future is already here — it’s just not evenly distributed” means that we have to disentangle the world around us and sift out the ideas and the implementations that show this future, and perhaps recognise early places where the future is likely to arise from places where the technology isn’t perhaps the most sophisticated, but the marketing is more advanced (thinking Betamax vs VHS here).  As Mark Musen pointed out in “A Data-Centric Architecture for Ensuring the Quality of Scientific Data”, when given a free reign people make a mess of adding metadata, and this can be remedied by designing minimal subsets that do a ‘good enough’ job.  Once a community realises that a satisficing minimum metadata set can deliver benefit in a domain, this model can be rolled out with similar good effect to other domains.  We know from Herbert Simon’s work that organisations naturally fill this ‘satisficing’ concept of operations, and as Alan Morrison and Mark Ouska discussed in their presentation on lowering the barriers to entry, going with the organisational flow – ensuring that there was an organisational language to express the new ideas – is a key element in successful adoption.  How else are we to bring to market the range of technologies presented by the 13 vendors exhibiting at the Forum?  Their benefits need to be describable in user stories that have resonance in all enterprises, for this isn’t just a revolution for science or for engineering, just like Berners-Lee tweeted about both the Olympic Games and the World Wide Web at the start of the London Olympic Games, “This is for everyone”.

Being for everyone requires the technologies, such as the inclusion of time and truth or the responding to events that are possible in modern triplestores, are able to be populated at scale with soundly-created information assets.  The approaches to “SemOps,”an automation ecosystem to provide scalable support to people managing enterprise data assets in a data-centric modality, was the focus of the presentation by Wallace and Karii.  Being for everyone also means that information needs to be used across domains, and not just within the highly tailored channels that are typical of current application-centric architectures.  Jay Yu from Intuit and Dan Gshwend from Amgen, among others, showed their organisation’s paths to this generalised, cross-domain use of enterprise information, and the social dimension of this liberation of data across the enterprise was considered by Mike Pool and also by Laura Madsen, who both provided their experiences on governance in data-centric worlds.  Security was also covered, albeit later in a video presentation, by Rich Sinnott from Melbourne.

So, where are we at?  With attendance from North and South America, Europe and Oceania, the Forum showed us that there is a global appeal to the ideas of data-centricity.  There is commercial activity by various scales of solution vendors and implementing enterprises.  There is also consideration both within enterprises and by specialist consultants in the human factors associated with implementation and management of data-centric architectures.  However, there are still considerable challenges in cross-domain implementation of data-centricity, and the need to scale simultaneously not only the technical infrastructures and human skills, but also the involvement of individuals at a personal level in the management of their information and their active involvement in the contribution of that information to the global web of data.  The news from the BBC on their work with Solid pods and other personal information stores gave the Forum an inkling of the scale of change that is about to hit us.  Let us hope that the Third Data-Centric Architecture Forum has played a catalytic role in this global transformation, and I hope to have many enjoyable discussions with readers as we evaluate progress on our journey at the next Forum in a year’s time.

Click here to purchase DCAF 2021 Presentation recordings.

Telecom Frameworx Model: Simplified with “gist”

We recently recast large portions of the telecom Frameworx Information Model into an Enterprise Ontology using patterns and reusable parts of the gist upper ontology.  We found that extendingTelecom Frameworx Model: Simplified with “gist” gist with the information content of the Frameworx model yields a simple telecom model that is easy to manage, federate, and extend, as described below.  Realizing accelerating time to market along with simplifying for cognitive consumption being typical barriers for success within the telecom industry, we’re certain this will help overcome a few hurdles to expediting adoption.

The telecommunications industry has made a substantial investment to define the Frameworx Information Model (TMF SID), an Enterprise-wide information model commonly implemented in a relational data base, as described in the GB922 User’s Guide.

Almost half of the GB922 User’s Guide is dedicated to discussing how to translate the Information Model to a Logical Model, and then translate the Logical Model to a Physical Model. With gist and our semantic knowledge graph approach, these transformations were no longer required. The simple semantic model and the data itself are linked together and co-exist in a triple-store data base without requiring transformations.

Click here to read more.

Semantic Arts, co-produced by Phil Blackwood and Dave McComb

The 90s Are Over, Let’s Stop Basing Capital Cost Decisions on Lagging Indicators

Let’s Stop Basing Capital Cost Decisions on Lagging IndicatorsLet’s Stop Basing Capital Cost Decisions on Lagging Indicators. Remember the good old days of emerging digital technology? Accessing information through a dial-up internet connection. Saving data to floppy discs or CDs. Sending emails to have them printed for storage. Mobile connectivity was new, exciting, and… slow compared to what we have today.

In the energy sector, data access limitations influenced the structure of traditional execution workflows for capital projects. It was common – and still is – for project execution models to focus on document-based deliverables over raw data.

The inherent problem with a document-centric approach is that documents take time to produce. Let’s imagine the workflow for a technology evaluation study that:

  • Begins with initial input from multiple departments.
  • Gets reviewed by 2-3 management layers on the project organizational chart.
  • Finally lands on the desk of a senior decision-maker.

This process could easily take two weeks or longer. But what happens during those two weeks? Work doesn’t get paused. The project continues to progress. The information initially collected for the study no longer represents current project conditions. By the time it gets to the decision-maker, the study is based on two-week-old lagging indicators.

A lot can change on a project in that amount of time. Execution workflows built around lagging indicators tend to:

  • Lead to costly and unnecessary errors caused by decisions based on old information.
  • Stymie innovation with rigid and slow processes that limit experimentation.

Click here to read more. 

Originally posted in: Digital Transformation

Click here to Read an Advanced Chapter from the Data-Centric Revolution by Dave McComb

A Data-Centric Approach to Managing Customer Data

by Phil Blackwood, Ph.D.

Without a doubt every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of each customer. However, this customer data is typically scattered across hundreds of applications, its meaning is embedded in code written years ago, and much of its value is locked away in silos. Compounding the problem, stakeholders in different parts of the business are likely to have different views of what the word “customer” means because they support different kinds of interactions with customers.

In this post, we’ll outline how to tackle these issues and unlock the value of customer data. We’ll use semantics to establish simple common terminology, show how a knowledge graph can provide 360 degree views, and explain how to classify data without writing code.

The semantic analysis will have three parts: first we consider the simple use case illustrated in the diagram below, then take a much broader view by looking at Events, and finally we will dive deeper into the meaning of the diagram by using use the concept of Agreements.

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

The diagram shows an event in which a customer purchases a shirt from a shop. Ask stakeholders around your company what types of events customers participate in, and you are likely to get a long list. It might look something like this (the verbs are from the viewpoint of your company):

  • Answer general questions about products and services
  • Create billing account for products and services
  • Create usage account for a product or service
  • Deliver product or service (including right-to-use)
  • Finalize contract for sale of product or service
  • Help a customer use a product or service
  • Identify a visitor to our web site.
  • Determine a recommender of a product or service
  • Find a user of a product or service
  • Migrate a customer from one service to another
  • Migrate a service from one customer to another
  • Prepare a proposal for sale of product or service
  • Receive customer agreement to terms and conditions
  • Receive payment for product or service
  • Rent product or service
  • Sell product or service
  • Send bill for product or service
  • Ship product

We can model these events using classes from the gist ontology, with one new class consisting of the categories of events listed above. When we load data into our knowledge graph, we link each item to its class and we relate the items to each other with object properties. For example, an entry for one event might look like:

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

By using categories instead of creating 18 new classes of events, we keep the model simple and flexible. We can round out the picture by realizing that the Person could instead be an Organization (company, non-profit, or government entity) and the Product could instead be a Service (e.g. window washing).

In a green-field scenario, the model and the data are seamlessly linked in a knowledge graph and we can answer many different questions about our customers. However, in most companies a considerable amount of customer data exists in application-centric silos. To unlock existing customer data, we have to first understand its meaning and then we can link it into the knowledge graph by using the R2RML data mapping language. This data federation allows us to write queries using the simple, standard semantic model and get results that include the existing data.

For any node in the knowledge graph, we have a 360 degree view of the data about the node and its context. A Person node can be enriched with data from social media. An Organization node can be enriched with data about corporate structure, subsidiaries, or partnerships.

Now let’s pivot from the broad event-based perspective to look more closely at the meaning of the original example. Implicit in the idea of a sale is an agreement between the buyer and the seller; once the agreement is made, the seller is obligated to deliver something, while the buyer must pay for it. The “something” is a product or service. We can model the transaction like this:Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

This basic pattern of agreement and obligation covers many use cases. The agreement could be the simple act of placing the shirt on the check-out counter, or it could be a contract. Delivery and payment could coincide in time, or not. Payments or deliveries, or both, could be monthly.

If our Contract Administration group wants a simple way to identify all the customers who have a contract, we can create a Class named ContractCustomer and populate it automatically from the data in our knowledge graph. To do this, we would write an expression similar to a query that defines what we mean by ContractCustomer, declare the Class to be equivalent to the expression, and then run an off-the-shelf, standards-based inference engine to populate the new class. With no code needed … it’s model-driven.

This method of automatically populating classes can be used to support the wide variety of needs of stakeholders in different parts of the company, even though they do not have the same definition of customer. For example, you could provide classes like PayingCustomer and ProductUsers that can be used to simplify the way the data is accessed or to become building blocks in the model to build upon. With this approach, there is no need to try to get everyone to agree on a single definition of customer. It lets everyone stay focused on what will help them run their part of the business.

While many refinements are possible, we’ve outlined the core of a data-centric solution to the knotty problem of managing customer data. The semantic analysis reveals a simple way to capture information about customer interactions and agreements. A knowledge graph supports 360 degree views of the data, and an inference engine allows us to populate classes automatically without writing a single line of code.

I hope you can glean some ideas from this discussion to help your business, and that you get a sense of why semantics, knowledge graphs, and model-driven-everything are three keys to data-centric architecture.