Blog: Semantic Arts posts on all things Data-Centric

Greatest hits from the Data-Centric Manifesto

July 1, 2021January 20, 2017 by Dave McComb

I was just reading through what some folks have written on the Data-Centric Manifesto web site. Thought I’d capture some of the more poignant:

“I believe [Linked] Data Centric approach is the way of the future. I am committing my company to assisting enterprises in their quest to Data-Centric transformation.” -Alex Jouravlev

“I have experienced first-hand in my former company the ravages of application-centric architectures. Development teams have rejected SQL-based solutions that performed 10 to 100 times better with less code and fewer resources, all because of application-centric dogma. Databases provide functional services, not just technical services – otherwise they’re not worth the money.” – Stew Ashton

“I use THE DATA-CENTRIC MANIFESTO as a mantra, a guide-line, a framework, an approach and a method, with which to add value as a consultant to large enterprises.” -Mark Besaans

“A data-centric approach will finally allow IT to really support the way we think and work instead of forcing us to think in capabilities of an application.” -Mark Schenk

“The principles of a data-centric approach would seem obvious, but the proliferation of application-centric implementations continues. Recognizing the difference is critical to positive change, and the benefits organizations want and need.” -Kim L Hoover

Data-centric is a major departure from the current application-centric approach to systems development and management. Migration to the data-centric approach will not happen by itself. It needs champions. If you’re ready to consider the possibility that systems could be more than an order of magnitude cheaper and more flexible, then become a signatory of the Data-Centric Manifesto.

Do Data Lakes Make My Enterprise Look Data-Centric?

July 14, 2021December 28, 2016 by Dave McComb

Dave McComb discusses data lakes, schema, and data-centricity in his latest post on the Data Centric Revolution for The Data Administration Newsletter. Here’s a brief excerpt to pique your interest: The Data-Centric Revolution: Implementing a Data-Centric Architecture

“I think it is safe to say that there will be declared successes in the Data Lake movement. A clever data scientist, given petabytes of data to troll through, will find insights that will be of use to the enterprise. The more enterprising will use machine learning techniques to speed up their exploration and will uncover additional insights.

But in the broader sense, we think the Data Lake movement will not succeed in changing the economics or overall architecture of the enterprise. In a way, the Data Lake is something to do instead of dealing with the very significant problems of legacy ecosystems and dis-economics of change.

Even at the analytics level, where the Data Lake has the most promise, we think it will fall short…

Conceptually, the Data Lake is not far off from the Data Centric Revolution. The data does have a more central position. However, there are three things that a Data Lake needs in order to be Data Centric…”

Click here to read the entire article.

Data-Centric vs. Data-Driven

July 23, 2021September 21, 2016 by Dave McComb

In this column, I am making the case for Data Centric architectures for enterprises. There is a huge economic advantage to converting to the data-centric approach, but curiously few companies are making the transition. One reason may be the confusion of Data Centric with Data Driven, and the belief that you are already on the road to data centric nirvana, when in fact you are nowhere near it.

Data-Centric

Data-centric refers to an architecture where data is the primary and permanent asset, and applications come and go. In the data-centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone.

Many people may think this is what happens now or what should happen. But it very rarely happens this way. Businesses want functionality, and they purchase or build application systems. Each application system has its own data model, and its code is inextricably tied with this data model. It is extremely difficult to change the data model of an implemented application system, as there may be millions of lines of code dependent on the existing model.

Of course, this application is only one of hundreds or thousands of such systems in an enterprise. Each application on its own has hundreds to thousands of tables and tens of thousands of attributes. These applications are very partially and very unstably “interfaced” to one another through some middleware that periodically schleps data from one database to another.

The data centric approach turns all this on its head. There is a data model—a semantic data model (but more on that will be in a subsequent white paper)—and each bit of application functionality reads and writes through the shared model. If there is application functionality that calculates suggested reorder quantities for widgets, it will make its suggestion, and add it to the shared database, using the common core terms. Any other system can access the suggestions and know what they mean. If the reordering functionality goes away tomorrow, the suggestions will still be there.

Click here to read more on TDAN.com

Debugging Enterprise Ontologies

August 3, 2021July 13, 2016 by Michael Uschold

Michael Uschold gave a talk at the International Workshop on Completing and Debugging the Semantic Web held in Crete on May 30, 2016. Here is a preview of the white paper, “Finding and Avoiding Bugs in Enterprise Ontologies” by Michael Uschold:

Finding and Avoiding Bugs in Enterprise Ontologies

Abstract: We report on ten years of experience building enterprise ontologies for commercial clients. We describe key properties that an enterprise ontology should have, and illustrate them with many real world examples. They are: correctness, understandability, usability, and completeness. We give tips and guidelines for how best to use inference and explanations to identify and track down problems. We describe a variety of techniques that catch bugs that an inference engine will not find, at least not on its own. We describe the importance of populating the ontology with data to drive out more bugs. We point out some common ontology design practices in the community that lead to bugs in ontologies and in downstream semantic web applications based on the ontologies. These include proliferation of namespaces, proliferation of properties and inappropriate use of domain and range. We recommend doing things differently to prevent bugs from arising.

Introduction
In a manner analogous to software debugging, ontologies need to be rid of their flaws. The types of flaws to be found in an ontology are slightly different than those found in software, and revolve around the ideas of correctness, understandability, usability and completeness. We report on our experience (spanning more than a decade) in building and debugging enterprise ontologies for large companies in a wide variety of industries including: finance, healthcare, legal research, consumer products, electrical devices, manufacturing and digital assets. For the growing number of companies starting to use ontologies, the norm is to build a single ontology for a point solution in one corner of the business. For large companies, this leads to any number of independently developed ontologies resulting in many of the same heterogeneity problems that ontologies are supposed to solve. It would help if they all used the same upper ontology, but most upper ontologies are unsuitable for enterprise use. They are hard to understand and use because they are large and complex, containing much more than is necessary, or the focus is too academic to be of use in a business setting. So the first step is to start with a small, upper, enterprise ontology such as gist [McComb 2006], which includes core concepts relevant to almost any enterprise. The resulting enterprise ontology itself will consist of a mixture of concepts that are important to any enterprise in a given industry, and those that are important to a particular enterprise. An enterprise ontology plays the role of an upper ontology for all the ontologies in a company (Fig. 1). Major divisions will import and extend it. Ontologies that are specific to particular applications will, in turn, import and extend those. The enterprise ontology evolves to be the semantic foundation for all major software systems and databases that are core to the enterprise.

Click here to download the white paper.

Click here to download the presentation.

Evolve your Non-Temporal Database in Place

August 3, 2021July 1, 2016 by Dave McComb

At Semantic Arts, we recently decided to upgrade our internal system to turn something that was a not temporal (our billing rates) into something that was. Normally, that would be a pretty big change. As it turned out, it was pretty straightforward and could be done, as an in place update. It turned out to be a pretty good mini case study for how using semantics and a graph database can make these kinds of changes far less painful.

So, Dave McComb documented it in a YouTube video.

Click here to view: Upgrade a non Temporal Database in Place

Introduction to FIBO Quick Start

August 4, 2021June 29, 2016 by Dave McComb

We have just launched our “FIBO Quick start” offering. If you are in the financial industry you likely have heard about the Financial Industry Business Ontology, which has been championed by the EDM Council, a consortium of virtually the entire who’s who of the financial industry. We’ve been helping with FIBO almost since its inception, and more recently Michael Uschold has be co-leading the mortgage and loan ontology development effort. Along the way we’ve done several major projects for financial clients, and have reduced what we know to a safe and quick approach to adopting semantics in the financial sector. We have the capacity to take on one more client in the financial space, so if you’re interested, by all means contact us.

FIBO Quick Start: Developing Business Value Rapidly with Semantics

The Financial Industry Business Ontology is nearing completion. As of June 2016, nine major financial institutions have joined the early adopter program. It is reasonable to expect that in the future all Financial Industry participants will have aligned some of their systems with FIBO. Most have focused their initial projects on incorporating the FIBO vocabulary. This is a good first step and can jump start a lot of compliance work.

But the huge winners, in our opinion, will be the few institutions that see the potential and go all-in with this approach. For sixteen years, we have been working with large enterprises who are interested in adopting semantic technology. Initially, our work focused on architecture and design as firms experimented with ways to incorporate these new approaches. More recently, we have been implementing what we call the “data-centric approach” to building semantically-centered systems in an agile fashion.

Click here to read more.

Data-Centric and Model Driven

July 26, 2021June 20, 2016 by Dave McComb

Model Driven Development

Model Driven seems to be enjoying a bit of an upsurge lately. Gartner has recently been hyping (is it fair to accuse the inventors of the hype curve of hyping something? or is it redundant?) what they call “low code/ no code” environments.

Perhaps they are picking up on and reporting a trend, or perhaps they are creating one.

Model Driven Development has been around for a long time. To back fill what this is and where it came from, I’m going to recount my experience with Model Driven, as I think it provides a first person narrative for most of what was happening in the field at the time.

I first encountered what would later be called Model Driven in the early 80’s when CAD (Computer Aided Design—of buildings and manufactured parts) was making software developers jealous. Why didn’t we have workbenches where we could generate systems from designs? Early experiments coalesced into CASE (Computer Aided Software Engineering). I was running a custom ERP development project in the early 80’s (on an ICL Mainframe!) and we ended up building our own CASE platform. The interesting thing about that platform was that we built the designs on recently acquired 8-bit microcomputers, which we then pushed to a compatible framework on the mainframe. We were able to iterate our designs on the PCs, work out the logistical issues, and get a working prototype UI to review with the users before we committed to the build.

The framework built a scaffold of code based on the prototype and indicated where the custom code needed to go. This forever changed my perspective on how systems could and should be built.

What we built was also being built at the same time by commercial vendors (we did this project in Papua, New Guinea and were pretty out of the loop as to what was happening in mainstream circles). When we came up for air, we discovered what we had built was being called “I-CASE” (Integrated Computer Aided Software Engineering), which referred to the integration of design with development (seemed like that was the idea all along). I assume Gartner would call this approach “low code” as there still was application code to be written for the non-boiler-plate functionality.

Next stop on my journey through model driven was another ERP custom build. By the late 80’s a few new trends had emerged. One was CAD was being invaded by parametric modeling. Parametric modeling recognizes that many designs of physical products did not need to be redesigned by a human every time a small change was made to the input factors. A motor mount could be designed in such a way that a change to the weight, position, and torque would drive a new design optimized for those new factors. The design of the trusses for a basketball court could be automatically redesigned if the span, weight, or snow load changed and the design of big box retail outlets could be derived from, among other things: wind shear, maximum rainfall, and seismic potential.

The other trend was AI (remember AI? Oh yeah, of course you remember AI, which you forgot about from the early 90’s until Watson and Google’s renaissance of AI).

Being privy to these two trends, we decided to build a parametric model of applications and have the code generation be driven by AI. Our goal was to be able to design a use case on a post-it note. We didn’t quite achieve our goal. Most of our designs were up to a page long. But this was a big improvement over what was on offer at the time. We managed to generate 97% of the code in this very sophisticated ERP system. While it was not a very big company, I have yet to see more complex requirements in any system I have seen (lot based inventory, multi-modal outbound logistics, a full ISO 9000 compliant laboratory information management system, in-line QA, complex real time product disposition based on physical and chemical characteristics of each lot).

In the mid 90’s we were working on systems for ambulatory health care. We were building semantic models for our domain. Instead of parametric modeling we defined all application behavior in a scripting language called tcl. One day we drew on a white board where all the tcl scripts fit in the architecture (they defined the UI, the constraint logic, the schema, etc.) It occurred to us that with the right architecture, the tcl code, and therefore the behavior of the application, could be reduced to data. The architecture would interpret the data, and create the equivalent of application behavior.

We received what I believe to be the original patents on fully model driven application development (patent number 6,324,682). We eventually built an architecture that would interpret the data models and build user interfaces, constraints, transactions, and even schemas. We built several healthcare applications in this architecture and were rolling out many more when our need for capital and the collapse of the .com bubble ended this company.

I offer this up as a personal history of the “low code / no code” movement. It is not only real, as far as we are concerned, but its value is underrepresented in the hype.

Data-Centric Architecture

More recently we have become attracted to the opportunity that lies in helping companies become data-centric. This data-centric focus has mostly come from our work with semantics and enterprise ontology development.

What we discovered is that when an enterprise embraces the elegant core model that drives their business, all their problems become tractable. Integration becomes a matter or conforming to the core. New system development becomes building to a much, much simpler core model.

Most of these benefits come without embracing model driven. There is amazing economy in reducing the size of your enterprise data model by two orders of magnitude.

Click here to read more on TDAN.com

The Evolution of the Data Centric Revolution Part Two

July 23, 2021June 1, 2016 by Dave McComb

In the previous installment (The Data Centric Revolution: The Evolution of the Data Centric Revolution Part One), we looked at some of the early trends in application development that foreshadowed the data centric revolution, including punched cards, magnetic tape, indexed files, databases, ERP, Data Warehouses and Operational Data Stores.

In this installment, we pick up the narrative with some of the more recent developments that are paving the way for a data centric future.

Master Data Management

Somewhere along the line, someone noticed (perhaps they harkened back to the reel-to-reel days) that there were two kinds of data that are, by now, mixed together in the databased application: transactional data and master data. The master data was data about entities, such as Customers, Vendors, Equipment, Fixed Assets, or Products. This master data was often replicated widely. For instance, every order entry system has to have yet another Customer table because of integrity constraints, if nothing else.

If you could just get all the master data in one place, you’d have made some headway. In practice, it rarely happened. Why? In the first place, it’s pretty hard. Most of the MDM packages are still using older, brittle technology, which makes it difficult to keep up with the many and various end-points to be connected. Secondly, it only partially solved the problem, as each system still had to maintain a copy of the data, if for nothing else, for their data integrity constraints. Finally, it only gave a partial solution to the use cases that justified it. For example, the 360^o view of the customer was a classic justification, but people didn’t want a 360^o view of the master data; they wanted to see the transaction data. Our observation is that most companies that had the intention to implement several MDMs gave up after about 1 ½ years when they found out they weren’t getting the payout they expected.

Canonical Message Model

Service Oriented Architecture (SOA) was created to address the dis-economy in the system integration space. Instead of point-to-point interfacing, you could send transactional updates onto a bus (the Enterprise Service Bus), and allow rules on the bus to distribute the updates to where they are needed.

The plumbing of SOA works great. It’s mostly about managing messages and queues and making sure messages don’t get lost, even if part of the architecture goes down. But most companies stalled out on their SOA implementations because they had not fully addressed their data issues. Most companies took the APIs that each of their applications “published” and then put them on the bus as messages. This essentially required all the other end-points to understand each other. This was point-to-point interfacing over a bus. To be sure, it is an improvement, but not as much as was expected.

Enter the Canonical Message Model. This is a little-known approach that generally works well, where we’ve seen it applied. The basic concept is to create an elegant [1] model of the data that is to be shared. The trick is in the elegance. If you can build a simple model that captures the distinctions that need to be communicated, there are tools that will help you build shared messages that are derived from the simple model. Having a truly shared message is what gets one out of the point-to-point trap. Each application “talks” through messages to the shared model (which is only instantiated “in motion,” so the ODS problem versioning is much easier to solve), which in turn “talks” to the receiving application.

Click here to continue reading on TDAN.com

Whitepaper: Quantum Entanglement, Flipping Out and Inverse Properties

August 4, 2021April 22, 2016 by Michael Uschold

We take a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.

Property Inverses and Perspectives

It is important to understand that logically, both perspectives always exist; they are joined at the hip. If Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa. If from one perspective, a new relationship link is created or an existing one is broken, then that change is immediately reflected when viewed from the other perspective. This is a bit like two quantumly entangled particles. The change in one is instantly reflected in the other, even if they are separated by millions of light years. Inverse properties and entangled particles are more like two sides of the same coin, than two different coins.

a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies. — Figure 2: Two sides of the same coin.

In OWL we call the property that is from the other perspective the inverse property. Given that a property and its inverse are inseparable, technically, you cannot create or use one without [implicitly] creating or using the other. If you create a property hasParent, there is an OWL syntax that lets you refer to and use that property’s inverse. In Manchester syntax you would write: “inverse(hasParent)”. The term ‘inverse’ is a function that takes an object property as an argument and returns the inverse of that property. If you assert that Michael hasParent Joan, then the inverse assertion, Joan inverse(hasParent) Michael, is inferred to hold. If you decide to give the inverse property the name parentOf, then the inverse assertion is that Joan parentOf Michael. This is summarized in Figure 3 and the table below.

Click here to read more and download the White-paper

Written by Michael Uschold

Naming an Inverse Property: Yay or Nay?

August 4, 2021April 15, 2016 by Michael Uschold

Inverse Property

Figure 1: Quantum Entanglement

For a fuller treatment of this topic, see the whitepaper: Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed. From the perspective of the child we might assert the triple: Michael :hasParent Joan. Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa. So asserting any triple results in the implicit assertion of an inverse triple. It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple. For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful. Below are four downsides of using named inverses (roughly in order of growing importance). The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
Slower Inference: Too many named inverses can significantly slow down inference
More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.