Dave McComb

Data-Centric Revolution: Is Knowledge Ontology the Missing Link?

October 20, 2025June 29, 2023 by Dave McComb

“You would think that after knocking around in semantics and knowledge graphs for over two decades I’d have had a pretty good idea about Knowledge Management, but it turns out I didn’t.

I think in the rare event the term came up I internally conflated it with Knowledge Graphs and moved on. The first tap on the shoulder that I can remember was when we were promoting work on a mega project in Saudi Arabia (we didn’t get it, but this isn’t likely why). We were trying to pitch semantics and knowledge graphs as the unifying fiber for the smart city the Neom Line was to become.

In the process, we came across a short list of Certified Knowledge Management platforms they were considering. Consider my chagrin when I’d never heard of any of them. I can no longer find that list, but I’ve found several more since…”

Read the rest: Data Centric Revolution: Is Knowledge Ontology the Missing Link? – TDAN.com

Interested in joining the discussion? Join the gist Forum (Link to register here)

The Data-Centric Revolution: Zero Copy Integration

March 8, 2023 by Dave McComb

I love the term “Zero Copy Integration.” I didn’t come up with it, it was the Data Collaboration Alliance, that came up with that one. The Data Collaboration Alliance is a Canadian based advocacy group promoting localized control of data along with federated access.

What I like about the term is how evocative it is. Everyone knows that all integration consists of copying and transforming data. Whether you do that through an API or through ETL (Extract Transform and Load) or Data Lake style ELT (Extract Load and leave it to someone else to maybe eventually Transform). Either way, we know from decades of experience that integration is at its core copying data from a source to a destination.

This is why “copy-less copying” is so evocative. It forces you to rethink your baseline assumptions.

We like it because it describes what we’ve been doing for years, and never had a name for. In this article, I’m going to drill a bit deeper into the enabling technology (i.e., what do you need to have in place to get Zero Copy Integration to work), then do a case study, and finally wrap up with “do you literally mean zero copy?”

The Data-Centric Revolution: Detour/Shortcut to FAIR

September 10, 2025December 8, 2022 by Dave McComb

The FAIR principles for data sets are gaining traction, especially in the pharmaceutical industry in Europe. FAIR stands for: Findable, Accessible, Interoperable and Reusable. In a world of exponential data growth and ever-increasing silo-ization, the FAIR principles are needed more than ever. In this article, we will first summarize the FAIR principles and describe the typical roadmap to FAIR. After that we will argue that using a Data-Centric approach is the best route to achieving FAIR principles. This material was recently presented at a FAIR workshop for a European pharmaceutical firm. FAIR emerged as a response to the fragmentation of data within and among most large firms. Most of the publicly available FAIR materials tend to focus on awareness and assessment (how FAIR are you?) and leave the way-finding to individual companies. Difficulties in sharing and applying data to problems create a friction on innovation that FAIR is meant to address. Keep reading at The Data-Centric Revolution: Detour / Shortcut to FAIR – TDAN.com

The Data-Centric Revolution: OWL as a Discipline

September 12, 2022 by Dave McComb

Many developers pooh-pooh OWL (the dyslexic acronym for the Web Ontology Language). Many decry it as “too hard,” which seems bizarre, given that most developers I know pride themselves on their cleverness (and, as anyone who takes the time to learn OWL knows, it isn’t very hard at all). It does require you to think slightly differently about the problem domain and your design. And I thin that’s what developers don’t like. If they continue to think in the way they’ve always thought, and try to express themselves in OWL, yes, they will and do get frustrated.

That frustration might manifest itself in a breakthrough, but more often, it manifests itself in a retreat. A retreat perhaps to SHACL, but more often the retreat is more complete than that, to not doing data modeling at all. By the way, this isn’t a “OWL versus SHACL” discussion, we use SHACL almost every day. This is an “OWL plus SHACL” conversation.

The point I want to make in this article is that it might be more productive to think of OWL not as a programming language, not even as a modeling language, but as a discipline. A discipline akin to normalization.

Keep reading: The Data-Centric Revolution: OWL as a Discipline – TDAN.com

Read more of Dave’s articles: mccomb – TDAN.com

Incremental Stealth Legacy Modernization

October 21, 2025March 8, 2022 by Dave McComb

I’m reading the book Kill it with Fire by Marianne Bellotti. It is a delightful book. Plenty of pragmatic advice, both on the architectural side (how to think through whether and when to break up that monolith) and the organizational side (how to get and maintain momentum for what are often long, drawn-out projects). So far in my reading she seems to advocate incremental improvement over rip and replace, which is sensible, given the terrible track record with rip and replace. Recommended reading for anyone who deals with legacy systems (which is to say anyone who deals with enterprise systems, because a majority are or will be legacy systems).

But there is a better way to modernize legacy systems. Let me spoil the suspense: it is Data-Centric. We are calling it Incremental Stealth Legacy Modernization because no one is going to get the green light to take this on directly. This article is for those playing the long game.

Legacy Systems

Legacy is the covering concept for a wide range of activities involving aging enterprise systems. I had the misfortune of working in Enterprise IT just as the term “Legacy” became pejorative. It was the early 1990’s, we were just completing a long-term strategic plan for John’s Manville. We decided to call it the “Legacy Plan” as we thought those involved with it would leave a legacy to those who came after. The ink had barely dried when “legacy” acquired a negative connotation. (While writing this I just looked it up, and Wikipedia thinks the term had already acquired its negative connotation in the 1980’s. Seems to me if it were in widespread use someone would have mentioned it before we published that report).

There are multiple definitions of what makes something a legacy system. Generally, it refers to older technology that is still in place and operating. What tends to keep legacy systems in place are networks of complex dependencies. A simple stand-alone program does not become a legacy system, because when the time comes, it can easily be rewritten and replaced. Legacy systems have hundreds or thousands of external dependencies, that often are not documented. Removing, replacing, or even updating legacy systems runs the risk of violating some of those dependencies. It is the fear of this disruption that keeps most legacy systems in place. And the longer it stays in place the more dependencies it accretes.

If these were the only forces affecting legacy systems, they would stay in place forever. The countervailing forces are obsolescence, dis-economy, and risk. While many parts of the enterprise depend on the legacy systems, the legacy system itself has dependencies. The system is dependent on operating systems, programming languages, middleware, and computer hardware. Any of these dependencies can and do become obsolescent and eventually obsolete. Obsolete components are no longer supported and therefore represent a high degree of risk of total failure of the system. The two main dimensions of dis-economy are operations and change. A modern system can typically run at a small fraction of the operating costs of a legacy system, especially when you tally up all the licenses for application systems, operating systems and middleware and add in salary costs for operators and administrators to support. The dis-economy of change is well known coming in the form of integration debt. Legacy systems are complex and brittle which makes change hard. The cost to make even the smallest changes to a legacy system are orders of magnitude more than the cost to make a similar change to a modern well-designed system. They are often written in obscure languages. One of my first legacy modernization projects involved replacing a payroll system written in assembler language with one that was to be written in “ADPAC.” You can be forgiven for thinking it is insane to have written a payroll system in assembler language, and even more so for replacing with a system written in a language that no one in the 21st century has heard of, but this was a long time ago, and is indicative of where legacy systems come from.

Legacy Modernization

Eventually the pressure to change overwhelms the inertia to leave things as they are. This usually does not end well for several reasons. Legacy modernization is usually long delayed. There is not a compelling need to change, and as a result for most of the life of a legacy systems resources have been assigned to other projects that get short term net positive returns. Upgrading the legacy system represents low upside. The new legacy system will do the same thing the old legacy system did, perhaps a bit cheaper or a bit better, but not fundamentally differently. Your old payroll system is paying everyone, and so will a new one.

As a result, the legacy modernization project is delayed as long as possible. When the inevitable precipitating event occurs, the replacement becomes urgent. People are frustrated with the old system. Replacing the legacy system with some more modern system seems like a desirable thing to do. Usually this involves replacing an application system with a package, as this is the easiest project to get approved. These projects were called “Rip and Replace” until the success rate of this approach plummeted. It is remarkable how expensive these projects are and how frequently they fail. Each failure further entrenches the legacy system and raises the stakes for the next project.

Ms. Bellotti points out in Kill it with Fire, many times the way to go is incremental improvement. By skillfully understanding the dependencies, and engineering decoupling techniques, such as APIs and intermediary data sets, it is possible to stave off some of the highest risk aspects of the legacy system. This is preferably to massive modernization projects that fail but, interestingly, has its own downsides: major portions of the legacy system continue to persist, and as she points out, few developers want to sign on to this type of work.

We want to outline a third way.

The Lost Opportunity

After a presentation on Data-Centricity someone in the audience pointed out that data-warehousing represented a form of Data-Centricity. Yes, in a way it does. With Data Warehousing and more recently Data Lakes and Data Lake houses, you have taken a subset of the data from numerous data silos and put it in one place for easier reporting. Yes, this captures a few of the data-centric tenets.

But what a lost opportunity. Think about it, we have spent the last 30 years setting up ETL pipelines and gone through several generations of data warehouses (from Kimball / Inmon roll your own to Teradata, Netezza to Snowflake and dozens more along the way) but have not gotten one inch closer to replacing any legacy systems. Indeed, the data warehouse is entrenching the legacy systems deeper by being dependent on them for their source of data. The industry has easily spent hundreds of billions of dollars, maybe even trillions of dollars over the last several decades, on warehouses and their ecosystems, but rather than getting us closer to legacy modernization we have gotten further from it.

Why no one will take you seriously

If you propose replacing a legacy system with a Knowledge Graph you will get laughed out of the room. Believe me, I’ve tried. They will point out that the legacy systems are vastly complex (which they are), have unknowable numbers of dependent systems (they do), the enterprise depends on their continued operation for its very existence (it does) and there are few if any reference sites of firms that have done this (also true). Yet, this is exactly what needs to be done, and at this point, it is the only real viable approach to legacy modernization.

So, if no one will take you seriously, and therefore no one will fund you for this route to legacy modernization, what are you to do? Go into stealth mode.

Think about it: if you did manage to get funded for a $100 million legacy replacement project, and it failed, what do you have? The company is out $100 million, and your reputation sinks with the $100 million. If instead you get approval for a $1 Million Knowledge Graph based project that delivers $2 million in value, they will encourage you to keep going. Nobody cares what the end game is, but you.

The answer then, is incremental stealth.

Tacking

At its core, it is much like sailing into the wind. You cannot sail directly into the wind. You must tack, and sail as close into the wind as you can, even though you are not headed directly towards your target. At some point, you will have gone far to the left of the direct line to your target, and you need to tack to starboard (boat speak for “right”). After a long starboard tack, it is time to tack to port.

In our analogy, taking on legacy modernization directly is sailing directly into the wind. It does not work. Incremental stealth is tacking. Keep in mind though, just incremental improvement without a strategy is like sailing with the wind (downwind): it’s fun and easy, but it takes you further from your goal, not closer.

The rest of this article are what we think the important tacking strategy should be for a firm that wants to take the Data-Centric route to legacy modernization. We have several clients that are on the second and third tack in this series.

I’m going to use a hypothetical HR / Payroll legacy domain for my examples here, but they apply to any domain.

Leg 1 – ETL to a Graph

The first tack is the simplest. Just extract some data from legacy systems and load it into a Graph Database. You will not get a lot of resistance to this, as it looks familiar. It looks like yet another data warehouse project. The only trick is getting sponsors to go this route instead of the tried-and-true data warehouse route. The key enablers here are to find problems well suited to graph structures, such as those that rely on graph analytics or shortest path problems. Find data that is hard to integrate in a data warehouse, a classic example is integrating structured data with unstructured data, which is nearly impossible in traditional warehouses, and merely tricky in graph environments.

The only difficulty is deciding how long to stay on this tack. As long as each project is adding benefit, it is tempting to stay on this tack for a long, long time. We recommend staying this course at least until you have a large subset of the data in at least one domain in the graph while refreshing frequently.

Let’s say after being on this tack for a long while you have all the key data on all your employees in the graph and being updated frequently

Leg 2 – Architecture MVP

On the first leg of the journey there are no updates being made directly to the graph. Just as in a data warehouse: no one makes updates in place in the data warehouse. It is not designed to handle that, and it would mess with everyone’s audit trails.

But a graph database does not have the limitations of a warehouse. It is possible to have ACID transactions directly in the graph. But you need a bit of architecture to do so. The challenge here is crating just enough architecture to get through your next tack. It depends a lot on what you think your next tack will be as to where you start. You’ll need constraint management to make sure your early projects are not loading invalid data back into your graph. Depending on the next tack you may need to implement fine grained security.

Whatever you choose, you will need to build or buy enough architecture to get your first update in place functionality going.

Leg 3 — Simple new Functionality in the Graph

In this leg we begin building update in place business use cases. We recommend not trying to replace anything yet. Concentrate on net new functionality. Some of the current best places to start are maintaining reference data (common shared data such as country codes, currencies, and taxonomies) and/ or some meta data management. Everyone seems to be doing data cataloging projects these days, they could just as well be done in the graph and give you experience and working through learning this new paradigm.

The objective here is to spend enough time on this tack that developers become comfortable with the new development paradigm. Coding directly to graph involves new libraries and new patterns.

Optionally, you may want to stay on this tack long enough to build “model driven development” (low code / no code in Gartner speak) capability into the architecture. The objective of this effort is to drastically reduce the cost of implementing new functionality in future tacks. Comparing before and after metrics on reduced code development, code testing, and code defects to make the case for the innovative approach will be alarming. Or you could leave model driven to a future tack.

Using the payroll / HR example, it will add new functionality dependent on HR data, but other things are not dependent on it. Maybe you built a skills database, or a learning management system. It depends on what is not yet in place that can be purely additive. These are the good places to start demonstrating business value.

Leg 4 – Understand the Legacy System and its Environment

Eventually you will get good at this and want to replace some legacy functionality. Before you do it will behoove you to do a bunch of deep research. Many legacy modernization attempts have run aground from not knowing what they did not know.

There are three things that you don’t fully know at this point:

• What data is the legacy system managing
• What business logic is the legacy system delivering
• What systems are dependent on the legacy system, and what is the nature of those dependencies.
If you have done the first three tacks well, you will have all the important data from the domain in the graph. But you will not have all the data. In fact, at the meta data level, it will appear that you have the tiniest fraction of the data. In your Knowledge Graph you may have populated a few hundred classes and used a few hundred properties, but your legacy system has tens of thousands of columns. By appearances you are missing a lot. What we have discovered anecdotally but have not proven yet, is that legacy systems are full of redundancy and emptiness. You will find that you do have most of the data you need, but before you proceed you need to prove this.

We recommend data profiling using software from a company such as GlobalIDs, IoTahoe or BigID. This software reads all the data in the legacy system and profiles it. It discovers patterns and creates histograms, which reveal where the redundancy is. More importantly, you can find data that is not in the graph and have a conversation about whether it is needed. A lot of data in legacy systems are accumulators (YTD, MTD etc.) that can easily be replaced by aggregation functions, processing flags that are no longer needed, and vast number of fields that are no longer used but both business and IT are afraid to let go. This will provide that certainty.

Another source of fear is “business logic” hidden in the legacy system. People fear that we do not know all of what the legacy system is doing and turning it off will break something. There are millions of lines of code in that legacy system, surely it is doing something useful. Actually, it is not. There is remarkably little essential business logic in most legacy systems. I know as I’ve built complex ERP systems and implemented many packages. Most of this code is just moving data from the database to an API, or to a transaction to another API, or into a conversational control record, or to the DOM if it is a more modern legacy system, onto the screen and back again. There is a bit of validation sprinkled throughout which some people call “business logic” but that is a stretch, it’s just validation. There is some mapping (when the user selects “Male” in the drop down put “1” in the gender field). And occasionally there is a bit of bona fide business logic. Calculating economic order quantities, critical paths or gross to net payroll calculations are genuine business logic. But they represent far less than 1% of the code base. The value is to be sure you have found them and insert into the graph.

This is where reverse engineering or legacy understanding software plays a vital role. Ms. Bellotti is 100% correct on this point as well. If you think these reverse engineer systems are going to automate your legacy conversion, you are in for a world of hurt. But what they can do is help you find the genuine business logic and provide some comfort to the sponsors that there isn’t something important that the legacy system is doing that no one knows about.

The final bit of understanding is the dependencies. This is the hardest one to get complete. The profiling software can help. Some can detect when the histogram of social security numbers in system A changes and the next day the same change is seen in system B, therefore it must be an interface. But beyond this the best you can do is catalog all the known data feeds and APIs. These are the major mechanisms that other systems use to become dependent on the legacy system. You will need to have strategies to mimic these dependencies to begin the migration.

This tack is purely research, and therefore does not develop any perceived immediate gain. You may need to bundle it with some other project that is providing incremental value to get it funded or you may fund it via contingency budget.

Leg 5 – Become the System of Record for some subset

Up to this point, data has been flowing into the graph from the legacy system or originating directly in the graph.

Now it is time to begin the reverse flow. We need to find an area where we can begin the flow going in the other direction. We now have enough architecture to build and answer use cases in the graph, it is time to start publishing rather than subscribing.

It is tempting to want to feed all the data back to the legacy system, but the legacy system has lots of data we do not want to source. Furthermore, this entrenches deeper into the legacy system. We need to pick off small areas that could decommission part of the legacy system.

Let’s say there was a certificate management system in the legacy system. We replace this with a better one in the graph and quit using the legacy one. But from our investigation above, we realize that the legacy certificate management system was feeding some points to the compensation management system. We just make sure the new system can feed the compensation system those points.

Leg 6 – Replace the dependencies incrementally

Now the flywheel is starting to turn. Encouraged by the early success of the reverse flow, the next step is to work out the data dependencies in the legacy system and work out a sequence to replace them.

The legacy payroll system is dependent on the benefit elections system. You now have two choices. You could replace the benefits system in the Graph. Now you will need to feed the results of the benefit elections (how much to deduct for the health care options etc.) to the legacy system. This might be the easier of the two options.

But the one that has the most impact is the other. Replace the payroll system. You have the benefits data feeding into the legacy system. If you replace the payroll system, there is nothing else (in HR) you need to feed. A feed the financial system and the government reporting system will be necessary, but you will have taken a much bigger leap in the legacy modernization effort.

Leg 7 – Lather, Rinse, Repeat

Once you have worked through a few of those, you can safely decommission the legacy system a bit at a time. Each time, pick off an area that can be isolated. Replace the functionality and feed the remaining bits of the legacy infrastructure if necessary. Just stop using that portion of the legacy system. The system will gradually atrophy. No need for any big bang replacement. The risk is incremental and can be rolled back and retried at any point.

Conclusion

We do not go into our clients claiming to be doing legacy modernization, but it is our intent to put them in a position where they could realize over time by applying knowledge graph capabilities.

We all know that at some point all legacy systems will have to be retired. At the moment the state of the art seems to be either “rip and replace” usually putting a packaged application in to replace the incumbent legacy system, or incrementally improve the legacy system in place.

We think there is a risk adverse, predictable, and self-funding route to legacy modernization, and it is done through Data-Centric implementation.

SHACL and OWL

October 21, 2025November 3, 2021 by Dave McComb

There is a meme floating around out in the internet ether these days: “Is OWL necessary, or can you do everything you need to with SHACL?” We use SHACL most days and OWL every day and we find it quite useful.

It’s a matter of scope. If you limited your scope to replacing individual applications, you could probably get away with just using SHACL. But frankly, if that is your scope, maybe you shouldn’t be in the RDF ecosystem at all. If you are just making a data graph, and not concerned with how it fits into the broader picture, then Neo4j or TigerGraph should give you everything you need, with much less complexity.

If your scope is unifying the information landscape of an enterprise, or an industry / supply chain, and if your scope includes aligning linked open data (LOD), then our experience says OWL is the way to go. At this point you’re making a true knowledge graph.

By separating meaning (OWL) from structure (SHACL) we find it feasible to share meaning without having to share structure. Payroll and HR can share the definition and identity of employees, while sharing very little of their structural representations.

Employing formal definition of classes takes most of the ambiguity out of systems. We have found that leaning into full formal definitions greatly reduces the complexity of the resulting enterprise ontologies.

We have a methodology we call “think big/ start small.” The “think big” portion is primarily about getting a first version of an enterprise ontology implemented, and the “start small” portion is about standing up a knowledge graph and conforming a few data sets to the enterprise ontology. As such, the “think big” portion is primarily OWL. The “start small” portion consists of small incremental extensions to the core model (also in OWL) conforming the data sets to the ontology (TARQL, R2RML, SMS or similar technologies), SHACL to ensure conformance and SPARQL to prove that it all fits together correctly.

For us, it’s not one tool, or one standard, or one language for all purposes. For us it’s like Mr. Natural says, “get the right tool for the job.”

What is a Market?

October 21, 2025September 9, 2021 by Dave McComb

What is a Market?

The term “market” is a very common term in the business industry. We talk about the automotive market, the produce market, the disk drive market, etc. And yet, what do we really mean when we use that term? It is an instructive question because very often CRM (Customer Relationship Management) systems or other sales analytic systems group customers and sales based on their “market”. However, if we don’t have a clear understanding of what a market is, we misrepresent and misgroup and therefore mislead ourselves as we study the implication of our advertising and promotional efforts.

The Historical Market.

Historically, markets were physical places: marketplaces. People went to the market in order to conduct trade. In many communities there was a single market where all goods were bought and sold. However, as communities became larger and developed into cities, marketplaces began to specialize and there was a particular place to go to buy and sell fresh produce. There was another place to go to buy and sell spices; and yet another to buy and sell furniture.

As time progressed and the economy became more and more differentiated and cities grew larger and larger, marketplaces became even more specialized and more geographically localized. So, for instance, we have the diamond marketplace in Antwerp, Belgium, and the screenplay marketplace in Hollywood.

Why Did Physical Marketplaces Emerge and Dominate?

The trend toward physical marketplaces was not necessarily inevitable. Buyers could seek out sellers at their own place of business and conduct business that way. Conversely, sellers could seek out buyers in their own place of business. What led to the popularity of the marketplace were two factors. One was that the physical movement to the marketplace was collectively more efficient for most of the participants. The second reason was that the marketplace allowed easy selection and comparison between similar offerings. Additionally, the cost of information about potential sources or demands for a given product or service was not nearly as economical as it is today with computers and the Internet.

Why Did Marketplaces Specialize?

Marketplaces specialized because the sheer volume of selection and choice eventually became overwhelming. Imagine in the current day if we went to a marketplace where every possible product and service was available for sell from every possible supplier. The marketplace would be so vast that none could traverse it. It certainly makes one wonder how COMDEX was as successful as it was. A general-purpose electronics show offers far more variety than most participants would deal with. So, as the marketplaces became larger, they began to self-organize into specialization of offerings.

But Why Did They Choose the Specializations That They Did?

Why, for instance, did a marketplace in antique furniture develop and not a marketplace of chairs? Each is a specialization. The chair marketplace could conceivably offer a wide variety of chairs, an infinite variety of chairs, including antique chairs. It would allow buyers to select the widest possible range of chairs. But that is not what occurred. What occurred was an antique furniture marketplace where buyers could buy antique chairs, antique tables, antique hutches, etc. And in parallel, a marketplace for contemporary furniture has emerged. We see this in virtually every market that we look at. There is not really a car market or marketplace. There is a luxury car market, an SUV market, a luxury SUV market, etc.

What Makes a Market?

In modern times, we only occasionally rely on a physical marketplace to define a market. Buyers and sellers get together and conduct business outside of a physical or even virtual marketplace.

However, we still have the useful concept of a “market”. As with many other categories, we group together or lump historical activity into categories with the intent that if the categories are well-chosen and most members of the category are representative members, then predictions made about the category will be valid for many of the members.

So what forms the basis of the categories? Categories can be based on attributes of the offering. So, for instance, we could make up a category of blue cars. Using blue as a possible attribute of cars and we could group buyers and sellers into the blue or not blue market place. But unless “blueness” was a selection metric, this is not how the market will form.

No, markets form when a group of customers begin selecting products or services based on one or two characteristics of an offering. For hundreds of years there have been potato markets. Generally they competed on the basis of price and perhaps freshness. But when McDonalds began buying potatoes based on length and straightness (they wanted long french fries to stand up in those little cardboard envelops) a market emerged to satisfy this demand.

In “The Innovators Dilemma” and “The Innovators Solution” Clayton Christensen outlines case after case of how markets formed around one set of metrics, but the incumbent vendors oversupplied the metric they were competing on, and left room for a new entrant to compete on a different metric. Eventually a new market emerged to serve a segment of the prior market who were more interested in the new metric of competition.

So for instance, the computer disk drive market at one time primarily competed on capacity and price per MB of capacity. For years each season brought bigger and bigger drives at lower and lower cost per MB. At one of the firms the engineers “invented” a smaller (8”) disk drive, but none of the existing customers were interested in a drive with less capacity and greater cost per MB for what it had.

The engineers defected and started a new company which had rough going for a while until the mini computer market emerged, which had slightly different selection criteria. The capacity of the mainframe drives already exceeded what they needed. They were

more interested in total cost of the unit, and the footprint. A second market emerged to supply these metrics to these new customers. Christiansen goes on to describe how in some cases (the disk drive being one of them) that the continued improved in the new market eventually crowded into and displaced the established players from the existing market.

For the purpose of this paper, though, we are more interested in how new markets come to be. And this is it. The way we currently define markets based on the SIC codes and the like is deeply flawed and does not produce categories with comparable attributes. Until we begin to discover what really creates and defines the boundaries around markets our categorizations are not going to be very useful.

Christiansen goes on to say in the Innovator’s Solution that failing to distinguish which markets you make offers to are in the, what he calls, “sustaining” category and which are in the disruptive category, will lead to marketing and financial ruin for the unwary.

Conclusion

To wrap it up, we believe that a market is a categorization based on a tacit agreement between a group of buyers and sellers as to what are the important selection criteria or selection metrics for the offering. Markets so defined do not have static long-term boundaries; they are constantly in flux. Marketing executives and professionals involved with analytics and business information would do well to align their market segments with this definition, as we believe that the group of customers or suppliers who share common selection criteria will also share much more common behavior and reception to specific promotional activities.

The Data-Centric Revolution: Avoiding the Hype Cycle

October 21, 2025September 1, 2021 by Dave McComb

Gartner has put “Knowledge Graphs” at the peak of inflated expectations. If you are a Knowledge Graph software vendor, this might be good news. Companies will be buying knowledge graphs without knowing what they are. I’m reminded of an old cartoon of an executive dictating into a dictation machine: “…and in closing, in the future implementing a relational database will be essential to the competitive survival of all firms. Oh, and Miss Smith, can you find out what a relational database is?” I imagine this playing out now, substituting “knowledge graph” for “relational database” and by-passing the misogynistic secretarial pool.

If you’re in the software side of this ecosystem, put some champagne on ice, dust off your business plan, and get your VCs on speed dial. Happy times are imminent.

Oh no! Gartner has put Knowledge Graphs at the peak of the hype cycle for Artificial Intelligence

Those of you who have been following this column know that our recommendations for data-centric transformations strongly encourage semantic technology and model driven development implemented on a knowledge graph. As we’ve said elsewhere, it is possible to become data-centric without all three legs of this stool, but it’s much harder than it needs to be. We find our fate at least partially tethered to the knowledge graph marketplace. You might think we’d be thrilled by the news that is lifting our software brethren’s boats.

But we know how this movie / roller coaster ends. Once a concept scales this peak, opportunists come out of the woodwork. Consultants will be touting their “Knowledge Graph Solutions” application and vendors will repackage their Content Management System or ETL Pipeline product as a key step on the way Knowledge Graph nirvana. Anyone who can spell “Knowledge Graph” will have one to offer.

Some of you will avoid the siren’s song, but many will not. Projects will be launched with great fanfare. Budgets will be blown. What is predictable is that these projects will fail to deliver on their promises. Sponsors will be disappointed. Naysayers will trot out their “I told you so’s.” Gartner will announce Knowledge Graphs are in the Trough of Disillusionment. Opportunists will jump on the next band wagon.

Click here to continue reading on TDAN.com

If you’re interested in Knowledge Graphs, and would like to avoid the trough of disillusionment, contact me: [email protected]

The Enterprise Ontology

October 21, 2025August 17, 2021 by Dave McComb

The Enterprise Ontology

At the time of this writing almost no enterprises in North America have a formal enterprise ontology. Yet we believe that within a few years this will become one of the foundational pieces to most information system work within major enterprises. In this paper, we will explain just what an enterprise ontology is, and more importantly, what you can expect to use it for and what you should be looking for, to distinguish a good ontology from a merely adequate one.

What is an ontology?

An ontology is a “specification of a conceptualization.” This definition is a mouthful but bear with me, it’s actually pretty useful. In general terms, an ontology is an organization of a body of knowledge or, at least, an organization of a set of terms related to a body of knowledge. However, unlike a glossary or dictionary, which takes terms and provides definitions for them, an ontology works in the other direction. An ontology starts with a concept. We first have to find a concept that is important to the enterprise; and having found the concept, we need to express it in as precise a manner as possible and in a manner that can be interpreted and used by other computer systems. One of the differences between a dictionary or a glossary and ontology is, as we know, dictionary definitions are not really processable by computer systems. But the other difference is that by starting with the concept and specifying it as rigorously as possible, we get definitive meaning that is largely independent of language or terminology. Then the definition states that an ontology is a “specification of a conceptualization.” That is what we just described. In addition, of course, we then attach terms to these concepts, because in order for us humans to use the ontology we need to associate the terms that we commonly use.

Why is this useful to an enterprise?

Enterprises process great amounts of information. Some of this information is structured in databases, some of it is unstructured in documents or semi structured in content management systems. However, almost all of it is “local knowledge” in that its meaning is agreed within a relatively small, local context. Usually, that context is an individual application, which may have been purchased or may have been built in-house.

One of the most time- and money-consuming activities that enterprise information professionals perform is to integrate information from disparate applications. The reason this typically costs a lot of money and takes a lot of time is not because the information is on different platforms or in different formats – these are very easy to accommodate. The expense is because of subtle, semantic differences between the applications. In some cases, the differences are simple: the same thing is given different names in different systems. However, in many cases, the differences are much more subtle. The customer in one system may have an 80 or 90% overlap with the definition of a customer in another system, but it’s the 10 or 20% where the definition is not the same that causes most of the confusion; and there are many, many terms that are far harder to reconcile than “customer.”

So the intent of the enterprise ontology is to provide a “lingua franca” to allow, initially, all the systems within an enterprise to talk to each other and, eventually, for the enterprise to talk to its trading partners and the rest of the world.

Isn’t this just a corporate data dictionary or consortia of data standards?

The enterprise ontology does have many similarities in scope to both a corporate data dictionary and consortia data standard. The similarity is primarily in the scope of the effort: both of those initiatives, as well as

enterprise ontologies, aim to define the shared terms that an enterprise uses. The difference is in the approach and the tools. With both a corporate data dictionary and a consortia data standard the interpretation and use of the definitions is strictly by humans, primarily system designers. Within an enterprise ontology, the expression of the ontology is such that tools are able to interpret and make inferences on the information when the system is running.

How to build an enterprise ontology

The task of building an enterprise ontology is relatively straightforward. You would be greatly aided by purchasing a good ontology editor, although reasonable ontology editors are available for free. The analytical work is similar to building a conceptual enterprise data model and involves many of the same skills: the ability to form good abstractions, to elicit information from users through interviews, as well as to find informational clues through existing documentation and data. One of the interesting differences is that as the ontology is being built it can be used in connection with data profiling to see whether the information that is currently being stored in information systems does in fact comply with the rules that the ontology would suggest.

What to look for in an enterprise ontology

What distinguishes a good or great enterprise ontology from a merely adequate one are several characteristics that will mostly be exercised later in the lifecycle of the actual use of the ontology. Of course, they are important to consider at the time you’re building the ontology.

Expressiveness

The ontology needs to be expressive enough to describe all the distinctions that an enterprise makes. Most enterprises of any size at all have tens of thousands to hundreds of thousands of distinctions that they use in their information systems. Not only is each piece of schemata in all of their databases a distinction but so are many of the codes they have in code tables as well as decisions that are called out either in code or in procedure manuals. The sum total of all these distinctions is the operating ontology of the enterprise. However, they are not formally expressed in one place. The structure as well as the base concepts used need to be rich enough that when a new concept is uncovered it can be expressed in the ontology.

Elegance

At the same time, we need to strive for an elegant representation. It would be simple but perhaps simplistic to take all the distinctions in all the current systems and put them in a simple repository and call them an ontology. This misses some of the great strengths of an ontology. We want to use our ontology not only to document and describe distinctions but also to find similarities. In these days of Sarbanes Oxley regulations it would be incredibly helpful to know which distinctions and which parts of which schemas deal with financial commitments and “material transactions.”

Inclusion and exclusion criteria

Essentially, the ontology is describing distinctions amongst “types.” In many cases, what we would like to know is whether a given instance is of a particular type. Let’s say it’s a record in a product table, therefore it’s a type “product.” But in another system we may have inventory and we would like to know whether this instance is also compatible with the type that we’ve defined as inventory. In order to do this, we need in the ontology a way to describe inclusion and exclusion criteria: what other clues we would use if we or another system were evaluating a particular instance to determine whether it was, in fact, of a particular type. For instance, if inventory were defined as being physical goods held for resale, one inclusion criteria might be weight because weight is an indicator of a physical good. Clearly, there would be many more, as well as criteria for excluding. But this gives you an idea.

Cross referencing capability

Another criterion that is very important is the ability to keep track of where the distinction was found; that is, which system currently implements and uses this particular distinction. This is very important for producing any type of where-used information because as we change our distinctions it might have side effects on other systems.

Inferencing

Inferencing is the ability to find or infer additional information based on the information we have. For instance, if we know that an entity is a person we can infer that the person has a birthday, whether we know it or not, and we can also infer that the person is less than 150 years old. While this sounds simple at this level, the power in an ontology is when the inference chains become long and complex and we can use the inferencing engine itself to make many of these conclusions on-the-fly.

Foreign-language support

As we described earlier, the ontology is a specification of a conceptualization that we attach terms to. It doesn’t take much to add the ability to add foreign language terms.. This adds a great deal of power for developers who wish to present the same information, and the same screens, in multiple languages, as we are really just manipulating the concepts and attaching the appropriate language at runtime.

Some of these characteristics are aided by the existence of tools or infrastructures, but many of them are produced by the skill of the ontologist.

Summary

We believe that the enterprise ontology will become a cornerstone in many information systems in the future. It will become a primary part of the systems integration infrastructure as one application will be translated into the ontology and we will very rapidly know what the corresponding schema and terms are and what transformations are needed to get to another application. It will become part of the corporate search strategy as search moves beyond mere keywords into actually searching for meaning. It will become part of business intelligence and data warehousing systems as naïve users can be led to similar terms in the warehouse repository and aid their manual search and query construction.

Many more tools and infrastructures will become available over the next few years that will make use of the ontology, but the prudent information manager will not wait. He or she will recognize that there is a fair lead time to learn and implement something like this, and any implementation will be better than none because this particular technology promises to greatly leverage all the rest of the system technologies.

Smart City Ontologies: Lessons Learned from Enterprise Ontologies

October 21, 2025June 29, 2021 by Dave McComb

Smart City Ontologies: Lessons Learned from Enterprise Ontologies For the last 20 years, Semantic Arts has been helping firms design and build enterprise ontologies to get them on the data-centric path. We have learned many lessons from the enterprise that can be applied in the construction of smart city ontologies.

What is similar between the enterprise and smart cities?

They both have thousands of application systems. This leads to thousands of arbitrarily different data models, which leads to silos.
The enterprise and smart cities want to do agile, BUT agile is a more rapid way to create more silos.

What is different between the enterprise and smart cities?

In the enterprise, everyone is on the same team working towards the same objectives.
In smart cities there are thousands of different teams working towards different objectives. For example:

Utility companies.
Sanitation companies.
Private and public industry.

Large enterprises have data lakes in data warehouses.
In smart cities there are little bits of data here and there.

What have we learned?

“Simplicity is the ultimate sophistication.”

20 years of Enterprise Ontology construction and implementation has taught us some lessons that apply to Smart City ontologies. The Smart City Landscape informs us how to apply those lessons, which are:

Think big and start small.

Simplicity is key to integration.

Low code / No code is key for citizen developers.

Semantic Arts gave this talk at the W3c Workshop on Smart Cities originally recorded on June 25, 2021. Click here to view the talks and interactive sessions recorded for this virtual event. A big thank you to W3C for allowing us to contribute!

Legacy Systems

Legacy Modernization

The Lost Opportunity

Why no one will take you seriously

Tacking

Leg 1 – ETL to a Graph

Leg 2 – Architecture MVP

Leg 3 — Simple new Functionality in the Graph

Leg 4 – Understand the Legacy System and its Environment

Leg 5 – Become the System of Record for some subset

Leg 6 – Replace the dependencies incrementally

Leg 7 – Lather, Rinse, Repeat

Conclusion

What is a Market?

The Historical Market.

Why Did Physical Marketplaces Emerge and Dominate?

Why Did Marketplaces Specialize?

But Why Did They Choose the Specializations That They Did?

What Makes a Market?

Conclusion

The Enterprise Ontology

What is an ontology?

Why is this useful to an enterprise?

Isn’t this just a corporate data dictionary or consortia of data standards?

How to build an enterprise ontology

What to look for in an enterprise ontology

Expressiveness

Elegance

Inclusion and exclusion criteria

Cross referencing capability

Inferencing

Foreign-language support

Summary

What is similar between the enterprise and smart cities?

What is different between the enterprise and smart cities?

What have we learned?

Contact Us

Learn More