The Whiteboard Archives - Page 2 of 23

Team Building – The Importance of an Ontology

October 16, 2025April 28, 2025 by Peter Winstanley

Team Building – The Importance of an Ontology

Effective communication is something that many in the C-suite aspire to have in their organization. But what does this really entail in practice? At one level, it is about the high-level aspirations of the organization, the expectations of clients, the hopes of investors, and so on. At another level, deeper into the workings of the enterprise, it is about clear messaging. It is about ensuring that effort is not duplicated because of a misunderstanding. It is about being able to re-use information assets effectively within different lines of business. It is about analyses of data coming to the same conclusion, and that the conclusion cannot be contested because of a failure to use the correct analytical procedure.

There are many good blogs and papers on this topic. They keep cropping up regularly, so the topic clearly has not been solved. One example, https://blog.landscapeprofessionals.org/team-building-the-importance-of-a-shared vocabulary/, could have been in any discipline but concerns landscaping. It clearly expresses the need for a ‘shared vocabulary’ and a ‘standardized terminology’. One sentence really stands out for me: “Having a consistent language when interacting with clients creates a cohesive experience for them no matter who they are speaking to in your organization.” The “cohesive experience” is what it is all about – that warm feeling when you know you’re on the same wavelength as somebody else. From that, they can intuit your next move, show empathy, and so on.

In writing “Is An Upper Ontology Useful?” [https://ceur-ws.org/Vol-3661/10-SHORT PWinstanley-ISKOUK2023.pdf] I specifically called out the utility of an upper ontology as a vocabulary in common – just like the shared vocabulary that is desired among landscape professionals and many others. But is an upper ontology something too technical to bring a cohesive experience to a company like yours? Not really. The process of bringing an ontology into an enterprise is a very social one. People must get together to work out what the types of ‘things’ they have in their business, what words get used across all parts of the business and client base to refer to these ‘things’ in a consistent manner, and how they can be defined in a non-circular way. The experience of building an ontology develops cohesion. Sometimes the development of an ontology exposes pre-existing fracture lines within the sta4. But using an RDF and OWL ontology together with IRIs as identifiers provides a mechanism to handle ambiguity gracefully. Gone are the days when people were in a room until one “winner” succeeded in getting their way. The move from using labels to using namespace-based IRIs as identifiers of the concepts in the development of a shared vocabulary (“strings to things”) gives your organization considerable flexibility to handle variation of expression but show the shared meaning, as well as splitting out the meaning of concepts where the use of labels alone might lead to confusion coming from multiple concepts sharing the same label.

If you want effective communication in your enterprise, and you feel the need for a shared vocabulary, then implement an ontology. Yes, it’s for information integration, and that means something involving computers. But that is not the only aspect of implementing an ontology. The process itself is helpful for team building, and the output is an excellent way to bring that cohesive communication experience to your business and your clients.

From Labels to Verbs –Child’s Play!

October 17, 2025April 28, 2025 by Peter Winstanley

From Labels to Verbs –Child’s Play!

Watching a child acquire their first language skills is nothing like acquiring a new language. The 2-year-old is learning vocalisation, and also ensuring that they get the correct labels for the things around them. “Mummy”, “Daddy”, “cat”, “car”, “sea” and so on. Often this is done with gestures and pointing. The whole person is involved in enunciating and conveying meaning as part of a dialogue. Others in the child’s environment will be encouraging them, but the child, too, will be observing and understanding at a level way beyond their ability to join in – at least for a month or two.

Over time, labels get replaced by phrases. Verbs and adjectives come into the mix. The degree of sophistication improves both the communication, and the ability to show that something communicated to the child has been understood as intended.

Are there any lessons from the child acquiring language from which enterprises can learn when it comes to their development in acquiring semantic skills, assets and technologies? I think there are. I see that enterprises follow a somewhat similar path to the child in acquiring semantic skills – tending to start with simple collections of labels (controlled vocabularies and simple taxonomies) before venturing into using more complex information structures with verbs and adjectives. (These come from ontologies that provide more scope for knowledge representation than controlled vocabularies and taxonomies do).

Historically, this has been the case. We went nearly 2000 years between Plato’s Socrates “carving nature at its joints” to help us understand what ‘things’ there are in our world to William A. Woods writing “What’s in a link?”.

Semantic Arts has developed a core, upper ontology that carves enterprise information at its seams. This has been described in many ways, most recently in a form like the Mendeleev periodic table of elements iii. But where are the links? Are we still approaching the persuasion to enterprises to use semantics in the same way as a child acquiring language rather than as someone already with language skills learning a new language? Do people in business and industry see their information assets as telling a story? Have they the understanding that information ‘bricks’ can be organised to build a variety of information stories in the same way that a set of Lego/Duplo bricks can be organised into the shape of a house, or a boat, or a space rocket? It is the links, the predicates in the RDF model, that help bring together instances of classes into a phrasal structure that is as simple as a 3-year-old’s language constructs.

Let’s use the remainder of this post just to examine the vocabulary of Semantic Arts’ ‘gist’ ontology from the perspective of the links – those predicates on which predicate logic is based.

‘gist’ version 13 has 63 object properties. These are the property type that relates one ‘thing’ to another ‘thing’. There are also 50 data properties, the type that relates a ‘thing’ to a ‘string’ of some sort. We know that the ‘string’ could be assigned to a ‘thing’ type by using xsd:anyURI, but let’s leave that for now.

There are 3 object properties where only the domain (the origin of the relationship ‘arrow’) is specified in the ontology:

gist: owns
gist:providesOrderFor
gist:isAbout

and 14 where the range alone (the class at the pointy end of the relationship ‘arrow) is specified:

gist:hasAccuracy
gist:hasParty
gist:hasPhysicalLocation
gist:comesFromAgent
gist:hasAspect
gist:isIdentifiedBy
gist:isAllocatedBy

gist:isMadeUpOf
gist:comesFromPlace
gist:goesToPlace
gist:hasMagnitude
gist:isRecognizedBy
gist:hasAddress
gist:goesToAgent

There are only 6 object properties where the ontology ‘constrains’ us to a specific set of both domain and range classes:

gist:isGeoContainedIn
gist:hasUnitOfMeasure
gist:prevents

gist:hasUnitGroup
gist:hasBiologicalParent
gist:isFirstMemberOf

This leaves 40 object properties that are a little more flexible in their intended use.

gist:isExpressedIn
gist:isCategorizedBy
gist:hasDirectBroader
gist:hasBroader
gist:isRenderedOn
gist:isGovernedBy
gist:hasParticipant
gist:hasGiver
gist:precedesDirectly
gist:requires
gist:isPartOf
gist:precedes
gist:allows
gist:isDirectPartOf

gist:contributesTo
gist:hasMultiplier
gist:isMemberOf
gist:hasGoal
gist:accepts
gist:hasUniqueNavigationalParent gist:hasNavigationalParent
gist:isTriggeredBy
gist:links
gist:isBasedOn
gist:isConnectedTo
gist:hasRecipient
gist:occursIn
gist:isAgectedBy

gist:refersTo
gist:hasSubtrahend
gist:hasDivisor
gist:conformsTo
gist:hasAddend
gist:hasUniqueBroader
gist:produces
gist:isUnderJurisdictionOf gist:linksFrom
gist:ogers
gist:hasIncumbent
gist:linksTo

One of our observations is that the more one focuses on classes, the tighter one sticks to domains within the enterprise. These then follow the verticals of the periodic table. We are re-enforcing the siloed view, a little. We are keeping interoperability, but looking at the enterprise from the perspective of business areas. But if we move to modelling with relationships, then we can think more openly about how similar patterns occur between the ‘things’ of the enterprise across different areas. It leads us to a much more abstract way of thinking about the enterprise because these relationships will crop up all over the place. (This is one reason the ‘gist’ ontology does not specify domains or ranges for the majority of its properties. It increases flexibility and opens up more possibilities for patterns.)

For a bit of semantic fun, look in the real world for opportunities to use the gist properties as verbs. Go into work and think about how many types of ‘thing’ you can see in the enterprise where ‘thing A’ gist:produces ‘thing B’. See where you can find ‘thing X’ gist:hasGoal ‘thing Y’. Unlike the child we started this article with, you as the reader already know language. So you can use more than just labels and start making statements of a phrasal nature that use these ‘gist’ relationships.

gist:produces
a owl:ObjectProperty ;
rdfs:isDefinedBy <https://w3id.org/semanticarts/ontology/gistCore> ;
skos:definition “The subject creates the object.”^^xsd:string ;
skos:example “A task produces a deliverable.”^^xsd:string ;
skos:prefLabel “produces”^^xsd:string ;

gist:hasGoal
a owl:ObjectProperty ; rdfs:isDefinedBy <https://w3id.org/semanticarts/ontology/gistCore> ;
skos:definition “The reason for doing something”^^xsd:string ;
skos:prefLabel “has goal”^^xsd:string ;

The Case for Enterprise Ontology

October 17, 2025April 7, 2025 by Dave McComb

The Case for Enterprise Ontology

I was asked by one of our senior staff why someone might want an enterprise ontology. From my perspective, there are three main categories of value for integrating all your enterprise’s data into a single core:

Economy
Cross Domain Use Cases
Serendipity

Economy

For many of our clients there is an opportunity that stems from simple rationalization and elimination of duplication. Every replicated data set incurs costs. It incurs costs in the creation and maintenance of the processes that generate it. But the far bigger costs are associated with data reconciliation. Inevitably each extract and population create variation. These variations add up, triggering additional research to find out why there are slight differences between these datasets.

Even with ontological based systems, these difference creep in. We know that many of our clients ontological based domains contain an inventory (or a sub inventory). Employees are a good example. These sub-directories show up all over the place. There is a very good chance each domain has their own feed from HR. They may be fed from the same system, but as is often the case, each was directed to a warehouse or a different system for their source. Even if they came from the same source – the pipeline, IRI assignment and transformation are all likely different.

Here’s an illustration from a large bank associated with records retention within their legal department. One part of this project involved getting a full directory of all the employees into the graph. Later on we were working with another group on the technical infrastructure, and they wanted to get their own feed from HR to convert into triples. Fortunately we were able to divert them by pointing out that there was already a feed that provided curated employee triples.

They accepted our justification but asked … “can we have a copy of those triples to conform to our needs.” This gave us the opportunity to explain there is no conforming. Each triple is an individual asserted fact with its own provenance. You either accept it or ignore it. There really isn’t anything to conform. There is no need to restructure.

At first glance all their sub domains seemed to stand alone, but the truth is there is a surprising amount of overlap between them. There were many similar but not identical definitions of “business units.” There were several incompatible ways to describe geographic aggregation. Many different divisions dealt with the same counterparties or with the same products. And it is only when the domains are unified that most of these differences come to light.

Just unifying and integrating duplicate data sets provided economic justification for the project. We know of another company that justified their whole graph undertaking simply from the rationalization and reduction of subscriptions to the same or similar datasets from different parts of the business.

The good news is that harmonizing ontologically based systems is an order of magnitude cheaper than traditional systems.

Cross Domain Use Cases

Reuse of concepts is one of the most compelling reasons for an enterprise ontology. Some of the obvious cross-domain use cases from some of our pharmaceutical clients include:

Translation of manufacturing process from bench to trial to full scale • Integration of Real-World Evidence and Adverse events
Collapsing submission time for regulatory reporting
Clinical trial recruiting
Cross channel customer integration

Some of the best opportunities come from combining previously separate sub-domains. Sometimes you can know this going into a project. But sometimes you don’t discover the opportunity until you are well into the project. Those are the ones that fall into the serendipity category.

Serendipity

I’ve recently come to the realization that the most important use cases for unification might in fact be serendipity. That is, the power might be in unanticipated use cases. I’ll give some examples and then we’ll point you to a video from one of Amazon’s lead ontologists who came to the same conclusion.

Schneider-Electric

We did a project for Schneider-Electric (see case study). We constructed the scaffolding of their enterprise ontology and then drilled in on their product catalog and offering. Our initial goal was to get their 1 million parts into a knowledge graph and demonstrate that it was as complete and as detailed as their incumbent system. At the end of the project we had all their products in a knowledge graph, with all their physical, electrical, thermal and many other characteristics defined and classified.

Serendipity 1: Inherent Product Compatibility

We interviewed product designers to find out the nature of product compatibility. It was easy to write a different type of rule (using SPARQL) with our greatly simplified ontology that persisted the “inherent” compatibility of parts into the catalog. By doing this it reversed the sequence of events. Previously, because the compatibility process was difficult and time-consuming, they would wait until they were ready to sell a line of products in a new market before beginning the compatibility studies. Not knowing the compatibility added months into their time-to-market. In the new approach, the graph knew which products were compatible before the decision to offer them to new markets.

Serendipity 2: Standards Alignment

Schneider were interested in aligning their product offerings with the standard called eCl@ss which has over 15,000 classes and thousands of attributes. It is a complex mapping process, which had been attempted before but abandoned. By starting with the extreme simplification of the ontology (46 classes and 36 properties out of the several hundred in the enterprise ontology), working toward the standard was far easier and we had an initial map completed in about two months.

Serendipity 3: Integrating Acquisitions

Schneider had acquired another electrical part manufacturer, Clipsal. They asked if we could integrate the Clipsal catalogue with the new graph catalogue. Clipsal also had a complex product catalogue. It was not as complex as Schneider’s, but it was complex and structured quite differently.

Rather than reverse engineering the Clipsal catalogue we just asked for their data engineers to point us to where the 46 classes and 36 properties were in the catalogue. Once we’d extracted all that we asked if we were missing anything. Turns out there were a few items, which we added to the model.

The whole exercise took about six weeks. At the end of the project we were reviewing the Schneider-Electric page in Wikipedia and found that they had acquired Clipsal over ten years prior. When we asked why they hadn’t integrated their catalogue in all the time they responded that it was “too hard.”

All three of these use cases are of interest, because they weren’t the use cases we were hired to solve but only manifested when the data was integrated into a simple model.

—————————–

Amazon Story of Serendipity

This video of Ora Lassila is excellent and inspiring.

https://videolectures.net/videos/iswc2024_lassila_web_and_ai

If you don’t have time to watch to the whole thing, skip into minute 14:40 where he describes the “inventory graph” for tracking packages in the Amazon ecosystem. They have 1 Trillion triples in the graph and the query response is far better than it was in their previous systems. At minute 23:20 he makes the case for serendipity.

How a “User” Knowledge Graph Can Help Change Data Culture

October 17, 2025April 7, 2025 by Semantic Arts Admin

How a “User” Knowledge Graph Can Help Change Data Culture

Identity and Access Management (IAM) has had the same problem since Fernando Corbató of MIT first dreamed up the idea of digital passwords in 1960: opacity. Identity in the physical world is rich and well-articulated, with a wealth of different ways to verify information on individual humans and devices. By contrast, the digital realm has been identity data impoverished, cryptic and inflexible for over 60 years now.

Jans Aasman, CEO of Franz, provider of the entity-event knowledge graph solution Allegrograph, envisions a “user” knowledge graph as a flexible and more manageable data-centric solution to the IAM challenge. He presented on the topic at this past summer’s Data-Centric Architecture Forum, which Semantic Arts hosted near its headquarters in Fort Collins, Colorado.

Consider the specificity of a semantic graph and how it could facilitate secure access control. Knowledge graphs constructed of subject-predicate-object triples make it possible to set rules and filters in an articulated and yet straightforward manner. Information about individuals that’s been collected for other HR purposes could enable this more precise filtering.

For example, Jans could disallow others’ access to a triple that connects “Jans” and “salary”. Or he could disallow access to certain predicates.

Identity and access management vendors call this method Attribute-Based Access Control (ABAC). Attributes include many different characteristics of users and what they interact with, which is inherently more flexible than role-based access control (RBAC).

Cell-level control is also possible, but as Forrest Hare of Summit Knowledge Solutions points out, such security doesn’t make a lot of sense, given how much meaning is absent in cells controlled in isolation. “What’s the classification of the number 7?” He asked. Without more context, it seems silly to control cells that are just storing numbers or individual letters, for example.

Simplifying identity management with a knowledge graph approach

Graph databases can simplify various aspects of the process of identity management. Let’s take Lightweight Directory Access Protocol, or LDAP, for example.

This vendor-agnostic protocol has been around for 30 years, but it’s still popular with enterprises. It’s a pre-web, post-internet hierarchical directory service and authentication protocol.

“Think of LDAP as a gigantic, virtual telephone book,” suggests access control management vendor Foxpass. Foxpass offers a dashboard-based LDAP management product which it claims is much easier to manage than OpenLDAP.

If companies don’t use LDAP, they might as well use Microsoft’s Active Directory, which is a broader, database-oriented identity and access management product that covers more of the same bases. Microsoft bundles AD with its Server and Exchange products, a means of lock-in that has been quite effective. Lock-in, obviously, inhibits innovation in general.

Consider the whole of identity management as it exists today and how limiting it has been. How could enterprises embark on the journey of using a graph database-oriented approach as an alternative to application-centric IAM software? The first step involves the creation of a “user” knowledge graph.

Access control data duplication and fragmentation

Semantic Arts CEO Dave McComb in his book Software Wasteland estimated that 90 percent of data is duplicated. Application-centric architectures in use since the days of mainframes have led to user data sprawl. Part of the reason there is such a duplication of user data is that authentication, authorization, and access control (AAA) methods require more bits of personally identifiable information (PII) be shared with central repositories for AAA purposes.

B2C companies are particularly prone to hoovering up these additional bits of PII lately and storing that sensitive info in centralized repositories. Those repositories become one-stop shops for identity thieves. Customers who want to pay online have to enter bank routing numbers and personal account numbers. As a result, there’s even more duplicate PII sprawl.

One of the reasons a “user” knowledge graph (and a knowledge graph enterprise foundation) could be innovative is that enterprises who adopt such an approach can move closer to zero-copy integration architectures. Model-driven development of the type that knowledge graphs enable assumes and encourages shared data and logic.

A “user” graph coupled with project management data could reuse the same enabling entities and relationships repeatedly for different purposes. The model-driven development approach thus incentivizes organic data management.

The challenge of harnessing relationship-rich data

Jans points out that enterprises, for example, run massive email systems that could be tapped to analyze project data for optimization purposes. And disambiguation by unique email address across the enterprise can be a starting point for all sorts of useful applications.

Most enterprises don’t apply unique email address disambiguation, but Franz has a pharma company client that does, an exception that proves the rule. Email continues to be an untapped resource in many organizations precisely because it’s a treasure trove of relationship data.

Problematic data farming realities: A social media example

Relationship data involving humans is sensitive by definition, but the reuse potential of sensitive data is too important to ignore. Organizations do need to interact with individuals online, and vice versa.

Former US Federal Bureau of Investigation (FBI) counterintelligence agent Peter Strzok quoted from Deadline: White House, an MSNBC program in the US aired on August 16:

“I’ve served I don’t know how many search warrants on Twitter (now known as X) over the years in investigations. We need to put our investigator’s hat on and talk about tradecraft a little bit. Twitter gathers a lot of information. They just don’t have your tweets. They have your draft tweets. In some cases, they have deleted tweets. They have DMs that people have sent you, which are not encrypted. They have your draft DMs, the IP address from which you logged on to the account at the time, sometimes the location at which you accessed the account and other applications that are associated with your Twitter account, amongst other data.”

X and most other social media platforms, not to mention law enforcement agencies such as the FBI, obviously care a whole lot about data. Collecting, saving, and allowing access to data from hundreds of millions of users in such a broad, comprehensive fashion is essential for X. At least from a data utilization perspective, what they’ve done makes sense.

Contrast these social media platforms with the way enterprises collect and handle their own data. That collection and management effort is function- rather than human-centric. With social media, the human is the product.

So why is a social media platform’s culture different? Because with public social media, broad, relationship-rich data sharing had to come first. Users learned first-hand what the privacy tradeoffs were, and that kind of sharing capability was designed into the architecture. The ability to share and reuse social media data for many purposes implies the need to manage the data and its accessibility in an elaborate way. Email, by contrast, is a much older technology that was not originally intended for multi-purpose reuse.

Why can organizations like the FBI successfully serve search warrants on data from data farming companies? Because social media started with a broad data sharing assumption and forced a change in the data sharing culture. Then came adoption. Then law enforcement stepped in and argued effectively for its own access.

Broadly reused and shared, web data about users is clearly more useful than siloed data. Shared data is why X can have the advertising-driven business model it does. One-way social media contracts with users require agreement with provider terms. The users have one choice: Use the platform, or don’t.

The key enterprise opportunity: A zero-copy user PII graph that respects users

It’s clear that enterprises should do more to tap the value of the kinds of user data that email, for example, generates. One way to sidestep the sensitivity issues associated with reusing that sort of data would be to treat the most sensitive user data separately.

Self-sovereign identity (SSI) advocate Phil Windley has pointed out that agent-managed, hashed messaging and decentralized identifiers could make it unnecessary to duplicate identifiers that correlate. If a bartender just needs to confirm that a patron at the bar is old enough to drink, the bartender could just ping the DMV to confirm the fact. The DMV could then ping the user’s phone to verify the patron’s claimed adult status.

Given such a scheme, each user could manage and control their access to their own most sensitive PII. In this scenario, the PII could stay in place, stored, and encrypted on a user’s phone.

Knowledge graphs lend themselves to such a less centralized, and yet more fine-grained and transparent approach to data management. By supporting self-sovereign identity and a data-centric architecture, a Chief Data Officer could help the Chief Risk Officer mitigate the enterprise risk associated with the duplication of personally identifiable information—a true, win-win.

Zero Copy Integration and Radical Simplification

October 17, 2025April 7, 2025 by Semantic Arts Admin

Zero Copy Integration and Radical Simplification

Dave McComb’s book Software Wasteland underscored a fundamental problem: Enterprise software sometimes costs 1,000 times more than it ought to. The poster child for cost overruns was highlighted in the book was Healthcare.gov, a public registration system for the US Affordable Care Act, enacted in 2010. By 2018, the US Federal government had spent $2.1 billion to build and implement the system. Most of that money was wasted. The government ended up adopting many of the design principles embodied in an equivalent system called HealthSherpa, which cost $1 million to build and implement.

In an era where the data-centric architecture Semantic Arts advocates should be the norm, application-centric architecture still predominates. But data-centric architecture doesn’t just reduce the cost of applications. It also attacks the data duplication problem attributable to poor software design. This article explores how expensive data duplication has become, and how data-centric, zero-copy integration can put enterprises on a course to simplification.

Data sprawl and storage volumes

In 2021, Seagate became the first company to ship three zettabytes worth of hard disks. It took them 36 years to ship the first zettabyte. Six years to ship the second zettabyte, and only one additional year to ship the third zettabyte.

The company’s first product, the ST-506, was released in 1980. The ST-506 hard disk, when formatted, stored five megabytes (10002). By comparison, an IBM RAMAC 305, introduced in 1956, stored five to ten megabytes. The RAMAC 305 weighed 10 US tons (the equivalent of nine metric tonnes). By contrast, the Seagate ST-506, 24 years later, weighed five US pounds (or 2.27 kilograms).

A zettabyte is the equivalent of 7.3 trillion MP3 files or 30 billion 4K movies, according to Seagate. When considering zettabytes:

1 zettabyte equals 1,000 exabytes.
1 exabyte equals 1,000 petabytes.
1 petabyte equals 1,000 terabytes.

IDC predicts that the world will generate 178 zettabytes of data by 2025. At that pace, “The Yottabyte Era” would succeed The Zettabyte Era by 2030, if not earlier.

The cost of copying

The question becomes, how much of the data generated will be “disposable” or unnecessary data? In other words, how much data do we actually need to generate, and how

much do we really need to store? Aren’t we wasting energy and other resources by storing more than we need to?

Let’s put it this way: If we didn’t have to duplicate any data whatsoever, the world would only have to generate 11 percent of the data it currently does. In 2021 terms, we’d only need to generate 8.7 zettabytes of data, compared with the 78 zettabytes we actually generated worldwide over the course of that year.

Moreover, Statista estimates that the ratio of unique to replicated data stored worldwide will decline to 1:10 from 1:9 by 2024. In other words, the trend is toward more duplication, rather than less.

The cost of storing oodles of data is substantial. Computer hardware guru Nick Evanson, quoted by Gerry McGovern in CMSwire, estimated in 2020 that storing two yottabytes would cost $58 trillion. If the cost per byte stored stayed constant, 40 percent of the world’s economic output would be consumed in 2035 by just storing data.

Clearly, we should be incentivizing what graph platform Cinchy calls “zero-copy integration”–a way of radically reducing unnecessary data duplication. The one thing we don’t have is “zero-cost” storage. But first, let’s finish the cost story. More on the solution side and zero-copy integration later.

The cost of training and inferencing large language models

Model development and usage expenses are just as concerning. The cost of training machines to learn with the help of curated datasets is one thing, but the cost of inferencing–the use of the resulting model to make predictions using live data–is another.

“Machine learning is on track to consume all the energy being supplied, a model that is costly, inefficient, and unsustainable,” Brian Bailey in Semiconductor Engineering pointed out in 2022. AI model training expense has increased with the size of the datasets used, but more importantly, as the amount of parameters increases by four, the amount of energy consumed in the process increases by 18,000 times. Some AI models included as many as 150 billion parameters in 2022. The more recent ChatGPT LLM Training includes 180 billion parameters. Training can often be a continuous activity to keep models up to date.

But the applied model aspect of inferencing can be enormously costly. Consider the AI functions in self-driving cars, for example. Major car makers sell millions of cars a year, and each one they sell is utilizing the same carmaker’s model in a unique way. 70 percent of the energy consumed in self-driving car applications could be due to inference, says Godwin Maben, a scientist at electronic design automation (EDA) provider Synopsys.

Data Quality by Design

Transfer learning is a machine learning term that refers to how machines can be taught to generalize better. It’s a form of knowledge transfer. Semantic knowledge graphs can be a valuable means of knowledge transfer because they describe contexts and causality well with the help of relationships.

Well-described knowledge graphs provide the context in contextual computing. Contextual computing, according to the US Defense Advanced Research Projects Agency (DARPA), is essential to artificial general intelligence.

A substantial percentage of training set data used in large language models is more or less duplicate data, precisely because of poorly described context that leads to a lack of generalization ability. Thus the reason why the only AI we have is narrow AI. And thus the reason large language models are so inefficient.

But what about the storage cost problem associated with data duplication? Knowledge graphs can help with that problem also, by serving as a means for logic sharing. As Dave has pointed out, knowledge graphs facilitate model-driven development when applications are written to use the description or relationship logic the graph describes. Ontologies provide the logical connections that allow reuse and thereby reduce the need for duplication.

FAIR data and Zero-Copy Integration

How do you get others who are concerned about data duplication on board with semantics and knowledge graphs? By encouraging data and coding discipline that’s guided by FAIR principles. As Dave pointed out in a December 2022 blog post, semantic graphs and FAIR

principles go hand in hand. https://www.semanticarts.com/the-data-centric-revolution-detour shortcut-to-fair/

Adhering to the FAIR principles, formulated by a group of scientists in 2016, promotes reusability by “enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.” When it comes to data, FAIR stands for Findable, Accessible, Interoperable, and Reusable. FAIR data is easily found, easily shared, easily reused quality data, in other words.

FAIR data implies the data quality needed to do zero-copy integration.

Bottom line: When companies move to contextual computing by using knowledge graphs to create FAIR data and do model-driven development, it’s a win-win. More reusable data and logic means less duplication, less energy, less labor waste, and lower cost. The term “zero-copy integration” underscores those benefits.

A Knowledge Model for Explainable Military AI

October 17, 2025April 7, 2025 by Semantic Arts Admin

A Knowledge Model for Explainable Military AI

Forrest Hare, Founder of Summit Knowledge Solutions, is a retired US Air Force targeting and information operations officer who now works with the Defense Intelligence Agency (DIA). His experience includes integrating intelligence from different types of communications, signals, imagery, open source, telemetry, and other sources into a cohesive and actionable whole.

Hare became aware of semantics technology while at SAIC and is currently focused on building a space + time ontology called the DIA Knowledge Model so that Defense Department intelligence could use it to contextualize these multi-source inputs.

The question becomes, how do you bring objects that don’t move and objects that do move into the same information frame with a unified context? The information is currently organized by collectors and producers.

The object-based intelligence that does exist involves things that don’t move at all. Facilities, for example, or humans using phones that are present on a communications network are more or less static. But what about the things in between such as trucks that are only intermittently present?

Only sparse information is available about these. How do you know the truck that was there yesterday in an image is the same truck that is there today? Not to mention the potential hostile forces who own the truck that have a strong incentive to hide it.

Objects in object-based intelligence not only include these kinds of assets, but also events and locations that you want to collect information about. In an entity-relationship sense, objects are entities.

Hare’s DIA Knowledge Model uses the ISO standard Basic Formal Ontology (BFO) to unify domains so that the information from different sources is logically connected and therefore makes sense as part of a larger whole. BFO’s maintainers (Director Barry Smith and team at the National Center for Ontological Research (NCOR) at the University of Buffalo) keep the ontology strictly limited to 30 or so classes.

The spatial-temporal regions of the Knowledge Model are what’s essential to do the kinds of dynamic, unfolding object tracking that’s been missing from object-based intelligence. Hare gave the example of a “site” (an immaterial entity) from a BFO perspective. A strict geolocational definition of “site” makes it possible for both humans and machines to make sense of the data about sites. Otherwise, Hare says, “The computer has no idea how to understand what’s in our databases, and that’s why it’s a dumpster fire.”

This kind of mutual human and machine understanding is a major rationale behind explainable AI. A commander briefed by an intelligence team must know why the team came to the conclusions it did. The stakes are obviously high. “From a national security perspective, it’s extremely important for AI to be explainable,” Hare reminded the audience. Black boxes such as ChatGPT as currently designed can’t effectively answer the commander’s question on how the intel team arrived at the conclusions it did.

Finally, the level of explain-ability knowledge models like the DIA’s becomes even more critical as information flows into the Joint Intelligence Operations Center (JIOC). Furthermore, the various branches of the US Armed Forces must supply and continually update a Common Intelligence Picture that’s actionable by the US President, who’s the Commander in Chief for the military as a whole.

Without this conceptual and spatial-temporal alignment across all service branches, joint operations can’t proceed as efficiently and effectively as they should. Certainly, the risk of failure looms much larger as a result.

How US Homeland Security Plans to Use Knowledge Graph

October 17, 2025April 7, 2025 by Semantic Arts Admin

How US Homeland Security Plans to Use Knowledge Graph

During this summer’s Data Centric Architecture Forum, Ryan Riccucci, Division Chief for U.S. Border Patrol – Tucson (AZ) Sector, and his colleague Eugene Yockey gave a glimpse of what the data environment is like within the US Department of Homeland Security (DHS), as well as how transforming that data environment has been evolving.

The DHS celebrated its 20-year anniversary recently. The Federal department’s data challenges are substantial, considering the need to collect, store, retrieve and manage information associated with 500,000 daily border crossings, 160,000 vehicles, and $8 billion in imported goods processed daily by 65,000 personnel.

Riccucci is leading an ontology development effort within the Customs and Border Patrol (CBP) agency and the Department of Homeland Security more generally to support scalable, enterprise-wide data integration and knowledge sharing. It’s significant to note that a Division Chief has tackled the organization’s data integration challenge. Riccucci doesn’t let leading-edge, transformational technology and fundamental data architecture change intimidate him.

Riccucci described a typical use case for the transformed, integrated data sharing environment that DHS and its predecessor organizations have envisioned for decades.

The CBP has various sensor nets that monitor air traffic close to or crossing the borders between Mexico and the US, and Canada and the US. One such challenge on the Mexican border is Fentanyl smuggling into the US via drones. Fentanyl can be 50 times as powerful as morphine. Fentanyl overdoses caused 110,000 deaths in the US in 2022.

On the border with Canada, a major concern is gun smuggling via drone from the US to Canada. Though legal in the US, Glock pistols, for instance, are illegal and in high demand in Canada.

The challenge in either case is to intercept the smugglers retrieving the drug or weapon drops while they are in the act. Drones may only be active for seven to 15 minutes at a time, so the opportunity window to detect and respond effectively is a narrow one.

Field agents ideally need to see enough visual real-time, mapped airspace information on the sensor activated, allowing them to move quickly and directly to the location. Specifics are important; verbally relayed information, by contrast, can often be less specific, causing confusion or misunderstanding.

The CBP’s successful proof of concept involved a basic Resource Description Framework (RDF) triple, semantic capabilities with just this kind of information:

Sensor → Act of sensing → drone (SUAS, SUAV, vehicle, etc.)

In a recent test scenario, CBP collected 17,000 records that met specified time/space requirements for a qualified drone interdiction over a 30-day period.

The overall impression that Riccucci and Yockey conveyed was that DHS has both the budget and the commitment to tackle this and many other use cases using a transformed data-centric architecture. By capturing information within an interoperability format, the DHS has been apprehending the bad guys with greater frequency and precision.Copyright @ semanticarts.com

SIX AXES OF DECOUPLING

October 17, 2025April 7, 2025 by Dave McComb

SIX AXES OF DECOUPLING

Loose coupling has been a Holy Grail for systems developers for generations.

The virtues of loose coupling have been widely lauded, yet there has been little description about what is needed to achieve loose coupling. In this paper we describe our observations from projects we’ve been involved with.

Coupling

Two systems or two parts of a single system are considered coupled if a change to one of the systems unnecessarily affects the other system. So for instance, if we upgrade the version of our database and it requires that we upgrade the operating system for every client attached to that database, then we would say those two systems or those two parts of the system are tightly coupled.

Coupling is widely understood to be undesirable because of the spread of the side effects. As systems get larger and more complex, anything that causes a change in one part to affect a larger and larger footprint in the entire system is going to be expensive and destabilizing.

Loose Coupling/Decoupling

So, the converse of this is to design systems that are either “loosely coupled” or “decoupled.” Loosely coupled systems do not arise by accident. They are intentionally designed such that change can be introduced around predefined flex points.

For instance, one common strategy is to define an application programming interface (API) which external users of a module or class can use. This simple technique allows the interior of the class or module or method to change without necessarily exporting a change in behavior to the users.

The Role of the Intermediate

In virtually every system that we’ve investigated that has achieved any degree of decoupling, we’ve found an “intermediate form.” It is this intermediate form that allows the two systems or subsystems not to be directly connected to each other.

As shown in Figure (1), they are connected through an intermediary. In the example described above with an API, the signature of the interface is the intermediate.

What Makes a Good Intermediary?

An intermediary needs several characteristics to be useful:

It doesn’t change as rapidly as its clients. Introducing an intermediate that changes more frequently than either the producer or consumer of the service will not reduce change traffic in the system. Imagine a system built on an API which changes on a weekly basis. Every producer and consumer of the services that use the API would have to change along with the API and chaos would ensue.

It is nonproprietary. A proprietary intermediary is one that is effectively owned and controlled by a single group or small number of vendors. The reason proprietary intermediaries are undesirable is because the rate of change of the intermediary itself has been placed outside the control of the consumer. In many cases to use the service you must adopt the intermediary of the provider. It should also be noted that in many cases the controller of the proprietary standard has incentive to continue to change the standard if that can result in additional revenue for upgrades and the like.

It is evolvable. It’s highly unlikely that anyone will design an intermediate form that is correct for all time from the initial design. Because of this, it’s highly desirable to have intermediate forms that are evolvable. The best trait of an evolvable intermediate is that it can be added on to, without invalidating previous uses of it. We sometimes more accurately call this an accretive capability, meaning that things can be added on incrementally. The great advantage of an evolvable or accretive intermediary is that if there are many clients and many suppliers using the intermediary they do not have to all be changed in lockstep, which allows many more options for upgrade and change.

It is simple to use. An intermediate form that is complex or overly difficult to use will not be used and either other forms will be adopted which may be more various and different or the intermediate form will be skipped altogether and the benefit lost.

Shared Intermediates

In addition to the simple reduction in change traffic from having the intermediate be more stable than the components at either end, we also have an advantage in most cases where the intermediate allows re use of connections. This has been popularized in the Systems Integration business where people have pointed out time and time again that creating a hub will drastically reduce the number of interfaces needed to supply a system.

In Figure (2), we have an example of what we call the traditional interface math, where the introduction of a hub or intermediate form can drastically reduce the number of interconnections in a system.

People selling hubs very often refer to this as: (n * n – 1) / 2 or sometimes simply the n2 problem. While this makes for very compelling economics, our observation is that the true math for this style of system is much less generous but still positive. Just because two systems might be interconnected does not mean that they will be. Systems are not completely arbitrarily divided and therefore not every interconnection need be accounted for.

Figure (3) shows a more traditional scenario where, in the case on the left without a hub, there are many but not an exponential number of interfaces between systems. As the coloring shows, if you change one

of those systems, any of the systems it touches may be affected and should at least be reviewed with an impact analysis. In the figure on the right, when the one system is changed, the evaluation is whether the effect spreads beyond the intermediary hub in the center. If it does not, if the system continues to obey the dictates of the intermediary form, than the change effect is, in fact, drastically reduced.

The Axes of Decoupling

We found in our work that, in many cases, people desire to decouple their systems and even go through the effort of creating intermediate forms or hubs and then build their systems to connect to those intermediate forms. However, as the systems evolve, very often they realize that a change in one of the systems does, in fact, “leak through” the abstraction in the intermediate and affects other systems.

In examining cases such as this, we have determined that there are six major considerations that cause systems that otherwise appear to be decoupled to have a secret or hidden coupling. We call these the axes of decoupling. If a system is successfully decoupled on each of these axes, then the impact of a change in any one of the systems should be greatly minimized.

Technology Dependency

The first axis that needs to be decoupled, and in some ways the hardest, is what we call technology dependency. In the current state of the practice, people attempt to achieve integration, as well as economy of system operation, by standardizing on a small number of underlying technologies, such as operating systems and databases. The hidden trap in this is that it is very easy to rely on the fact that two systems or subsystems are operating on the same platform. As a result, developers find it easy to join a table from another database to one in their own database if they find that to be a convenient solution. They find it easy to make use of a system function on a remote system if they know that the remote system supports the same programming languages, the same API, etc.

However, this is one of the most pernicious traps because as a complex system is constructed with more and more of these subtle technology dependencies, it becomes very hard to separate out any portion and re – implement it.

The solution to this, as shown in Figure (4), is to introduce an intermediate form that ensures that a system does not talk directly to another platform. The end result is that each application or subsystem or service can run on its own hardware, in its own operating system, using its own database management system, and not be affected by changes in other systems. Of course, each system or subsystem does have a technological dependency on the technology of the intermediary in the middle. This is the trade-off; you introduce the dependence on one platform in exchange for being independent of n other platforms. In the current state-of-the-art, most people use what’s called an integration broker to achieve this. An integration broker is a product such as IBM’s WebSphere or TIBCO or BEA, which allows one application to communicate with another without being aware of, or care, what platform the second application runs on.

Destination Dependency

Even when you’ve successfully decoupled the platforms the two applications rely on, we’ve sometimes observed problems where one application “knows” of the existence and location of another application or service. By the way, this will become a very “normal problem” as Web services become more popular because the default method of implementing Web services has the requester knowing of the nature and destination of the service.

In Figure (5), we show a little more clearly through an example where two systems have an intermediary. In this case, the distribution and shipping application would like to send messages to a freight application, for instance to get a freight rating or to determine how long it would take to get a package somewhere. Imagine if you were to introduce a new service in the freight area that in some cases handled international shipping, but we continue to do domestic the old way. If we had not decoupled these services, it is highly likely that the calling program would now need to be aware of the difference and make a determination in terms of what message to send, what API to call, where to send its request, etc. The only other defense would be to have yet another service that accepted all requests and then dispatched them; but this is really an unnecessary artifact that would have to be added into a system where the destination intermediary had not been designed in.

Syntax Intermediary

Classically in an API, the application programming interface defines very specifically the syntax of any message sent between two systems. For instance, the API specifies the number of arguments, their order, and their type; and any change to any of those will affect any of the calling programs. Also EDI (electronic data interchange) relies very much on a strict syntactical definition of the message being passed between partners.

In Figure (6), we show a small snippet of XML, which has recently become the de facto syntactic intermediate form. Virtually all new initiatives now use XML as the syntactic lingua franca. As such, any two systems that communicate through XML at least do not have to mediate differences at that syntactic level. Also, fortunately, XML is a nonproprietary standard and, at least to date, has been evolving very slowly.

Semantic Intermediary

Where systems integration projects generally run into the greatest amount of trouble is from semantic differences or ambiguities in the meaning of the information being passed back and forth. Traditionally, we find that developers build interfaces and run them and test them against live data, and then find that the ways in which the systems have been used does not conform particularly well to the spec. Additionally, in each case the names and therefore the implied semantics of all the elements used in the interface are typically different from system to system and must be reconciled. The n2 way of resolving this is to reconcile every system to every other system, a very tedious process.

There have been a few products and some approaches, as we show very simply and schematically in Figure (7), that have attempted to provide a semantic intermediary. Two that we’re most familiar with are

Condivo and Unicorn. Over the long-term, the intent of the Semantic Web is to build shared ontologies in OWL, which is the Web Ontology Language and a derivative of RDF and DAML+OIL. In the long-term,

it’s expected that systems will be able to communicate shared meaning through mutually committed ontologies.

Identity Intermediary

A much subtler coupling that we’ve found in several systems is in the use of identifiers. Most systems have identifiers for all the key real world and invented entities that they deal with. For instance, most systems have identifiers for customers, patients, employees, sales orders, purchase orders, production lines, etc. All of these things must be given unique unambiguous names. That is not the problem; the problem is that each system has a tendency to create its own identifiers for items that are very often shared. In the real world, there is only one instance of many of these items. There is only one of each of us as individuals, one each for each building, one each for each corporation, etc. And yet each system tends to create its own numbering system and when it discovers a new customer it will give it the next available customer number. In order to communicate unambiguously with the system that’s done this, to date the two main approaches have been either to force universal identifiers onto a large number of systems or to store other people’s identifiers in your own system. Both of these approaches are flawed and do not scale well. In the case of the universal identifier, besides having all the problems of attempting to get coverage on the multiple domains, there is the converse problem of privacy. Once people, for instance, are given universal identifiers it’s very hard to keep information about individuals anonymous. The other approach of storing others’ identifiers in your systems does not scale well because as the number of systems you must communicate with grows, the number of other identifiers that you must store also grows. In addition, there is the problem of being notified when any changes to these identifiers occur.

In Figure (8), we outline a new intermediary, which is just beginning to be discussed as a general-purpose service, variously called the identity intermediary or the handle intermediary. The reason we’ve begun shifting from calling it an identity intermediary is because the security industry has been referring to identity systems and it does not mean exactly the same thing as what we mean here. Essentially, this is a service where each subscribing system recognizes that it may be dealing with an entity that any of the other systems may have previously dealt with. So this has a discovery piece that systems can discover if they’re dealing with, communicating with, or aware of any entity that has already been identified in the larger federation. It also acts as a cross reference so that each system need not keep track of all the synonyms of identifiers or handles to all the other systems. Figure (8) shows a very simple representation of this with two very similar individuals that need to be identified separately. To date, the only system that we know of that covers some of this territory is called ChoiceMaker, but it is not configured to be used in exactly the manner that we show here.

Nomenclature Intermediary

Very similar to the identity or handle intermediary is the nomenclature intermediary. We separate it because typically, with the identity intermediary, we’re dealing with discovered real world entities and the reason we have synonyms is because multiple different systems are “discovering” the same physical real-world item.

In the case of the nomenclature intermediary system, we’re dealing with an invented categorization system. Sometimes categorization systems are quite complex. In the medical industry we have SNOMED, HCPCS, and the CPT nomenclature. But also we have incredibly simple, and very often internally made up, classification systems, so in every case where we create a code file where we might have seven types of customer or orders or accidents or whatever that we tend to codify in order to get more uniformity, these are nomenclatures. What is helpful about having intermediary forms is that it enables multiple systems to either share or map to a common set of nomenclatures or codes.

Figure (9) shows a simple case of how the mapping could be centralized. Again, this is another example where over the long term, developments in Semantic Web may be a great help and may provide clearinghouses for the communication between disparate systems. In the meantime, the only example that we’re aware of where a company has internally devoted a lot of attention to this is the Allstate Insurance Co., which has built what they call a domain management system where they have found, catalogued, and cross-referenced over 6,000 different nomenclatures that are in use within Allstate.

Summary

Loose coupling has been a Holy Grail for systems developers for generations. There is no silver bullet that will slay these problems; however, as we have discussed in this paper, there are a number of specific disciplined things that we can look at as developers, and as we continue to pay attention to these, we will make our systems more and more decoupled, and therefore easier and easier to evolve and change.

Documents, Events and Actions

October 17, 2025March 21, 2025 by Dave McComb

Documents, Events and Actions

We have recently been reexamining the weird relationship of “documents” to “events” in enterprise information systems and have surfaced some new insights that are worth sharing.

Documents and Events

Just to make sure we are all seeing things clearly, the documents we’re referring to are those that give rise to financial change in an enterprise. This includes invoices, purchase orders, receiving reports and sales contracts. We’re not including other documents like memos, reports, news articles and emails – nor are we focusing on document structures such as JSON or XML.

In this context, the “events” represent the recording of something happening that has a high probability of affecting the finances of the firm. Many people call these “transactions” or “financial transactions.” The deeper we investigated, the more we found a need to distinguish the “event” (which is occurring in the real world) from the “transaction” (which is its reflection in the database). But I’m getting ahead of myself and will just stick with documents and events for this article.

Documents and Events, Historically

For most of recorded history, the document was the event, or at least it was the only tangibly recorded interpretation of the event. That piece of actual paper was both the document and the representation of the event. When you wrote up a purchase order (and had it signed by the other party) you had an event.

In the 1950’s we began computerizing these documents, turning them into a skeuomorph (a design that imitates a real-world object to make it more familiar). The user interfaces looked like paper forms. There were boxes on the top for “ship to” and “bill to” and small boxes in the middle for things like “payment terms,” and “free on board.” This was accompanied by a line item of the components that made up the bill, invoice, purchase order, timecard, etc.

For the longest time, the paper was also the “source document” which would be entered into the computer at the home office. Somewhere along the way some clever person realized you could start by entering the data into the computer for things you originated and then print out the paper. That paper was then sent to the other party for them to key it into their system.

Now, most of these “events” are not produced by humans, but by some other computer program. These ‘bills of materials’ processors can generate purchase orders much faster than a room full of procurement specialists. Many industries now consider these “events” to be primary. The documents (if they exist at all) are part of the audit trail. Industries like healthcare have long ago replaced the “superbill” (a document on a clipboard with 3 dozen check boxes to represent what the physician did to you on that visit) with 80 specific types of HL7 messages that ricochet back and forth from provider to payer.

And yet, even in the 21^stcentury, we still find ourselves often excerpting facts from unstructured documents and entering them into our computer systems. Here at Semantic Arts, we take the contracts we’ve signed with our clients and scan them for the tidbits that we need to put into our systems (such as the budgets, time frame, staffing and billing rates) and conveniently leave the other 95% of the document in a file somewhere.

Documents and Events, what is the difference?

So for hundreds of years, documents and events were more or less the same thing. Now they have drifted apart. In today’s environment, the real questions are not “what’s the difference” but rather “which one is the truth.” In other words, if there is a difference which one do we use? There is not a one-size-fits-all answer to that dilemma. It varies from industry to industry.

But I think it’s fairly safe to say the current difference is that an “event” is a structured data representation of the business activity, while a “document” is the unstructured data representation. Either one could have come first. Each is meant to be the reflection of the other.

The Event and the Transaction

The event has a very active sense to it because it occurs at a specific point in time. And therefore, we record it in our computer system and create a transaction, which updates our database at the posting date and as the effective accounting date.

The transaction and the event often appear to be the same thing, partly because so many events terminate in the accounting department. But, in reality, the transaction is adding information to the event that allows it to be posted. The main information that is being added is the valuation, the classification and the effective dates. Most people enter these at the same time they capture the event, but they are distinct. The distinction is more obvious when you consider events such as “issuing material” to a production order. The issuer doesn’t know what account number should be charged, nor do they know the valuation (this is buried in an accounting policy that determines whether to cost this widget based on the most recent cost, the oldest cost or the average cost of widgets on hand.) So the “transaction” is different from the “event” even if they occur at the same time.

Until fairly recently, administrators wouldn’t sit at their computer and enter invoices until they were prepared for them to be issued. Most people wait until they ship the widget or complete the milestone before they key in the invoice data and email it to their customer. In this circumstance, the event and the transaction are cotemporaneous – they happen at the same time. And the document being sent to the customer follows shortly thereafter.

One More Disconnect

We are implementing data-centric accounting at Semantic Arts and have disconnected the “event” that is the structured data representation of the event, from its classification as an event. We realized that as soon as we had signed a contract, we knew at least one of the two aspects of our future invoices, and in many cases, we knew both. For fixed price projects, we knew the amount of the future invoices. The only thing we didn’t know was when we could invoice them – because that was based on the date of some given milestone. For time and material contracts we know the dates of our future invoices (end of the month often) but don’t know the amount. And for our best efforts contracts we know the dates and the amounts and adjust the scope to fit.

But knowing these things and capturing them in our accounting system creates a problem. They weren’t actually real yet (or at least they weren’t real enough to be invoices). The sad thing was they looked just like invoices. They had all the data, and it was all valid. They could be rendered to pdfs, and even printed, but we knew we couldn’t send all the invoices to our client all at once. So we now had some invoices in our system that weren’t really invoices, and didn’t have a good way to make the distinction.

As we puzzled over this, we came across a university that was dealing with the same challenge. In their case they were implementing “commitment accounting,” which is trying to keep track of the commitments (purchase orders mostly) that are outstanding as a way to prevent overrunning budgets. As people entered their purchase orders (structured records as we’ve been describing them) the system captured them as events. These events were captured and tallied by the system. In order to get the system to work, people entered purchase orders long before they were approved. In fact, you have to enter them to get an event (or a document) that can be approved and agreed to by your vendor.

The problem was many of these purchase order events never were approved. The apparent commitments vastly exceeded the budgets, and the whole system was shut down.

Actions

We discovered that it isn’t the document, and it isn’t even the event (if we think of the event as the structured data record of the business event) that makes the financial effect real. It is something we are now calling the “action,” or really a special type of “action.”

There is a magic moment when an event, or perhaps more accurately a proto-event becomes real. On a website, it is the “buy” button. In the enterprise ,it is often the “approval” button.

As we worked on this, we discovered it is just one of the steps in a workflow. The workflow for a purchase order might start with sourcing, getting quotes, negotiating, etc. The special step that makes the purchase order “real” isn’t even the last step. After the purchase order is accepted by the vendor, we still need to exchange more documents to get shipping notifications, deal with warranties, etc. It is one of those steps that makes the commitment. We are now calling this the “green button.” There is one step, one button in the workflow progression that makes the event real. In our internal systems we’re going to make that one green, so that employees know when they are committing the firm.

Once you have this idea in your head, you’ll be surprised how often it is missed. I go on my bank’s website and work through the process of transferring money. I get a number of red buttons, and with each one, I wonder, “is this the green one.” Nope, one more step before we’re committed. Same with booking a flight. There are lots of purple buttons, but you have to pay a lot of attention before you notice which one of those purple buttons is really the green one.

Promotion

And what does the green button in our internal systems do? Well, it varies a bit, workflow to workflow, but in many cases it just “promotes” a draft item to a committed one.

In a traditional system you would likely have draft items in one table and then copy them over to the approved table. Or you might have a status and just be careful to exclude the unapproved ones from most queries.

But we’ve discovered that many of these events can be thought of as subtypes of their draft versions. When the green button gets pressed in an invoicing workflow, the draft invoice gains another triple, which makes it also an approved or a submitted invoice – in addition to its being a draft invoice.

Summary

We in the enterprise software industry have had a long history of conflating documents and events. Usually we get away with it, but occasionally it bites us.

What we’re discovering now with the looming advent of data-centric accounting is the need not only to distinguish the document from the event but also distinguish the event (as a structure) from the action that enlivens it. We see this as an important step in the further automation of direct financial reporting.

gist: Buckets, Buckets Everywhere: Who Knows What to Think

October 17, 2025March 14, 2025 by Michael Uschold

gist: Buckets, Buckets Everywhere: Who Knows What to Think

We humans are categorizing machines, which is to say, we like to create metaphorical buckets and put things inside. But there are different kinds of buckets, and different ways to model them in OWL and gist. The most common bucket represents a kind of thing, such as Person or Building. Things that go into those buckets are individuals of those kinds, e.g. Albert Einstein, or the particular office building you work in. We represent this kind of bucket as an owl:Class and we use rdf:type to put something into the bucket.

Another kind of bucket is when you have a group of things, like a jury or a deck of cards that are functionally connected in some way. Those related things go into the bucket (12 members of a jury, or 52 cards). We have a special class in gist called Collection, for this kind of bucket. A specific bucket of this sort will be an instance of a subclass of gist:Collection. E.g. OJs_Jury is an instance of the class Jury, a subclass of gist: Collection. We use gist:memberOf to put things into the bucket. Convince yourself that these buckets do not represent a kind of thing. A jury is a kind of thing, a particular jury is not. We would use rdf:type to connect OJ’s jury to the owl: ClassJury, and use gist:memberOf to connect the specific jurors to OJ’s jury.

A third kind of bucket is a tag which represents a topic and is used to categorize individual items for the purpose of indexing a body of content. For example, the tag “Winter” might be used to index photographs, books and/or YouTube videos. Any content item that depicts or relates to winter in some way should be categorized using this tag. In gist, we represent this in a way that is structurally the same as how we represent buckets that are collections of functionally connected items. The differences are 1) the bucket is an instance of a subclass of gist:Category, rather than of gist: Collection and 2) we put things into the bucket using gist:categorizedBy rather than gist:memberOf. The Winter tag is essentially a bucket containing all the things that have been indexed or categorized using that tag.

Below is a summary table showing these different kinds of buckets, and how we represent them in OWL and gist.

Kind of Bucket	Example	Representing the Bucket	Putting something in the Bucket
Individual of a Kind	John Doe is a Person	Instance of owl:Class	rdf:type
A bucket with functionally connected things inside	Sheila Woods is a member of OJ’s Jury	Instance of a subclass of gist:Collection	gist:memberOf
An index term for categorizing content	The book “Winter of our Discontent” has Winter as one of its tags	Instance of a subclass of gist:Category	gist:categorizedBy

Team Building – The Importance of an Ontology

From Labels to Verbs –Child’s Play!

The Case for Enterprise Ontology

Economy

Cross Domain Use Cases

Serendipity

Serendipity 1: Inherent Product Compatibility

Serendipity 2: Standards Alignment

Serendipity 3: Integrating Acquisitions

Amazon Story of Serendipity

How a “User” Knowledge Graph Can Help Change Data Culture

Simplifying identity management with a knowledge graph approach

Access control data duplication and fragmentation

The challenge of harnessing relationship-rich data

Problematic data farming realities: A social media example

The key enterprise opportunity: A zero-copy user PII graph that respects users

Zero Copy Integration and Radical Simplification

Data sprawl and storage volumes

The cost of copying

The cost of training and inferencing large language models

Data Quality by Design

FAIR data and Zero-Copy Integration

A Knowledge Model for Explainable Military AI

How US Homeland Security Plans to Use Knowledge Graph

SIX AXES OF DECOUPLING

Coupling

Loose Coupling/Decoupling

The Role of the Intermediate

What Makes a Good Intermediary?

Shared Intermediates

The Axes of Decoupling

Technology Dependency

Destination Dependency

Syntax Intermediary

Semantic Intermediary

Identity Intermediary

Nomenclature Intermediary

Summary

Documents, Events and Actions

Documents and Events

Documents and Events, Historically

Documents and Events, what is the difference?

The Event and the Transaction

One More Disconnect

Actions

Promotion

Summary

gist: Buckets, Buckets Everywhere: Who Knows What to Think

Contact Us

Learn More