Understanding the Graph Center of Excellence

Understanding the Graph Center of  Excellence 

The Knowledge Management community has gotten good at extracting and curating  knowledge. 

There is a confluence of activity – including generative AI models, digital twins and shared ledger capabilities that are having a profound impact on enterprises. Recent research by analysts at Gartner places contextualized information and graph technologies at the center of their impact radar for emerging technologies. This recognition of the importance of these critical enablers to define, contextualize and constrain data for consistency and trust is all part of the maturity process for today’s enterprise. It also is beginning to shine light on the emergence of the Graph Center of Excellence (CoE) as an important contributor to achieving strategic objectives.  

For companies who are ready to make the leap from being applications centric to data centric – and for companies that have successfully deployed single-purpose graphs in business silos – the CoE can become the foundation for ensuring data quality and reusability. Instead of transforming data for each new viewpoint or application, the data is stored once in a machine-readable format that retains the original context, connections and meaning that can be used for any purpose.  

And now that you have demonstrated value from your initial (lighthouse) project, the  pathway to progress primarily centers on the investment in people. The goal at this stage of development is to build a scalable and resilient semantic graph as a data hub for all business-driven use cases. This is where building a Graph CoE becomes a critical asset because the journey to efficiency and enhanced capability must be guided.  

Along with the establishment of a Graph CoE, enterprises should focus on the creation of  a “use case tree” or “business capability model” to identify where the data in the graph  can be extended. This is designed to identify business priorities and must be aligned with the data from initial use cases. The objective is to create a reusable architectural framework and a roadmap to deliver incremental value and capitalize on the benefits of content reusability. Breakthrough progress comes from having dedicated resources for the design, construction and support of the foundational knowledge graph.  

The Graph CoE would most logically be an extension of the Office of Data Management and the domain of the Chief Data Officer. It is a strategic initiative that focuses on the adoption of semantic standards and the deployment of knowledge graphs across the enterprise. The goal is to establish best practices, implement governance and provide expertise in the development and use of the knowledge graph. Think of it as both the 

hub of graph activities within your organization and the mechanism to influence organizational culture.  

Some of the key elements of the Graph CoE include: 

• Information Literacy: A Graph CoE is the best approach to ensure organizational understanding of the root causes and liabilities resulting from technology fragmentation and misalignment of data across repositories. It is the organizational advocate for new approaches to data management. The message for all senior executive stakeholders is to both understand the causes of the data dilemma and recognize that properly managed data is an achievable objective.  Information literacy and cognition about the data pathway forward is worthy of being elevated as a ‘top-of-the-house’ priority.  

• Organizational Strategy: One of the fundamental tasks of the Graph CoE is to define the overall strategy for leveraging knowledge graphs within the organization. This includes defining the underlying drivers (i.e., cost containment,  process automation, flexible query, regulatory compliance, governance  simplification) and prioritizing use cases (i.e., data integration, digitalization,  enterprise search, lineage traceability, cybersecurity, access control). The opportunities exist when you gain trust across stakeholders that there is a path to ensure that data is true to original intent, defined at a granular level and in a format that is traceable, testable and flexible to use. 

• Data Governance: The Graph CoE is responsible for establishing data policies and standards to ensure that the semantic layer is built using wise engineering principles that emphasize simplicity and reusability. When combining resolvable identity with precise meaning, quality validation and data lineage – governance shifts away from manual reconciliation. With a knowledge graph at the foundation, organizations can create a connected inventory of what data exists,  how it is classified, where it resides, who is responsible, how it is used and how it moves across systems. This changes the governance operating model – by simplifying and automating it. 

• Knowledge Graph Development: The Graph CoE should lead the development of each of the knowledge graph components. This includes working with subject matter experts to prioritize business objectives and build use case relationships.  Building data and knowledge models, data onboarding, ontology development,  source-to-target mapping, identity and meaning resolution and testing are all areas of activity to address. One of the critical components is the user experience and data extraction capabilities. Tools should be easy to use and help teams do  their job faster and better. Remember, people have an emotional connection to the way they work. Win them over with visualization. Invest in the user interface.  Let them gain hands-on experience using the graph. The goal should be to create value without really caring what is being used at the backend. 

• Cross-Functional Collaboration: The pathway to success starts with the clear and visible articulation of support by executive management. It is both essential and meaningful because it drives organizational priorities. The lynchpin, however,  involves cooperation and interaction among teams from related departments to deploy and leverage the graph capabilities most effectively. Domain experts from  technology are required to provide the building blocks for developing  applications and services that leverage the graph. Business users identify and prioritize use cases to ensure the graph addresses their evolving requirements.  Governance policies need to be aligned with insights from data stewards and compliance officers. Managing the collaboration is essential for orchestrating the  successful shift from applications-centric to data-centric across the enterprise.  

After successfully navigating the initial stages of your project, the onward pathway to  progress should focus on the development of the team of involved stakeholders. The first  hurdle is to expand the identity of data owners who know the location and health of the  data. Much of this is about organizational dynamics and understanding who the players are, who is trusted, who is feared, who elicits cooperation and who is out to kill the activity.  

This coincides with the development of an action plan and the assembly of the team of skilled practitioners needed to ensure success. Enterprises will need an experienced architect who understands the workings of semantic technologies and knowledge graphs to lead the team. The CoE will need ontologists to engineer content and manage the mapping of data. Knowledge graph engineers are needed to coordinate the meaning of data, knowledge and content models. This will also require a project manager to be an advocate for the team and the development process.  

And a final note, organizations working on their AI readiness must understand it requires being ready from the perspective of people, technology and data. The AI-ready data component means incorporating context with the data. Gartner points this out by noting that it necessitates a shift from the traditional ETL mindset to a new ECL (extract,  contextualize and load) orientation. This ensures meaningful data connections. Gartner advises enterprises to leverage semantic metadata as the core for facilitating data connections.  

The Graph CoE is an important step in transforming your lighthouse project or silo deployment into a true enterprise platform. A well-structured CoE should be viewed as a driver of innovation and agility within the enterprise that facilitates better data integration, improves operational efficiency, contextualizes AI and enhances the user experience. It is the catalyst for building organizational capabilities for long-term strategic advantage and one of the key steps in the digital transformation journey. 

The Enterprise Ontology 

The Enterprise Ontology  

At the time of this writing almost no enterprises in North America have a formal enterprise ontology. Yet we believe that within a few years this will become one of the foundational pieces to most information system work within major enterprises. In this paper, we will explain just what an enterprise ontology is, and more importantly, what you can expect to use it for and what you should be looking for, to  distinguish a good ontology from a merely adequate one. 

What is an ontology?  

An ontology is a “specification of a conceptualization.” This definition is a mouthful but bear with me, it’s actually pretty useful. In general  terms, an ontology is an organization of a body of knowledge or, at  least, an organization of a set of terms related to a body of knowledge.  However, unlike a glossary or dictionary, which takes terms and provides definitions for them, an ontology works in the other direction.  An ontology starts with a concept. We first have to find a concept that  is important to the enterprise; and having found the concept, we need to express it in as precise a manner as possible and in a manner that can be interpreted and used by other computer systems. One of the differences between a dictionary or a glossary and ontology is, as we know, dictionary definitions are not really processable by computer systems. But the other difference is that by starting with the concept and specifying it as rigorously as possible, we get definitive meaning that is largely independent of language or terminology. Then the  definition states that an ontology is a “specification of a  conceptualization.” That is what we just described. In addition, of course, we then attach terms to these concepts, because in order for us humans to use the ontology we need to associate the terms that we commonly use. 

Why is this useful to an enterprise?  

Enterprises process great amounts of information. Some of this information is structured in databases, some of it is unstructured in documents or semi structured in content management systems.  However, almost all of it is “local knowledge” in that its meaning is agreed within a relatively small, local context. Usually, that context is  an individual application, which may have been purchased or may  have been built in-house.

One of the most time- and money-consuming activities that enterprise  information professionals perform is to integrate information from  disparate applications. The reason this typically costs a lot of money  and takes a lot of time is not because the information is on different  platforms or in different formats – these are very easy to  accommodate. The expense is because of subtle, semantic differences  between the applications. In some cases, the differences are simple:  the same thing is given different names in different systems. However,  in many cases, the differences are much more subtle. The customer in  one system may have an 80 or 90% overlap with the definition of a  customer in another system, but it’s the 10 or 20% where the  definition is not the same that causes most of the confusion; and there  are many, many terms that are far harder to reconcile than  “customer.” 

So the intent of the enterprise ontology is to provide a “lingua franca”  to allow, initially, all the systems within an enterprise to talk to each  other and, eventually, for the enterprise to talk to its trading partners  and the rest of the world. 

Isn’t this just a corporate data dictionary or consortia of data  standards?  

The enterprise ontology does have many similarities in scope to both a corporate data dictionary and consortia data standard. The similarity is primarily in the scope of the effort: both of those initiatives, as well as  

enterprise ontologies, aim to define the shared terms that an enterprise uses. The difference is in the approach and the tools. With both a corporate data dictionary and a consortia data standard the interpretation and use of the definitions is strictly by humans, primarily system designers. Within an enterprise ontology, the expression of the  ontology is such that tools are able to interpret and make inferences  on the information when the system is running. 

How to build an enterprise ontology  

The task of building an enterprise ontology is relatively straightforward. You would be greatly aided by purchasing a good  ontology editor, although reasonable ontology editors are available for  free. The analytical work is similar to building a conceptual enterprise data model and involves many of the same skills: the ability to form good abstractions, to elicit information from users through interviews,  as well as to find informational clues through existing documentation and data. One of the interesting differences is that as the ontology is being built it can be used in connection with data profiling to see whether the information that is currently being stored in information systems does in fact comply with the rules that the ontology would suggest. 

What to look for in an enterprise ontology  

What distinguishes a good or great enterprise ontology from a merely  adequate one are several characteristics that will mostly be exercised  later in the lifecycle of the actual use of the ontology. Of course, they  are important to consider at the time you’re building the ontology. 

Expressiveness 

The ontology needs to be expressive enough to describe all the distinctions that an enterprise makes. Most enterprises of any size at  all have tens of thousands to hundreds of thousands of distinctions  that they use in their information systems. Not only is each piece of schemata in all of their databases a distinction but so are many of the codes they have in code tables as well as decisions that are called out either in code or in procedure manuals. The sum total of all these distinctions is the operating ontology of the enterprise. However, they are not formally expressed in one place. The structure as well as the base concepts used need to be rich enough that when a new concept is uncovered it can be expressed in the ontology. 

Elegance 

At the same time, we need to strive for an elegant representation. It  would be simple but perhaps simplistic to take all the distinctions in all  the current systems and put them in a simple repository and call them  an ontology. This misses some of the great strengths of an ontology.  We want to use our ontology not only to document and describe  distinctions but also to find similarities. In these days of Sarbanes Oxley regulations it would be incredibly helpful to know which  distinctions and which parts of which schemas deal with financial  commitments and “material transactions.” 

Inclusion and exclusion criteria 

Essentially, the ontology is describing distinctions amongst “types.” In  many cases, what we would like to know is whether a given instance is  of a particular type. Let’s say it’s a record in a product table, therefore  it’s a type “product.” But in another system we may have inventory and we would like to know whether this instance is also compatible with the type that we’ve defined as inventory. In order to do this, we need in the ontology a way to describe inclusion and exclusion criteria:  what other clues we would use if we or another system were evaluating a particular instance to determine whether it was, in fact, of  a particular type. For instance, if inventory were defined as being physical goods held for resale, one inclusion criteria might be weight because weight is an indicator of a physical good. Clearly, there would be many more, as well as criteria for excluding. But this gives you an idea. 

Cross referencing capability 

Another criterion that is very important is the ability to keep track of where the distinction was found; that is, which system currently implements and uses this particular distinction. This is very important for producing any type of where-used information because as we change our distinctions it might have side effects on other systems. 

Inferencing 

Inferencing is the ability to find or infer additional information based on the information we have. For instance, if we know that an entity is a person we can infer that the person has a birthday, whether we know it or not, and we can also infer that the person is less than 150  years old. While this sounds simple at this level, the power in an ontology is when the inference chains become long and complex and we can use the inferencing engine itself to make many of these conclusions on-the-fly. 

Foreign-language support 

As we described earlier, the ontology is a specification of a conceptualization that we attach terms to. It doesn’t take much to add  the ability to add foreign language terms.. This adds a great deal of power for developers who wish to present the same information, and the same screens, in multiple languages, as we are really just manipulating the concepts and attaching the appropriate language at runtime. 

Some of these characteristics are aided by the existence of tools or  infrastructures, but many of them are produced by the skill of the  ontologist.

Summary  

We believe that the enterprise ontology will become a cornerstone in many information systems in the future. It will become a primary part of the systems integration infrastructure as one application will be translated into the ontology and we will very rapidly know what the corresponding schema and terms are and what transformations are needed to get to another application. It will become part of the corporate search strategy as search moves beyond mere keywords into actually searching for meaning. It will become part of business intelligence and data warehousing systems as naïve users can be led to similar terms in the warehouse repository and aid their manual search and query construction. 

Many more tools and infrastructures will become available over the  next few years that will make use of the ontology, but the prudent  information manager will not wait. He or she will recognize that there  is a fair lead time to learn and implement something like this, and any  implementation will be better than none because this particular  technology promises to greatly leverage all the rest of the system  technologies.

How US Homeland Security Plans to Use Knowledge Graph

How US Homeland Security Plans to Use Knowledge Graph

During this summer’s Data Centric Architecture Forum, Ryan Riccucci, Division Chief for  U.S. Border Patrol – Tucson (AZ) Sector, and his colleague Eugene Yockey gave a glimpse of what the data environment is like within the US Department of Homeland Security (DHS), as well as how transforming that data environment has been evolving. 

The DHS celebrated its 20-year anniversary recently. The Federal department’s data challenges are substantial, considering the need to collect, store, retrieve and manage information associated with 500,000 daily border crossings, 160,000 vehicles, and $8 billion in imported goods processed daily by 65,000 personnel. 

Riccucci is leading an ontology development effort within the Customs and Border  Patrol (CBP) agency and the Department of Homeland Security more generally to support  scalable, enterprise-wide data integration and knowledge sharing. It’s significant to note that a  Division Chief has tackled the organization’s data integration challenge. Riccucci doesn’t let leading-edge, transformational technology and fundamental data architecture change intimidate him. 

Riccucci described a typical use case for the transformed, integrated data sharing  environment that DHS and its predecessor organizations have envisioned for decades. 

The CBP has various sensor nets that monitor air traffic close to or crossing the borders  between Mexico and the US, and Canada and the US. One such challenge on the Mexican border is Fentanyl smuggling into the US via drones. Fentanyl can be 50 times as powerful as morphine. Fentanyl overdoses caused 110,000 deaths in the US in 2022. 

On the border with Canada, a major concern is gun smuggling via drone from the US to Canada. Though legal in the US, Glock pistols, for instance, are illegal and in high demand in Canada. 

The challenge in either case is to intercept the smugglers retrieving the drug or weapon drops while they are in the act. Drones may only be active for seven to 15 minutes at a time, so  the opportunity window to detect and respond effectively is a narrow one. 

Field agents ideally need to see enough visual real-time, mapped airspace information on the sensor activated, allowing them to move quickly and directly to the location. Specifics are important; verbally relayed information, by contrast, can often be less specific, causing confusion or misunderstanding.

The CBP’s successful proof of concept involved a basic Resource Description Framework  (RDF) triple, semantic capabilities with just this kind of information: 

Sensor → Act of sensing → drone (SUAS, SUAV, vehicle, etc.) 

In a recent test scenario, CBP collected 17,000 records that met specified time/space requirements for a qualified drone interdiction over a 30-day period. 

The overall impression that Riccucci and Yockey conveyed was that DHS has both the budget and the commitment to tackle this and many other use cases using a transformed data-centric architecture. By capturing information within an interoperability format, the DHS has  been apprehending the bad guys with greater frequency and precision.Copyright @ semanticarts.com

Client 360 – A Foundational Challenge

Client 360 – A Foundational Challenge

When Lehman Brothers collapsed in 2008, CROs, CFOs and chief compliance officers were stuck  pouring through annual reports and frantically searching within corporate documents to  determine Lehman’s actual corporate structure – including who was bankrupt, who funded  whom, who guaranteed what, and who would hold the obligations when everything was finally  sorted out. It took an extraordinary 14 years after the collapse to find out due to the complexity  of untangling a globally interconnected financial institution.  

This legal entity identification problem during the 2008 financial crisis proved to be a systemic  weakness that hampered the ability of regulators to understand and respond to what was  happening in financial markets. Without a standard way of identifying financial institutions and  their relationships to each other – it was near impossible to track interconnectedness, monitor  risks, aggregate exposure, or coordinate regulatory responses. 

KYC/AML 

These challenges were not limited to systemic risk. Firms struggled to maintain consistent  customer identification across their various lines of business and trading operations in order to  meet their Know Your Customer (KYC) and Anti-Money Laundering (AML) obligations. The  complexity of corporate structures makes it almost impossible to track fund flows, identify  suspicious patterns or connect subsidiaries and affiliates across jurisdictions. 

Client 360 

The evolution from corporate entity identification to individual customer identification has been  a natural progression in financial services as well as for many other industry sectors. The rise of  the “customer 360” (better termed “client 360”) approach represents the goal of creating a  complete and unified view of each customer across all business touchpoints. With  fragmentation, however, an individual might have multiple accounts across different product  lines using a host of name variations that result in missed opportunities for cross-selling or  blindness in terms of relationship management.

“Customer” is a trigger word and has been one of the top data management challenges  for companies since the beginning. The plethora of internal battles led to a simple  conclusion – stop trying to harmonize the descriptors. There is no single view of  customer. Every stakeholder’s definition is valid, just not the same. Focus on meaning,  not words – make every person and organization the company touches an “entity” and  assign to every entity a “role” (often multiple roles). Simple and elegant.

Root of the Problem 

All three of these challenges – legal entity identification, KYC/AML and client 360 – stem from the same  root problem: the inability to create consistent identity and meaning across various systems and  databases. Each system speaks its own language and uses proprietary identifiers that become  semantically incompatible data silos. Cross-border complications magnify the problem. These  silos often number hundreds or thousands across large firms. Conventional approaches  (deduplication, centralization, cross-referencing) have proven themselves to be unreliable.  

This entity resolution challenge makes integration across sources extremely difficult and  hampers understanding the relationship between clients, products, interactions, obligations and  transactions. As a result, teams remodel the same entities in different systems. That makes it hard to reconcile. And within these divergent models, programmers use different terms for the  same concept, use the same terms for different concepts or ignore important nuances altogether, making collaboration harder. These discrepancies and broken references are hard to  detect across repositories. And while foreign keys and joins exist, they are often inconsistently  modeled and poorly documented – requiring manual reconciliation by domain experts to find  and fix data quality issues. The lack of entity (and meaning) resolution is risky, costly and totally  unnecessary. 

Semantic Standards as the Foundation 

By addressing the challenges of entity and meaning resolution, organizations can aggregate all  client data into a single, unified view. The most efficient and effective way to accomplish this is to  put the data and the model at the center of the system. This is what we advocate as data-centric architecture – leveraging semantic standards and graph technology to ensure that applications  conform to the data, not the other way around. Semantic interoperability is the key. 

In a data-centric environment, we assign a unique identifier to every data concept. This enables  firms to link data wherever it resides to one master ID – eliminating the need to continually move  and map data across the enterprise. Rather than each system having its own definition of  “customer,” “legal entity,” or “beneficial owner,” semantic standards ensure a shared  understanding of requirements between business stakeholders and application developers.  

As a result, systems can automatically understand and translate between different formats because everyone uses the same definitions for business concepts. When a new entity is  created, the systems understand its place in the corporate hierarchy without additional mapping. Data and application models can be catalogued and mapped to ensure that users can  find where the business concept resides. Instead of pulling data from multiple systems, data centric maintains a semantic model of each customer and their relationships. New data is  automatically integrated based on semantic understanding rather than manual ELT processes.  This data-centric approach becomes the foundational infrastructure for achieving Client 360.  

Client 360 Maturity Cycle 

Many in the financial services industry are already moving toward this vision, with leading  institutions implementing knowledge graph solutions built on semantic standards. The  migration from solving entity identification problems to enabling Client 360 represents more  than technological evolution – it’s a fundamental shift toward semantic-first data architecture (without the rip and replace of traditional methods). Below is a three-level maturity guideline to  help you implement a common language across your organization … 

1. Maturity Level 1: Demonstration of Capability – This involves working with your SMEs  to verify business requirements and build the business capability model. This goal of this  maturity level will be to integrate at least two of your client-related datasets into a single  model based on the client 360 ontology. The team will write and execute scripts to  transform the data and test it for logic and reasoning validity. The result will be a core  knowledge graph to enable key stakeholders to understand the query and analytical  capabilities of data-centric by looking at their own data. 

2. Maturity Level 2: Expanded Capability – This level focuses on harvesting additional  customer datasets related to client 360. The goal is to link use cases (i.e. KYC, risk exposure  analysis, CCAR, Basel III, FRTB, BCBS 239, Rule 4210, lineage traceability, cost of service,  customer classification, profitability analysis, issue management, etc.) based on your internal  priorities. This result will be an expanded domain ontology required to implement entity  resolution and ensure conformance of the data to internal data service (DSAs) and service  level agreements (SLAs). 

3. Maturity Level 3 – Semantic Operations and GUI – Install a licensed, production-ready  triplestore. You will rewrite RDF transformation scripts for your internal environment and  set up data transformation workflows. This includes implementing change management  approval processes, automated quality testing and entitlement controls. This should  include training in expanding analytical and reporting capabilities as well as  implementation of graphical user interfaces.

A Final Word 

Starting your data-centric journey with legal entities and individuals is strategically sound as well  as wise data policy. These “entities” represent the core actors in almost every business process.  They are the primary subjects that most other data points relate to – they drive transactions, sign  contracts, participate in supply chains and have a wide variety of relationships with your  organization. We have learned that by adopting semantic standards for these foundational  elements, you create a stable baseline upon which all other data relationships can be built. 

By virtue of their centricity, restructuring your data environment as a connected infrastructure  for organizations and people delivers immediate and tangible value across most departments and in terms of relationship management, reduced data duplication and enhanced regulatory  compliance. Adopting data-centric standards for clients and legal entities is the first step in  unraveling the critical connections that translate into better risk assessments and opportunity  identification. Client 360 represents the path of least resistance – and one that delivers  maximum initial impact for your organization.

Zero Copy Integration and Radical Simplification

Zero Copy Integration and Radical Simplification

Dave McComb’s book Software Wasteland underscored a fundamental problem:  Enterprise software sometimes costs 1,000 times more than it ought to. The poster child for cost overruns was highlighted in the book was Healthcare.gov, a public registration system for the US Affordable Care Act, enacted in 2010. By 2018, the US Federal government had spent  $2.1 billion to build and implement the system. Most of that money was wasted. The  government ended up adopting many of the design principles embodied in an equivalent  system called HealthSherpa, which cost $1 million to build and implement. 

In an era where the data-centric architecture Semantic Arts advocates should be the  norm, application-centric architecture still predominates. But data-centric architecture doesn’t just reduce the cost of applications. It also attacks the data duplication problem attributable to  poor software design. This article explores how expensive data duplication has become, and  how data-centric, zero-copy integration can put enterprises on a course to simplification. 

Data sprawl and storage volumes  

In 2021, Seagate became the first company to ship three zettabytes worth of hard disks.  It took them 36 years to ship the first zettabyte. Six years to ship the second zettabyte, and only one additional year to ship the third zettabyte. 

The company’s first product, the ST-506, was released in 1980. The ST-506 hard disk,  when formatted, stored five megabytes (10002). By comparison, an IBM RAMAC 305,  introduced in 1956, stored five to ten megabytes. The RAMAC 305 weighed 10 US tons (the  equivalent of nine metric tonnes). By contrast, the Seagate ST-506, 24 years later, weighed five  US pounds (or 2.27 kilograms). 

A zettabyte is the equivalent of 7.3 trillion MP3 files or 30 billion 4K movies, according to  Seagate. When considering zettabytes: 

  • 1 zettabyte equals 1,000 exabytes. 
  • 1 exabyte equals 1,000 petabytes. 
  • 1 petabyte equals 1,000 terabytes. 

IDC predicts that the world will generate 178 zettabytes of data by 2025. At that pace, “The  Yottabyte Era” would succeed The Zettabyte Era by 2030, if not earlier. 

The cost of copying  

The question becomes, how much of the data generated will be “disposable” or  unnecessary data? In other words, how much data do we actually need to generate, and how 

much do we really need to store? Aren’t we wasting energy and other resources by storing  more than we need to? 

Let’s put it this way: If we didn’t have to duplicate any data whatsoever, the world would only have to generate 11 percent of the data it currently does. In 2021 terms, we’d only need  to generate 8.7 zettabytes of data, compared with the 78 zettabytes we actually generated worldwide over the course of that year. 

Moreover, Statista estimates that the ratio of unique to replicated data stored worldwide will decline to 1:10 from 1:9 by 2024. In other words, the trend is  toward more duplication, rather than less. 

The cost of storing oodles of data is substantial. Computer hardware guru Nick  Evanson, quoted by Gerry McGovern in CMSwire, estimated in 2020 that storing two  yottabytes would cost $58 trillion. If the cost per byte stored stayed constant, 40 percent of the world’s economic output would be consumed in 2035 by just storing data. 

Clearly, we should be incentivizing what graph platform Cinchy calls “zero-copy  integration”–a way of radically reducing unnecessary data duplication. The one thing we don’t  have is “zero-cost” storage. But first, let’s finish the cost story. More on the solution side and zero-copy integration later. 

The cost of training and inferencing large language models  

Model development and usage expenses are just as concerning. The cost of training  machines to learn with the help of curated datasets is one thing, but the cost of inferencing–the  use of the resulting model to make predictions using live data–is another. 

“Machine learning is on track to consume all the energy being supplied, a model that is costly, inefficient, and unsustainable,” Brian Bailey in Semiconductor Engineering pointed out  in 2022. AI model training expense has increased with the size of the datasets used, but more importantly, as the amount of parameters increases by four, the amount of energy consumed in the process increases by 18,000 times. Some AI models included as many as 150 billion parameters in 2022. The more recent ChatGPT LLM Training includes 180 billion parameters.  Training can often be a continuous activity to keep models up to date. 

But the applied model aspect of inferencing can be enormously costly. Consider the AI  functions in self-driving cars, for example. Major car makers sell millions of cars a year, and each  one they sell is utilizing the same carmaker’s model in a unique way. 70 percent of the energy  consumed in self-driving car applications could be due to inference, says Godwin Maben, a  scientist at electronic design automation (EDA) provider Synopsys. 

Data Quality by Design  

Transfer learning is a machine learning term that refers to how machines can be taught  to generalize better. It’s a form of knowledge transfer. Semantic knowledge graphs can be a  valuable means of knowledge transfer because they describe contexts and causality well with  the help of relationships.

Well-described knowledge graphs provide the context in contextual computing.  Contextual computing, according to the US Defense Advanced Research Projects Agency  (DARPA), is essential to artificial general intelligence. 

A substantial percentage of training set data used in large language models is more or less duplicate data, precisely because of poorly described context that leads to a lack of generalization ability. Thus the reason why the only AI we have is narrow AI. And thus the reason large language models are so inefficient. 

But what about the storage cost problem associated with data duplication? Knowledge graphs can help with that problem also, by serving as a means for logic sharing. As Dave has  pointed out, knowledge graphs facilitate model-driven development when applications are  written to use the description or relationship logic the graph describes. Ontologies provide the logical connections that allow reuse and thereby reduce the need for duplication. 

FAIR data and Zero-Copy Integration  

How do you get others who are concerned about data duplication on board with semantics and knowledge graphs? By encouraging data and coding discipline that’s guided by  FAIR principles. As Dave pointed out in a December 2022 blog post, semantic graphs and FAIR  

principles go hand in hand. https://www.semanticarts.com/the-data-centric-revolution-detour shortcut-to-fair/ 

Adhering to the FAIR principles, formulated by a group of scientists in 2016, promotes  reusability by “enhancing the ability of machines to automatically find and use the data, in  addition to supporting its reuse by individuals.” When it comes to data, FAIR stands for Findable, Accessible, Interoperable, and Reusable. FAIR data is easily found, easily shared,  easily reused quality data, in other words. 

FAIR data implies the data quality needed to do zero-copy integration. 

Bottom line: When companies move to contextual computing by using knowledge  graphs to create FAIR data and do model-driven development, it’s a win-win. More reusable  data and logic means less duplication, less energy, less labor waste, and lower cost. The term  “zero-copy integration” underscores those benefits.

Data-Centric Credentialling

Data-Centric Credentialling

In order to ensure that clients can get what they expect when they buy software or services that purport to be “data-centric” we are going to implement a credentialling program. The program will be available at three levels. 

Implementation Awards 

These are assessments and awards given to clients for projects or enterprises to recognize the milestones on their journey to becoming completely data-centric.  

It is a long journey. There is great benefit along the way, and these awards are meant to  recognize progress on the journey 

Software Certification 

The second area is in certify that software meets the goals of the data-centric approach.  There will be two major categories: 

  • Middleware – databases, messaging systems, and non-application-specific tools that  might be used in a data-centric implementation will be evaluated on its consistency  with the approach 
  • Applications – as described in the book “Real Time Financial Accounting, the Data Centric Way” we expect that vertical industries will be far easier and more consistent with the data-centric approach. Horizontal applications will be evaluated based on  their ease of being truly integrated with the rest of a data-centric enterprise.  Adhering to open models and avoiding proprietary structures will also improve the rating in this area. 

Professional Services  

There will be two levels of professional services credentialling, one based on what you know and the other on what you’ve done.  

The “what you know “will be based on studying and testing akin to the Project Management  Institute of the Data Management DMBOK. 

The “what you’ve done” recognizes that a great deal of the ability to deliver these types of projects is based on field experience. 

Amgen: Data Centric Architecture

Amgen: Data Centric Architecture

Amgen is a large biotechnology company committed to unlocking the potential of  biology for patients suffering from serious illnesses by discovering, developing,  manufacturing, and delivering innovative human therapeutics. Amgen, CEO Bob Bradway focuses on innovation to set the cultural direction. According to Bradway: “Push the  boundaries of biotechnology and knowledge to be part of the process of changing the practice  of medicine.”  

Amgen’s goal is to provide life-changing value to patients with expediency.  Democratized access to enterprise data speeds the process from drug discovery to drug delivery. One element Amgen’s strategic data leadership agreed upon is that a common language expedites product development by removing ambiguities that slow business processes.  

Data capture comes from a multitude of information systems, each using their own data model and unique vocabularies. Different systems use different terminology to refer to the same concept. An organization steeped in data silos no longer works. The challenge is to provide a common intuitive model for all systems and people to use. Once such a model is in place, it is no longer laborious and expensive for enterprise consumers to benefit from the data. A decision to establish a semantic layer for building an enterprise data fabric emerged.  

Amgen developed a vision of a Data-Centric Architecture (DCA) that transforms data from being system-specific to being universally available. Data is organized and unambiguously represented in data domains within a Semantic layer. 

Broadridge: Legacy Understanding

Broadridge: Legacy Understanding

This firm processes some 70% of all the back office of Wall Street. They have three major systems, for different types of financial instruments and jurisdictions. The three systems are barely integrated. Bringing a client up on any of their systems is a multiyear endeavor.  Getting combined reporting from these three systems is nearly impossible. 

They have embarked on a major initiative to create a path toward integration, initially integrating the systems they have, ultimately delivering a fully integrated system. 

One of the barriers is the complexity of the existing systems. They are massively complex and built on completely different architectures. 

One of the systems was designed with an extremely table-driven design. While this made it  very flexible, it also created performance problems as well as understanding problems.  There are only two people in the world who understand all the intricacies of the system. 

We built an ontology of the functions that the system covered. We then took all the metadata in the tables that drive the processing and loaded them into a triple store. We  constructed a series of SPARQL queries that allowed relatively new personnel to pose and  get answers to complex questions regarding how the existing system works. This has become a key input into their project to understand and integrate their systems to create legacy understanding.

Contact Us: 

Overcome integration debt with proven semantic solutions. 

Contact Semantic Arts, the experts in data-centric transformation, today! 

CONTACT US HERE 

Address: Semantic Arts, Inc. 

123 N College Avenue Suite 218 

Fort Collins, CO 80524 

Email: [email protected] 

Phone: (970) 490-2224

Teacher Retirement System: Enterprise Architecture

Teacher Retirement System: Enterprise Architecture

The Teacher Retirement Systems is one of the largest pension funds in the country, with 1  million active teachers and 250,000 retirees. They run the organization on a series of aging mainframe systems. In the late 90s, they attempted a major upgrade to their technology, but finally had to abandon that path. Since then, they have mostly been front-ending their existing systems with newer proxy systems to deal with web-based clients and the like. 

We worked with them to design a future enterprise architecture, featuring SOA and ontology-driven messages. They have begun work on some of the early projects in the plan.

Contact Us: 

Overcome integration debt with proven semantic solutions. 

Contact Semantic Arts, the experts in data-centric transformation, today! 

CONTACT US HERE 

Address: Semantic Arts, Inc. 

123 N College Avenue Suite 218 

Fort Collins, CO 80524 

Email: [email protected] 

Phone: (970) 490-2224

gistBFO: An Open-Source, BFO Compatible Version of gist

gistBFO: An Open-Source, BFO Compatible Version of gist 

Dylan ABNEY a,1, Katherine STUDZINSKI a, Giacomo DE COLLE b,c,  Finn WILSON b,c, Federico DONATO b,c, and John BEVERLEY b,c aSemantic Arts, Inc. 

bUniversity at Buffalo 

cNational Center for Ontological Research 

ORCiD ID: Dylan Abney https://orcid.org/0009-0005-4832-2900, Katherine Studzinski  https://orcid.org/0009-0001-3933-0643, Giacomo De Colle https://orcid.org/0000- 0002-3600-6506, Finn Wilson https://orcid.org/0009-0002-7282-0836, Federico  Donato https://orcid.org/0009-0001-6600-240X, John Beverley https://orcid.org/0000- 0002-1118-1738 

Abstract. gist is an open-source, business-focused ontology actively developed by  Semantic Arts. Its lightweight design and use of everyday terminology has made it a useful tool for kickstarting domain ontology development in a range of areas including finance, government, and pharmaceuticals. The Basic Formal Ontology  (BFO) is an ISO/IEC standard upper ontology that has similarly found practical application across a variety of domains, especially biomedicine and defense. Given its demonstrated utility, BFO was recently adopted as a baseline standard in the U.S.  Department of Defense and Intelligence Community. 

 Because BFO sits at a higher level of abstraction than gist, we see an opportunity  to align gist with BFO and get the benefits of both: one can kickstart domain  ontology development with gist, all the while maintaining an alignment with the  BFO standard. This paper presents such an alignment, which consists primarily of subclass relations from gist classes to BFO classes and includes some subproperty axioms. The union of gist, BFO, and this alignment is what we call “gistBFO.” The upshot is that one can model instance data using gist and then instances of gist classes can be mapped to BFO. This not only achieves compliance with the BFO  standard; it also enables interoperability with other domains already modelled using  BFO. We describe a methodology for aligning gist and BFO, provide rationale for decisions we made about mappings, and detail a vision for future development. 

Keywords. Ontology, upper ontology, ontology alignment, gist, BFO 

1. Introduction 

In this paper, we present an alignment between two upper ontologies: gist and the Basic  Formal Ontology (BFO). While both are upper ontologies, gist and BFO exhibit rather different formal structures. An alignment between these ontologies allows users to get  the benefits of both. 

An ontology is a representational artifact which includes a hierarchy of classes of entities and logical relations between them [1, p.1]. Ontologies are increasingly being  used to integrate diverse sorts of data owing to their emphasis on representing implicit  semantics buried within and across data sets, in the form of classes and logical relations  among them [2]. Such formal representations facilitate semantic interoperability, where diverse data is connected by a common semantic layer. Ontologies have additionally  proven valuable for clarifying the meanings of terms [3] and supporting advanced  reasoning when combined with data, in the form of knowledge graphs [4]. 

The Basic Formal Ontology (BFO) is an upper-level ontology that is used by over  700 open-source ontologies [5]. It is designed to be very small, currently consisting only  of 36 classes, 40 object properties, and 602 axioms [6]. BFO satisfies the conditions for counting as a top-level ontology, described in ISO/IEC 21838-1:2021: it is “…created to represent the categories…shared across a maximally broad range of domains” [7].  ISO/IEC 21838-2:2021 establishes BFO as a top-level ontology standard [8]. The BFO  ecosystem adopts a hub-and-spokes strategy for ontology extensions, where classes in  BFO form a hub, and new subclasses of BFO classes are made as spokes branching out  from it. Interoperability between different ontologies can be preserved by linking up to  BFO as a common hub. All classes in BFO are subclasses of bfo:Entity2 [9], which  includes everything that has, does, or will exist. Within this scope, BFO draws a fundamental distinction with two classes: bfo:Continuant and bfo:Occurrent.  Roughly, a continuant is a thing that persists over some amount of time, whereas an  occurrent is something that happens over time [1]. A chef is an example of a continuant,  and an act of cooking is an example of an occurrent. 

gist is a business-focused upper-level ontology that has been developed over the last  15+ years and used in over 100 commercial implementations [10]. Ontology elements found in gist leverage everyday labels in the interest of facilitating stakeholder understanding to support rapid modeling. Much like BFO, gist contains a relatively small  number of terms, relations, and formally specified axioms: It has 98 classes, 63 object  properties, 50 datatype properties, and approximately 1400 axioms, at the time of this  writing. Approximately 20 classes are at the highest level of the gist class hierarchy.  Subclasses are defined using a distinctionary pattern,3 which includes using a subclass axiom along with disjointness axioms and property restrictions to distinguish a class from its parents and siblings. gist favors property restrictions over domain and range axioms to maintain generality and avoid a proliferation of properties [12]. Commonly used top level classes include gist:Commitment, gist:Event, gist:Organizationgist:PhysicalIdentifiableItem, and gist:Place

Ontology alignments in general are useful because they allow interoperability between ontologies and consequently help prevent what has been called the ontology silo  problem, which arises when ontologies covering the same domain are constructed independently from one another, using differing syntax and semantics [13]. Ontologists  typically leverage the Resource Description Framework (RDF) and vocabularies extended from it, to maintain flexibility when storing data into graphs, which goes some  way to address silo problems. If, however, data is represented in RDF using different ontologies, enriched with different semantics, then ontology silo problems emerge.  Alignment between potential ontology silos can address this problem by allowing the data to be interpreted by each aligned ontology. 

Needless to say, given the respective scopes of gist and BFO, as well as their overlapping users and domains, we have identified them as ontology silos worth aligning.  For current users of gist, alignment provides a way to leverage BFO without requiring  

2 We adopt the convention of displaying class names in bold, prepended with a namespace identifier indicating provenance. 3 The distinctionary pattern outlined in [11] is like the Aristotelian approach described in [1].

any additional implementation. For new users of gist, it provides a pragmatic base for building domain ontologies. This is of particular importance as BFO was recently adopted as a baseline standard in the U.S. Department of Defense and Intelligence  Community [14]. For stakeholders in both the open and closed space, the alignment proposed here will allow users to model a domain in gist and align with other ontologies in the BFO ecosystem, satisfying the requirements of leveraging an ISO standard. In the other direction, users of BFO will be able to leverage domain representations in gist,  gaining insights into novel modeling patterns, potential gaps in the ecosystem, and avenues for future modeling research. 

2. Methodology 

In this section we discuss the process we used to build what we call “gistBFO,” an ontology containing a semantic alignment between gist and BFO. We started by creating an RDF turtle file that would eventually contain all the mappings, and then manually worked out the connections between gist and BFO starting from the upper-level classes of both ontologies. We specified that our new ontology imports both the gist and BFO  ontologies, complete with their respective terms and axioms. To make use of gistBFO, it  can be imported into a domain ontology that currently uses gist. 

Figure 1. gistBFO import hierarchy4 

2.1. Design principles 

To describe our methodology, it is helpful to distinguish between alignments and mappings [15]. By alignment we mean a set of assertions (or “triples”) of the form <s, p,  o> that relate the terms of one ontology to another. gistBFO contains such an alignment.  The individual assertions contained within an alignment are mappings.  

4 This diagram is adapted from a similar diagram in [10].

gist:Specification subClassOf bfo:GenericallyDependentContinuant5 (bfo:GDC,  hereafter) is an example of one mapping in gistBFO’s alignment [16].  By way of evaluation, we have designed gistBFO to exhibit a number of important  properties: consistency, coherence, conservativity, specificity, and faithfulness [17]. An  ontology containing an alignment is consistent just in case its mappings and the component ontologies do not logically entail a contradiction. For example, if a set of  assertions entails both that Bird is equivalent to NonBird and that NonBird is equivalent  to the complement of Bird, then it is inconsistent. Relatedly, such an ontology is  coherent just in case all of its classes are satisfiable. In designing an ontology, a common mistake is creating an unsatisfiable class—a class that cannot have members on pain of  a contradiction.6 Suppose a class A is defined as a subclass of both B and the complement of B. Anything asserted as a member of A would be inferred to be a member of B and its complement, resulting in a contradiction. Note that the definition of A itself  does not result in a logical inconsistency; it is only when an instance is asserted to be a member of A that a contradiction is generated. 

Consistency and coherence apply to gistBFO as a whole (i.e., the union of gist, BFO,  and the alignment between them). The next several apply more specifically to the alignment. 

An alignment is conservative just in case it does not add any logical entailments  within the aligned ontologies.7 Trivially, gistBFO allows more to be inferred than either gist or BFO alone, since it combines the two ontologies and adds mapping assertions between them. However, it should not imply anything new within gist or BFO, which would effectively change the meanings of terms within the ontologies. For example, gist  never claims that gist:Content subClassOf gist:Event. If gistBFO were to imply this,  it would not only be ontologically suspect, but it would extend gist in a non-conservative  manner, effectively changing the meaning of gist:Content. Similarly, BFO never claims  that BFO:GDC subClassOf BFO:Process (again, for good reason); so if gistBFO were  to imply this, this too would make it a non-conservative extension, changing the content  of BFO itself. It is desirable for the alignment to be a conservative extension of gist and  BFO so that it is not changing the meaning of terms within gist or BFO. By the same  token, if gistBFO were to remove axioms from gist or BFO, it would need to be handled carefully so that it too preserves the spirit of the two ontologies. (More on this in Section  4.1.1.) Additionally, if gistBFO does not remove any axioms from gist or BFO, there is  no need to maintain separate artifacts with modified axioms. 

An alignment is specific to the extent that terms from the ontologies are related to the most specific terms possible. For example, one possible alignment between gist and  BFO would contain mappings from each top-level gist class to bfo:Entity. While this would constitute a bonafide alignment by our characterization above, it is not an interesting or useful one. If it achieves BFO-compliance, it is only in a trivial sense. For  this reason, we aimed to be specific with our alignment and mapped gist classes to the lowest BFO classes that were appropriate.  

5 Strictly speaking, the IRI for generically dependent continuant in BFO is obo:BFO_0000031, but we use bfo:GenericallyDependentContinuant (and bfo:GDC for short). The actual subclass relation used in the  alignment is rdfs:subClassOf, but the namespace prefix is dropped for brevity. 6 In OWL, unsatisfiable classes are subclasses of owl:Nothing, the class containing no members. It is  analogous to the set-theoretic notion of “the empty set.” 7 See [17, p.3] for a more formal explanation of conservativity in the context of an alignment.

An alignment is faithful to the extent that it respects the intended meanings of the terms in each ontology. Intent is not always obvious, but it can often be gleaned from  formal definitions, informal definitions/annotations, and external sources. 

We aim in this work for gistBFO to exhibit the above properties. Note also that two ontologies are said to be synonymous just in case anything expressed in one ontology can be expressed in terms of the other (and vice versa) [18]. We do not attempt to establish synonymy with this alignment. First, for present purposes, our strategy is to model in gist and then move to BFO, not the other way around. Second, the alignment in its current form consists primarily of subclass assertions from gist classes to BFO classes. With an accurate subclassing bridge, instances modeled in gist would then achieve an initial level of BFO-compliance, as instances can be inferred into BFO classes. A richer mapping  might be able to take an instance modeled in gist and then translate that entirely into  BFO, preserving as much meaning as possible. For example, something modeled as a  gist:Event with gist:startDateTime and gist:endDateTime might be modeled as a  bfo:Process related to a bfo:TemporalRegion. We gesture at some more of these richer  mappings in the Conclusion section, noting that our ultimate plan is to investigate these  richer mappings in the future. So, while we do not attempt to establish synonymy  between gist and BFO at present, we do have a goal of preserving as much meaning as  possible in the alignment here, and plan to expand this work in the near future. In that  respect, our work here provides a firm foundation for a richer, more complex, semantic  alignment between gist and BFO. 

Given our aim of creating a BFO-compliant version of gist, we have created a consistent, coherent, conservative, specific, and faithful ontology. Since both gist and  BFO are represented in the OWL DL profile, consistency and coherence were established using HermiT, a DL reasoner [19, 20]. By running the reasoner, we were able to establish that no logical inconsistencies or unsatisfiable classes were generated. While it is  undecidable in OWL 2 DL whether an alignment is a conservative extension, one can  evaluate the approximate deductive difference by looking more specifically at the  subsumption relations that hold between named classes in gist or BFO.8 We checked, for example, that no new entailments between gist classes were introduced. Specificity and  faithfulness are not as easily measured, but we detail relevant design choices in the  Discussion section as justification for believing our alignment exhibits these properties  as well. 

2.2. Identifying the mappings 

The properties detailed in Section 2.1 give a sense of our methodological aims for gistBFO. Now we turn to our methods for creating the mappings within the alignment. In our initial development of the alignment, we leveraged the BFO Classifier [22].  Included in the BFO Classifier was a decision diagram that allowed us to take instances of gist classes, answer simple questions, and arrive at a highly plausible candidate superclass in BFO. For example, consider a blueprint for a home. In gist, a blueprint  would fall under gist:Specification. To see where a blueprint might fall in BFO, we  answered the following questions: 

8 The set of changed subsumption entailments from combining ontologies with mappings has been called the approximate deductive difference [17, p.3; 21].

Q: Does this entity persist in time or unfold in time? A: It persists. So, a  blueprint is a bfo:Continuant

Q: Is this entity a property of another entity or depends on at least one other  entity? A: Yes, a blueprint depends on another entity (e.g., a sheet of paper) to be represented. 

Q: May the entity be copied between a number of bearers? A: Yes, a blueprint can be copied across multiple sheets of paper. So, a blueprint is a bfo:GDC

Given that blueprints are members of gist:Specification and bfo:GDC (at least  according to our answer above), bfo:GDC was considered a plausible candidate  superclass for gist:Specification. And indeed, as we think about all the possible  instances of gist:Specification, they all seem like they would fall under bfo:GDC

Our alignment was not conducted entirely by using the BFO Classifier. Our teams are constituted by lead developers, stakeholders, and users of both gist and BFO.  Classification was refined through consensus-driven meetings, where the meanings of ontology elements in respective structures were discussed, debated, and clarified. Thus, while the BFO Classifier tool provided a very helpful starting point for discussions of alignment, thoughtful effort was put into identifying and verifying that the gist and BFO mappings exhibited the highest degree of accuracy. 

Tables 1 and 2 contain a non-exhaustive list of important classes and definitions from gist and BFO that we refer to throughout the paper. 

BFO Class Elucidation/Definition
Continuant An entity that persists, endures, or continues to exist through time while maintaining
its identity.

Independent
Continuant A continuant which is such that there is no x such that it specifically depends on x
and no y such that it generically depends on y.

Specifically
Dependent
Continuant A continuant which is such that (i) there is some independent continuant x that is not
a spatial region, and which (ii) specifically depends on x.

Generically
Dependent
Continuant An entity that exists in virtue of the fact that there is at least one of what may be
multiple copies.

Material Entity An independent continuant that at all times at which it exists has some portion of
matter as continuant part.

Immaterial Entity An independent continuant which is such that there is no time t when it has a
material entity as continuant part.

Object A material entity which manifests causal unity and is of a type instances of which
are maximal relative to the sort of causal unity manifested.

Occurrent An entity that unfolds itself in time or is the start or end of such an entity or is a
temporal or spatiotemporal region.

Process An occurrent that has some temporal proper part and for some time has a material
entity as participant.
Table 1. Selected BFO classes and definitions [6]

gist Class Elucidation/Definition
Event Something that occurs over a period of time, often characterized as an activity being
carried out by some person, organization, or software application or brought about
by natural forces.

Organization A generic organization that can be formal or informal, legal or non-legal. It can have
members, or not.


Building A relatively permanent man-made structure situated on a plot of land, having a roof
and walls, commonly used for dwelling, entertaining, or working.


Unit of Measure A standard amount used to measure or specify things.


Physical
Identifiable Item A discrete physical object which, if subdivided, will result in parts that are
distinguishable in nature from the whole and in general also from the other parts.


Specification One or more characteristics that specify what it means to be a particular type of
thing, such as a material, product, service or event. A specification is sufficiently
precise to allow evaluating conformance to the specification.


Intention Goal, desire, aspiration. This is the “teleological” aspect of the system that indicates
things are done with a purpose.

Temporal Relation A relationship existing for a period of time.


Category A concept or label used to categorize other instances without specifying any formal
semantics. Things that can be thought of as types are often categories.


Collection A grouping of things.


Is Categorized By Points to a taxonomy item or other less formally defined class.

Is Member Of Relates a member individual to the thing, such as a collection or organization, that it
is a member of.
Table 2. Selected gist classes and definitions [23]

3. Results 

The gistBFO alignment contains 43 logical axioms. 35 of these axioms are subclass assertions relating gist classes to more general classes in BFO. All gist classes have a superclass in BFO.9 The remaining eight axioms are subproperty assertions. We focused  on mapping key properties in gist (e.g., gist:isCategorizedBy and gist:isMemberOf) to  BFO properties. While mapping gist properties to more specific properties in BFO does  not serve the use case of starting with gist and inferring into BFO, it nevertheless  provides a richer connection between the ontologies, which we view as a worthy goal. 

In addition to these 43 logical axioms, gistBFO also contains annotations expressing  the rationale behind some of the mapping choices. We created an annotation property  gist:bfoMappingNote for this purpose. 

At the highest level, almost all classes in gist fall under bfo:Continuant, since their  instances are things that persist through time rather than unfold over time. Exceptions to  this are instances falling under gist:Event and its subclasses, which (generally) fall under  bfo:Occurrent

Some of the gist subclasses of bfo:Continuant include gist:Collectiongist:PhysicallyIdentifiableItem, and gist:Content. Within BFO, continuants break  down into bfo:IndependentContinuant (entities that bear properties), bfo:GDC  (copyable patterns that are often about other entities), and bfo:SDC (properties borne by  independent continuants). With respect to our alignment, introduced subclasses of  bfo:IndependentContinuant include gist:Building or gist:Component or other  material entities like gist:PhysicalSubstance.10 Subclasses of bfo:GDC include  gist:Content, gist:Language, gist:Specification, gist:UnitOfMeasure, and  

9 An exception is gist:Artifact, which, in addition to being difficult to place in BFO, is slated for removal  from gist. 10 Best practice in BFO is to avoid mass terms [1], whereas gist:PhysicalSubstance is intentionally  designed to represent them—e.g., a particular amount of sand. Regardless, this class of mass terms would  map into a subclass of bfo:IndependentContinuant.

gist:Template—all things that can be copied across multiple bearers.11 A subclass of  bfo:SDC includes gist:TemporalRelation—a relational quality holding between  multiple entities. 

In most cases, the subclass assertions are simple in construction, relating a named  class in gist to a named class in BFO, for example, gist:Specification subClassOf  bfo:GDC. A more complex pattern involves the use of OWL property restrictions. For  example, gist:ControlledVocabulary was asserted to be a subclass of bfo:GDCs that  have some bfo:GDC as a continuant part. 

gist:ControlledVocabulary 

rdfs:subClassOf [ 

 a owl:Class ; 

 owl:intersectionOf ( 

  # class = bfo:GDC 

 obo:BFO_0000031 

  [ 

  a owl:Restriction ; 

  # property = bfo:hasContinunantPart  owl:onProperty obo:BFO_0000178 ;  # class = bfo:GDC 

 owl:someValuesFrom obo:BFO_0000031 ; 

  ) ; ] ; . 

In other cases, we employed a union pattern—e.g., gist:Intention is a subclass of the  union of bfo:SDC and bfo:GDC. Had we chosen a single named superclass in BFO for  gist:Intention, it might have been bfo:Continuant. The union pattern, however, allows  our mapping to exhibit greater specificity, as discussed above.  

Figures 2 through 4 illustrate important subclass relationships between gist and BFO  classes: 

Figure 2. Continuants in gist 

11 Many of these can be understood as various sorts of ‘information’, which should be classified under  bfo:GDC. For example, units of measurement are standardized information which describe some magnitude  of quantity.

Figure 3. Independent and dependent continuants in gist 

Figure 4. gist:Event 

4. Discussion 

In this section we discuss in depth some specific mappings we made, focusing most  closely on some challenging cases. 

4.1.1. gist:Intention and gist:Specification 

One challenging case was gist:Intention and its subclass gist:Specification. The textual  definition of gist:Intention suggests it is a mental state that is plausibly placed under  bfo:SDC. That said, the textual definition of gist:Specification (think of a blueprint)  suggests this class plausibly falls under bfo:GDC. Given that bfo:SDC and bfo:GDC 

are disjoint in BFO, this would result in a logical inconsistency. We thus appear to have  encountered a genuine logical challenge to our mapping. 

Exploring strategies for continuing our effort, we considered importing a “relaxed”  version of BFO that drops the disjointness axiom between bfo:SDC and bfo:GDC.  Arguably this option would respect the spirit of gist (by placing gist:Intention and  gist:Specification in their true homes in BFO) while losing a bit of the spirit of BFO.  While this may appear to be an unsatisfactory mapping strategy, we maintain that—if  such relaxing of constraints are properly documented and tracked—there is considerable  benefit in adopting such a strategy. Given two ontologies developed independently of  one another, there are likely genuine semantic differences between them, differences that  cannot be adequately addressed by simply adopting different labels. Clarifying, as much as possible, what those differences are can be incredibly valuable when representing data  using each ontology structure. Putting this another way, if, say, gist and BFO exhibited  some 1-1 semantic mapping so that everything in gist corresponds to something in BFO  and vice versa, it would follow that the languages of gist and BFO were simply two 

different ways to talk about the same domain. We find this less interesting, to be candid,  than formalizing the semantic overlap between these structures, and noting precisely  where they semantically differ. One way in which such differences might be recorded is  by observing and documenting—as suggested in this option—where logical constraints  such as disjointness might need to be relaxed in alignment. 

The preceding stated, relaxing constraints should be the last, not the first, option  pursued, since for the benefits highlighted above to manifest, it is incumbent on us to  identify where exactly there is semantic alignment, and formalize this as clearly as  possible. With that in mind, we pursue another option here, namely, to use a disjunctive  definition for gist:Intention—asserted to be a subclass of the union of bfo:GDC and  bfo:SDC. While this disjunctive definition perhaps does not square perfectly with the  text definition of gist:Intention, it does seem to be in the spirit of how gist:Intention is  actually used—sometimes like an bfo:SDC (in the case of a gist:Function), sometimes  like a bfo:GDC (in the case of a gist:Specification). This option does not require a  modified version of BFO. It also aligns with our goal of exhibiting specificity in our  mapping, since otherwise we would have been inclined to assert gist:Intention to simply  be a subclass of bfo:Continuant

gist:Intention 

rdfs:subClassOf [ 

a owl:Class ; 

  owl:unionOf ( 

  obo:BFO_0000020 # bfo:SDC 

  obo:BFO_0000031 # bfo:GDC 

  );];. 

This mapping arguably captures the spirit of both gist and BFO while remaining  conservative—i.e., it does not change any of the logical entailments within gist or BFO. 

4.1.2. gist:Organization 

gist:Organization was another interesting case. During the mapping we consulted the  Common Core Ontologies (CCO), a suite of mid-level ontologies extended from BFO,  for guidance since it includes an organization class [24]. cco:Organization falls under bfo:ObjectAggregate. Arguably, however, organizations can be understood as  something over and above the aggregate of their members, perhaps even persisting when  there are no members. For this reason, we considered bfo:ImmaterialEntity and  bfo:GDC as superclasses of gist:Organization. On the one hand, the challenge with  asserting gist:Organization is a subclass of bfo:ImmaterialEntity is that instances of  the latter cannot have material parts, and yet organizations often do, i.e. members. On  the other hand, there is plausibly a sense in which organizations can be understood as,  say, prescriptions or directions (bfo:GDC) for how members occupying positions in that  organization should behave, whether there ever are actual members. The CCO  characterization of organization does not seem to reflect this sense, given it is defined in  terms of members. It was thus important for our team to clarify which sense, if either or  both, was best reflected in gist:Organization

Ultimately, we opted for asserting bfo:ObjectAggregate as the superclass for  gist:Organization, as the predominant sense in which the latter is to be understood  concerns members of such entities. This is, importantly, not to say there are not genuine 

alternative senses of organization worth modeling in both gist and within the BFO  ecosystem; rather, it is to say after reflection, the sense most clearly at play here for  gist:Organization involves membership. For some gist classes, annotations and  examples made it clear that they belonged under a certain BFO class. In the case of  gist:Organization, gist is arguably neutral with respect to a few candidate superclasses.  Typically what is most important in an enterprise context is modeling organizational  structure (with sub-organizations) and organization membership. Perhaps this alone does  not require gist:Organization being understood as a bfo:ObjectAggregate;  nevertheless, practical considerations pointed in favor of it. Adopting this subclassing  has the benefit of consistency with CCO (and a fortiori BFO) and allows for easy  modeling of organization membership in terms of BFO. 

4.1.3. gist:Event 

At a first pass, a natural superclass (or even equivalent class) for gist:Event is  bfo:Process. After all, ‘event’ is an alternative label for bfo:Process in BFO. Upon  further evaluation, it became clear that some instances of gist:Event would not be  instances of bfo:Process—namely, future events. In BFO, with its realist interpretation,  processes have happened if they are to be represented. It is in this way that BFO  differentiates how the world could be, e.g., this portion of sodium chloride could  dissolve, from how the world is, e.g., this portion of sodium chloride dissolves. Future  events can be modeled as specifications, ultimately falling under bfo:GDC. In contrast,  a subclass of gist:Event, namely gist:ScheduledEvent, includes within its scope events  that have not yet started. There is thus not a straightforward mapping between  bfo:Process and gist:Event. Following our more conservative strategy, however, the  identified discrepancy can be accommodated by asserting that gist:Event is a defined  subclass of the union of bfo:GDC and bfo:Process.12 In this respect, we are able to  represent instances of gist:Event that have started (as instances of bfo:Process) and  those that have not (as instances of bfo:GDC). 

gist:Event 

rdfs:subClassOf [ 

a owl:Class ; 

  owl:unionOf ( 

  obo:BFO_0000031 # bfo:GDC 

  obo:BFO_0000015 # bfo:Process 

  );];. 

4.1.4. gist:Category 

gist:Category is a commonly used class in gist. It allows one to categorize an entity  without introducing a new class into an ontology. It guards against the proliferation of  classes with little or no semantics; instead, such categories are treated as instances, which  

12It is common in gist to model planned-event-turned-actual-events as single entities that persist through both  stages. When a plan goes from being merely planned to actually starting, it can flip from a bfo:GDC to a  bfo:Process. Events that have a gist:actualStartDateTime will be instances of bfo:Process, and the presence  of this property could be used to automate the flip. Different subclasses of gist:Event will be handled  differently—e.g., gist:HistoricalEvent is a subclass of bfo:Process that would not require the transition from  bfo:GDC.

are related to entities by a predicate gist:isCategorizedBy. So, for example, one might  have an assertion like ex:_Car_1 gist:isCategorizedBy  ex:_TransmissionType_manual, where the object of this triple is an instance of  ex:TransmissionType, which would be a subclass of gist:Category

If one thinks of BFO as an ontology of particulars, and if instances of gist:Category are not particulars but instead types of things, then arguably gist:Category does not have  a home in BFO. 

Nevertheless, as a commonly-used class in gist, it is helpful to find a place for it in  BFO if possible. One option is bfo:SDC: indeed, there are some classes in CCO (e.g.,  cco:EyeColor) that seem like they could be subclasses of gist:Category. However,  instances of bfo:SDC (e.g., qualities and dispositions) are individuated by the things that  bear them (e.g., the eye color of a particular person), which does not seem to capture the  spirit of gist:Category. Ultimately, we opted for bfo:GDC as the superclass in part  because of the similarity of instances of gist:Category to information content entities in  CCO, which are bfo:GDCs. 

5. Conclusion 

5.1. Future work 

We have established a foundational mapping between gist and BFO. From this  foundation going forward we aim to improve gistBFO along multiple dimensions. The  first set of improvements relate to faithfulness. While we are confident in many of the  mappings we have made, we expect the alignment to become more and more accurate as  we continue development. In some cases, the intended meanings of concepts are obvious  from formal definitions and annotations. In other cases, intended meaning is best  understood by discussions about how the concepts are used in practice. As we continue  discussions with practitioners of gist and BFO, the alignment will continue to improve. 

Another aim related to faithfulness is to identify richer mappings. In its current form  gistBFO allows instance data modeled under gist to be inferred into BFO superclasses.  While this achieves an initial connection with BFO, a deeper mapping could take  something modeled in gist and translate it to BFO. Revisiting the previous example,  something modeled as a gist:Event with gist:startDateTime and gist:endDateTime 

might be modeled as a bfo:Process related to a bfo:TemporalRegion. Many of these  types of modeling patterns can be gleaned from formal definitions and annotations, but  they do not always tell the whole story. Again, this is a place where continued discussions  with practitioners of both ontologies can help. From a practical perspective, more  complex mappings like these could be developed using a rule language (e.g., datalog or  SWRL) or SPARQL INSERT queries. 

We have also considered alignment with the Common Core Ontologies (CCO). One  of the challenges with this alignment is that gist and CCO sit at similar levels of  abstraction. Indeed, gist and CCO even appear to share classes that exhibit overlapping  semantics, e.g., language and organization. The similar level of abstraction creates a  challenge because it is not always easy to determine which classes are more general than  which. For example, are gist:Organization and cco:Organization equivalent, or is one  a superclass of the other? Furthermore, because there are considerably more classes in  CCO than BFO, preserving consistency with a growing set of alignment axioms becomes 

more of a concern. Despite the challenges, a mapping between gist and CCO would help  with interoperability, and it is a topic we intend to pursue in the future to that end. 

5.2. Final remarks 

We have presented an open-source alignment between gist and BFO. We described a  methodology for identifying mappings, provided rationale for the mappings we made,  and outlined a vision for future development. Our hope is that gistBFO can serve as a  practical tool, promoting easier domain ontology development and enabling  interoperability. 

Acknowledgements 

Thank you to Dave McComb for support at various stages of the gistBFO design process,  from big-picture discussions to input on specific mappings. Thanks also to Michael  Uschold and Ryan Hohimer for helpful discussions about gistBFO. 

References 

[1] Arp R, Smith B, Spear AD. Building ontologies with Basic Formal Ontology. Cambridge, Massachusetts:  The MIT Press; 2015. p. 220 

[2] Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research:  a functional perspective. Brief Bioinform. 2015 Nov;16(6):1069–80, doi: 10.1093/bib/bbv011 [3] Neuhaus F, Hastings J. Ontology development is consensus creation, not (merely) representation.  Applied Ontology. 2022;17(4):495-513, doi:10.3233/AO-220273 

[4] Chen X, Jia S, Xiang Y. A review: Knowledge reasoning over knowledge graph. Expert Systems with  Applications. 2020 Mar;141:112948, doi:10.1016/j.eswa.2019.112948 

[5] Basic Formal Ontology Users [Internet]. Available from: https://basic-formal-ontology.org/users.html [6] GitHub [Internet]. Basic Formal Ontology (BFO) Wiki – Home. Available from: https://github.com/BFO ontology/BFO/wiki/Home 

[7] ISO/IEC 21838-1:2021: Information technology — Top-level ontologies (TLO) Part 1: Requirements  [Internet]. Available from: https://www.iso.org/standard/71954.html 

[8] ISO/IEC 21838-2:2021: Information technology — Top-level ontologies (TLO) Part 2: Basic Formal  Ontology (BFO) [Internet]. Available from: https://www.iso.org/standard/74572.html [9] Otte J, Beverley J, Ruttenberg A. Basic Formal Ontology: Case Studies. Applied Ontology. 2021  Aug;17(1): doi:10.3233/AO-220262 

[10] McComb D. A BFO-ready Version of gist [Internet]. Semantic Arts. Available from:  https://www.semanticarts.com/wp-content/uploads/2025/01/20241024-BFO-and-gist-Article.pdf [11] McComb D. The Distictionary [Internet]. Semantic Arts. 2015 Feb. Available from:  https://www.semanticarts.com/white-paper-the-distinctionary/ 

[12] Carey D. Avoiding Property Proliferation [Internet]. Semantic Arts. Available from:  https://www.semanticarts.com/wp-content/uploads/2018/10/AvoidingPropertyProliferation012717.pdf [13] Trojahn C, Vieira R, Schmidt D, Pease A, Guizzardi G. Foundational ontologies meet ontology matching:  A survey. Semantic Web. 2022 Jan 1;13(4):685–704, doi.org/10.3233/SW-210447 

[14] Gambini B, Intelligence Community adopt resource developed by UB ontologists [Internet]. News  Center. 2024 [cited 2025 Mar 30]. Available from:  https://www.buffalo.edu/news/releases/2024/02/department-of-defense-ontology.html 

[15] Euzenat J, Shvaiko P. Ontology Matching, 2nd edition. Heidelberg: Springer; 2013. doi:10.1007/978-3- 642-38721-0. 

[16] GitHub [Internet]. Available from: https://github.com/semanticarts/gistBFO 

[17] Prudhomme T, De Colle G, Liebers A, Sculley A, Xie P “Karl”, Cohen S, Beverley J. A semantic  approach to mapping the Provenance Ontology to Basic Formal Ontology. Sci Data. 2025 Feb  17;12(1):282, doi:10.1038/s41597-025-04580-1 

[18] Aameri B, Grüninger M. A New Look at Ontology Correctness. Logical Formalizations of Commonsense  Reasoning. Papers from the 2015 AAAI Spring Symposium; 2015. doi:10.1613/jair.5339

[19] Shearer R, Motik B, Horrocks I. HermiT: A highly-efficient OWL reasoner. OWLED, 2008, Available  from: https://ceur-ws.org/Vol-432/owled2008eu_submission_12.pdf. 

[20] Glimm B, Horrocks I, Motik B, Stoilos G, Wang Z. HermiT: an OWL 2 reasoner. Journal of Automated  Reasoning. 2014;53:245–269, doi:10.1007/s10817-014-9305-1 

[21] Solimando A, Jiménez-Ruiz E, Guerrini G. Minimizing conservativity violations in ontology alignments:  algorithms and evaluation. Knowl Inf Syst. 2017;51:775–819, doi:10.1007/s10115-016-0983-3 [22] Emeruem C, Keet CM, Khan ZC, Wang S. BFO Classifier: Aligning Domain Ontologies to BFO. 8th  Joint Ontology Workshops; 2022. 

[23] GitHub [Internet]. gist. https://github.com/semanticarts/gist 

[24] Jensen M, De Colle G, Kindya S, More C, Cox AP, Beverley J. The Common Core Ontologies. 14th  International Conference on Formal Ontology in Information Systems; 2024:  doi:10.48550/arXiv.2404.17758.