gist: Buckets, Buckets Everywhere:  Who Knows What to Think

gist: Buckets, Buckets Everywhere:  Who Knows What to Think

We humans are categorizing machines, which is to say, we like to create metaphorical buckets and put things inside. But there are different kinds of buckets, and different ways to model them in  OWL and gist. The most common bucket represents a kind of thing, such as Person or Building.  Things that go into those buckets are individuals of those kinds, e.g. Albert Einstein, or the particular office building you work in. We represent this kind of bucket as an owl:Class and we use rdf:type to put something into the bucket. 

Another kind of bucket is when you have a group of things, like a jury or a deck of cards that are functionally connected in some way. Those related things go into the bucket (12 members of a jury, or 52 cards). We have a special class in gist called Collection, for this kind of bucket. A specific bucket of this sort will be an instance of a subclass of gist:Collection. E.g. OJs_Jury is an instance of the class Jury, a subclass of gist: Collection. We use gist:memberOf to put things into the bucket.  Convince yourself that these buckets do not represent a kind of thing. A jury is a kind of thing, a particular jury is not. We would use rdf:type to connect OJ’s jury to the owl: ClassJury, and use gist:memberOf to connect the specific jurors to OJ’s jury.

A third kind of bucket is a tag which represents a topic and is used to categorize individual items for the purpose of indexing a body of content. For example, the tag “Winter” might be used to index photographs, books and/or YouTube videos. Any content item that depicts or relates to winter in some way should be categorized using this tag. In gist, we represent this in a way that is  structurally the same as how we represent buckets that are collections of functionally connected  items. The differences are 1) the bucket is an instance of a subclass of gist:Category, rather than of gist: Collection and 2) we put things into the bucket using gist:categorizedBy rather than gist:memberOf. The Winter tag is essentially a bucket containing all the things that have been indexed or categorized using that tag.

Below is a summary table showing these different kinds of buckets, and how we represent them in  OWL and gist.

Kind of Bucket Example Representing the Bucket Putting something in the Bucket
Individual of a Kind John Doe is a Person Instance of owl:Class rdf:type
A bucket with  functionally connected  things insideSheila Woods is a  member of OJ’s JuryInstance of a subclass of  gist:Collection gist:memberOf
An index term for  categorizing contentThe book “Winter of  our Discontent” has  Winter as one of its  tagsInstance of a subclass of  gist:Category gist:categorizedBy


Semantic Arts’ 25 Year History

Semantic Arts Enters Its’ 25th Year

According to the U.S. Bureau of Labor Statistics, 15,336 companies were founded in Colorado in the year 2000. By 2024, only 2,101 of those companies remained. While we can speculate endlessly about why just ~14% survived recessions, pandemics, and international conflicts, the impression is clear.

The organizations that endured… and ideally thrived deserve recognition for their adaptability and resourcefulness. Seeing how we are entering our 25th anniversary, this statistic is a big deal. Most companies do not make it that long.

70% of companies fail in their first 10 years.

Even the venerable S&P 500 companies have an average lifespan of 21 years.

Resilience is in Our DNA

So here we are at 25, just getting warmed up.

To make things even more interesting, it is cool to be 25 years in an industry that many people think is only a few years old. Many people have only recently stumbled into semantics based on open standards, knowledge graphs, and data-centric thinking, and are surprised to find a company that has been specializing in this since before Facebook was founded.

It hasn’t always been easy or a smooth ride, but we like to think longevity is in our DNA.

Keep reading for a look at three of the most important lessons we’ve learned, a brief tour of our biggest achievements over the past 25 years, a glimpse of where we’re windsurfing to next, and as a bonus for reading through the entirety of our history, we’ll give you an inside scoop on Dave McComb’s origin story leading up to the founding of Semantic Arts.

3 Lessons Learned Surviving 25 Years

You learn a few things after surviving and eventually thriving for 25 years.

After you learn them and then state them, they often sound obvious and trivial. The reality is that we had to learn them to get to where we are today. We hope it serves you as much as it has served us.

LESSON 1:

Becoming data-centric is more of a program than a project. It is more of a journey and process than a destination or product.

We’ve observed a consistent pattern among our clients: once they discover the data-centric approach, they want it immediately. But meaningful transformation requires rethinking deeply held beliefs and shedding long-standing stigmas. This paradigm shift challenges cultural norms, restructures how information assets are organized, and redefines how knowledge is shared (in more meaningful and context-rich ways).

We’ve also seen what happens when organizations resist the data-centric shift. Despite initial interest, they cling to legacy mindsets, siloed systems, and entrenched hierarchies. The transformation stalls because cultural resistance outweighs technical readiness. Information remains fragmented, knowledge-sharing stays shallow, and AI initiatives struggle to produce meaningful results, often reinforcing the very inefficiencies the organization hoped to overcome.

LESSON 2:

Successful data-centric transformations require you to simultaneously look at the big picture and the fine-grain details.

Through decades of execution (and refinement of that execution), we employ a “think big” (Enterprise) / “start small” (Business Domain) approach to implementing data-centric

architecture. We advocate doing both the high-level and low-level models in tandem to ensure extendable and sustained success.

If you only start small (which every agile project advises), you end up recreating the very point solutions and silos you’re working to integrate. And only thinking big tends to build enterprise data models that do not get implemented (we know, because that’s where we started).

Doing both simultaneously affords two things that clients appreciate.

1. It demonstrates a solution to a choice problem set, by leveraging real production data, and in a way that a skeptic can understand

2. It performs in a way that ensures future proofing while avoiding vendor lock-in. After the first engagement with a client, each new project will fit into the broader data-centric architecture and will be pre-integrated. This work can later be re-used and leveraged to extend the ontological model.

LESSON 3:

To instill confidence, you need to prove value through a series of projects validating the utility of the data-centric paradigm.

Most of our clients re-engage us after the initial engagement to guide in the adoption. Generally, we extend the engagement by bringing our approach to more sub-domains. While in parallel, we help a client think through the implementation details of the architecture by modeling the business via an ontology and contextually connecting information with a semantic knowledge graph.

Part of the magic of our modular approach to extending a knowledge graph is that each newly integrated subdomain expands the limitless applications of clean, well-structured, and verified data. The serendipitous generation of use cases can’t be planned (as they are not always obvious),but it often creates opportunities that delight our clients and exceed their expectations.

Let’s take a text-guided tour of what led us to these conclusions, as well as the events that shaped our history.

A Historical Account of Semantic Arts

If we look at the official registration date with the Colorado Secretary of State, Semantic Arts was formed on August 14, 2000. However, reality is rarely as clear-cut as what’s captured on paper. In fact, we had already been operating loosely as Semantic Arts for several months prior.

Stick around, and we’ll take you through the journey, from August 2000 to the time of this writing, August 2025.

FOUNDING & EARLY EXPLORATION (2000)

  • In 2000, the idea of applying semantics to information systems was just beginning to gain traction, with emerging efforts like SHOE, DAML, and OIL.
  • Leaning into this promising field, the company was aptly named Semantic Arts and served as a vessel through which contracts flowed through to the consultants, all of whom were subcontractors.
  • There was virtually no demand for semantic consulting, largely due to a lack of understanding of what “semantic” even meant, so Semantic Arts focused on delivering traditional IT consulting projects (such as feasibility studies and SOA message modeling), often embedding semantic models behind the scenes to build internal capabilities.

THE 1ST SEMANTIC WAVE NEVER CAME (2001–2002)

  • In 2001, the “Semantic Web” was formally introduced by Tim Berners-Lee, Jim Hendler, and  Ora Lassila in Scientific American, and given Berners-Lee’s legacy as the inventor of the  World Wide Web, excitement soared. 
  • On surface, it appeared that Semantic Arts was poised to ride what seemed to be the next monster wave, but the wave never came. • Despite the hype, potential clients remained unaware or uninterested in semantics, and adoption stagnated.

BOOKS, CLIENTS, AND THE BIRTH OF gist (2002–2004) 

  • From 2002 to 2003, while Dave McComb authored Semantics in Business Systems: The  Savvy Manager’s Guide, while Semantic Arts primarily sustained itself through contracts with the State of Washington. 
  • Behind the scenes, Semantic Arts developed semantic models for departments such as  Labor & Industry and Transportation, and it was during the Department of Transportation project that gist, the open-source upper ontology, was born. 
  • A small capital call in 2003 helped keep Semantic Arts viable, with Dave McComb becoming majority owner, and Simon Robe joining as the minority shareholder. 

EVANGELISM WITHOUT DEMAND (2005–2007) 

  • From 2005–2012, Semantic Arts produced the Semantic Technology Conference and simultaneously began teaching how to design and build business ontologies.
  • Despite the proactive outreach efforts, the market remained indifferent.
  • During this time, an ontology for Child Support Enforcement in Colorado was created, but  clients were still largely unreceptive to semantic technologies.

THE FIRST WAVE OF REAL DEMAND (2008–2011) 

  • In 2008, interest in semantics began to emerge with Sallie Mae being among the first to seek an ontology for a content management system.  
  • Semantic Arts advised the team to build a Student Loan Ontology instead, a decision that proved critical when legacy systems could not support a new loan type, marking the first real demonstration of the serendipitous power of semantics. 
  • Other clients soon followed: Lexis Nexis (their next generation Advantage platform),  Sentara (healthcare delivery), and Procter & Gamble (R&D and material safety). 

FROM DESIGN TO IMPLEMENTATION (2012–2016) 

  • By 2012, Semantic Arts had matured into a premier ontology design firm; however,  increased efficiency meant projects became smaller, and few enterprises required more than one enterprise ontology. 
  • A pivotal change occurred when an intern transformed the internal timecard system into a graph-based model, which became the prototype for Semantic Arts’ first implementation project, partnering with Goldman Sachs to solve a “living will” regulatory challenge. 
  • This era saw deeper implementations, including a product catalog for Schneider Electric in partnership with Mphasis, and marked the period when Dave McComb eventually bought out Simon Robe to become the sole owner of Semantic Arts. 

SCALING THE DATA-CENTRIC MOVEMENT (2017–2019) 

  • By 2017, implementation projects had overtaken design as Semantic Arts’ core business, and feedback from those projects helped rapidly evolve gist, with clients including Broadridge, Dun & Bradstreet, Capital One, Discourse.ai (now TalkMap), Euromonitor, Standard & Poor’s, and Morgan Stanley. 
  • Dave McComb published Software Wasteland, followed by The Data-Centric Revolution,  both of which galvanized interest in reforming enterprise modeling. 
  • Up to this point, Semantic Arts was primarily composed of highly experienced senior ontologists and architects, but with the growth of implementation work, they developed repeatable methodologies and began hiring junior ontologists and developers to support delivery at scale. 

INSTITUTIONALIZING THE VISION (2020–2024) 

  • Around 2020, Semantic Arts realized that version 1.0 of the model driven system was not going to satisfy the increasing demands, so work began on a more ambitious version 2.0  (code named Spark) to begin development of a low-code, next-generation model-driven system.
  • In parallel, implementation work toward data-centric transformations continued at pace  with clients including Morgan Stanley, Standard & Poor’s, Amgen, the Center for Internet  Security, PricewaterhouseCoopers, Electronic Arts, PCCW (Hong Kong Telecom), Payzer,  Juniper Networks, Wolters Kluwer, and the Institute for Defense Analyses. 
  • At some point, Semantic Arts decided that the industry needed some companies that could become fully data-centric in a finite amount of time, which led to further self experimentation, and in an unplanned way yielded towards data-centric accounting, and the book promoting it, Real-Time Financial Accounting: The Data-Centric Way, by Dave  McComb and Cheryl Dunn to be published in late 2025. 

THE NEW SELF-GOVERNANCE OPERATING MODEL (2025) 

  • In 2025, Semantic Arts entered a new era of self-governance as ownership transferred to the Semantic Arts Trust, secured by a royalty agreement that ensures independence from market acquisition. 
  • The firm is now guided by a five-person Governance Committee, responsible for key deliberative functions such as budgeting, staffing levels, and strategic direction, alongside  a new President (Mark Wallace), who leads day-to-day strategic execution. 
  • One of the first key initiatives in transitioning to this self-governance model is to improve the discipline and repeatability of the marketing and sales functions, making the pipeline of new work more predictable. 

If you’re interested in learning more about why we transitioned into an employee-governed company, we’ll leave you in suspense just a little while longer. We’re currently writing a companion article to this one, where we’ll share more about the company’s secret sauce, cultural  DNA, and what makes Semantic Arts as unique and bespoke as the work we do for our clients. 

You can find more information on our about us page here:  https://www.semanticarts.com/about-us/

Looking towards the Future 

As we reflect and prose on the last 25 years, we adjust our sails to ride the wind of our lessons into the next 25 years. We have a plan. It is not set in stone, but it is surprising how many things have remained constant over these last few decades, and we anticipate them staying constant into the future. 

Most software companies operate hockey-stick business plans that forecast explosive growth over the next few years. If you’re a software firm, that pace is both possible and desirable. But as a professional services firm, there is a natural limit to how fast we can, and should grow. We’ve seen that natural growth limit in other professional services firms, and we’ve experienced it ourselves.  We think that the limit is around 25% per year. Under that number, culture and quality can still be maintained, even as a firm grows.  

We’ve chosen the slightly more ambitious 26% per year as our target. 26% yearly growth is the number that results in a firm doubling in size every three years. We won’t always hit this exact target, but it is what we are aiming for. Afterall, the vast backlog of legacy applications, combined with the continuing accumulation of new legacy systems, suggests that we will have meaningful,  high-impact work for far longer than 25 years. 

If you’re a history buff, you might appreciate learning a thing or two about Dave McComb’s origin story. His professional background deeply shaped the DNA of Semantic Arts and continues to  influence how it functions today. 

Dave McComb’s Origin Story 

Since we’re reviewing Semantic Arts’ history in 25-year increments, we’ll do the same with Dave,  starting in 1975 and leading up to the founding of Semantic Arts. Like a skyscraper, an organization can only rise as high as its foundation is strong, and thanks to Dave’s remarkable background and expertise, Semantic Arts has been built into a truly exceptional organization. 

BREAKING INTO THE REAL WORLD (1975 – 1979) 

  • Dave started his career in software in 1975, teaching the class “The Computer in Business”  at Portland State University while getting his MBA.  
  • The same year, he got his first paid consulting gig, for an architectural firm (maybe that’s the source of his fascination with architectural firms); to computerize the results of some  sort of survey they had issued for a whopping $200 fixed price bid.  
  • He joined Arthur Andersen (the accounting firm) in their “Administrative Division,” which would become Andersen Consulting and eventually Accenture. 
  • Five years of building and implementing systems for inventory management, construction management, and payroll, he was made a manager and shipped off (in a plane) to  Singapore.
  • After rescuing a plantation management system project that was going badly, he ended up in Papua New Guinea (no good deed goes unpunished). 

BUILDING AN ERP SYSTEM FROM SCRATCH (1980 – 1989) 

  • On the island of Bougainville, Papa New Guinea was home to what was, at the time, the world’s largest copper and gold mine.  
  • Their systems were pathetic, and so, Dave launched a project to build an ERP system from the ground up (SAP R/2 did exist at the time but was not available on the ICL mainframes that ran the mine).  
  • The plan was fairly audacious: to build out a multi-currency production planning, materials management, purchasing and payables system of some 600 screens and 600 reports with  25 people in two years. 
  • The success of that project was mostly due to partially automating the implementation of use cases.  

AI BEFORE IT WAS COOL (1990 – 1994) 

  • Around 1990, Dave returned to the U.S. and was tasked with delivering another custom ERP  system, this time for a diatomaceous earth mine of similar size and scope as the previous mine in Papa New Guinea.  
  • In this project, there was even more automation leveraged, in this case 98% of the several million lines of code were generated (using artificial intelligence in 1991). 
  • Around this time, Dave started the consulting firm First Principles, Inc.  
  • One of the anchor clients was BSW, the architectural firm that designed all the Walmarts in  North America, and it was on this project, in 1992, that First Principles decided to apply semantics to the design of database systems. 

TURNING A CORNER AT THE END OF THE CENTURY (1995-1999) 

  • First Principles, was rolled into Velocity Healthcare Informatics, a dot com era healthcare software company.  • Velocity Healthcare Informatics built and patented the first fully model-driven application environment, where code was not generated, but behavior was expressed based on information in the model.
  • Alongside this new model-driven application, the nascent semantic methodology evolved and was grafted onto an Object-Oriented Database.  
  • Velocity Healthcare Informatics created a semantic model of healthcare that, in 1999, the medical director of WebMD said, after a multi-hour interrogation of his team, “I wish we had that when we started.”  
  • Velocity Healthcare Informatics built several systems in this environment, including Patient  Centered Outcomes, Case Management, Clinical Trial Recruiting and Urology Office  Management.  
  • Towards the turn of the century, Velocity Healthcare Informatics was preparing for the road show to go public in March of 2000 when the dot com bubble burst.  
  • Velocity Healthcare Informatics imploded in a way that intellectual property could not be salvaged, and as a result, several of the employees jointly formed a new company in the late spring of 2000.

Semantic Arts, Inc. Celebrates its 25th Anniversary

Semantic Arts, Inc. Celebrates its 25th Anniversary

Pioneering Data-Centric Transformations to Modernize IT Architecture, Advance Knowledge Systems, and Enable Foundational AI

CONTACT INFO:

Dave McComb

Phone: (970) 490-2224

Email: [email protected]

Website: https://www.semanticarts.com/

Fort Collins, Colorado – August 14, 2025:

According to the U.S. Bureau of Labor Statistics, 15,336 companies were founded in Colorado in the year 2000. By 2024, only 2,101 of those companies remained. While we can speculate endlessly about why just ~14% survived through recessions, pandemics, and international conflicts, the impression is clear. Organizations that endured, and ideally thrived, deserve recognition for their adaptability and resourcefulness.

On August 14, Semantic Arts is celebrating its 25th year of leading the data-centric revolution. In those 25 years, we have undergone a long and treacherous journey from a one-person consultancy to a globally respected firm.

Throughout that time, Semantic Arts has guided organizations to unlock the power of semantic knowledge graphs to structure, manage, and scale their data. With over 100 successfully completed projects, we have refined our “Think Big, Start Small” approach, aligning strategic priorities with high-impact use cases where knowledge graphs create measurable value. And as a result, we have come to specialize in end-to-end, design and implementation of enterprise semantic knowledge graphs.

Company CEO, Dave McComb remarked: “Our 25-year journey has proven that while technologies evolve, the core challenges persist. The vast backlog of legacy applications, and the continuing addition to the legacy backlog suggest that it will be far more than 25 years before we run out of work. Lucky for us, it also means we’re just getting started in bringing meaningful, semantic transformations.

To put things into perspective, we have supported multinational organizations like Amgen, Broadridge, and Morgan Stanley to undergo their semantic transformation, through the adoption of taxonomies, ontologies, and semantic knowledge graphs. We’ve developed and continuously evolved gist, a foundational ontology built for practical enterprise use.

We are proud to faithfully serve organizations undergoing their data-centric transformations, in the same fashion that sherpas support and guide high-altitude climbers in the mountaineering world.

As a matter of fact, we’d like to extend an invitation for you, our dear reader, to sample our guidance in a 30-minute, no-strings-attached consultation. During this session, we’ll share how to avoid common pitfalls and reduce ongoing project risks. We guarantee it will improve your chances of launching successful pilot projects using taxonomies, ontologies, and knowledge graphs.

If you are interested in having a friendly chat, email us at [email protected] with a summary of your goals.

We’ll set up a time that works for you.

FOR MORE INFORMATION:

Contact JT Metcalf, Chief Administrative Officer at [email protected] or

call us at (970) 490-2224

Semantic Arts’Secret Sauce

Semantic Arts’Secret Sauce

An organization founded by an individual or small group is deeply shaped by the priorities and capabilities of its founder(s), as well as the market and industry it enters. The baseline requirement for any startup is to identify and meet the needs of specific types of customer or clients to sustain and grow financially. How an organization chooses to do that, how it responds to feedback that validates or challenges its value proposition, and how it adapts over time, each could be the subject of a thesis on its own.

To understand an organization, you might take what is written in a business plan at face value, consult the company charter to understand its aspirations, or evaluate performance through public-facing news, press releases, and financial reports to get a sense of its real-world impact. But we are not here to get lost in all that, or to bore you with the generic, hollow promises often engraved inside generic institutions.

In this paper, we want to turn our house into glass to share what makes this professional services firm so darn special to us and our clients. We will start by sharing some background on our newly minted employee-governance operating structure, why it matters for our culture and growth, the organizational foundations behind it, and a few internal practices that foster trust, knowledge sharing, and organizational alignment in our day-to-day work.

Our New Employee-Governance Model

As of 2025, Semantic Arts is an employee governed company. It is not employee owned, as enlightened as that might sound, as that is still a risk factor for premature death. We have watched that situation unfold right here in our own backyard (Fort Collins, Colorado). The New Belgium Brewing Company, of the famous Fat Tire Ale, became employee owned in 2012. For several years it was one of the darlings of the ESOP movement. But by 2019 the employees succumbed to the same temptation that traditional owners face: the lure of the exit, when they sold to the Kirin company. Semantic Arts is now owned by a perpetual benefit trust. The company pays a small royalty to the
trust in exchange for the privilege of continuing the execution of a company that has great people, proven methodology, and an excellent reputation. As part of the transition to self-governance, Mark Wallace, a long-time senior consultant with the firm stepped up to the role of President, while Dave transitioned to CEO.

Employee-Governance Supports Culture and Growth

There are some interesting and subtle things that our structure does to provide alignment.

The first, that has been in place for over a decade, helps align the incentives of the company with the incentives of the employee.

The second, which is just being implemented, more closely aligns the interests of our clients with the interests of Semantic Arts.

Employee and Semantic Arts Alignment

Most companies, at some level, are at odds, to some degree with their employees. If they can pay the employees less, they will make more profit. If they can outsource the work, it improves their margins. And when it comes to promotions, there are a limited number of boxes in the upper rungs of the org chart.

At Semantic Arts we have created an incentive pay system that aligns the company’s goals with the employees. As employees become more skilled, we can charge more for those skills in the marketplace. The incentive system continually adjusts their pay without any negotiation. The company also makes more money from the upskilled employees. As a result, the company is continually investing in training and coaching. And while there are positions, there is no limit to how many people can be in any of them.

Client and Semantic Arts Alignment

In a comparable way, most consulting firms, really most types of firms, have some degree of inherent conflict with their clients. If they can charge a bit more for an engagement, it goes to the bottom line.

Because of our employee alignment, we often do something that other consulting firms would not think of doing, and if they did think of it, they would not do it. When we take on a fixed price engagement and finish ahead of budget, most firms would pocket the difference, indeed that is the reward for taking the risk. But because of our employee incentives, in those cases the employee did not have the opportunity to earn the full amount of the contract. As standard practice, we try to find additional work that we can do for the client, to produce meaningful results within the original budget.

Additionally, our new trust arrangement adds another level of alignment. Professional Services firms make money off the difference between what they charge for their consultants and what they pay them. The owner of a professional services firm is incented to maximize this difference, not just to provide distributions to the owners, but also it is the main factor in determining the ultimate sale price of the firm, on exit.

Clients know this. They know most consultants list price fees that are inflated and negotiate aggressively accordingly. They believe the reduction in fees comes out of the margin. We have shared with our clients that it does not work that way here. At Semantic Arts, the reduction in fees comes from the consultants pay (not immediately and directly, but it factors strongly). Now we have further increased our alignment. The perpetual trust cannot be sold. We do not inflate our rates to try to boost our profits. As a result, there is no incentive to increase the margin to increase the final sale value. And the trust is receiving its payment off the top line not the bottom line, so the trust would rather see growth than profit.

Besides, there is no real mechanism for distributing profit. Any profit we gain through client engagements is retained to tide us over through lean times. From a practical point of view, the firm will operate like a non-profit. That is because we deeply value and protect the reputation we have built as honest brokers.

You might ask, “Why go through the trouble of creating a money management mechanism within the organization instead of simply maximizing profitability?” While most firms aim to maximize profits and then exit, our philosophy is a little different, as you will find it is reflected in our vision, mission, and values.

Semantic Arts Organizational Foundations


At Semantic Arts, the cultural line between management and employees is nearly invisible in dayto-day activities, thanks to the high degree of autonomy and transparency built into the
organization’s operational framework.

Our Vision:

We want to be a part of a world where any employee of an organization has easy access to all the
information they need to do their job, subject to authorization. Data is not siloed away in hundreds
or thousands of application-specific systems but is presented in a data-centric fashion. We have built methodology, tools, and quality assurance processes to make sure we continue to deliver the highest possible quality outcome.

Our Mission:

Our motivation when we get out of bed is to transform as many organizations as possible.

Our Shared Values are:

  • Create value: All our projects should create value for our clients. We are not trying to setup situations where our clients pay us because we have coerced them or made them dependent on us.
  • One voice: We can disagree violently in the office but will present our best consensus when we are with the client.
  • Share equitably: We focus on ways to share the results of our efforts with clients, partners, and employees. Equitable sharing is based on contributions.
  • Lifetime learning: We value continual study and research to look for better ways, and to stay current in our discipline, rather than milking whatever expertise we have.
  • Show respect: While focusing on results, we remain mindful of the contribution, options, and rights of others. In all our dealings, we need to be humble.
  • Have fun: We are not here to suffer for the sake of the company. We can accomplish all we set out to, stay true to our values, have a good time, and take time to enjoy our lives.

6 Semantic Arts Organizational Practices

A critical component of our core competencies and strategic advantage in implementing data centric transformations is rooted in the internal practices we have embedded into our culture. These practices enable us to blend the experience, skills, and strengths of our collective into a cohesive unit, allowing us to act like a stick of dynamite for our clients’ toughest problems.

  1. Weekly Staff Meetings: Twice a week, we hold company-wide staff meetings to report on active project work, discuss ways to deliver more value to our clients, and address any
    technical or operational challenges that arise. These meetings offer everyone an opportunity to engage meaningfully in current and future activities, while also serving as a vehicle for learning through osmosis.
  2. Knowledge Exchange: Once a week, we hold open Q&A sessions where employees are encouraged to share challenges they are facing or present new ideas. These sessions
    quickly become a rich forum for collective problem solving, offering a fast and effective way to get up to speed on a variety of issues, and the solutions that address them.
  3. gist Development Meeting: One of our most valuable internal resources and core competencies is the continuous use and refinement of gist, our open-source upper ontology, in client projects. Over the past decade, we’ve leveraged gist in more than 100 engagements to accelerate time-to-value and deliver impactful results. With each project, we have applied gist across diverse domains (such as pharma, manufacturing, cyber security, and finance), allowing us to iteratively refine our knowledge and approach. Each month, we actively evolve gist based on our collective learnings and community feedback.
  4. Friday Sessions: We reserve a weekly session on Friday for our consultants or select invited guests to deliver a prepared presentation on a topic of their choice and expertise.
    The goal is to infuse our team with insights on cutting-edge technologies, whether from technical vendors, consultants in complementary areas, or internal projects we want to share with the broader company.
  5. Heavy Investments into R&D: We have superb technical talent with deep expertise in ontology modeling, software engineering, and the end-to-end development of enterprise knowledge graphs, from design through to live production environments. This core differentiator of technical excellence is shared openly and cross-pollinated across a wide
    range of interests and domains. Individual curiosity, combined with the resolution of new or recurring technical challenges, results in reusable scripts, design patterns, operational strategies, and innovative ways to deliver greater value in less time.
  6. Drinking our Champagne: A natural consequence of our heavy investment in R&D activities and projects is that we have had the opportunity to make, and drink, our own champagne throughout the entire lifespan of Semantic Arts. Our focus on reducing the effort required to deliver value to clients, combined with a commitment to continuously elevating our practice, has fostered a culture of ongoing innovation, for our people, processes, and tooling.

The Secret Sauce

The people who make up this organization, our ongoing practices, and the incredibly rich and stimulating client work we engage in are part of what makes working at Semantic Arts feel like you have stepped back in time to the intellectually rigorous forums of ancient Greece.
Iron sharpens iron; that’s our way of life.

At Semantic Arts, you do not need to wear a toga to feel like a philosopher, nor a lab coat to feel like a research scientist. But you will need to get used to Hawaiian shirts on Fridays, and spending hours wrestling with deeply stimulating intellectual challenges.

Both our clients and our people are better because of them.

And if you want to get an inside scoop into some of the history, origin story of our founder Dave McComb, as well as learn the top 3 lessons that have kept us surviving and thriving, visit our about us page here: https://www.semanticarts.com/about-us/.

Stay in Touch with Us

We have an open-door policy here at Semantic Arts, with our staff and clients.
If you’d like to stay in the loop or engage with us in a more formal discussion, we’ll always make time to talk; see below to find out how!

For general interest, we have a newsletter:

Sign up for the newsletter: https://lp.constantcontactpages.com/sl/gS1gQIf/newslettersignup

For practitioners and information seekers, we have community events:

Sign up to take part in the Estes Park Group, a non-commercial monthly review of topics around data-centric architecture: https://lp.constantcontactpages.com/sl/fe6usRp/EstesParkGroup

Sign up to take part in the gist Forum, a monthly discussion around gist related developments and presentations: https://lp.constantcontactpages.com/sl/e1DBewb/gistForum

For prospective employees, we have a recruitment process:

  1. Review the job description to make sure you are a fit: Ontologist Job Description
  2. Complete our job application found here: Semantic Arts Ontologist Application
  3. Email your resume to [email protected] (after completion of the application)

For clients, we have our contact information:

Contact JT Metcalf, Chief Administrative Officer at [email protected] to provide us with context and more information around your priorities, roadmap, and any efforts that your organization has already conducted. We’ll find a suitable time to have a conversation and shape out a clear path forward.

Should you have any other inquiries, you can call us at (970) 490-2224

Attribution 4.0 International

Attribution 4.0 International

Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses
does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no
warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.

Using Creative Commons Public Licenses

Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.

Considerations for licensors: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC- licensed material, or material used under an exception or limitation to copyright. More considerations for licensors: wiki.creativecommons.org/Considerations_for_licensors

Considerations for the public: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the
licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the
license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other
reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. More considerations for the public: wiki.creativecommons.org/Considerations_for_licensees

Creative Commons Attribution 4.0 International Public License

By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License (“Public License”). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.

Section 1 — Definitions.

a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.

b. Adapter’s License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.

c. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright
and Similar Rights.

d. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.

e. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.

f. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License.

g. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your
use of the Licensed Material and that the Licensor has authority to license.

h. Licensor means the individual(s) or entity(ies) granting rights under this Public License.

i. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.

j. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.

k. You means the individual or entity exercising the Licensed Rights under this Public
License. Your has a corresponding meaning.

Section 2 — Scope.

a. License grant

  1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
    • a. reproduce and Share the Licensed Material, in whole or in part; and
    • b. produce, reproduce, and Share Adapted Material.
  2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
  3. Term. The term of this Public License is specified in Section 6(a).
  4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a) (4) never produces Adapted Material.
  5. Downstream recipients.
    • a. Offer from the Licensor — Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
    • b. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
  6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).

b. Other rights

  1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
  2. Patent and trademark rights are not licensed under this Public License.
  3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties.

Section 3 — License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. Attribution.

  1. If You Share the Licensed Material (including in modified form), You must:
    a. retain the following if it is supplied by the Licensor with the Licensed Material:
    • i. identification of the creator(s) of the Licensed Material and any others designated
      to receive attribution, in any reasonable manner requested by the Licensor (including by
      pseudonym if designated);
    • ii. a copyright notice;
    • iii. a notice that refers to this Public License;
    • iv. a notice that refers to the disclaimer of warranties;
    • v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

b. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

c. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

  1. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
  2. If requested by the Licensor, You must remove any of the information required by Section 3(a (1)(A) to the extent reasonably practicable.
  3. If You Share Adapted Material You produce, the Adapter’s License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

Section 4 — Sui Generis Database Rights.

Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:

a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database;

b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and

c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.

For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights

Section 5 — Disclaimer of Warranties and Limitation of Liability.

a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.

b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.

c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.

Section 6 — Term and Termination.

a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.

b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:

1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or

2. upon express reinstatement by the Licensor.

For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.

c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.

d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.

Section 7 — Other Terms and Conditions.

a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.

b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.

Section 8 — Interpretation.

a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.

b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.

c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.

d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.

=======================================================================

Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the CC0 Public Domain Dedication. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at creativecommons.org/policies, Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of

Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.

Creative Commons may be contacted at creativecommons.org.

Building an Ontology with LLMs

Building an Ontology with LLMs

We implement Enterprise Knowledge Graphs for our clients. One of the key skills in doing so is ontology modeling. One might think that with the onslaught of ChatGPT and the resulting death knell of professional services, we’d be worried. We’re not. We are using LLMs in our practice, and we are finding ways to leverage them in what we do but using them to design ontologies is not one of the use cases we’re leaning on.

A Financial Reporting Ontology

Last week Charlie Hoffman, who is an accomplished accountant and CPA, showed me the financial reporting ontology he had built with the help of an LLM. As so many of us are these days, he was surprised at the credible job it had done in so little time. It loaded into Protégé, the reasoner ran successfully (there weren’t any real restrictions so that isn’t too hard to pull off). It created a companion SHACL file. In the prompt, he asked it to base it on gist, our upper ontology, and sure enough, there was a gist namespace (an old one, but still it was a correct one) with the requisite gist: prefix. It built a bunch of reasonable-sounding classes and properties in the gist namespace (technically, namespace squatting, but we haven’t gotten very far on ethical AI yet).

Now I look at this and think, while it is a clever trick, it would not have helped me build a financial reporting ontology at all (a task I have been working on in my spare time, so I would have welcomed the help if there was any). I would have tossed out every line. There wasn’t a single line in the file I would have kept.

One Click Ontology Building

But here’s where it gets interesting. A few months ago, at the KM World AI Conference, one of my fellow panelists, Dave Hannibal of Squirro, stated confidently that within a year there would be a one-click ontology builder. As I reflect on it, he was probably right. And I think there is a market for that. I overheard attendees saying, “even if the quality isn’t very good, it’s a starting point, and we need an ontology to get started.”

An old partner and mentor once told me, “Most people are better editors than authors.” What he meant was: give someone a blank sheet of paper and they struggle to get started, but give them a first draft and they tear into it.

The Zeitgeist

I think the emerging consensus out there is roughly as follows:

  • GraphRAG is vastly superior to prompt engineering or traditional RAG (it’s kind of hard for me to call something “traditional” that’s only a year old), in terms of reigning in LLM errors and hallucinations.
  • In order to do graphRAG you need a Knowledge Graph, preferably a curated Enterprise Knowledge Graph.
  • A proper Enterprise Knowledge Graph has an Ontology at its core.
  • Ontology modeling skills are in short supply and therefore are a bit of a bottleneck to this whole operation.
  • Therefore, getting an LLM to create even a lousy ontology is a good starting point.

This seems to me to be the zeitgeist as it now exists. But I think the reasoning is flawed and it will lead most of its followers down the wrong path.

The flawed implicit assumption

You see, lurking behind the above train of thought is an assumption. That assumption is that we need to build a lot of ontologies. Every project needs an ontology.

There are already tens of thousands of open-source ontologies “out there” and unknowable multiples of that on internal enterprise projects. The zeitgeist seems to suggest that with the explosion of LLM powered projects we are going to need orders of magnitude more ontologies. Hundreds of thousands, maybe millions. And our only hope is automation.

The Coming Ontology Implosion

What we need are orders of magnitude fewer ontologies. You really see the superpowers of ontologies when you have the simplest possible expression of complex concepts in an enterprise. Small is beautiful. Simpler is better. Fewer is liberating.

I have nearly 1000 ontologies on our shared drive that I’ve scavenged over the years (kind of a hobby of mine). Other than gist, I’d say there are barely a handful that I would rate as “good.” Most range from distracting to actively getting in the way of getting something done. And this is the training set that LLMs went to ontology school on.

Now I don’t think the world has all the ontologies it needs yet. However, when the dust settles, we’ll be in a much better place the fewer and simpler the remaining ontologies are. Because what we’re trying to do is negotiate the meaning of our information, between ourselves and between our systems. Automating the generation of ontologies is going to slow progress down.

How Many Ontologies Do We Need?

Our work with a number of very large as well as medium-sized firms has convinced me that, at least for the next five years, every enterprise will need an Enterprise Ontology. As in 1. This enterprise ontology that some of our clients call their “core ontology” is extended into their specific sub-domains.

But let’s look at some important numbers.

  • gist, our starter kit (which is free and freely available on our web site) has about 100 classes and almost that many properties, for a cognitive load of 200 concepts.
  • When we build enterprise ontologies, we often move many distinctions into taxonomies. What this does is shift a big part of the complexity of business information out of the structure (in the ontology and the shapes derived from the ontology) and into a much simpler structure that can be maintained by subject matter experts and has very little chance of disrupting anything that is based on the ontology. It is not unusual to have many thousands of distinctions in taxonomies, but this complexity does not leak into the structure or complexity of the model.
  • When we work with clients to build their core ontology, we often double or triple the number of concepts that we started with in gist, to 400-600 total concepts. This gets the breadth and depth needed to provide what we call the scaffolding to include all the key concepts in their various lines of businesses and functions.
  • Each department often extends this further, but it continues to astound us how little extension is often needed to cover the requisite variety. We have yet to find a firm that really needs more than about 1000 concepts (classes and properties) to express the variety of information they are managing.
  • A well-designed Enterprise Ontology (a core and a series of well-managed extensions) will have far fewer concepts to master than even an average-sized enterprise application database schema. Orders of magnitude fewer concepts than a large packaged application, and many, many orders of magnitude fewer than the sum total of all the schemas that have been implemented.

We’re already seeing signs of a potential further decrease. Most of the firms in the same industry share about 70-80% of their core concepts. Industry ontologies will emerge. I really mean useful ones; there are many industry ontologies out there, but we haven’t found any useful ones yet. As they emerge, and as firms move to specializing their shared industry ontology, they will need even fewer new unique concepts.

What we need are a few thousand well-crafted concepts that information providers and consumers can agree on and leverage. We currently have millions of concepts in the
many ontologies that are out there, and billions of concepts in the many database schemas that are out there.

We need a drastic reduction in quantity and a ramp up in quality if we are to have any hope of reigning in the complexity we have created. LLMs used for ontology building promise a major distraction to that goal. Let’s use LLMs instead for things they are good at, like extracting information from text, finding complex patterns in noise, and generating collateral content at wicked rates to improve the marketing department’s vanity metrics.

Data-Centric: How Big Things Get Done (in IT)

Dave McComb

I read “How Big Things Get Done” when it first came out about six months ago.[1] I liked it then. But recently, I read another review of it, and another coin dropped. I’ll let you know what the coin was toward the end of this article, but first I need to give you my own review of this highly recommended book.

The prime author, Bent Flyvbjerg, is a professor of “Economic Geography” (whatever that is) and has a great deal of experience with engineering and architecture. Early in his career, he was puzzling over why mass transit projects seemed routinely to go wildly over budget. He examined many in great detail; some of his stories border on the comical, except for the money and disappointment that each new round brought.

He was looking for patterns, for causes. He began building a database of projects. He started with a database of 178 mass transit projects, but gradually branched out.

It turns out there wasn’t anything especially unique about mass transit projects. Lots of large projects go wildly over budget and schedule, but the question was: Why?

It’s not all doom and gloom and naysaying. He has some inspirational chapters about the construction of the Empire State Building, the Hoover Dam, and the Guggenheim Museum in Bilbao. All of these were in the rarified atmosphere of the less than ½ of 1% of projects that came in on time and on budget.

Flyvbjerg contrasted them with a friend’s brownstone renovation, California’s bullet train to nowhere, the Pentagon (it is five-sided because the originally proposed site had roads on five sides), and the Sydney Opera House. The Sydney Opera House was a disaster of such magnitude that the young architect who designed it never got another commission for the rest of his career.

Each of the major projects in his database has key considerations, such as original budget and schedule and final cost and delivery. The database is organized by type of project (nuclear power generation versus road construction, for instance). The current version of the database has 38,000 projects. From this database, he can calculate the average amount projects run over budget by project type.

IT Projects

He eventually discovered IT projects. He finds them to be among the most likely projects to run over budget. According to his database, IT projects run over budget by an average of 73%. This database is probably skewed toward larger projects and more public ones, but this should still be of concern to anyone who sponsors IT projects.

He described some of my favorites in the book, including healthcare.gov. In general, I think he got it mostly right. Reading between the lines, though, he seems to think there is a logical minimum that the software projects should be striving for, and therefore he may be underestimating how bad things really are.

This makes sense from his engineering/architecture background. For instance, the Hoover Dam has 4.3 million cubic yards of concrete. You might imagine a design that could have removed 10 or 20% of that, but any successful dam-building project would involve nearly 4 million cubic yards of concrete. If you can figure out how much that amount of concrete costs and what it would take to get it to the site and installed, you have a pretty good idea of what the logical minimal possible cost of the dam would be.

I think he assumed that early estimates for the cost of large software projects, such as healthcare.gov at $93 million, may have been closer to the logical minimum price, which just escalated from there, to $2.1 billion.

What he didn’t realize, but readers of Software Wasteland[2] as well as users of healthsherpa.com[3] did, was that the actual cost to implement the functionality of healthcare.gov is far less than $2 million; not the $93 million originally proposed, and certainly not the $2.1 billion it eventually cost. He likely reported healthcare.gov as a 2,100% overrun (final budget of $2.1 billion / original estimate of $93 million). This is what I call the “should cost” overrun. But the “could cost” overrun was closer to 100,000% (one hundred thousand percent, which is a thousand-fold excess cost).

From his database, he finds that IT projects are in the top 20%, but not the worst if you use average overrun as your metric.

He has another metric that is also interesting called the “fat tail.” If you imagine the distribution of project overruns around a mean, there are two tails to the bell curve, one on the left (projects that overrun less than average) and one on the right for projects that overrun more than average. If overruns were normally distributed, you would expect 68% of the projects to be within one standard deviation of the mean and 94% within two standard deviations. But that’s not what you find with IT projects. Once they go over, they have a very good chance of going way over, which means the right side of the bell curve goes kind of horizontal. He calls this a “fat tail.” IT projects have the fattest tails of all the projects in his database.

IT Project Contingency

Most large projects have “contingency budgets.” That is an amount of money set aside in case something goes wrong.

If the average large IT project goes over budget by 73%, you would think that most IT project managers would use a number close to this for their contingency budget. That way, they would hit their budget-with-contingency half the time.

If you were to submit a project plan with a 70% contingency, you would be laughed out of the capital committee. They would think that you have no idea how to manage a project of this magnitude. And they would be right. So instead, you put a 15% contingency (on top of the 15% contingency your systems integrator put in there) and hope for the best. Most of the time, this turns out badly, and half the time, this turns out disastrously (in the “fat tail” where you run over by 447%). As Dave Barry always says, “I am not making this up.”

Legacy Modernization

These days, many of the large IT projects are legacy modernization projects. Legacy modernization means replacing technology that is obsolete with technology that is merely obsolescent, or soon to become so. These days, a legacy modernization project might be replacing Cobol code with Java.

It’s remarkable how many of these there are. Some come about because programming languages become obsolete (really it just becomes too hard to find programmers to work on code that is no longer padding their resumes). Far more common are vendor-forced migrations. “We will no longer support version 14.4 or earlier; clients will be required to upgrade.”  What used to be an idle threat is now mandatory, as staying current is essential in order to have access to zero-day security patches.

When a vendor-forced upgrade is announced, often the client realizes this won’t be as easy as it sounds (mostly because the large number of modifications, extensions, and configurations they have made to the package over the years are going to be very hard to migrate). Besides, having been held hostage by the vendor for all this time, they are typically ready for a break. And so, they often put it out to bid, and bring in a new vendor.

What is it about these projects that are so rife? Flyvbjerg touches on it in the book. I will elaborate here.

Remember when your company implemented its first payroll system? Of course you don’t, unless you are, like, 90 years old. Trust me, everyone implemented their first automated payroll system in the 1950s and 1960s (so I’m told, I wasn’t there either). They implemented them with some of the worst technology you can imagine. Mainframe Basic Assembler Language and punched cards were state of the art on some of those early projects. These projects typically took dozens of person years (OK, back in those days they really were man years) to complete. This would be $2-5 million at today’s wages.

These days, we have modern programming languages, tools, and hardware that is literally millions of times more powerful than what was available to our ancestors. As such, a payroll system implementation in a major company is a multi-hundred million undertaking these days. “Wait, Dave, are you saying that the cost of implementing something as bog standard as a payroll system has gone up a factor of 100, while the technology used to implement it has improved massively?” Yes, that is exactly what I’m saying.

To understand how this could be you might consult this diagram.

This is an actual diagram from a project with a mid-sized (7,000-person) company. Each box represents an application and each line an interface. Some are APIs, some are ETLs, and some are manual. All must be supported through any conversion.

My analogy is with heart transplantation. Any butcher worth their cleaving knife could remove one person’s heart and put in another in a few minutes. That isn’t the hard part. The hard part is keeping the patient alive through the procedure and hooking up all those arteries, veins, nerves, and whatever else needs to be restored. You don’t get to quit when you’re half done.

And so it is with legacy modernization. Think of any of those boxes in the above diagram as a critical organ. Replacing it involves reattaching all those pink lines (plus a bunch more you don’t even know are there).

DIMHRS was the infamous DoD project to upgrade their HR systems. They gave up with north of a billion dollars invested when they realized they likely only had about 20% of the interfaces completed and they weren’t even sure what the final number would be.

Back to Flyvbjerg’s Book

We can learn a lot by looking at the industries where projects run over the most and run over the least. The five types of projects that run over the most are:

  • Nuclear storage
  • Olympic Games
  • Nuclear power
  • Hydroelectric dams
  • IT

To paraphrase Tolstoy, “All happy projects are alike; each unhappy project is unhappy in its own way.”

The unhappiness varies. The Olympics is mostly political. Sponsors know the project is going to run wildly over, but want to do the project anyway, so they lowball the first estimate. Once the city commits, they have little choice but to build all the stadiums and temporary guest accommodations. One thing all of these have in common is they are “all or nothing” projects. When you’ve spent half the budget on a nuclear reactor, you don’t have anything useful. When you have spent 80% of the budget and the vendor tells you you are half done, you have few choices other than to proceed. Your half a nuclear plant is likely more liability than asset.

 

Capital Project Riskiness by Industry [4]

And so it is with most IT projects. Half a legacy modernization project is nothing.

Now let’s look at the bottom of Flyvbjerg’s table:

  • Roads
  • Pipelines
  • Wind power
  • Electrical transmission
  • Solar power

Roads. Really? That’s how bad the other 20 categories are.

What do these have in common? Especially wind and solar.

They are modular. Not modular as in made of parts, even nuclear power is modular in some fashion. They are modular in how their value is delivered. If you plan a wind project with 100 turbines, then when you have installed 10, you are generating 10% of the power you hoped the whole project would. You can stop at this point if you want (you probably won’t as you’re coming in on budget and getting results).

In my mind, this is one reason I think wind and solar are going to outpace most predictions of their growth. It’s not because they are green, or even that they are more economical — they are — but they are also far more predictable and lower risk. People who invest capital like that.

Data-Centric as the Modular Approach to Digital Transformation

That’s when the coin dropped.

What we have done with data-centric is create a modular way to convert an enterprise’s entire data landscape. If we pitched it as one big monolithic project, it would likely be hundreds of millions of dollars, and by the logic above, high risk and very likely to go way over budget.

But instead, we have built a methodology that allows clients to migrate toward data-centric one modest sized project at a time. At the end of each project, the client has something of value they didn’t have before, and they have convinced more people within their organization of the validity of the idea.

Briefly how this works:

  • Design an enterprise ontology. This is the scaffolding that prevents subsequent projects from merely re-platforming existing silos into neo-ilos.
  • Load data from several systems into a knowledge graph (KG) that conforms to the ontology in a sandbox. This is nondestructive. No production systems are touched.
  • Update the load process to be live. This does introduce some redundant interfaces. It does not require any changes, but some additions to the spaghetti diagram (this is all for the long-term good).
  • Grow the domain footprint. Each project can add more sources to the knowledge graph. Because of the ontology, the flexibility of the graph and the almost free integration properties of RDF technology, each domain adds more value, through integration, to the whole.
  • Add capability to the KG architecture. At first, this will be view-only capability. Visualizations are a popular first capability. Natural language search is another. Eventually, firms add composable and navigable interfaces, wiki-like. Each capability is its own project and is modular and additive as described above. If any project fails, it doesn’t impact anything else.
  • Add live transaction capture. This is the inflection point. Up to this point, the project was a richer and more integrated data warehouse. Up to this point, the system relied on the legacy systems for all the information, much as a data warehouse does. At this junction, you implement the ability to build use cases directly on the graph. These use cases are not bound to each other in the way that monolithic legacy system use cases are. These use cases are bound only to the ontology and therefore are extremely modular.
  • Make the KG the system of record. With the use case capability in place, the graph can become the source system and system of record for some data. Any data sourced directly in the graph no longer needs to be fed from the legacy system. People can continue to update it in the legacy system if there are other legacy systems that depend on it, but over time, portions of the legacy system will atrophy.
  • Legacy avoidance. We are beginning to see clients who are far enough down this path that they have broken the cycle of dependence they have been locked into for decades. The cycle is: If we have a business problem, we need to implement another application to solve it. It’s too hard to modify an existing system, so let’s build another. Once a client starts to get to critical mass in some subset of their business, they begin to become less eager to leap into another neo-legacy project.
  • Legacy erosion. As the KG becomes less dependent on the legacy systems, the users can begin partitioning off parts of it and decommissioning them a bit at a time. This takes a bit of study to work through the dependencies, but is definitely worth it.
  • Legacy replacement. When most of the legacy systems data is already in the graph, and many of the use cases have been built, managers can finally propose a low-risk replacement project. Those pesky interface lines are still there, but there are two strategies that can be used in parallel to deal with them. One is to start the furthest downstream, with the legacy systems that are fed, but do little feeding of others. The other strategy is to replicate the interface functionality, but from the graph.

We have done dozens of these projects. This approach works. It is modular, predictable, and low-risk.

If you want to talk to someone about getting on a path of modular modernization that really works, look us up.

The New Gist Model for Quantitative Data

Phil Blackwood

Every Enterprise can benefit from having a simple, standard way to represent quantitative data. In this blog post, we will provide examples of how to use the new gist model of quantitative data released in gist version 13. After illustrating key concepts, we will look at how all the pieces fit together and provide one concrete end-to-end example.

Let’s examine the following:

  1. How is a measurement represented?
  2. Which units can be used to measure a given characteristic?
  3. How do I convert a value from one unit to another?
  4. How are units defined in terms of the International System of Units?

First, we want to be able to represent a fact like:

“The patio has an area of 144 square feet.”

The area of the patio is represented using this pattern:

… where:

A magnitude is an amount of some measurable characteristic.

An aspect is a measurable characteristic like cost, area, or mass.

A unit of measure is a standard amount used to measure or specify things, like US dollar, meter, or kilogram.

Second, we need to be able to identify which units are applicable for measuring a given aspect. Consider a few simple examples, the aspects distance, energy, and cost:

For every aspect there is a group of applicable units. For example, there is a group of units that measure energy density:

… where:

A unit group is a collection of units that can be used to measure the same aspect.

A common scenario is that we want to validate the combination of aspect and unit of measure. All we need to do is check to see if the unit of measure is a member of the unit group for the aspect:

Next, we want to be able to convert measurements from one unit to another. A conversion like this makes sense only when the two units measure the same aspect. For example, we can convert pounds to kilograms because they both measure mass, but we can’t convert pounds to seconds. When a conversion is possible, the rule is simple:

There is an exception to the rule above for units of measure that do not have a common zero value. For example, 0 degrees Fahrenheit is not the same temperature as 0 degrees Kelvin.

To convert from Kelvin to Fahrenheit, reverse the steps: first divide by the conversion factor and then subtract the offset.

To convert a value from Fahrenheit to Celsius, first use the conversion above to convert to Kelvin, and then convert from Kelvin to Celsius.

Next, we will look at how units of measure are related to the International System of Units, which defines a small set of base units (kilogram, meter, second, Kelvin, etc.) and states:

Notice that every expression on the right side is a multiple of kilogram meter2 per second3. We can avoid redundancy by “attaching” the exponents of base units to the unit group. That way, when adding a new unit of measure to the unit group for power there is no need to re-enter the data for the exponents.

The example also illustrates the conversion factors; each conversion factor appears as the initial number on the right hand side. In other words:

The conversion factors and exponents allow units of measure to be expressed in terms of the International System of Units, which acts as something of a Rosetta Stone for understanding units of measure.

One additional bit of modeling allows calculations of the form:

(45 miles per hour) x 3 hours = 135 miles

To enable this type of math, we represent miles per hour directly in terms of miles and hours:

Putting the pieces together:

Here is the standard representation of a magnitude:

Every aspect has a group of units that can be used to measure it:

Every member of a unit group can be represented as a multiple of the same product of powers of base units of the International System of Units:

where X can be:

  • Ampere
  • Bit
  • Candela
  • Kelvin
  • Kilogram
  • Meter
  • Mole
  • Number
  • Other
  • Radian
  • Second
  • Steradian
  • USDollar

Every unit of measure belongs to one or more unit groups, and if can be defined in terms of other units acting as multipliers and divisors:

We’ll end with a concrete example, diastolic blood pressure.

The unit group for blood pressure is a collection of units that measure blood pressure. The unit group is related to the exponents of base units of the International System of Units:

Finally, one member of the unit group for blood pressure is millimeter of mercury. The scope note gives an equation relating the unit of measure to the base units (in this case, kilogram, meter, and second).

The diagrams above were generated using a visualization tool. The text version of the diagrams is:

For more examples and some basic queries, visit the gitHub site gistReferenceData.

In closing, we would like to acknowledge the re-use of concepts from QUDT, namely:

  • every magnitude has an aspect, via the new gist property hasAspect
  • aspects are individuals instead of categories or subclasses of Magnitude as in gist 12
  • exponents are represented explicitly, enabling calculations

The Data-Centric Revolution: Best Practices and Schools of Ontology Design

This article originally appeared at The Data-Centric Revolution: Best Practices and Schools of Ontology Design – TDAN.com. Subscribe to TDAN directly for this and other great content!

I was recently asked to present “Enterprise Ontology Design and Implementation Best Practices” to a group of motivated ontologists and wanna-be ontologists. I was flattered to be asked, but I really had to pause for a bit. First, I’m kind of jaded by the term “best practices.” Usually, it’s just a summary of what everyone already does. It’s often sort of a “corporate common sense.” Occasionally, there is some real insight in the observations, and even rarer, there are best practices without being mainstream practices. I wanted to shoot for that latter category.

As I reflected on a handful of best practices to present, it occurred to me that intelligent people may differ. We know this because on many of our projects, there are intelligent people and they often do differ. That got me to thinking: “Why do they differ?” What I came to was that there are really several different “schools of ontology design” within our profession. They are much like “schools of architectural design” or “schools of magic.” Each of those has their own tacit agreement as to what constitutes “best practice.”

Armed with that insight, I set out to identify the major schools of ontological design, and outline some of their main characteristics and consensus around “best practices.” The schools are (these are my made-up names, to the best of my knowledge none of them have planted a flag and named themselves — other than the last one):

  • Philosophy School
  • Vocabulary and Taxonomy School
  • Relational School
  • Object-Oriented School
  • Standards School
  • Linked Data School
  • NLP/LLM School
  • Data-Centric School

There are a few well known ontologies that are a hybrid of more than one of these schools. For instance, most of the OBO Life Sciences ontologies are a hybrid of the Philosophy and Taxonomy School, I think this will make more sense after we describe each school individually.

Philosophy School

The philosophy school aims to ensure that all modeled concepts adhere to strict rules of logic and conform to a small number of well vetted primitive concepts.

Exemplars

The Basic Formal Ontology (BFO), DOLCE and Cyc are the best-known exemplars of this school.  Each has a set of philosophical primitives that all derived classes are meant to descend from.

How to Recognize

It’s pretty easy to spot an ontology that was developed by someone from the philosophy school. The top-level classes will be abstract philosophical terms such as “occurrent” and “continuant.”

Best Practices

All new classes should be based on the philosophical primitives. You can pretty much measure the adherence to the school by counting the number of classes that are not direct descendants of the 30-40 base classes.

Vocabulary and Taxonomy School

The vocabulary and taxonomy school tends to start with a glossary of terms from the domain and establish what they mean (vocabulary school) and how these terms are hierarchically related to each other (taxonomy school). The two schools are more alike than different.

The taxonomy school especially tends to be based on standards that were created before the Web Ontology Language (OWL). These taxonomies often model a domain as hierarchical structures without defining what a link in the hierarchy actually means. As a result, they often mix sub-component and sub-class hierarchies.

Exemplars

Many life sciences ontologies, such as SNOMED are primarily taxonomy ontologies, and only secondarily philosophy school ontologies. Also, the Suggested Upper Merged Ontology is primarily a vocabulary ontology, it was mostly derived from WordNet and one of its biggest strengths is its cross reference to 250,000 words and their many word senses.

How to Recognize

Vast numbers of classes. There are often tens of thousands or hundreds of thousands of classes in these ontologies.

Best Practices

For the vocabulary and taxonomy schools, completeness is the holy grail. A good ontology is one that contains as many of the terms from the domain as possible. The Simple Knowledge Organization System (SKOS) was designed for taxonomies. Thus, even though it is implemented in OWL, it is designed to add semantics to taxonomies that often are less rigorous, using generic predicates such as broaderThan and narrowerThan rather than more precise subclass or object properties such as “part of.” SKOS is a good tool for integrating taxonomies with ontologies.

Relational School

Most data modelers grew up with relational design, and when they design ontologies, they rely on ways of thinking that served them well in relational.

Exemplars

These are mostly internally created ontologies.

How to Recognize

Relational ontologists tend to be very rigorous about putting specific domains and ranges on all their properties. Properties are almost never reused. All properties will have inverses. Restrictions will be subclass axioms, and you will often see restrictions with “min 0” cardinality, which doesn’t mean anything to an inference engine, but to a relational ontologist it means “optional cardinality.” You will also see “max 1” and “exactly 1” restrictions which almost never imply what the modeler thought, and as a result, it is rare for relational modelers to run a reasoner (they don’t like the implications).

Best Practices

For relational ontologist best practices are to make ontologies that are as similar to existing relational structures as possible. Often, the model is a direct map from an existing relational system.

Modelers in the relational school (as well as the object-oriented school coming up next) tend to bring the “Closed World Assumption” (CWA) with them from their previous experience. CWA takes a mostly implicit attitude that the information in the system is a complete representation of the world. The “Open World Assumption” (OWA) takes the opposite starting point: that the data in the system is a subset of all knowable information on the subject.

CWA was and is more appropriate in narrow scope, bounded applications. When we query your employee master file looking for “Dave McComb” and don’t get a hit, we reasonably assume that he is not an employee of your enterprise. When TSA queries their system and doesn’t get a hit, they don’t assume that he is not a terrorist. They still use the X-ray and metal detectors. This is because they believe that their information is incomplete. They are open worlders. More and more of our systems combine internal and external data in ways that are more likely to be incomplete.

There are techniques for closing the open world, but the relational school tends not to use them because they assume their world is already closed.

Object-Oriented School

Like the relational school, the object-oriented school comes from designers who grew up with object-oriented modeling.

Exemplars

Again, a lot of object-oriented (OO) ontologies are internal client projects, but a few public ones of note include eCl@ss and Schema.org. eCl@ss is a standard for describing electrical products. It has been converted into an ontology. The ontology version has 60,000 classes, which combine taxonomic and OO style modeling. Schema.org is an ontology for tagging web sites that Google promotes to normalize SEO. It started life fairly elegant. It now has 1300 classes, many of which are taxonomic distinctions, rather than real classes.

How to Recognize

One giveaway for the object-oriented school is designing in SHACL. SHACL is a semantic constraint language, which is quite useful as a guard for updates to a triple store. Because SHACL is less concerned with meaning and more concerned with structure, many object-oriented ontologists prefer it to OWL for defining their classes.

Even those who design in OWL have some characteristic tells. OO ontologists tend to use subclassing far more than relational ontologists. They tend to declare which class is a subclass of another, rather than allowing the inference engine to infer subsumption. There is also a tendency to believe that the superclass will constrain subclass membership.

Best Practices

OO ontologies tend to co-exist with Graph QL and focus on json output. This is because the consuming applications are object oriented, and this style ontology and architecture have less impedance mismatch with the consuming applications. The level of detail tends to mirror the kind of detail you find in an application system. Best practices for an OO ontology would never consider the tens of thousands or hundreds of thousands of classes in a taxonomy ontology, nor would they go for the minimalist view of the philosophy or data-centric schools. They tend to make all distinctions at the class level.

Standards School

This is a Janus school, with two faces, one facing up and one facing down. The one facing down is concerned with building ontologies that others can (indeed should) reuse. The one facing up is the enterprise ontologies that import the standard ontologies in order to conform.

Exemplars

Many of the most popular ontology standards are produced and promoted by the W3C. These include DCAT (Data Catalog Vocabulary), the Ontology for Media Resources, Prov-O (an ontology of provenance), Time Ontology, and Dublin Core (an ontology for metadata, particular around library science).

How to Recognize

For the down facing standards ontology, it’s pretty easy. They are endorsed by some standards body. Most common are W3C, OMG and Oasis. ISO has been a bit late to this party, but we expect to see some soon. (Everyone uses the ISO country and currency codes, and yet there is no ISO ontology of countries or currencies.) There are also many domain-specific standard ontologies that are remakes of their previous message model standards, such as FHIR from HL7 in healthcare and ACORD in insurance.

The upward facing standards ontologies can be spotted by their importing a number of standard ontologies each meant to address an aspect of the problem at hand.

Best Practices

Best practice for downward facing standards ontologies is to be modular, fairly small, complete and standalone. Unfortunately, this best practice tends to result in modular ontologies that redefine (often inconsistently) shared concepts.

Best practice for upward facing standards ontologies is to rely as much as possible on ontologies defined elsewhere. This usually starts off by importing many ontologies and ends up with a number of bridges to the standards when it’s discovered that they are incompatible.

Linked Open Data School

The linked open data school promotes the idea of sharing identifiers across enterprises. Linked data is very focused on instance (individual or ABox) data, and only secondarily on classes.

Exemplars

The poster child for LOD is DBPedia, the LOD knowledge graph derived from the Wikipedia information boxes. It also includes the direct derivatives such as WikiData and the entire Linked Open Data Cloud.

I would put the Global Legal Entity Identifier Foundation (GLEIF) in this school as their primary focus is sharing between enterprises and there are more focused on the ABox (the instances).

How to Recognize

Linked open data ontologies are recognizable by their instances, often millions and in many cases billions of instances. The ontologies (TBox) is often very naïve, as they are often derived directly from informal classifications made by text editors in Wikipedia and its kin.

You will see many adhoc classes raised to the status of a formal class in LOD ontologies. I just notice the classes dbo:YearInSpaceFlight and yago:PsychologicalFeature100231001.

Best Practices

The first best practice (recognized more in the breach) is to rely on other organizations IRIs. This is often clumsy because historically, each organization invented identifiers for things in the world (their employees and vendors for instance) and they tend to build their IRIs around these well-known (at least locally) identifiers.

A second best practice is entity resolution and “owl:sameAs.” Entity resolution can determine if two IRIs represent the same real-world object. Once recognized, one of the organizations can choose to adopt the others IRI (previous paragraph best practice) or continue to use their own, but recognize the identity through owl:sameAs (which is mostly motivated by the following best practice).

LOD creates the opportunity for IRI resolution at the instance level. Put the DBPedia IRI for a famous person in your browser address bar and you will be redirected to DBPedia resolution page for that individual, showing all that DBPedia knows about them. For security reasons, most enterprises don’t yet do this. Because of this, another best practice is to only create triples with subjects whose domain name you control. Anything you state about a IRI in someone else’s name space will not be available for resolution by the organization that minted the subject URI.

NLP/LLM School

There is a school of ontology design that says turn ontology design over to the machines. It’s too hard anyway.

Exemplars

Most of these are also internal projects. About every two to three years, we see another startup with the premise that ontologies can be built by machines. For most of history, these were cleverly tailored NLP systems. The original works in this area took large teams of computational linguists to master.

This year (2023), they are all LLMs. You can ask ChatGPT to build an ontology for [fill in the blank] industry, and it will come up with something surprisingly credible looking.

How to Recognize

For LLMs, the first giveaway are hallucinations. These are hard to spot and require deep domain and ontology experience to pick out. The second clue is humans with six fingers (just kidding). There aren’t many publicly available LLM generated ontologies (or if there are they are so good we haven’t detected that they were machine generated).

Best Practices

Get a controlled set of documents that represent the domain you wish to model. This is better than relying on what ChatGPT learned by reading the internet.

And have a human in the loop. This is an approach that shows significant promise and several researchers have already created prototypes that utilize this approach. Consider that the NLP / LLM created artifacts are primarily speed reading or intelligent assistants for the ontologist.

In the broader adoption of LLMs, there is a lot of energy going into ways to use knowledge graphs as “guard rails” against some of LLMs excesses, and the value of keeping a human in the loop. Our immediate concern there are advocates of letting generative AI design ontologies, and as such it becomes a school of its own.

Data-Centric School

The data-centric school of ontology design, as promoted by Semantic Arts, focuses on ontologies that can be populated and implemented. In building architecture, they often say “It’s not architecture until it’s built.” The data-centric school says, “It’s not an ontology until it has been populated (with instance level, real world data, not just taxonomic tags).” The feedback loop of loading and querying the data is what validates the model.

Exemplars

Gist, an open-source owl ontology, is the exemplar data-centric ontology. SchemaApp, Morgan Stanley’s compliance graph, Broadridge’s Data Fabric, Procter & Gamble’s Material Safety graph, Schneider-Electric’s product catalog graph, Standard & Poor’s commodity graph, Sallie Mae’s Service Oriented Architecture and dozens of small firms’ enterprise ontologies are based on gist.

How to Recognize

Importing gist is a dead giveaway. Other telltale signs include a modest number of classes (less than 500 for almost all enterprises) and eschewing inverse and transitive properties (the overhead for these features in a large knowledge graph far outweigh their expressive power). Another giveaway is delegating taxonomic distinctions to be instances of subclasses of gist:Category rather than being classes in their own right.

Best Practices

One best practice is to have non primitive classes have “equivalent class” restrictions that define class membership and are used to infer the class hierarchy. Another best practice is to have domains and ranges at very high levels of abstraction (and often missing completely) in order to promote property reuse and reduce future refactoring.

Another best practice is to load a knowledge graph with data from the domain of discourse to prove that the model is appropriate and at the requisite level of detail.

Summary

One of the difficulties in getting wider spread adoption of ontologies and knowledge graphs is that if you recruit and/or assemble a group of ontologists, there is a very good chance you will have members from multiple of the above-described schools. There is a good chance they will have conflicting goals, and even a different definition of what “good” is. Often, they will not even realize that their difference of opinion is due to their being members of a different tribe.

There isn’t one of these schools that is better than any of the others for all purposes. They each grew up solving different problems and emphasizing different aspects of the problem.

When you look at existing ontologies, especially those that were created by communities, you’ll often find that many are an accidental hybrid of the above schools. This is caused by different members coming to the project from different schools and applying their own best practices to the design project.

Rather than try to pick which school is “best,” you should consider what the objectives of your ontology project are and use that to determine which school is better matched. Select ontologists and other team members who are willing to work to the style of that school. Only then is it appropriate to consider “best practices.”

Acknowledgement

I want to acknowledge Michael Debellis for several pages of input on an early draft of this paper. The bits that didn’t make it into this paper may surface in a subsequent paper.

DCA Forum Recap: Forrest Hare, Summit Knowledge Solutions

A knowledge model for explainable military AI

Forrest Hare, Founder of Summit Knowledge Solutions, is a retired US Air Force targeting and information operations officer who now works with the Defense Intelligence Agency (DIA). His experience includes integrating intelligence from different types of communications, signals, imagery, open source, telemetry, and other sources into a cohesive and actionable whole.

Hare became aware of semantics technology while at SAIC and is currently focused on building a space + time ontology called the DIA Knowledge Model so that Defense Department intelligence could use it to contextualize these multi-source inputs.

The question becomes, how do you bring objects that don’t move and objects that do move into the same information frame with a unified context? The information is currently organized by collectors and producers.

The object-based intelligence that does exist involves things that don’t move at all.  Facilities, for example, or humans using phones that are present on a communications network are more or less static. But what about the things in between such as trucks that are only intermittently present?

Only sparse information is available about these. How do you know the truck that was there yesterday in an image is the same truck that is there today? Not to mention the potential hostile forces who own the truck that have a strong incentive to hide it.

Objects in object-based intelligence not only include these kinds of assets, but also events and locations that you want to collect information about. In an entity-relationship sense, objects are entities.

Hare’s DIA Knowledge Model uses the ISO standard Basic Formal Ontology (BFO) to unify domains so that the information from different sources is logically connected and therefore makes sense as part of a larger whole. BFO’s maintainers (Director Barry Smith and team at the National Center for Ontological Research (NCOR) at the University of Buffalo) keep the ontology strictly limited to 30 or so classes.

The spatial-temporal regions of the Knowledge Model are what’s essential to do the kinds of dynamic, unfolding object tracking that’s been missing from object-based intelligence. Hare gave the example of a “site” (an immaterial entity) from a BFO perspective. A strict geolocational definition of “site” makes it possible for both humans and machines to make sense of the data about sites. Otherwise, Hare says, “The computer has no idea how to understand what’s in our databases, and that’s why it’s a dumpster fire.”

This kind of mutual human and machine understanding is a major rationale behind explainable AI. A commander briefed by an intelligence team must know why the team came to the conclusions it did. The stakes are obviously high. “From a national security perspective, it’s extremely important for AI to be explainable,” Hare reminded the audience. Black boxes such as ChatGPT as currently designed can’t effectively answer the commander’s question on how the intel team arrived at the conclusions it did.

Finally, the level of explain-ability knowledge models like the DIA’s becomes even more critical as information flows into the Joint Intelligence Operations Center (JIOC). Furthermore, the various branches of the US Armed Forces must supply and continually update a Common Intelligence Picture that’s actionable by the US President, who’s the Commander in Chief for the military as a whole.

Without this conceptual and spatial-temporal alignment across all service branches, joint operations can’t proceed as efficiently and effectively as they should.  Certainly, the risk of failure looms much larger as a result.

Contributed by Alan Morrison