Financial Data Transparency Act “PitchFest”

The Data Foundation (Data Foundation PitchFest) hosted at PitchFest on “Unlocking the vision of the Financial Data Transparency Act” a few days ago. Selected speakers were given 10 minutes to bring their best ideas on how to use the improved financial regulatory information and data.

The Financial Data Transparency Act is a new piece of legislation directly affecting the financial services industry. In short, it directs financial regulators to harmonize data collections and move to machine (and people) readable forms. The goal is to reduce the burdens of compliance on regulated industries, increase the ability to analyze data, and to enhance overall transparency.

Two members of our team, Michael Atkin and Dalia Dahleh were given the opportunity to present. Below is the text from Michael Atkin’s pitch:

  1. Background – Just to set the stage. I’ve been fortunate to have been in the position as scribe, analyst, advocate and organizer for data management since 1985.  I’ve always been a neutral facilitator – allowing me to sit on all sides of the data management issue all over the world – from data provider to data consumer to market authority to regulator.  I’ve helped create maturity models outlining best practice – performed benchmarking to measure progress – documented the business case – and created and taught the Principles of Data Management at Columbia University.  I’ve also served on the SEC’s Market Data Advisory Committee, the CFTC’s Technical Advisory Committee and as the Chair of the Data Subcommittee of the OFR’s Financial Research Advisory activity during the financial crisis of 2008.  So, I have some perspective on the challenges the regulators face and the value of the FDTA.
  2. Conclusion (slide 2) – My conclusions after all that exposure are simple. There is a real data dilemma for many entities.  The dilemma is caused by fragmentation of technology.  It’s nobody’s fault.  We have business and operational silos.  They are created using proprietary software.  The same things are modeled differently based on the whim of the architects, the focus of the applications and the nuances of the technical solution.This fragmentation creates “data incongruence” – where the meaning of data from one repository doesn’t match other repositories.  We have the same words, with different meanings.  We have the same meaning using different words.  And we have nuances that get lost in translation.  As a result, we spend countless effort and money moving data around, reconciling meaning and doing mapping.  As one of my banking clients said … “My projects end up as expensive death marches of data cleansing and manipulation just to make the software work.”  And we do this over and over ad infinitum.Not only do we suffer from data incongruence – we suffer from the limitations of relational technology that still dominates our way of processing data.  For the record, relational technology is over 50 years old.  It was (and is) great for computation and structured data.  It’s not good for ad hoc inquiry and scenario-based analysis.  The truth is that data has become isolated and mismatched across repositories due to technology fragmentation and the rigidity of the relational paradigm.  Enterprises (including government enterprises) often have thousands of business and data silos – each based on proprietary data models that are hard to identify and even harder to change.  I refer to this as the bad data tax.  It costs most organizations somewhere around 40-60% of their IT budget to address.  So, let’s recognize that this is a real liability.  One that diverts resources from business goals, extends time-to-value for analysts, and leads to knowledge worker frustration.  The new task before FSOC leadership and the FDTA is now about fixing the data itself.
  3. Solution (slide 3) – The good news is that the solution to this data dilemma is actually quite simple and twofold in nature. First – adopt the principles of good data hygiene.  And on that front, there appears to be good progress thanks to efforts around the Federal Data Strategy and things related to BCBS 239 and the Open Government Data Act.  But governance alone will not solve the data dilemma.The second thing that is required is to adopt data standards that were specifically designed to address the problems of technology fragmentation.  And these open data web-based standards are quite mature.  They include the Internationalized Resource Identifier (or IRI) for identity resolution.  The use of ontologies – that enable us to model simple facts and relationship facts.  And the expression of these things in standards like RDF for ontologies, OWL for inferencing and SHACL for business rules.From these standards you get a bunch of capabilities.  You get quality by math (because the ontology ensures precision of meaning).  You get reusability (which eliminates the problem of hard coded assumptions and the problem of doing the same thing in slightly different ways).  You get access control (because the rules are embedded into the data and not constrained by systems or administrative complexity).  You get lineage traceability (because everything is linked to a single identifier so that data can be traced as it flows across systems).  And you get good governance (since these standards use resolvable identity, precise meaning and lineage traceability to shift governance from people-intensive data reconciliation to more automated data applications).
  4. FDTA (slide 4) – Another important component is that this is happening at the right time. I see the FDTA as the next step in a line of initiatives seeking to modernize regulatory reporting and reduce risk.  I’ve witnessed the efforts to move to T+1 (to address the clearing and settlement challenge).  I’ve seen the recognition of global interdependencies (with the fallout from Long Term Capital, Enron and the problems of derivatives in Orange County).  We’ve seen the problems of identity resolution that led to KYC and AML requirements.  And I was actively involved in understanding the data challenges of systemic risk with the credit crisis of 2008.The problem with all these regulatory activities is that most of them are not about fixing the data.  Yes, we did get LEI and data governance.  Those are great things, but far from what is required to address the data dilemma.  I also applaud the adoption of XBRL (and the concept of data tagging).  I like the XBRL taxonomies (as well as the Eurofiling regulatory taxonomies) – but they are designed vertically report-by-report with a limited capability for linking things together.  Not only that, most entities are just extracting XBRL into their relational environments that does little to address the problem of structural rigidity.  The good news is that all the work that has gone into the adoption of XBRL is able to be leveraged.  XML is good for data transfer.  Taxonomies are good for unraveling concepts and tagging.  And the shift from XML to RDF is straightforward and would not affect those who are currently reporting using XBRL.One final note before I make our pitch.  Let’s recognize that XBRL is not the way the banks are managing their internal data infrastructures.  They suffer from the same dilemmas as the regulators and almost every G-SIB and D-SIB I know is moving toward semantic standards.  Because even though FDTA is about the FSOC agencies – it will ultimately affect the financial institutions.  I see this as an opportunity for collaboration between regulators and the regulated, in building the infrastructure for the digital world.
  5. Proposal (slide 5) – Semantic Arts is proposing a pilot project to implement the foundational infrastructure of precise data about financial instruments (including identification, classification, descriptive elements and corporate actions), legal entities (including entity types as well as information about ownership and control), obligations (associated with issuance, trading, clearing and settlement), and holdings about the portfolios of the regulated entities. These are the building blocks of linked risk analysis.To implement this initiative, we are proposing you start with a single simple model of the information from one of the covered agencies.  The Initial project would focus on defining the enterprise model and conforming two to three key data sets to the model.  The resulting model would be hosted on a graph database.  Subsequent projects would involve expanding the footprint of data domains to be added to the graph, and gradually building functionality to begin to reverse the legacy creation process.We would initiate things by leveraging the open standard upper ontology (GIST) from Semantic Arts as well as the work of the Financial Industry Business Ontology (from the EDM Council) and any other vetted ontology like the one OFR is building for CFI.Semantic Arts has a philosophy of “think big” (like cross-agency interoperability) but “start small” (like a business domain of one of the agencies).  The value of adopting semantic standards is threefold – and can be measured using the “three C’s” of metrics.  The first C is cost containment starting with data integration and includes areas focused on business process automation and consolidation of redundant systems (best known as technical agility).  The second C is capability enhancement for analysis of the degrees of interconnectedness, the nature of transitive relationships, state contingent cash flow, collateral flow, guarantee and transmission of risk.  The final C is implementation of the control environment focused on tracking data flow, protecting sensitive information, preventing unwanted outcomes, managing access and ensuring privacy.
  6. Final Word (contact) – Just a final word to leave you with. Adopting these semantic standards can be accomplished at a fraction of the cost of what you spend each year supporting the vast cottage industry of data integration workarounds.  The pathway forward doesn’t require ripping everything out but instead building a semantic “graph” layer across data to connect the dots and restore context.  This is what we do.  Thank you.

Link to Slide Deck

How to Take Back 40-60% of Your IT Spend by Fixing Your Data

Creating a semantic graph foundation helps your organization become data-driven while significantly reducing IT spend

Organizations that quickly adapt to changing market conditions have a competitive advantage over their peers. Achieving this advantage is dependent on their ability to capture, connect, integrate, and convert data into insight for business decisions and processes. This is the goal of a “data-driven” organization. However, in the race to become data-driven, most efforts have resulted in a tangled web of data integrations and reconciliations across a sea of data silos that add up to between 40% – 60% of an enterprise’s annual technology spend. We call this the “Bad Data Tax”. Not only is this expensive, but the results often don’t translate into the key insights needed to deliver better business decisions or more efficient processes.

This is partly because integrating and moving data is not the only problem. The data itself is stored in a way that is not optimal for extracting insight. Unlocking additional value from data requires context, relationships, and structure, none of which are present in the way most organizations store their data today.

Solution to the Data Dilemma

The good news is that the solution to this data dilemma is actually quite simple. It can be accomplished at a fraction of the cost of what organizations spend each year supporting the vast industry of data integration workarounds. The pathway forward doesn’t require ripping everything out but building a semantic “graph” layer across data to connect the dots and restore context. However, it will take effort to formalize a shared semantic model that can be mapped to data assets, and turn unstructured data into a format that can be mined for insight. This is the future of modern data and analytics and a critical enabler to getting more value and insight out of your data.

This shift from relational to graph approach has been well-documented by Gartner who advise that “using graph techniques at scale will form the foundation of modern data and analytics” and “graph technologies will be used in 80% of data and analytics innovations by 2025.” Most of the leading market research firms consider graph technologies to be a “critical enabler.” And while there is a great deal of experimentation underway, most organizations have only scratched the surface in a use-case-by-use-case fashion. While this may yield great benefits for the specific use case, it doesn’t fix the causes behind the “Bad Data Tax” that organizations are facing. Until executives begin to take a more strategic approach with graph technologies, they will continue to struggle to deliver the needed insights that will give them a competitive edge. 

Modernizing Your Data Environment

Most organizations have come of age in a world dominated by technology. There have been multiple technology revolutions that have necessitated the creation of big organizational departments to make it all work. In spite of all the activity, the data paradigm hasn’t evolved much. Organizations are still managing data using relational technology invented in the 1970’s. While relational databases are the best fit for managing structured data workloads, they are not good for ad hoc inquiry and scenario-based analysis.

Data has become isolated and mismatched across repositories and silos due to technology fragmentation and the rigidity of the relational paradigm. Enterprises often have thousands of business and data silos–each based on proprietary data models that are hard to identify and even harder to change. This has become a liability that diverts resources from business goals, extends time-to-value for analysts, and leads to business frustration. The new task before leadership is now about fixing the data itself.

Fixing the data is possible with graph technologies and web standards that share data across federated environments and between interdependent systems. The approach has evolved for ensuring data precision, flexibility, and quality. Because these open standards are based on granular concepts, they become reusable building blocks for a solid data foundation. Adopting them removes ambiguity, facilitates automation, and reduces the need for data reconciliation.

Data Bill of Rights

Organizations need to remind themselves that data is simply a representation of real things (customers, products, people, and processes) where precision, context, semantics, and nuance matter as much as the data itself. For those who are tasked with extracting insight from data, there are several expectations that should be honored– that the data should be available and accessible when needed, stored in a format that is flexible and accurate, retains the context and intent of the original data, and is traceable as it flows through the organization.

This is what we call the “Data Bill of Rights”. Providing this Data Bill of Rights is achievable right now without a huge investment in technology or massive disruption to the way the organization operates.

Strategic Graph Deployment

Many organizations are already leveraging graph technologies and semantic standards for their ability to traverse relationships and connect the dots across data silos. These organizations are often doing so on a case-by-case basis covering one business area and focusing on an isolated application, such as fraud detection or supply chain analytics. While this can result in faster time-to-value for a singular use case, without addressing the foundational data layers, it results in another silo without gaining the key benefit of reusability.

The key to adopting a more strategic approach to semantic standards and knowledge graphs starts at the top with buy-in across the C-suite. Without this senior sponsorship, the program will face an uphill battle of overcoming the organizational inertia with little chance of broad success. However, with this level of support, the likelihood dramatically increases of getting sufficient buy-in across all the stakeholders involved in managing an organization’s data infrastructure.

While starting as an innovation project can be useful, forming a Graph Center of Excellence, will have an even greater impact. It can give the organization a dedicated team to evangelize and execute the strategy, score incremental wins to demonstrate value and leverage best practices and economies of scale along the way. They would be tasked with both building the foundation as well as prioritizing graph use cases against organizational focuses.

One key benefit from this approach is the ability to start small, deliver quick wins, and expand as value is demonstrated. There is no getting around the mandate to initially deliver something practical and useful. A framework for building a Graph Center of Excellence will be published in the coming weeks.

Scope of Investment Required

Knowledge graph advocates admit that a long tail of investment is necessary to realize its full potential. Enterprises need basic operational information including an inventory of the technology landscape and the roadmap of data and systems to be merged, consolidated, eliminated, or migrated. They need to have a clear vision of the systems of record, data flows, transformations, and provisioning points. They need to be aware of the costs associated with the acquisition of platforms, triplestore databases, pipeline tools, and other components needed to build the foundational layer of the knowledge graph.

In addition to the plumbing, organizations need to also understand the underlying content that supports business functionality. This includes the reference data about business entities, agents, and people. The taxonomies and data models about contract terms and parties, the meaning of ownership and control, notions of parties and roles, and so on. These concepts are the foundation of the semantic approach. These might not be exciting, but they are critical because it is the scaffolding for everything else.

Initial Approach

When thinking about the scope of investment, the first graph-enabled application can take anywhere from 6-12 months from conception to production. Much of the time needs to be invested in getting data teams aligned and mobilized – which underscores the essential nature of leadership and the importance of starting with the right set of use cases. It need to be operationally viable and solve a real business problem. The initial use case has to be important for the business.

With the right strategic approach in perspective, the first delivery is infrastructure plus pipeline management and people. This gets the organization the MVP including an incremental project plan and rollout. The second delivery should consist of the foundational building blocks for workflow and reusability. This will prove the viability of the approach.

Building Use Cases Incrementally

The next series of use cases should be based on matching functionality to capitalize on concept reusability. This will enable teams to shift their effort from building the technical components to adding incremental functionality. This translates to 30% of the original cost and a rollout that could be three times faster. These costs will continue to decrease as the enterprise expands reusable components – achieving full value around the third year.

The strategic play is not the $3-$5 million for the first few domains, but the core infrastructure required to run the organization moving forward. It is absolutely possible to continue to add use cases on an incremental level, but not necessarily the best way to capitalize on the digital future. The long-term cost efficiency of a foundational enterprise knowledge graph (EKG) should be compared to the costs of managing thousands of silos. For a big enterprise, this can be measured in hundreds of millions of dollars – before factoring in the value proposition of enhanced capabilities for data science and complying with regulatory obligations to manage risks.

Business Case Summary

Organizations are paying a “Bad Data Tax” of 40% – 60% of their annual IT spend on the tangled web of integrations across their data silos. To make matters worse, following this course does not help an organization achieve their goal of being data-driven. The data itself has a problem. This is due to the way data is traditionally stored in rows, columns, and tables that do not have the context, relationships, and structure needed to extract the needed insight.

Adding a semantic graph layer is a simple, non-intrusive solution to connect the dots, restore context, and provide what is needed for data teams to succeed. While the Bad Data Tax alone quantifiably justifies the cost of solving the problem, it scarcely scratches the surface of the full value delivered. The opportunity cost side, though more difficult to quantify, is no less significant with the graph enabling a host of new data and insight capabilities (better AI and data science outcomes, increased personalization and recommendations for driving increased revenue, more holistic views through data fabrics, high fidelity digital twins of assets, processes, and systems for what-if analysis, and more).

While most organizations have begun deploying graph technologies in isolated use cases, they have not yet applied them foundationally to solving the Bad Data Tax and fixing their underlying data problem. Success will require buy-in and sponsorship across the C-suite to overcome organizational inertia. For best outcomes, create a Graph Center of Excellence focused on strategically deploying both a semantic graph foundation and high-priority use cases. The key will be in starting small, delivering quick wins with incremental value and effectively communicating this across all stakeholders.

While initial investments can start small, expect initial projects to take from 6-12 months. To cover the first couple of projects, a budget between $1.5-$3 million should be sufficient. The outcomes will justify further investment in graph-based projects throughout the organization, each deploying 30% faster and cheaper than early projects through leveraging best practices and economies of scale.

Conclusion

The business case is compelling – the cost to develop a foundational graph capability is a fraction of the amount wasted each year on the Bad Data Tax alone. Addressing this problem is both easier and more urgent than ever. Failing to develop the data capabilities that graph technologies offer can put organizations at a significant disadvantage, especially in a world where AI capabilities are accelerating and critical insight is being delivered in near real time. The opportunity cost is significant. The solution is simple. Now is the time to act.

 

This article originally appeared at How to Take Back 40-60% of Your IT Spend by Fixing Your Data – Ontotext, and was reposted 

 

HR Tech and The Kitchen Junk Drawer

I often joke that when I started with Semantic Arts nearly two years ago, I had no idea a solution existed to a certain problem that I well understood. I had experienced many of the challenges and frustrations of an application-centric world but had always assumed it was just a reality of doing business. As an HR professional, I’ve heard over the years about companies having to pick the “best of the worst” technologies. Discussion boards are full of people dissatisfied with current solutions – and when they try new ones, they are usually dissatisfied with those too!

The more I have come to understand the data-centric paradigm, the more I have discovered its potential value in all areas of business, but especially in human resources. It came as no surprise to me when a recent podcast by Josh Bersin revealed that the average large company is using 80 to 100 different HR Technology systems (link). Depending on who you ask, HR is comprised of twelve to fifteen key functions – meaning that we have an average of six applications for each key function. Even more ridiculously, many HR leaders would admit that there are probably even more applications in use that they don’t know about.  Looking beyond HR at all core business processes, larger companies are using more than two hundred applications, and the number is growing by 10% per year, according to research by Okta from earlier this year (link). From what we at Semantic Arts have seen, the problem is actually much greater than this research indicates.

Why Is This a Problem?

Most everyone has experienced the headaches of such application sprawl. Employees often have to crawl through multiple systems, wasting time and resources, either to find data they need or to recreate the analytics required for reporting. As more systems come online to try to address gaps, employees are growing weary of learning yet another system that carries big promises but usually fails to deliver (link). Let’s not forget the enormous amount of time spent by HR Tech and other IT resources to ensure everything is updated, patched and working properly. Then, there is the near daily barrage of emails and calls from yet another vendor promising some incremental improvement or ROI that you can’t afford to miss (“Can I have just 15 minutes of your time?”).

Bersin’s podcast used a great analogy for this: the kitchen drawer problem. We go out and procure some solution, but it gets thrown into the drawer with all the other legacy junk. When it comes time to look in the drawer, either it’s so disorganized or we are in such a hurry that it seems more worthwhile to just buy another app than to actually take the time to sort through the mess.

Traditional Solutions

When it comes to legacy applications, companies don’t even know where to start. We don’t know who is even using which system, so we don’t dare to shut off or replace anything. So we end up with a mess of piecemeal integrations that may solve the immediate issue, but just kicks the technical debt down the road. Sure, there are a few ETL and other integration tools out there that can be helpful, but without a unified data model and a broad plan, these initiatives usually end up in the drawer with all the other “flavor of the month” solutions.

Another route is to simply put a nice interface over the top of everything, such as ServiceNow or other similar solutions. This can enhance the employee experience by providing a “one stop shop” for information, but it does nothing to address the underlying issues. These systems have gotten quite expensive, and can run $50,000-$100,000 per year (link). The systems begin to look like ERPs in terms of price and upkeep, and eventually they become legacy systems themselves.

Others go out and acquire a “core” solution such as SAP, Oracle, or another ERP system. They hope that these solutions, together with the available extensions, will provide the same interface benefits. A company can then buy or build apps that integrate. Ultimately, these solutions are also expensive and become “black boxes” where data and its related insights are not visible to the user due to the complexity of the system. (Intentional? You decide…). So now you go out and either pay experts in the system to help you manipulate it or settle for whatever off-the-shelf capabilities and reporting you can find. (For one example of how this can go, see link).

A Better Path Forward

Many of the purveyors of these “solutions” would have you believe there is no better way forward; but those familiar with data-centricity know better. To be clear, I’m not a practioner or technologist. I joined Semantic Arts in an HR role, and the ensuing two years have reshaped the way I see HR and especially HR information systems. I’ll give you a decent snapshot as I understand it, along with an offer that if your interested in the ins and outs of these things I’d be happy to introduce you to someone that can answer them in greater detail.

Fundamentally, a true solution requires a mindset shift away from application silos and integration, towards a single, simple model that defines the core elements of the business, together with a few key applications that are bound to that core and speak the same language. This can be built incrementally, starting with specific use cases and expanding as it makes sense. This approach means you don’t need to have it “all figured out” from the start. With the adoption of an existing ontology, this is made even easier … but more on that later.

Once a core model is established, an organization can begin to deal methodically with legacy applications. You will find that over time many organizations go from legacy avoidance to legacy erosion, and eventually to legacy replacement. (See post on Incremental Stealth Legacy Modernization). This allows a business to slowly clean out that junk drawer and avoid filling it back up in the future (and what’s more satisfying than a clean junk drawer?).

Is this harder in the short term than traditional solutions? It may appear so on the surface, but really it isn’t. When a decision is made to start slowly, companies discover that the flexibility of semantic knowledge graphs allows for quick gains. Application development is less expensive and applications more easily modified as requirements change. Early steps help pay for future steps, and company buy-in becomes easier as stakeholders see their data come to life and find key business insights with ease.

For those who may be unfamiliar with semantic knowledge graphs, let me try to give a brief introduction. A graph database is a fundamental shift away from the traditional relational structure. When combined with formal semantics, a knowledge graph provides a method of storing and querying information that is more flexible and functional (more detail at link or link). Starting from scratch would be rather difficult, but luckily there are starter models (ontologies) available, including one we’ve developed in-house called gist, which is both free and freely available. By building on an established structure, you can avoid re-inventing the wheel.

HR departments looking to leverage AI and large language models in the future will find this data-centric transformation even more essential, but that’s a topic for another time.

Conclusion

HR departments face unique challenges. They deal with large amounts of information and must justifying their spending as non-revenue producing departments. The proliferation of systems and applications is a drain on employee morale and productivity and represents a major source of budget drain.

By adopting data-centric principles and applying them intentionally in future purchasing and application development, HR departments can realize greater strategic insights while saving money and providing a richer employee experience.

Taken all the way to completion, adoption of these technologies and principles would mean business data stored in a single, secured location. Small apps or dashboards can be rapidly built and deployed as the business evolves. No more legacy systems, no more hidden data, no more frustration with systems that simply don’t work.

Maybe, just maybe, this model will provide a success story that leads the rest of the organization to adopt similar principles.

 

JT Metcalf is the Chief Administrative Officer at Semantic Arts, managing HR functions along with many other hats.

The Data-Centric Revolution: “RDF is Too Hard”

This article originally appeared at The Data-Centric Revolution: “RDF is Too Hard” – TDAN.com. Subscribe to The Data Administration Newsletter for this and other great content!

The Data-Centric Revolution: “RDF is Too Hard”

By Dave McComb

We hear this a lot. We hear it from very smart people. Just the other day we heard someone say they had tried RDF twice at previous companies and it failed both times. (RDF stands for Resource Description Framework,[1] which is an open standard underlying many graph databases). It’s hard to convince someone like that that they should try again.

That particular refrain was from someone who was a Neo4j user (the leading contender in the LPG (Labeled Property Graph) camp). We hear the same thing from any of three camps: the relational camp, the JSON camp, and the aforementioned LPG camp.

Each has a different reason for believing this RDF stuff is just too hard. Convincing those who’ve encountered setbacks to give RDF another shot is also challenging. In this article, I’ll explore the nuances of RDF, shedding light on challenges and strengths in the context of enterprise integration and application development.

For a lot of problems, the two-dimensional world of relational tables is appealing. Once you know the column headers, you pretty much know how to get to everything. It’s not quite one form per table, but it isn’t wildly off from that. You don’t have to worry about some of the rows having additional columns, you don’t have to worry about some cells being arrays or having additional depth. Everything is just flat, two-dimensional tables. Most reporting is just a few joins away.

JSON is a bit more interesting. At some point you discover, or decree if you’re building it, that your dataset has a structure. Not a two-dimensional structure as in relational, but more of a tree-like structure. More specifically, it’s all about determining if this is an array of dictionaries or a dictionary of arrays. Or a dictionary of dictionaries. Or an array of arrays. Or any deeply nested combination of these simple structures. Are the keys static — that is, can they be known specifically at coding time, or are they derived dynamically from the data itself? Frankly, this can get complex, but at least it’s only locally complex. A lot of JSON programming is about turning someone else’s structure into a structure that suits the problem at hand.

One way to think of LPG is JSON on top of a graph database. It has a lot of the flexibility of JSON coupled with the flexibility of graph traversals and graph analytics. It solves problems difficult to solve with relational or just JSON and has beautiful graphics out of the box. Maybe link to your blog post about LPGs as training wheels?

Each of these approaches can solve a wide range of problems. Indeed, almost all applications use one of those three approaches to structure the data they consume.

And I have to admit, I’ve seen a lot of very impressive Neo4j applications lately. Every once in a while, I question myself and wonder aloud if we should be using Neo4j. Not because RDF is too hard but because we’ve mastered it and have many successful implementations running at client sites and internally. But maybe, if it is really easier, we should switch. And maybe it just isn’t worth disagreeing with our prospects.

Enterprise Integration is Hard

Then it struck me. The core question isn’t really “RDF v LPG (or JSON or relational),” it’s “application development v. enterprise integration.”

I’ve heard Jans Aasman, CEO of Franz, the creators of AllegroGraph, make this observation more than once: “Most application developers have dedicated approximately 0 of their neurons contemplating how what they are working on is going to fit in with the rest of their enterprise,  whereas people who are deeply into RDF may spend upwards of half their mental cycles thinking of how the task and data at hand fits into the overall enterprise model.”

That, I think, is the nub of the issue. If you are not concerned with enterprise integration, then maybe those features that scratch the itches that enterprise integration creates are not worth the added hassle.

Let’s take a look at the aspects of enterprise integration that are inherently hard, why RDF might be the right tool for the job, and why it might be overkill for traditional application development.

Complexity Reduction

One of the biggest issues dealing with enterprise integration is complexity. Most mid to large enterprises harbor thousands of applications. Each application has thousands of concepts (tables and columns or classes and attributes or forms and fields) that must be learned to become competent either in using the application and/or in debugging and extending it. No two application data models are alike. Even two applications in the same domain (e.g., two inventory systems) will have comically different terms, structures, and even levels of abstraction.

Each application is at about the complexity horizon that most mere mortals can handle. The combination of all those models is far beyond the ability of individuals to grasp.

Enterprise Resource Planning applications and Enterprise Data Modeling projects have shone a light on how complex it can get to attempt to model all an enterprise’s data. ERP systems now have tens of thousands of tables, and hundreds of thousands of columns. Enterprise Data Modeling fell into the same trap. Most efforts attempted to describe the union of all the application models that were in use. The complexity made them unusable.

What few who are focused on solving a point solution are aware of, is that there is a single simple model at the heart of every enterprise. It is simple enough that motivated analysts and developers can get their heads around it in a finite amount of time. And it can be mapped to the existing complex schemas in a lossless fashion.

The ability to posit these simple models is enabled by RDF (and its bigger brothers OWL and SHACL). RDF doesn’t guarantee you’ll create a simple or understandable model (there are plenty of counterexamples out there) but it at least makes the problem tractable.

Concept Sharing

An RDF based system is mostly structure-free, so we don’t have to be concerned with structural disparities between systems, but we do need a way to share concepts. We need to have a way to know that “employee,” “worker,” “user,” and “operator” are all referring to the same concept.  Or if they aren’t, in what ways they overlap.

In an RDF-based system we spend a great deal of time understanding the concepts that are being used in all the application systems, and then creating a way that both the meaning and the identity of the concept can be easily shared across the enterprise.  And that the map between the existing application schema elements and the shared concepts are also well known and findable.

One mechanism that helps with this is the idea that concepts have global identifiers (URIs /IRIs) that can be resolved.  You don’t need to know which application defined a concept; the domain name (and therefore the source authority) is right there in the identifier and can be used much like a URL to surface everything known about the concept.  This is an important feature of enterprise integration.

Instance Level Integration

It’s not just the concepts. All the instances referred to in application systems have identifiers.  But often the identifiers are local. That is, “007” refers to James Bond in the Secret Agent table, but it refers to “Ham Sandwich” in the company cafeteria system.

The fact that systems have been creating identity aliases for decades is another problem that needs to be addressed at the enterprise level. The solution is not to attempt, as many have in the past, to jam a “universal identifier” into the thousands of affected systems. It is too much work, and they can’t handle it anyway. Plus, there are many identity problems that were unpredicted at the time their systems were built (who imagined that some of our vendors would also become customers?) and are even harder to resolve.

The solution involves a bit of entity resolution, coupled with a flexible data structure that can accommodate multiple identifiers without getting confused.

Data Warehouse, Data Lake, and Data Catalog all in One

Three solutions have been mooted over the last three decades to partially solve the enterprise integration problem: data warehouses, lakes, and catalogs.  Data warehouses acknowledged that data has become balkanized.  By conforming it to a shared dimensional model and co-locating the data, we could get combined reporting.  But the data warehouse was lacking on many fronts: it only had a fraction of the enterprise’s data, it was structured in a way that wouldn’t allow transactional updates, and it was completely dependent on the legacy systems that fed it. Plus, it was a lot of work.

The data lake approach said co-location is good, let’s just put everything in one place and let the consumers sort it out. They’re still trying to sort it out.

Finally, the data catalog approach said: don’t try to co-locate the data, just create a catalog of it so consumers can find it when they need it.

The RDF model allows us to mix and match the best of all three approaches. We can conform some of the enterprise data (we usually recommend all the entity data such as MDM and the like, as well as some of the key transactional data). An RDF catalog, coupled with an R2RML or RML style map, will not only allow a consumer to find data sets of interest, in many cases they can be accessed using the same query language as the core graph. This ends up being a great solution for things like IoT, where there are great volumes of data that only need to be accessed on an exception basis.

Query Federation

We hinted at query federation in the above paragraph. The fact that query federation is built into the spec (of SPARQL, which is the query language of choice for RDF, and also doubles as a protocol for federation) allows data to be merged at query time, across different database instances, different vendors and even different types of databases (with real time mapping, relational and document databases can be federated into SPARQL queries).

Where RDF Might Be Overkill

The ability to aid enterprise integration comes at a cost. Making sure you have valid, resolvable identifiers is a lot of work. Harmonizing your data model with someone else’s is also a lot of work. Thinking primarily in graphs is a paradigm shift. Anticipating and dealing with the flexibility of schema-later modeling adds a lot of overhead. Dealing with the oddities of open world reasoning is a major brain breaker.

If you don’t have to deal with the complexities of enterprise integration, and you are consumed by solving the problem at hand, then maybe the added complexity of RDF is not for you.

But before you believe I’ve just given you a free pass consider this: half of all the work in most IT shops is putting back together data that was implemented by people who believed they were solving a standalone problem.

Summary

There are many aspects of the enterprise integration problem that lend themselves to RDF-based solutions. The very features that help at the enterprise integration level may indeed get in the way at the point solution level.

And yes, it would in theory be possible to graft solutions to each of the above problems (and more, including provenance and fine-grained authorization) onto relational, JSON or LPG. But it’s a lot of work and would just be reimplementing the very features that developers in these camps find so difficult.

If you are attempting to tackle enterprise integration issues, we strongly encourage you to consider RDF. There is a bit of a step function to learn it and apply it well, but we think it’s the right tool for the job.

Morgan Stanley , Global Fortune 100 Financial Institution, Transforms Information & Knowledge Manage ment

CASE STUDY: Morgan Stanley , Global Fortune 100 Financial Institution, Transforms Information & Knowledge Management 

Morgan Stanley is one of the largest investment banking and wealth management firms with offices in more than 42 countries and more than 60,000 employees, ranking 67th on the 2018 Fortune 500 list of the largest US corporations by total revenue. Headquartered in New York City, the organization faced challenges for better information retrieval, records retention, and legal hold capabilities or potentially face steep compliance fines. Securing data from outside threats is critical, but information from within the friendly firewall’s hamstrings business ability to operate, even without regulatory pressures. With worldwide data to swell by 10-fold by  

2025, a better solution needed to be addressed. Leadership at Morgan Stanley solicited several consulting experts and chose  Semantic Arts to guide in strategic resolution of this massive information sprawl while enabling greater information retrieval and easier user consumption. 

“Information management” as part of the legal department took lead as it was chartered with knowing about all data sets within the firm: Structured, Unstructured and everything in between. A major undertaking for any group yet alone a global giant with divisions all over the world. 

PROBLEM STATEMENT: Information management determined that existing traditional architectures and relational data structures were failing to keep pace with data growth and management of information assets. A solution that offered scale, extend-ability, and an enhanced user search experience was the primary objectives. Like other organizations entrenched in data silos and single ownership, information resided in many data sources (SQL, Oracle, SAP, SharePoint, Excel, PDF, videos, and shared files, to name a few), making for difficult data aggregation with accuracy. Decades of integration have resulted in highly dependent systems and applications. In fact, changes to any data schemas were laborious coding and testing exercises that yielded little business benefit. In short, it was problematic to access the right data and costly to make even simple changes.  

STRATEGY: By collaborating with Semantic Arts, experts in Data /Digital transformation, a data strategy was established for better  information management. Implementation of Semantic Knowledge Graphs and a flexible Ontology for future information growth  was decided after lengthy evaluation. It offered strategic value for supporting numerous domain areas simultaneously; including risk management, regulatory compliance, asset management, adviser information retrieval while linking data from each domain.  Additionally, an important use case of a Semantic Knowledge Graph approach is the architectural advantage of limitless extendibility across the enterprise for reuse. This factored into the long-term reasoning and vision of becoming Data Centric. 

APPROACH: Starting strategic initiatives like this can be particularly tricky in that achieving a balance between building a foundation for future success and immediate results can be a high wire act in organizational politics. With the advice of Semantic Arts, a “Think Big and Start Small” initial phase of work was proposed and accepted. This involved building a core Ontology in parallel with a Domain model, whereby both will be connected for building data relationships in future phases. This strategy will address the mission of contextually enriching the data organizationally, which in turn can be leveraged for greater insights in making business decisions and improved data governance.

Semantic Arts represents professional management consulting services for untangling the ad hoc patchwork of systems integration; turbo-charging new Knowledge and Information initiatives. We call it the “Data-Centric Revolution” that inverts the dependency between data models and application code. In short order, the code will become dependent on the shared information model. Join the Revolution!  

RESULTS: A small team of consultants and Morgan Stanley SME’s assembled for a 6-month assignment. During the engagement,  results came quickly. Within the initial weeks, after loading the data into a Triple Store and applying some very simplistic natural language processing routines, the team took the firm from 0.5% tagging of information to 25%, a 50-fold increase in information classification with relatively nominal effort. 

By incorporating Semantic Arts strategy of instituting a flexible Ontology and Knowledge Graphs, the improved visibility and harmonization of the information across multiple data sets quickly captured the attention of business capability owners. Amazing is the fact that only 1% accuracy was in place by leveraging existing technologies. Collaboratively, the team captured hundreds of regulatory jurisdictions used for promoting rules. By linking this data with billions of internal documents from disparate databases, it gave contextual information surrounding a document or repository for a self-assembling capability. Previously, aggregation was manually driven, inaccurate, clumsy and time-consuming. 

OTHER DOMAINS JOIN IN: Follow up engagements with Equity Research, and Operations Resiliency soon followed as the changes made a tangible impact. Those domain teams have taken on smaller use case purposes to answer difficult questions while leveraging the functionality from the core Ontology foundation developed by the Semantic Arts consultants during the first initiative. The inherent nature of Knowledge Graphs linking data relationships can transform into a Siri like experience by offering answers, recommendations and learn when tied to AI capabilities. Furthermore, the information within the Graph enriches the contextual value because its connected, resulting in a single model. The business value of capturing knowledge for expanding wisdom growth multiples as the connections become realized between domains. Beginnings are taking form: removal of data silos,  replication of data, and costly integration of application functionality.  

CHANGING WALL STREET: Combining a strategic data plan and incorporating Knowledge Graphs as a companion solution is making a difference. Wall Street reports are now being unlocked with AskResearch Chatbot capabilities to extract value by delivering hard to find information from hundreds of data sources. With coaching in best practice Ontology development, the  Equities Research team has successfully continued expansion of this graphs initial use case. 

“You have this historical archive sitting in a library and   there is so much value embedded in it, but traditionally   it has been hard to unlock that value because insights   and data are fixed in monolithic PDFs.” -D’Arcy Carr, the global head of research, editorial, and publishing

Claims of future time savings in (Billions/year) are hard to quantify but clearly usage of the Chatbot is steadily increasing. Leveraging Knowledge Graphs as the backbone for information retrieval was critical for intuitive search functionality and giving realization to self-service capability for users.  

FINANCIAL INDUSTRY INFORMATION FUTURE: The ability to leverage AI and Machine Learning in tandem with Knowledge  Graphs according to Forbes, is the financial industry future. Use will soon shift from a competitive edge to a must-have. Further discussion between Semantic Arts and Marketing and HR innovators at Morgan Stanley are in flight with more dynamic results pending.  

Semantic Arts represents professional management consulting services for untangling the ad hoc patchwork of systems integration; turbo charging  new Knowledge and Information initiatives. We call it the, “Data-Centric Revolution” that inverts the dependency between data models and application code. In short order, the code will become dependent on the shared information model. Join the Revolution!  

gist Jumpstart

This blog post is for anyone responsible for Enterprise data management who would like to save time and costs by re-using a great piece of modeling work. It updates an earlier blog post, “A brief introduction to the gist semantic model”.

A core semantic model, also called an upper ontology, is a common model across the Enterprise that includes major concepts such as Event, Agreement, and Organization. Using an upper ontology greatly simplifies data integration across the Enterprise. Imagine, for example, being able to see all financial Events across your Enterprise; that kind of visibility would be a powerful enabler for accurate financial tracking, planning, and reporting.

If you are ready to incorporate semantics into your data environment, consider using the gist upper ontology. gist is available for free from Semantic Arts under a creative commons license. It is based on more than a hundred data-centric projects done with major corporations in a variety of lines of business.  gist “is designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and the least amount of ambiguity.”  The Wikipedia entry for upper ontologies compares gist to other ontologies and gives a good sense of why gist is a match for Enterprise data management: it is comprehensive, unambiguous, and easy to understand.

 

So, what exactly is in gist?

First, gist includes types of things (classes) involved in running an Enterprise. Some of the more frequently used gist classes, grouped for ease of understanding, are:

Some of these classes have subclasses that are not shown. For example, an Intention could be a Goal or a Requirement.

Gist also includes properties that are used to describe things and to describe relationships between things. Many of the gist properties can be approximately grouped as above:

Other commonly used gist properties include:

Next, let’s look at a few typical graph patterns that illustrate how classes and properties work together to model the Enterprise world.

An Account might look like:

An Event might look like:

An ID such as a driver’s license might look like:

To explore gist in more detail, you can view it in an ontology editor such as Protégé. Try looking up the Classes and Properties in each group above (who, what, where, why, etc.). Join the gist Forum (select and scroll to the bottom) for regular discussion and updates.

Take a look at gist.  It’s worth your time, because adopting gist as your upper ontology can be a significant step toward reversing the proliferation of data siloes within your Enterprise.

Further reading and videos:

3-part video introduction to gist:

  1. https://www.youtube.com/watch?v=YbaDZSuhm54&t=123s
  2. https://www.youtube.com/watch?v=UzNVIErpGpQ&t=206s
  3. https://www.youtube.com/watch?v=2g0E6cFro18&t=14s

Software Wasteland, by Dave McComb

The Data-Centric Revolution, by Dave McComb

Demystifying OWL for the Enterprise, by Michael Uschold

 

Diagrams in this blog post were generated using a visualization tool.

Data-Centric Revolution: Is Knowledge Ontology the Missing Link?

“You would think that after knocking around in semantics and knowledge graphs for over two decades I’d have had a pretty good idea about Knowledge Management, but it turns out I didn’t.

I think in the rare event the term came up I internally conflated it with Knowledge Graphs and moved on. The first tap on the shoulder that I can remember was when we were promoting work on a mega project in Saudi Arabia (we didn’t get it, but this isn’t likely why). We were trying to pitch semantics and knowledge graphs as the unifying fiber for the smart city the Neom Line was to become.

In the process, we came across a short list of Certified Knowledge Management platforms they were considering. Consider my chagrin when I’d never heard of any of them. I can no longer find that list, but I’ve found several more since…”

Read the rest: Data Centric Revolution: Is Knowledge Ontology the Missing Link? – TDAN.com

Interested in joining the discussion? Join the gist Forum (Link to register here)

A Knowledge Graph for Mathematics

This blog post is for anyone interested in mathematics and knowledge representation as associated with career progression in today’s changing information eco-system. Mathematics and knowledge representation have a strong common thread; they both require finding good abstractions and simple, elegant solutions, and they both have a foundation in set theory. It could be used as the starting point for an accessible academic research project that deals with the foundation of mathematics and will also develop commercially marketable knowledge representation skills.

Hypothesis: Could the vast body of mathematical knowledge be put into a knowledge graph? Let’s explore, because doing so could provide a searchable data base of mathematical concepts and help identify previously unrecognized connections between concepts.

Every piece of data in a knowledge graph is a semantic triple of the form:

subject – predicate – object.

A brief look through mathematical documentation reveals the frequent appearance of semantic triples of the form:

A implies B, where A and B are statements.

“A implies B” is itself a statement, equivalent to “If A then B”. Definitions, axioms, and theorems can be stated using these if/then statements. The if/then statements build on each other, starting with a foundation of definitions and axioms (statements so fundamental they are made without proof). Furthermore, the predicate “implies” is transitive, meaning an “implies” relationship can be inferred from a chain of other “implies” relationships.

…. hence the possibility of programmatically discovering relationships between statements.

Before speculating further, let’s examine two examples from the field of point set topology, which deals abstractly with concepts like continuity, connectedness, and compactness.

Definition: a collection of sets T is a topology if and only if the following are true:

• the union of sets in any subcollection of T is a member of T
• the intersection of sets in any finite subcollection of T is a member of T.

Problem: Suppose there is a topology T and a set X that satisfies the following condition:

• for every member x of X there is a set Tx in T with x in Tx and Tx a subset of X.

Show that X is a member of T.

Here’s a diagram showing the condition stated in the problem, which holds for every x in X:

Perhaps you can already see what happens if we take the union of all of the Tx’s, one for each x in X.

In English, the solution to the problem is:

The union of all sets Tx is a subset of X because every Tx is a subset of X.

The union of all sets Tx contains X because there is a Tx containing x, for every x in X.

Based on the two statements above, the union of all sets Tx equals X because it is both a subset and a superset of X.

Finally, since every Tx belongs to T, the union of all sets Tx (which is X) is a member of T.

Let’s see how some of this might look in a knowledge graph. According to the definition of topology:

Applying this pattern to the problem above, we find:

While it may seem simple to recognize the sameness of the patterns on the left side of the two diagrams above, what precisely is it that makes the pattern in the problem match the pattern in the definition of topology? The definition applies because both left-hand statements conform to the same graph pattern:

This graph pattern consists of two triple patterns, each of which has the form:

[class of the subject] – predicate – [class or datatype of the object].

We now have the beginnings of a formal ontology based on triple patterns that we have encountered so far. Statements, including complex ones, can be represented using triples.

Note: in the Web Ontology Language, the properties hasSubject, hasPredicate, and hasObject will need to be annotation properties (they can be used in queries but will not be part of automated inference).

Major concepts can be represented as classes:

It’s generally good practice to use classes for major concepts, while using other methods such as categories to model other distinctions needed.

Other triple patterns we have seen describe a variety of relationships between sets and collections of sets, summarized as:

Could the vast body of mathematical knowledge be put into a knowledge graph? Certainly, a substantial amount of it, that which can be expressed as “A implies B”.

However, much remains to be done. For example, we have not looked at how to distinguish between a statement that is asserted to be true versus, for example, a statement that is part of an “if” clause.

Or imagine a math teacher on Monday saying “x + 3 = 7” and on Tuesday saying “x – 8 = 4”. In a knowledge graph, every thing has a unique permanent ID, so if x is 4 on Monday, it is still 4 on Tuesday. Perhaps there is a simple way to bridge the typical mathematical re-use of non-specific names like “x” and the knowledge graph requirement of unique IDs; finding it is left to the reader.

For a good challenge, try stating the Urysohn Lemma using triples, and see how much of its proof can be represented as triples and triple patterns.

To understand modeling options within the Web Ontology Language (OWL), I refer the reader to the book Demystifying OWL for the Enterprise by Michael Uschold. The serious investigator might also want to explore the semantics of rdf* since it explicitly deals with the semantics of statements.

Special thanks to Irina Filitovich for her insights and comments.

The ABCs of QUDT

This blog post is for anyone interested in understanding units of measure for the physical world.

The dominant standard for units of measure is the International System of Units, part of a collaborative effort that describes itself as:

Working together to promote and advance the global comparability of measurements.

While the International System of Units is defined in a document, QUDT has taken the next step and defined an ontology and a set of reference data that can be queried via a public SPARQL endpoint. QUDT provides a wonderful resource for data-centric efforts that involve quantitative data.

QUDT is an acronym for Quantities, Units, Dimensions, and Types. With 72 classes and 178 properties in its ontology, QUDT may at first appear daunting. In this note, we will use a few simple SPARQL queries to explore the QUDT graph. The main questions we will answer are:

  1. What units are applicable for a given measurable characteristic?
  2. How do I convert a value from one unit to another?
  3. How does QUDT support dimensional analysis?
  4. How can units be defined in terms of the International System of Units?

Let’s jump right in. Please follow along as a hands-on exercise. Pull up the QUDT web site at:

https://qudt.org/

On the right side of the QUDT home page select the link to the QUDT SPARQL Endpoint where we can run queries:

From the SPARQL endpoint, select the query option.

Question 1: What units are applicable for a given measurable characteristic?

First, let’s at the measurable characteristics defined in QUDT. Copy-paste this query into the SPARQL endpoint:

prefix qudt: <http://qudt.org/schema/qudt/>

prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix owl:  <http://www.w3.org/2002/07/owl#>

prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

select ?qk

where {?qk rdf:type qudt:QuantityKind . }

order by ?qk

 

QUDT calls the measurable characteristics QuantityKinds.

Note that there is a Filter box that lets us search the output.

Type “acceleration” into the Filter box and then select the first value, Acceleration, to get a new tab showing the properties of Acceleration. Voila, we get a list of units for measuring acceleration:

Now to get a complete answer to our first question, just add a line to the query:

prefix qudt: <http://qudt.org/schema/qudt/>

prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix owl:  <http://www.w3.org/2002/07/owl#>

prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

select ?qk ?unit

where {

?qk rdf:type qudt:QuantityKind ;

qudt:applicableUnit ?unit  ; # new line

.

}

order by ?qk ?unit

The output shows the units of measure for each QuantityKind.

Question 2: How do I convert a value from one unit to another?

Next, let’s look at how to do a unit conversion from feet to yards, with meter as an intermediary:

To convert from feet to meters, multiply by 0.3048. Then to convert from meters to yards, divide by 0.9144. Therefore, to convert from feet to yards, first multiply by 0.3048 and then divide by 0.9144. For example:

27 feet = 27 x (0.3048/0.9144) yards

= 9 yards

The 0.3048 and 0.9144 are in QUDT as the conversionMultipliers of foot and yard, respectively. You can see them with this query:

prefix qudt: <http://qudt.org/schema/qudt/>

prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix owl:  <http://www.w3.org/2002/07/owl#>

prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

select ?unit ?multiplier

where {

values ?unit {

<http://qudt.org/vocab/unit/FT>

<http://qudt.org/vocab/unit/YD> }

?unit  qudt:conversionMultiplier ?multiplier .

}

This example of conversionMultipliers answers our second question; to convert values from one unit of measure to another unit of measure, first multiply by the conversionMultiplier of the “from” unit and then divide by the conversionMultiplier of the “to” unit. [note: for temperatures, offsets are also needed]

Question 3: How does QUDT support dimensional analysis?

To answer our third question we will start with a simple example:

Force = mass x acceleration

In the following query, we retrieve the exponents of Mass, Acceleration, and Force to validate that Force does indeed equal Mass x Acceleration:

prefix qudt: <http://qudt.org/schema/qudt/>

prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix owl:  <http://www.w3.org/2002/07/owl#>

prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

select  ?qk ?dv ?exponentForMass ?exponentForLength ?exponentForTime

where {

values ?qk {

<http://qudt.org/vocab/quantitykind/Mass>

<http://qudt.org/vocab/quantitykind/Acceleration>

<http://qudt.org/vocab/quantitykind/Force> }

?qk  qudt:hasDimensionVector ?dv  .

?dv qudt:dimensionExponentForMass   ?exponentForMass  ;

qudt:dimensionExponentForLength ?exponentForLength ;

qudt:dimensionExponentForTime   ?exponentForTime ;

.

}

Recall that to multiply “like terms” with exponents, add the exponents, e.g.

length1 x length2 = length3

In the QUDT output, look at the columns for Mass, Length, and Time. Note that in each column the exponents associated with Mass and Acceleration add up to the exponent associated with Force, as expected.

Question 4: How can units be defined in terms of the International System of Units?

Finally, we want to see how QUDT can be used to define units in terms of the base units of the International System of Units as defined in the SI Brochure. We want to end up with equations like:

1 inch = 0.0254 meters

1 foot per second squared = 0.3048 meters per second squared

1 pound per cubic yard = 0.5932764212577829 kilograms per cubic meter

Delving deeper into QUDT, we see the concept of QuantityKindDimensionVector. Every unit and every quantity kind is related to one of these QuantityKindDimensionVectors.

Let’s unpack what that means by way of an example where we show the dimension vector A0E0L1I0M0H0T-2D0 means Length x Time-2 (linear acceleration):

Start with dimension vector: A0E0L1I0M0H0T-2D0

Each letter stands for a base dimension, and the vector can also be written as:

Amount0 x ElectricCurrent0 x Length1 x Intensity0 x Mass0 x Heat0 x Time-2 x Other0

Every term with an exponent of zero equals 1, so this expression can be reduced to:

Length x Time-2 (also known as Linear Acceleration)

The corresponding expression in terms of base units of the International System of Units is:

Meter x Second-2 (the standard unit for acceleration)

… which can also be written as:

meter per second squared

Using this example as a pattern, we can proceed to query QUDT to get an equation for each QUDT unit in terms of base units. To reduce the size of the query we will focus on mechanics, where the base dimensions are Mass, Length, and Time and the corresponding base units are kilogram, meter, and second.

Here is the query to create the equations we want; run it on the QUDT SPARQL Endpoint and see what you get:

prefix qudt: <http://qudt.org/schema/qudt/>

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix owl: <http://www.w3.org/2002/07/owl#>

prefix xsd: <http://www.w3.org/2001/XMLSchema#>

select distinct ?equation

where {

?unit rdf:type  qudt:Unit ;

qudt:conversionMultiplier ?multiplier ;

qudt:hasDimensionVector ?dv ;

rdfs:label ?unitLabel ;

.

?dv qudt:dimensionExponentForMass    ?expKilogram ;  # translate to units

qudt:dimensionExponentForLength  ?expMeter ;

qudt:dimensionExponentForTime    ?expSecond ;

rdfs:label ?dvLabel ;

.

filter(regex(str(?dv), “A0E0L.*I0M.*H0T.*D0”)) # mechanics

filter(!regex(str(?dv), “A0E0L0I0M0H0T0D0”))

filter(?multiplier > 0)

bind(str(?unitLabel) as ?unitString)

# to form a label for the unit:

#    put positive terms first

#    omit zero-exponent terms

#    change exponents to words

bind(if(?expKilogram > 0, concat(“_kilogram_”, str(?expKilogram)), “”) as ?SiUnitTerm4)

bind(if(?expMeter    > 0, concat(“_meter_”,    str(?expMeter)),    “”) as ?SiUnitTerm5)

bind(if(?expSecond   > 0, concat(“_second_”,   str(?expSecond)),   “”) as ?SiUnitTerm7)

bind(if(?expKilogram < 0, concat(“_kilogram_”, str(-1 * ?expKilogram)), “”) as ?SiUnitTerm104)

bind(if(?expMeter    < 0, concat(“_meter_”,    str(-1 * ?expMeter)),    “”) as ?SiUnitTerm105)

bind(if(?expSecond   < 0, concat(“_second_”,   str(-1 * ?expSecond)),   “”) as ?SiUnitTerm107)

bind(concat(?SiUnitTerm4,   ?SiUnitTerm5,   ?SiUnitTerm7)   as ?part1)

bind(concat(?SiUnitTerm104, ?SiUnitTerm105, ?SiUnitTerm107) as ?part2)

bind(if(?part2 = “”, ?part1,

if(?part1 = “”, concat(“per”,?part2),

concat(?part1, “_per”, ?part2))) as ?SiUnitString1)

bind(replace(?SiUnitString1, “_1_|_1$”, “_”)             as ?SiUnitString2)

bind(replace(?SiUnitString2, “_2_|_2$”, “Squared_”)      as ?SiUnitString3)

bind(replace(?SiUnitString3, “_3_|_3$”, “Cubed_”)        as ?SiUnitString4)

bind(replace(?SiUnitString4, “_4_|_4$”, “ToTheFourth_”)  as ?SiUnitString5)

bind(replace(?SiUnitString5, “_5_|_5$”, “ToTheFifth_”)   as ?SiUnitString6)

bind(replace(?SiUnitString6, “_6_|_6$”, “ToTheSixth_”)   as ?SiUnitString7)

bind(replace(?SiUnitString7, “_7_|_7$”, “ToTheSeventh_”) as ?SiUnitString8)

bind(replace(?SiUnitString8, “_8_|_8$”, “ToTheEighth_”)  as ?SiUnitString9)

bind(replace(?SiUnitString9, “_9_|_9$”, “ToTheNinth_”)   as ?SiUnitString10)

bind(replace(?SiUnitString10, “_10_|_10$”,”ToTheTenth_”)  as ?SiUnitString11)

bind(replace(?SiUnitString11,  “^_”,  “”)  as ?SiUnitString12) # tidy up

bind(replace(?SiUnitString12,  “_$”,  “”)  as ?SiUnitString13)

bind(?SiUnitString13 as ?SiUnitLabel)

bind(concat(“1 “, str(?unitLabel), ” = “, str(?multiplier), ”  “,   ?SiUnitLabel) as ?equation)

}

order by ?equation

The result of this query is a set of equations that tie more than 1200 units back to the base units of the International System of Units, which in turn are defined in terms of seven fundamental physical constants.

And that’s a wrap. We answered all four questions with only 3 QUDT classes and 6 QUDT properties:

  1. What units are applicable for a given measurable characteristic?
  2. How do I convert a value from one unit to another?
  3. How does QUDT support dimensional analysis?
  4. How can units be defined in terms of the International System of Units?

For future reference, here’s a map of the territory we explored:

One final note: kudos to everyone who contributed to QUDT; it has a lot of great information in one place. Thank you!

The Data-Centric Revolution: Zero Copy Integration

I love the term “Zero Copy Integration.” I didn’t come up with it, it was the Data Collaboration Alliance, that came up with that one. The Data Collaboration Alliance is a Canadian based advocacy group promoting localized control of data along with federated access.

What I like about the term is how evocative it is. Everyone knows that all integration consists of copying and transforming data. Whether you do that through an API or through ETL (Extract Transform and Load) or Data Lake style ELT (Extract Load and leave it to someone else to maybe eventually Transform). Either way, we know from decades of experience that integration is at its core copying data from a source to a destination.

This is why “copy-less copying” is so evocative.  It forces you to rethink your baseline assumptions.

We like it because it describes what we’ve been doing for years, and never had a name for. In this article, I’m going to drill a bit deeper into the enabling technology (i.e., what do you need to have in place to get Zero Copy Integration to work), then do a case study, and finally wrap up with “do you literally mean zero copy?”

 

Read more at: The Data-Centric Revolution: Zero Copy Integration – TDAN.com