Investment Bank: Economic Architecture

Investment Bank: Economic Architecture

We worked with a large investment bank who are embarking on a series of projects to further automate their back office. One of their first tasks was to understand in greater detail what all the 5,000 people in the back office were doing. They built an “Economic  Architecture” that was essentially the equivalent of a continually running Activity Based  Costing project.  

They asked managers to estimate the percentage of time each of their reports spent on a standard list of activities. However the activity list was not stabilizing, and many managers  had difficulty deciding which of the many activities they should use. As this was slated to  eventually become part of the reporting and perhaps eventually the charge back to the  front office for the activities performed to settle some of these very complex instruments. 

We were called in to create a rational basis for the activity taxonomy. We ended up  decomposing the working list of 800 or so activities into a set of orthogonal facets. What was fascinating was that the facets were far simpler than the long complex list of activities.  Once someone knew the facets (such as financial product, market, as well as a simple set of verbs and modifiers), they would know what all the activities were, as they were just concatenations of the facets.  

More interestingly we discovered as we performed this that the facets provided a level of categorization that it would be possible to instrument in the workflow and source systems.  The list of 800 activities was too arbitrary to allow for automation, but the facets were closely aligned with primitive concepts found in most systems. 

We completed the redefinition and got agreement on the new activities. The new activities are in production, and they are looking at applying this concept beyond the back-office operations.

Contact Us: 

Overcome integration debt with proven semantic solutions. 

Contact Semantic Arts, the experts in data-centric transformation, today! 

CONTACT US HERE 

Address: Semantic Arts, Inc. 

123 N College Avenue Suite 218 

Fort Collins, CO 80524 

Email: [email protected] 

Phone: (970) 490-2224

Washington State: SOA Design and Ontology 

Washington State: SOA Design and Ontology 

In our initial engagement, we did a rapid but detailed review of 200 applications, interfaces,  current initiatives, long-range plan, and a new system being proposed. We found several  areas where they could leverage work in progress to speed up their new project initiative,  and several areas where, with a slight change in scope and priority, the new initiatives  would actually reduce the amount of redundancy and inconsistency.  

We helped them build a high-fidelity depiction of their current “as-is” state. The content from an existing, unread 400-page report was rendered, and massively updated, to a very large graphic of the as-is condition. We then worked with them to define their long-term  SOA architecture with shared services.

Contact Us: 

Overcome integration debt with proven semantic solutions. 

Contact Semantic Arts, the experts in data-centric transformation, today! 

CONTACT US HERE 

Address: Semantic Arts, Inc. 

123 N College Avenue Suite 218 

Fort Collins, CO 80524 

Email: [email protected] 

Phone: (970) 490-2224

Ontology Consultant Job Description

Ontology Consultant Job Description

As a Semantic Arts ontologist, you will be essential in fixing the tangled mess of information in  client enterprise systems and promoting a world where enterprise information is widely  understood and easily accessed by all who have permission. Come work with the best in the  business on interesting projects with global leaders! 

Working together with other ontology consultants, you will take existing design artifacts and  work with subject matter experts to convert models to formal semantic expressions for clients.  We work with a diverse set of clients of all sizes and across industries. Therefore, you can expect a variety of work across many domains of knowledge. We have a strong sense of team, with no rigid hierarchy and place a high value on individual input. 

Requirements: 

  • A passion for information and knowledge modeling 
  • Must be trained in ontological development, either through formal training or on-the-job development. 
  • Should have experience in data modeling or related analytical skills. 
  • Strong interpersonal communication skills, experience managing client, stakeholder, and internal interactions. 
  • Experience in OWL, RDF, SPARQL and the ability to program against triplestores
  • A desire to learn new domains of knowledge in a fast-paced environment.
  • Bachelor’s degree in Computer Science, Information Systems, Knowledge Management,  Engineering, Philosophy, Business, or similar. 

Nice to Have: 

  • Prior use or understanding of W3C semantic web standards 
  • Advanced academic degree preferred. 

About Us: 

Promoting a vision of Data-Centric Architecture for more than 20 years, people are catching on!  We have been awarded the 2022 ″Colorado Companies to Watch”, 2022 “Top 30 Innovators of the Year”, 2021 “30 Innovators to Watch”, and 2020 “30 Best Small Companies to Watch”.  Semantic Arts is growing quickly and expanding our domains, projects, and roles. We have assembled what might be the largest team of individuals passionately dedicated to this task,  making Semantic Arts a great place to develop skills and grow professionally in this exciting field. 

What We Offer: 

  • Remote Position, with travel for onsite work with clients required (up to 3 days every 3  weeks) 
  • Professional development fund to develop skills, attend conferences, and advance your career. 
  • Medical, Dental, and Vision Benefits 
  • SIMPLE IRA with company match 
  • Student Loan Reimbursement
  • Annual Bonus Potential 
  • Equipment Purchase Assistance 
  • Employee Assistance Program 

Employment Type: 

Full-time 

Authorization: 

Candidates must be authorized to work for any employer within the US, UK, or Canada. We are not currently able to sponsor visas or hire outside of those countries. 

Compensation: 

Compensation for this position varies based on experience, billable utilization, and other factors.  Entry-level ontologists start around $70,000 USD annually and generally rise quickly, with the overall average being approximately $150,000 USD, and about 1/3 of consultants averaging more than $175,000 USD. More details shared during the interview process. 

Semantic Arts is committed to the full inclusion of all qualified individuals. In keeping with our commitment, we will take steps to assure that people with disabilities are provided reasonable accommodations. Accordingly, if a reasonable accommodation is required to fully participate in  the job application or interview process, to perform the essential duties of the position, and/or to  receive all other benefits and privileges of employment, please contact our HR representative at  [email protected]

Semantic Arts is an Equal Opportunity Employer. We respect and seek to empower each  individual and support the diverse cultures, perspectives, skills, and experiences within our  workforce. We support an inclusive workplace where employees excel based on merit,  qualifications, experience, ability, and job performance.

Amgen: Data Centric Architecture

Amgen: Data Centric Architecture

Amgen is a large biotechnology company committed to unlocking the potential of  biology for patients suffering from serious illnesses by discovering, developing,  manufacturing, and delivering innovative human therapeutics. Amgen, CEO Bob Bradway focuses on innovation to set the cultural direction. According to Bradway: “Push the  boundaries of biotechnology and knowledge to be part of the process of changing the practice  of medicine.”  

Amgen’s goal is to provide life-changing value to patients with expediency.  Democratized access to enterprise data speeds the process from drug discovery to drug delivery. One element Amgen’s strategic data leadership agreed upon is that a common language expedites product development by removing ambiguities that slow business processes.  

Data capture comes from a multitude of information systems, each using their own data model and unique vocabularies. Different systems use different terminology to refer to the same concept. An organization steeped in data silos no longer works. The challenge is to provide a common intuitive model for all systems and people to use. Once such a model is in place, it is no longer laborious and expensive for enterprise consumers to benefit from the data. A decision to establish a semantic layer for building an enterprise data fabric emerged.  

Amgen developed a vision of a Data-Centric Architecture (DCA) that transforms data from being system-specific to being universally available. Data is organized and unambiguously represented in data domains within a Semantic layer. 

Extending an Upper-Level Ontology

Extending an Upper-Level Ontology 

If you have been following my blogs over the past year or so, then you will know I am a big  fan of adopting an upper-level ontology to help bootstrap your own bespoke ontology  project. Of the available upper-level ontologies I happen to like gist as it embraces a “less is more” philosophy. 

Given that this is 3rd party software with its own lifecycle, how does one “merge” such an upper ontology with your own? Like most things in life, there are two primary ways. 

CLONE MODEL 

This approach is straightforward: simply clone the upper ontology and then modify/extend it directly as if it were your own (being sure to retain any copyright notice). The assumption  here is that you will change the “gist” domain into something else like “mydomain”. The  benefit is that you don’t have to risk any 3rd party updates affecting your project down the  road. The downside is that you lose out on the latest enhancements/improvements over time, which if you wish to adopt, would require you to manually re-factor into your own  ontology. 

As the inventors of gist have many dozens of man-years of hands-on experience with  developing and implementing ontologies for dozens of enterprise customers, this is not an  approach I would recommend for most projects. 

EXTEND MODEL 

Just as when you extend any 3rd party software library you do so in your own namespace,  you should also extend an upper-level ontology in your own namespace. This involves just a  couple of simple steps: 

First, declare your own namespace as an owl ontology, then import the 3rd party upper-level  ontology (e.g. gist) into that ontology. Something along the lines of this: 

<https://ont.mydomain.com/core>  

 a owl:Ontology ; 

 owl:imports <https://ontologies.semanticarts.com/o/gistCore11.0.0> ;  .

Second, define your “extended” classes and properties, referencing appropriate gist  subclasses, subproperties, domains, and/or range assertions as needed. A few samples  shown below (where “my” is the prefix for your ontology domain): 

my:isFriendOf  

 a owl:ObjectProperty ; 

 rdfs:domain gist:Person ; 

my:Parent  

 a owl:Class ; 

 rdfs:subClassOf gist:Person ; 

my:firstName  

 a owl:DatatypeProperty ; 

 rdfs:subPropertyOf gist:name ; 

The above definitions would allow you to update to new versions of the upper-level  ontology* without losing any of your extensions. Simple right? 

*When a 3rd party upgrades the upper-level ontology to a new major version — defined as non backward compatible — you may find changes that need to be made to your extension ontology;  as a hypothetical example, if Semantic Arts decided to remove the class gist:Person, the assertions  made above would no longer be compatible. Fortunately, when it comes to major updates  Semantic Arts has consistently provided a set of migration scripts which assist with updating your  extended ontology as well as your instance data. Other 3rd parties may or may not follow suit.

Understanding the Graph Center of Excellence

Understanding the Graph Center of  Excellence 

The Knowledge Management community has gotten good at extracting and curating  knowledge. 

There is a confluence of activity – including generative AI models, digital twins and shared ledger capabilities that are having a profound impact on enterprises. Recent research by analysts at Gartner places contextualized information and graph technologies at the center of their impact radar for emerging technologies. This recognition of the importance of these critical enablers to define, contextualize and constrain data for consistency and trust is all part of the maturity process for today’s enterprise. It also is beginning to shine light on the emergence of the Graph Center of Excellence (CoE) as an important contributor to achieving strategic objectives.  

For companies who are ready to make the leap from being applications centric to data centric – and for companies that have successfully deployed single-purpose graphs in business silos – the CoE can become the foundation for ensuring data quality and reusability. Instead of transforming data for each new viewpoint or application, the data is stored once in a machine-readable format that retains the original context, connections and meaning that can be used for any purpose.  

And now that you have demonstrated value from your initial (lighthouse) project, the  pathway to progress primarily centers on the investment in people. The goal at this stage of development is to build a scalable and resilient semantic graph as a data hub for all business-driven use cases. This is where building a Graph CoE becomes a critical asset because the journey to efficiency and enhanced capability must be guided.  

Along with the establishment of a Graph CoE, enterprises should focus on the creation of  a “use case tree” or “business capability model” to identify where the data in the graph  can be extended. This is designed to identify business priorities and must be aligned with the data from initial use cases. The objective is to create a reusable architectural framework and a roadmap to deliver incremental value and capitalize on the benefits of content reusability. Breakthrough progress comes from having dedicated resources for the design, construction and support of the foundational knowledge graph.  

The Graph CoE would most logically be an extension of the Office of Data Management and the domain of the Chief Data Officer. It is a strategic initiative that focuses on the adoption of semantic standards and the deployment of knowledge graphs across the enterprise. The goal is to establish best practices, implement governance and provide expertise in the development and use of the knowledge graph. Think of it as both the 

hub of graph activities within your organization and the mechanism to influence organizational culture.  

Some of the key elements of the Graph CoE include: 

• Information Literacy: A Graph CoE is the best approach to ensure organizational understanding of the root causes and liabilities resulting from technology fragmentation and misalignment of data across repositories. It is the organizational advocate for new approaches to data management. The message for all senior executive stakeholders is to both understand the causes of the data dilemma and recognize that properly managed data is an achievable objective.  Information literacy and cognition about the data pathway forward is worthy of being elevated as a ‘top-of-the-house’ priority.  

• Organizational Strategy: One of the fundamental tasks of the Graph CoE is to define the overall strategy for leveraging knowledge graphs within the organization. This includes defining the underlying drivers (i.e., cost containment,  process automation, flexible query, regulatory compliance, governance  simplification) and prioritizing use cases (i.e., data integration, digitalization,  enterprise search, lineage traceability, cybersecurity, access control). The opportunities exist when you gain trust across stakeholders that there is a path to ensure that data is true to original intent, defined at a granular level and in a format that is traceable, testable and flexible to use. 

• Data Governance: The Graph CoE is responsible for establishing data policies and standards to ensure that the semantic layer is built using wise engineering principles that emphasize simplicity and reusability. When combining resolvable identity with precise meaning, quality validation and data lineage – governance shifts away from manual reconciliation. With a knowledge graph at the foundation, organizations can create a connected inventory of what data exists,  how it is classified, where it resides, who is responsible, how it is used and how it moves across systems. This changes the governance operating model – by simplifying and automating it. 

• Knowledge Graph Development: The Graph CoE should lead the development of each of the knowledge graph components. This includes working with subject matter experts to prioritize business objectives and build use case relationships.  Building data and knowledge models, data onboarding, ontology development,  source-to-target mapping, identity and meaning resolution and testing are all areas of activity to address. One of the critical components is the user experience and data extraction capabilities. Tools should be easy to use and help teams do  their job faster and better. Remember, people have an emotional connection to the way they work. Win them over with visualization. Invest in the user interface.  Let them gain hands-on experience using the graph. The goal should be to create value without really caring what is being used at the backend. 

• Cross-Functional Collaboration: The pathway to success starts with the clear and visible articulation of support by executive management. It is both essential and meaningful because it drives organizational priorities. The lynchpin, however,  involves cooperation and interaction among teams from related departments to deploy and leverage the graph capabilities most effectively. Domain experts from  technology are required to provide the building blocks for developing  applications and services that leverage the graph. Business users identify and prioritize use cases to ensure the graph addresses their evolving requirements.  Governance policies need to be aligned with insights from data stewards and compliance officers. Managing the collaboration is essential for orchestrating the  successful shift from applications-centric to data-centric across the enterprise.  

After successfully navigating the initial stages of your project, the onward pathway to  progress should focus on the development of the team of involved stakeholders. The first  hurdle is to expand the identity of data owners who know the location and health of the  data. Much of this is about organizational dynamics and understanding who the players are, who is trusted, who is feared, who elicits cooperation and who is out to kill the activity.  

This coincides with the development of an action plan and the assembly of the team of skilled practitioners needed to ensure success. Enterprises will need an experienced architect who understands the workings of semantic technologies and knowledge graphs to lead the team. The CoE will need ontologists to engineer content and manage the mapping of data. Knowledge graph engineers are needed to coordinate the meaning of data, knowledge and content models. This will also require a project manager to be an advocate for the team and the development process.  

And a final note, organizations working on their AI readiness must understand it requires being ready from the perspective of people, technology and data. The AI-ready data component means incorporating context with the data. Gartner points this out by noting that it necessitates a shift from the traditional ETL mindset to a new ECL (extract,  contextualize and load) orientation. This ensures meaningful data connections. Gartner advises enterprises to leverage semantic metadata as the core for facilitating data connections.  

The Graph CoE is an important step in transforming your lighthouse project or silo deployment into a true enterprise platform. A well-structured CoE should be viewed as a driver of innovation and agility within the enterprise that facilitates better data integration, improves operational efficiency, contextualizes AI and enhances the user experience. It is the catalyst for building organizational capabilities for long-term strategic advantage and one of the key steps in the digital transformation journey. 

DCA Forum Recap: Forrest Hare, Summit Knowledge Solutions

A knowledge model for explainable military AI

Forrest Hare, Founder of Summit Knowledge Solutions, is a retired US Air Force targeting and information operations officer who now works with the Defense Intelligence Agency (DIA). His experience includes integrating intelligence from different types of communications, signals, imagery, open source, telemetry, and other sources into a cohesive and actionable whole.

Hare became aware of semantics technology while at SAIC and is currently focused on building a space + time ontology called the DIA Knowledge Model so that Defense Department intelligence could use it to contextualize these multi-source inputs.

The question becomes, how do you bring objects that don’t move and objects that do move into the same information frame with a unified context? The information is currently organized by collectors and producers.

The object-based intelligence that does exist involves things that don’t move at all.  Facilities, for example, or humans using phones that are present on a communications network are more or less static. But what about the things in between such as trucks that are only intermittently present?

Only sparse information is available about these. How do you know the truck that was there yesterday in an image is the same truck that is there today? Not to mention the potential hostile forces who own the truck that have a strong incentive to hide it.

Objects in object-based intelligence not only include these kinds of assets, but also events and locations that you want to collect information about. In an entity-relationship sense, objects are entities.

Hare’s DIA Knowledge Model uses the ISO standard Basic Formal Ontology (BFO) to unify domains so that the information from different sources is logically connected and therefore makes sense as part of a larger whole. BFO’s maintainers (Director Barry Smith and team at the National Center for Ontological Research (NCOR) at the University of Buffalo) keep the ontology strictly limited to 30 or so classes.

The spatial-temporal regions of the Knowledge Model are what’s essential to do the kinds of dynamic, unfolding object tracking that’s been missing from object-based intelligence. Hare gave the example of a “site” (an immaterial entity) from a BFO perspective. A strict geolocational definition of “site” makes it possible for both humans and machines to make sense of the data about sites. Otherwise, Hare says, “The computer has no idea how to understand what’s in our databases, and that’s why it’s a dumpster fire.”

This kind of mutual human and machine understanding is a major rationale behind explainable AI. A commander briefed by an intelligence team must know why the team came to the conclusions it did. The stakes are obviously high. “From a national security perspective, it’s extremely important for AI to be explainable,” Hare reminded the audience. Black boxes such as ChatGPT as currently designed can’t effectively answer the commander’s question on how the intel team arrived at the conclusions it did.

Finally, the level of explain-ability knowledge models like the DIA’s becomes even more critical as information flows into the Joint Intelligence Operations Center (JIOC). Furthermore, the various branches of the US Armed Forces must supply and continually update a Common Intelligence Picture that’s actionable by the US President, who’s the Commander in Chief for the military as a whole.

Without this conceptual and spatial-temporal alignment across all service branches, joint operations can’t proceed as efficiently and effectively as they should.  Certainly, the risk of failure looms much larger as a result.

Contributed by Alan Morrison

Financial Data Transparency Act “PitchFest”

The Data Foundation (Data Foundation PitchFest) hosted at PitchFest on “Unlocking the vision of the Financial Data Transparency Act” a few days ago. Selected speakers were given 10 minutes to bring their best ideas on how to use the improved financial regulatory information and data.

The Financial Data Transparency Act is a new piece of legislation directly affecting the financial services industry. In short, it directs financial regulators to harmonize data collections and move to machine (and people) readable forms. The goal is to reduce the burdens of compliance on regulated industries, increase the ability to analyze data, and to enhance overall transparency.

Two members of our team, Michael Atkin and Dalia Dahleh were given the opportunity to present. Below is the text from Michael Atkin’s pitch:

  1. Background – Just to set the stage. I’ve been fortunate to have been in the position as scribe, analyst, advocate and organizer for data management since 1985.  I’ve always been a neutral facilitator – allowing me to sit on all sides of the data management issue all over the world – from data provider to data consumer to market authority to regulator.  I’ve helped create maturity models outlining best practice – performed benchmarking to measure progress – documented the business case – and created and taught the Principles of Data Management at Columbia University.  I’ve also served on the SEC’s Market Data Advisory Committee, the CFTC’s Technical Advisory Committee and as the Chair of the Data Subcommittee of the OFR’s Financial Research Advisory activity during the financial crisis of 2008.  So, I have some perspective on the challenges the regulators face and the value of the FDTA.
  2. Conclusion (slide 2) – My conclusions after all that exposure are simple. There is a real data dilemma for many entities.  The dilemma is caused by fragmentation of technology.  It’s nobody’s fault.  We have business and operational silos.  They are created using proprietary software.  The same things are modeled differently based on the whim of the architects, the focus of the applications and the nuances of the technical solution.This fragmentation creates “data incongruence” – where the meaning of data from one repository doesn’t match other repositories.  We have the same words, with different meanings.  We have the same meaning using different words.  And we have nuances that get lost in translation.  As a result, we spend countless effort and money moving data around, reconciling meaning and doing mapping.  As one of my banking clients said … “My projects end up as expensive death marches of data cleansing and manipulation just to make the software work.”  And we do this over and over ad infinitum.Not only do we suffer from data incongruence – we suffer from the limitations of relational technology that still dominates our way of processing data.  For the record, relational technology is over 50 years old.  It was (and is) great for computation and structured data.  It’s not good for ad hoc inquiry and scenario-based analysis.  The truth is that data has become isolated and mismatched across repositories due to technology fragmentation and the rigidity of the relational paradigm.  Enterprises (including government enterprises) often have thousands of business and data silos – each based on proprietary data models that are hard to identify and even harder to change.  I refer to this as the bad data tax.  It costs most organizations somewhere around 40-60% of their IT budget to address.  So, let’s recognize that this is a real liability.  One that diverts resources from business goals, extends time-to-value for analysts, and leads to knowledge worker frustration.  The new task before FSOC leadership and the FDTA is now about fixing the data itself.
  3. Solution (slide 3) – The good news is that the solution to this data dilemma is actually quite simple and twofold in nature. First – adopt the principles of good data hygiene.  And on that front, there appears to be good progress thanks to efforts around the Federal Data Strategy and things related to BCBS 239 and the Open Government Data Act.  But governance alone will not solve the data dilemma.The second thing that is required is to adopt data standards that were specifically designed to address the problems of technology fragmentation.  And these open data web-based standards are quite mature.  They include the Internationalized Resource Identifier (or IRI) for identity resolution.  The use of ontologies – that enable us to model simple facts and relationship facts.  And the expression of these things in standards like RDF for ontologies, OWL for inferencing and SHACL for business rules.From these standards you get a bunch of capabilities.  You get quality by math (because the ontology ensures precision of meaning).  You get reusability (which eliminates the problem of hard coded assumptions and the problem of doing the same thing in slightly different ways).  You get access control (because the rules are embedded into the data and not constrained by systems or administrative complexity).  You get lineage traceability (because everything is linked to a single identifier so that data can be traced as it flows across systems).  And you get good governance (since these standards use resolvable identity, precise meaning and lineage traceability to shift governance from people-intensive data reconciliation to more automated data applications).
  4. FDTA (slide 4) – Another important component is that this is happening at the right time. I see the FDTA as the next step in a line of initiatives seeking to modernize regulatory reporting and reduce risk.  I’ve witnessed the efforts to move to T+1 (to address the clearing and settlement challenge).  I’ve seen the recognition of global interdependencies (with the fallout from Long Term Capital, Enron and the problems of derivatives in Orange County).  We’ve seen the problems of identity resolution that led to KYC and AML requirements.  And I was actively involved in understanding the data challenges of systemic risk with the credit crisis of 2008.The problem with all these regulatory activities is that most of them are not about fixing the data.  Yes, we did get LEI and data governance.  Those are great things, but far from what is required to address the data dilemma.  I also applaud the adoption of XBRL (and the concept of data tagging).  I like the XBRL taxonomies (as well as the Eurofiling regulatory taxonomies) – but they are designed vertically report-by-report with a limited capability for linking things together.  Not only that, most entities are just extracting XBRL into their relational environments that does little to address the problem of structural rigidity.  The good news is that all the work that has gone into the adoption of XBRL is able to be leveraged.  XML is good for data transfer.  Taxonomies are good for unraveling concepts and tagging.  And the shift from XML to RDF is straightforward and would not affect those who are currently reporting using XBRL.One final note before I make our pitch.  Let’s recognize that XBRL is not the way the banks are managing their internal data infrastructures.  They suffer from the same dilemmas as the regulators and almost every G-SIB and D-SIB I know is moving toward semantic standards.  Because even though FDTA is about the FSOC agencies – it will ultimately affect the financial institutions.  I see this as an opportunity for collaboration between regulators and the regulated, in building the infrastructure for the digital world.
  5. Proposal (slide 5) – Semantic Arts is proposing a pilot project to implement the foundational infrastructure of precise data about financial instruments (including identification, classification, descriptive elements and corporate actions), legal entities (including entity types as well as information about ownership and control), obligations (associated with issuance, trading, clearing and settlement), and holdings about the portfolios of the regulated entities. These are the building blocks of linked risk analysis.To implement this initiative, we are proposing you start with a single simple model of the information from one of the covered agencies.  The Initial project would focus on defining the enterprise model and conforming two to three key data sets to the model.  The resulting model would be hosted on a graph database.  Subsequent projects would involve expanding the footprint of data domains to be added to the graph, and gradually building functionality to begin to reverse the legacy creation process.We would initiate things by leveraging the open standard upper ontology (GIST) from Semantic Arts as well as the work of the Financial Industry Business Ontology (from the EDM Council) and any other vetted ontology like the one OFR is building for CFI.Semantic Arts has a philosophy of “think big” (like cross-agency interoperability) but “start small” (like a business domain of one of the agencies).  The value of adopting semantic standards is threefold – and can be measured using the “three C’s” of metrics.  The first C is cost containment starting with data integration and includes areas focused on business process automation and consolidation of redundant systems (best known as technical agility).  The second C is capability enhancement for analysis of the degrees of interconnectedness, the nature of transitive relationships, state contingent cash flow, collateral flow, guarantee and transmission of risk.  The final C is implementation of the control environment focused on tracking data flow, protecting sensitive information, preventing unwanted outcomes, managing access and ensuring privacy.
  6. Final Word (contact) – Just a final word to leave you with. Adopting these semantic standards can be accomplished at a fraction of the cost of what you spend each year supporting the vast cottage industry of data integration workarounds.  The pathway forward doesn’t require ripping everything out but instead building a semantic “graph” layer across data to connect the dots and restore context.  This is what we do.  Thank you.

Link to Slide Deck

DCA Forum Recap: Jans Aasman, Franz

How a “user” knowledge graph can help change data culture

Identity and Access Management (IAM) has had the same problem since Fernando Corbató of MIT first dreamed up the idea of digital passwords in 1960: opacity. Identity in the physical world is rich and well-articulated, with a wealth of different ways to verify information on individual humans and devices. By contrast, the digital realm has been identity data impoverished, cryptic and inflexible for over 60 years now.

Jans Aasman, CEO of Franz, provider of the entity-event knowledge graph solution Allegrograph, envisions a “user” knowledge graph as a flexible and more manageable data-centric solution to the IAM challenge. He presented on the topic at this past summer’s Data-Centric Architecture Forum, which Semantic Arts hosted near its headquarters in Fort Collins, Colorado.

Consider the specificity of a semantic graph and how it could facilitate secure access control. Knowledge graphs constructed of subject-predicate-object triples make it possible to set rules and filters in an articulated and yet straightforward manner.  Information about individuals that’s been collected for other HR purposes could enable this more precise filtering.

For example, Jans could disallow others’ access to a triple that connects “Jans” and “salary”. Or he could disallow access to certain predicates.

Identity and access management vendors call this method Attribute-Based Access Control (ABAC). Attributes include many different characteristics of users and what they interact with, which is inherently more flexible than role-based access control (RBAC).

Cell-level control is also possible, but as Forrest Hare of Summit Knowledge Solutions points out, such security doesn’t make a lot of sense, given how much meaning is absent in cells controlled in isolation. “What’s the classification of the number 7?” He asked. Without more context, it seems silly to control cells that are just storing numbers or individual letters, for example.

Simplifying identity management with a knowledge graph approach

Graph databases can simplify various aspects of the process of identity management. Let’s take Lightweight Directory Access Protocol, or LDAP, for example.

This vendor-agnostic protocol has been around for 30 years, but it’s still popular with enterprises. It’s a pre-web, post-internet hierarchical directory service and authentication protocol.

“Think of LDAP as a gigantic, virtual telephone book,” suggests access control management vendor Foxpass. Foxpass offers a dashboard-based LDAP management product which it claims is much easier to manage than OpenLDAP.

If companies don’t use LDAP, they might as well use Microsoft’s Active Directory, which is a broader, database-oriented identity and access management product that covers more of the same bases. Microsoft bundles AD with its Server and Exchange products, a means of lock-in that has been quite effective. Lock-in, obviously, inhibits innovation in general.

Consider the whole of identity management as it exists today and how limiting it has been. How could enterprises embark on the journey of using a graph database-oriented approach as an alternative to application-centric IAM software? The first step involves the creation of a “user” knowledge graph.

Access control data duplication and fragmentation

Semantic Arts CEO Dave McComb in his book Software Wasteland estimated that 90 percent of data is duplicated. Application-centric architectures in use since the days of mainframes have led to user data sprawl. Part of the reason there is such a duplication of user data is that authentication, authorization, and access control (AAA) methods require more bits of personally identifiable information (PII) be shared with central repositories for AAA purposes.

B2C companies are particularly prone to hoovering up these additional bits of PII lately and storing that sensitive info in centralized repositories. Those repositories become one-stop shops for identity thieves. Customers who want to pay online have to enter bank routing numbers and personal account numbers. As a result, there’s even more duplicate PII sprawl.

One of the reasons a “user” knowledge graph (and a knowledge graph enterprise foundation) could be innovative is that enterprises who adopt such an approach can move closer to zero-copy integration architectures. Model-driven development of the type that knowledge graphs enable assumes and encourages shared data and logic.

A “user” graph coupled with project management data could reuse the same enabling entities and relationships repeatedly for different purposes. The model-driven development approach thus incentivizes organic data management.

The challenge of harnessing relationship-rich data

Jans points out that enterprises, for example, run massive email systems that could be tapped to analyze project data for optimization purposes. And disambiguation by unique email address across the enterprise can be a starting point for all sorts of useful applications.

Most enterprises don’t apply unique email address disambiguation, but Franz has a pharma company client that does, an exception that proves the rule. Email continues to be an untapped resource in many organizations precisely because it’s a treasure trove of relationship data.

Problematic data farming realities: A social media example

Relationship data involving humans is sensitive by definition, but the reuse potential of sensitive data is too important to ignore. Organizations do need to interact with individuals online, and vice versa.

Former US Federal Bureau of Investigation (FBI) counterintelligence agent Peter Strzok quoted from Deadline: White House, an MSNBC program in the US aired on August 16:

“I’ve served I don’t know how many search warrants on Twitter (now known as X) over the years in investigations. We need to put our investigator’s hat on and talk about tradecraft a little bit. Twitter gathers a lot of information. They just don’t have your tweets. They have your draft tweets. In some cases, they have deleted tweets. They have DMs that people have sent you, which are not encrypted. They have your draft DMs, the IP address from which you logged on to the account at the time, sometimes the location at which you accessed the account and other applications that are associated with your Twitter account, amongst other data.” 

X and most other social media platforms, not to mention law enforcement agencies such as the FBI, obviously care a whole lot about data. Collecting, saving, and allowing access to data from hundreds of millions of users in such a broad, comprehensive fashion is essential for X. At least from a data utilization perspective, what they’ve done makes sense.

Contrast these social media platforms with the way enterprises collect and handle their own data. That collection and management effort is function- rather than human-centric. With social media, the human is the product.

So why is a social media platform’s culture different? Because with public social media, broad, relationship-rich data sharing had to come first. Users learned first-hand what the privacy tradeoffs were, and that kind of sharing capability was designed into the architecture. The ability to share and reuse social media data for many purposes implies the need to manage the data and its accessibility in an elaborate way. Email, by contrast, is a much older technology that was not originally intended for multi-purpose reuse.

Why can organizations like the FBI successfully serve search warrants on data from data farming companies? Because social media started with a broad data sharing assumption and forced a change in the data sharing culture. Then came adoption. Then law enforcement stepped in and argued effectively for its own access.

Broadly reused and shared, web data about users is clearly more useful than siloed data. Shared data is why X can have the advertising-driven business model it does. One-way social media contracts with users require agreement with provider terms. The users have one choice: Use the platform, or don’t.

The key enterprise opportunity: A zero-copy user PII graph that respects users

It’s clear that enterprises should do more to tap the value of the kinds of user data that email, for example, generates. One way to sidestep the sensitivity issues associated with reusing that sort of data would be to treat the most sensitive user data separately.

Self-sovereign identity (SSI) advocate Phil Windley has pointed out that agent-managed, hashed messaging and decentralized identifiers could make it unnecessary to duplicate identifiers that correlate. If a bartender just needs to confirm that a patron at the bar is old enough to drink, the bartender could just ping the DMV to confirm the fact. The DMV could then ping the user’s phone to verify the patron’s claimed adult status.

Given such a scheme, each user could manage and control their access to their own most sensitive PII. In this scenario, the PII could stay in place, stored, and encrypted on a user’s phone.

Knowledge graphs lend themselves to such a less centralized, and yet more fine-grained and transparent approach to data management. By supporting self-sovereign identity and a data-centric architecture, a Chief Data Officer could help the Chief Risk Officer mitigate the enterprise risk associated with the duplication of personally identifiable information—a true, win-win.

 

Contributed by Alan Morrison

How to Take Back 40-60% of Your IT Spend by Fixing Your Data

Creating a semantic graph foundation helps your organization become data-driven while significantly reducing IT spend

Organizations that quickly adapt to changing market conditions have a competitive advantage over their peers. Achieving this advantage is dependent on their ability to capture, connect, integrate, and convert data into insight for business decisions and processes. This is the goal of a “data-driven” organization. However, in the race to become data-driven, most efforts have resulted in a tangled web of data integrations and reconciliations across a sea of data silos that add up to between 40% – 60% of an enterprise’s annual technology spend. We call this the “Bad Data Tax”. Not only is this expensive, but the results often don’t translate into the key insights needed to deliver better business decisions or more efficient processes.

This is partly because integrating and moving data is not the only problem. The data itself is stored in a way that is not optimal for extracting insight. Unlocking additional value from data requires context, relationships, and structure, none of which are present in the way most organizations store their data today.

Solution to the Data Dilemma

The good news is that the solution to this data dilemma is actually quite simple. It can be accomplished at a fraction of the cost of what organizations spend each year supporting the vast industry of data integration workarounds. The pathway forward doesn’t require ripping everything out but building a semantic “graph” layer across data to connect the dots and restore context. However, it will take effort to formalize a shared semantic model that can be mapped to data assets, and turn unstructured data into a format that can be mined for insight. This is the future of modern data and analytics and a critical enabler to getting more value and insight out of your data.

This shift from relational to graph approach has been well-documented by Gartner who advise that “using graph techniques at scale will form the foundation of modern data and analytics” and “graph technologies will be used in 80% of data and analytics innovations by 2025.” Most of the leading market research firms consider graph technologies to be a “critical enabler.” And while there is a great deal of experimentation underway, most organizations have only scratched the surface in a use-case-by-use-case fashion. While this may yield great benefits for the specific use case, it doesn’t fix the causes behind the “Bad Data Tax” that organizations are facing. Until executives begin to take a more strategic approach with graph technologies, they will continue to struggle to deliver the needed insights that will give them a competitive edge. 

Modernizing Your Data Environment

Most organizations have come of age in a world dominated by technology. There have been multiple technology revolutions that have necessitated the creation of big organizational departments to make it all work. In spite of all the activity, the data paradigm hasn’t evolved much. Organizations are still managing data using relational technology invented in the 1970’s. While relational databases are the best fit for managing structured data workloads, they are not good for ad hoc inquiry and scenario-based analysis.

Data has become isolated and mismatched across repositories and silos due to technology fragmentation and the rigidity of the relational paradigm. Enterprises often have thousands of business and data silos–each based on proprietary data models that are hard to identify and even harder to change. This has become a liability that diverts resources from business goals, extends time-to-value for analysts, and leads to business frustration. The new task before leadership is now about fixing the data itself.

Fixing the data is possible with graph technologies and web standards that share data across federated environments and between interdependent systems. The approach has evolved for ensuring data precision, flexibility, and quality. Because these open standards are based on granular concepts, they become reusable building blocks for a solid data foundation. Adopting them removes ambiguity, facilitates automation, and reduces the need for data reconciliation.

Data Bill of Rights

Organizations need to remind themselves that data is simply a representation of real things (customers, products, people, and processes) where precision, context, semantics, and nuance matter as much as the data itself. For those who are tasked with extracting insight from data, there are several expectations that should be honored– that the data should be available and accessible when needed, stored in a format that is flexible and accurate, retains the context and intent of the original data, and is traceable as it flows through the organization.

This is what we call the “Data Bill of Rights”. Providing this Data Bill of Rights is achievable right now without a huge investment in technology or massive disruption to the way the organization operates.

Strategic Graph Deployment

Many organizations are already leveraging graph technologies and semantic standards for their ability to traverse relationships and connect the dots across data silos. These organizations are often doing so on a case-by-case basis covering one business area and focusing on an isolated application, such as fraud detection or supply chain analytics. While this can result in faster time-to-value for a singular use case, without addressing the foundational data layers, it results in another silo without gaining the key benefit of reusability.

The key to adopting a more strategic approach to semantic standards and knowledge graphs starts at the top with buy-in across the C-suite. Without this senior sponsorship, the program will face an uphill battle of overcoming the organizational inertia with little chance of broad success. However, with this level of support, the likelihood dramatically increases of getting sufficient buy-in across all the stakeholders involved in managing an organization’s data infrastructure.

While starting as an innovation project can be useful, forming a Graph Center of Excellence, will have an even greater impact. It can give the organization a dedicated team to evangelize and execute the strategy, score incremental wins to demonstrate value and leverage best practices and economies of scale along the way. They would be tasked with both building the foundation as well as prioritizing graph use cases against organizational focuses.

One key benefit from this approach is the ability to start small, deliver quick wins, and expand as value is demonstrated. There is no getting around the mandate to initially deliver something practical and useful. A framework for building a Graph Center of Excellence will be published in the coming weeks.

Scope of Investment Required

Knowledge graph advocates admit that a long tail of investment is necessary to realize its full potential. Enterprises need basic operational information including an inventory of the technology landscape and the roadmap of data and systems to be merged, consolidated, eliminated, or migrated. They need to have a clear vision of the systems of record, data flows, transformations, and provisioning points. They need to be aware of the costs associated with the acquisition of platforms, triplestore databases, pipeline tools, and other components needed to build the foundational layer of the knowledge graph.

In addition to the plumbing, organizations need to also understand the underlying content that supports business functionality. This includes the reference data about business entities, agents, and people. The taxonomies and data models about contract terms and parties, the meaning of ownership and control, notions of parties and roles, and so on. These concepts are the foundation of the semantic approach. These might not be exciting, but they are critical because it is the scaffolding for everything else.

Initial Approach

When thinking about the scope of investment, the first graph-enabled application can take anywhere from 6-12 months from conception to production. Much of the time needs to be invested in getting data teams aligned and mobilized – which underscores the essential nature of leadership and the importance of starting with the right set of use cases. It need to be operationally viable and solve a real business problem. The initial use case has to be important for the business.

With the right strategic approach in perspective, the first delivery is infrastructure plus pipeline management and people. This gets the organization the MVP including an incremental project plan and rollout. The second delivery should consist of the foundational building blocks for workflow and reusability. This will prove the viability of the approach.

Building Use Cases Incrementally

The next series of use cases should be based on matching functionality to capitalize on concept reusability. This will enable teams to shift their effort from building the technical components to adding incremental functionality. This translates to 30% of the original cost and a rollout that could be three times faster. These costs will continue to decrease as the enterprise expands reusable components – achieving full value around the third year.

The strategic play is not the $3-$5 million for the first few domains, but the core infrastructure required to run the organization moving forward. It is absolutely possible to continue to add use cases on an incremental level, but not necessarily the best way to capitalize on the digital future. The long-term cost efficiency of a foundational enterprise knowledge graph (EKG) should be compared to the costs of managing thousands of silos. For a big enterprise, this can be measured in hundreds of millions of dollars – before factoring in the value proposition of enhanced capabilities for data science and complying with regulatory obligations to manage risks.

Business Case Summary

Organizations are paying a “Bad Data Tax” of 40% – 60% of their annual IT spend on the tangled web of integrations across their data silos. To make matters worse, following this course does not help an organization achieve their goal of being data-driven. The data itself has a problem. This is due to the way data is traditionally stored in rows, columns, and tables that do not have the context, relationships, and structure needed to extract the needed insight.

Adding a semantic graph layer is a simple, non-intrusive solution to connect the dots, restore context, and provide what is needed for data teams to succeed. While the Bad Data Tax alone quantifiably justifies the cost of solving the problem, it scarcely scratches the surface of the full value delivered. The opportunity cost side, though more difficult to quantify, is no less significant with the graph enabling a host of new data and insight capabilities (better AI and data science outcomes, increased personalization and recommendations for driving increased revenue, more holistic views through data fabrics, high fidelity digital twins of assets, processes, and systems for what-if analysis, and more).

While most organizations have begun deploying graph technologies in isolated use cases, they have not yet applied them foundationally to solving the Bad Data Tax and fixing their underlying data problem. Success will require buy-in and sponsorship across the C-suite to overcome organizational inertia. For best outcomes, create a Graph Center of Excellence focused on strategically deploying both a semantic graph foundation and high-priority use cases. The key will be in starting small, delivering quick wins with incremental value and effectively communicating this across all stakeholders.

While initial investments can start small, expect initial projects to take from 6-12 months. To cover the first couple of projects, a budget between $1.5-$3 million should be sufficient. The outcomes will justify further investment in graph-based projects throughout the organization, each deploying 30% faster and cheaper than early projects through leveraging best practices and economies of scale.

Conclusion

The business case is compelling – the cost to develop a foundational graph capability is a fraction of the amount wasted each year on the Bad Data Tax alone. Addressing this problem is both easier and more urgent than ever. Failing to develop the data capabilities that graph technologies offer can put organizations at a significant disadvantage, especially in a world where AI capabilities are accelerating and critical insight is being delivered in near real time. The opportunity cost is significant. The solution is simple. Now is the time to act.

 

This article originally appeared at How to Take Back 40-60% of Your IT Spend by Fixing Your Data – Ontotext, and was reposted