How a “user” knowledge graph can help change data culture
Identity and Access Management (IAM) has had the same problem since Fernando Corbató of MIT first dreamed up the idea of digital passwords in 1960: opacity. Identity in the physical world is rich and well-articulated, with a wealth of different ways to verify information on individual humans and devices. By contrast, the digital realm has been identity data impoverished, cryptic and inflexible for over 60 years now.
Jans Aasman, CEO of Franz, provider of the entity-event knowledge graph solution Allegrograph, envisions a “user” knowledge graph as a flexible and more manageable data-centric solution to the IAM challenge. He presented on the topic at this past summer’s Data-Centric Architecture Forum, which Semantic Arts hosted near its headquarters in Fort Collins, Colorado.
Consider the specificity of a semantic graph and how it could facilitate secure access control. Knowledge graphs constructed of subject-predicate-object triples make it possible to set rules and filters in an articulated and yet straightforward manner. Information about individuals that’s been collected for other HR purposes could enable this more precise filtering.
For example, Jans could disallow others’ access to a triple that connects “Jans” and “salary”. Or he could disallow access to certain predicates.
Identity and access management vendors call this method Attribute-Based Access Control (ABAC). Attributes include many different characteristics of users and what they interact with, which is inherently more flexible than role-based access control (RBAC).
Cell-level control is also possible, but as Forrest Hare of Summit Knowledge Solutions points out, such security doesn’t make a lot of sense, given how much meaning is absent in cells controlled in isolation. “What’s the classification of the number 7?” He asked. Without more context, it seems silly to control cells that are just storing numbers or individual letters, for example.
Simplifying identity management with a knowledge graph approach
Graph databases can simplify various aspects of the process of identity management. Let’s take Lightweight Directory Access Protocol, or LDAP, for example.
This vendor-agnostic protocol has been around for 30 years, but it’s still popular with enterprises. It’s a pre-web, post-internet hierarchical directory service and authentication protocol.
“Think of LDAP as a gigantic, virtual telephone book,” suggests access control management vendor Foxpass. Foxpass offers a dashboard-based LDAP management product which it claims is much easier to manage than OpenLDAP.
If companies don’t use LDAP, they might as well use Microsoft’s Active Directory, which is a broader, database-oriented identity and access management product that covers more of the same bases. Microsoft bundles AD with its Server and Exchange products, a means of lock-in that has been quite effective. Lock-in, obviously, inhibits innovation in general.
Consider the whole of identity management as it exists today and how limiting it has been. How could enterprises embark on the journey of using a graph database-oriented approach as an alternative to application-centric IAM software? The first step involves the creation of a “user” knowledge graph.
Access control data duplication and fragmentation
Semantic Arts CEO Dave McComb in his book Software Wasteland estimated that 90 percent of data is duplicated. Application-centric architectures in use since the days of mainframes have led to user data sprawl. Part of the reason there is such a duplication of user data is that authentication, authorization, and access control (AAA) methods require more bits of personally identifiable information (PII) be shared with central repositories for AAA purposes.
B2C companies are particularly prone to hoovering up these additional bits of PII lately and storing that sensitive info in centralized repositories. Those repositories become one-stop shops for identity thieves. Customers who want to pay online have to enter bank routing numbers and personal account numbers. As a result, there’s even more duplicate PII sprawl.
One of the reasons a “user” knowledge graph (and a knowledge graph enterprise foundation) could be innovative is that enterprises who adopt such an approach can move closer to zero-copy integration architectures. Model-driven development of the type that knowledge graphs enable assumes and encourages shared data and logic.
A “user” graph coupled with project management data could reuse the same enabling entities and relationships repeatedly for different purposes. The model-driven development approach thus incentivizes organic data management.
The challenge of harnessing relationship-rich data
Jans points out that enterprises, for example, run massive email systems that could be tapped to analyze project data for optimization purposes. And disambiguation by unique email address across the enterprise can be a starting point for all sorts of useful applications.
Most enterprises don’t apply unique email address disambiguation, but Franz has a pharma company client that does, an exception that proves the rule. Email continues to be an untapped resource in many organizations precisely because it’s a treasure trove of relationship data.
Problematic data farming realities: A social media example
Relationship data involving humans is sensitive by definition, but the reuse potential of sensitive data is too important to ignore. Organizations do need to interact with individuals online, and vice versa.
Former US Federal Bureau of Investigation (FBI) counterintelligence agent Peter Strzok quoted from Deadline: White House, an MSNBC program in the US aired on August 16:
“I’ve served I don’t know how many search warrants on Twitter (now known as X) over the years in investigations. We need to put our investigator’s hat on and talk about tradecraft a little bit. Twitter gathers a lot of information. They just don’t have your tweets. They have your draft tweets. In some cases, they have deleted tweets. They have DMs that people have sent you, which are not encrypted. They have your draft DMs, the IP address from which you logged on to the account at the time, sometimes the location at which you accessed the account and other applications that are associated with your Twitter account, amongst other data.”
X and most other social media platforms, not to mention law enforcement agencies such as the FBI, obviously care a whole lot about data. Collecting, saving, and allowing access to data from hundreds of millions of users in such a broad, comprehensive fashion is essential for X. At least from a data utilization perspective, what they’ve done makes sense.
Contrast these social media platforms with the way enterprises collect and handle their own data. That collection and management effort is function- rather than human-centric. With social media, the human is the product.
So why is a social media platform’s culture different? Because with public social media, broad, relationship-rich data sharing had to come first. Users learned first-hand what the privacy tradeoffs were, and that kind of sharing capability was designed into the architecture. The ability to share and reuse social media data for many purposes implies the need to manage the data and its accessibility in an elaborate way. Email, by contrast, is a much older technology that was not originally intended for multi-purpose reuse.
Why can organizations like the FBI successfully serve search warrants on data from data farming companies? Because social media started with a broad data sharing assumption and forced a change in the data sharing culture. Then came adoption. Then law enforcement stepped in and argued effectively for its own access.
Broadly reused and shared, web data about users is clearly more useful than siloed data. Shared data is why X can have the advertising-driven business model it does. One-way social media contracts with users require agreement with provider terms. The users have one choice: Use the platform, or don’t.
The key enterprise opportunity: A zero-copy user PII graph that respects users
It’s clear that enterprises should do more to tap the value of the kinds of user data that email, for example, generates. One way to sidestep the sensitivity issues associated with reusing that sort of data would be to treat the most sensitive user data separately.
Self-sovereign identity (SSI) advocate Phil Windley has pointed out that agent-managed, hashed messaging and decentralized identifiers could make it unnecessary to duplicate identifiers that correlate. If a bartender just needs to confirm that a patron at the bar is old enough to drink, the bartender could just ping the DMV to confirm the fact. The DMV could then ping the user’s phone to verify the patron’s claimed adult status.
Given such a scheme, each user could manage and control their access to their own most sensitive PII. In this scenario, the PII could stay in place, stored, and encrypted on a user’s phone.
Knowledge graphs lend themselves to such a less centralized, and yet more fine-grained and transparent approach to data management. By supporting self-sovereign identity and a data-centric architecture, a Chief Data Officer could help the Chief Risk Officer mitigate the enterprise risk associated with the duplication of personally identifiable information—a true, win-win.
Contributed by Alan Morrison