Not infrequently, we come across use cases that particularly highlight the power of RDF knowledge graphs. One such case involves the analysis of a simple violations dataset to explore complex trends surrounding team culture and conduct within a large investment bank.
Among a number of compliance-related datasets, we incorporated the bank’s employee violations data into an integrated ontological model. The source data consisted of the violator’s employee ID, the type of violation, the incident’s date, and a few additional data points about the incident itself. Within three weeks, we modeled, converted, and loaded the data into the development triplestore. The development knowledge graph was already populated with the firm’s organizational structure and human resources personnel data.
The amalgamation of data, coupled with the strength of SPARQL’s path queries, allowed us to quickly produce reports highlighting cultural trends that would have been extremely difficult to find in the original siloed data. In one example, we investigated senior personnel who committed violations, and who manage employees who committed similar violations. This could point to managers who may be cultivating certain negative behaviors within their teams.
Using property paths, we fine-tuned the query to include indirect managers within a variable number of levels in the company hierarchy, since cultural tendencies can propagate down the organizational ladder. We also calculated the number of personnel within these multi-level teams to provide a quantitative perspective.
This exposed, for instance, that 17 out of the 54 employees who report to various managers below Ms. Greed breached conflict-of-interest guidelines this year. This included 2 employees who committed repeat violations, and 6 who committed additional related violations. Ms. Greed herself participated in a private investment without authorization. All this was found, without a starting or seeding data point, using only one SPARQL query. It would have been nearly impossible to uncover the related nature of these violations with the legacy relational data structure.
In another example, we leveraged HR data to analyze the distribution of violations among the various job titles and hire types within different departments. We revealed, for instance, (a) which types of violations were common at different levels of seniority, (b) what percentage of contingent workers (i.e., contractors) committed the various types of violations, and (c) which departments fostered problematic conduct at senior management levels.
Unfortunately, we did not have historical organizational data. Had that been available, many additional interesting insights could be explored. We could look at likelihoods of job promotions or demotions following the various types of violations, and vice versa. We could study the effect of onboarding or departure of senior personnel on the conduct of their teams.
One of the next datasets we will be working on pertains to the bank’s compliance training attendance. We can then look at correlations between training and violations, to study the effectiveness of training across different topics and departments. By plugging in a small dataset into our expanding knowledge base, we can instantly multiply our analytical possibilities.