The Data Foundation (Data Foundation PitchFest) hosted at PitchFest on “Unlocking the vision of the Financial Data Transparency Act” a few days ago. Selected speakers were given 10 minutes to bring their best ideas on how to use the improved financial regulatory information and data.
The Financial Data Transparency Act is a new piece of legislation directly affecting the financial services industry. In short, it directs financial regulators to harmonize data collections and move to machine (and people) readable forms. The goal is to reduce the burdens of compliance on regulated industries, increase the ability to analyze data, and to enhance overall transparency.
Two members of our team, Michael Atkin and Dalia Dahleh were given the opportunity to present. Below is the text from Michael Atkin’s pitch:
- Background – Just to set the stage. I’ve been fortunate to have been in the position as scribe, analyst, advocate and organizer for data management since 1985. I’ve always been a neutral facilitator – allowing me to sit on all sides of the data management issue all over the world – from data provider to data consumer to market authority to regulator. I’ve helped create maturity models outlining best practice – performed benchmarking to measure progress – documented the business case – and created and taught the Principles of Data Management at Columbia University. I’ve also served on the SEC’s Market Data Advisory Committee, the CFTC’s Technical Advisory Committee and as the Chair of the Data Subcommittee of the OFR’s Financial Research Advisory activity during the financial crisis of 2008. So, I have some perspective on the challenges the regulators face and the value of the FDTA.
- Conclusion (slide 2) – My conclusions after all that exposure are simple. There is a real data dilemma for many entities. The dilemma is caused by fragmentation of technology. It’s nobody’s fault. We have business and operational silos. They are created using proprietary software. The same things are modeled differently based on the whim of the architects, the focus of the applications and the nuances of the technical solution.This fragmentation creates “data incongruence” – where the meaning of data from one repository doesn’t match other repositories. We have the same words, with different meanings. We have the same meaning using different words. And we have nuances that get lost in translation. As a result, we spend countless effort and money moving data around, reconciling meaning and doing mapping. As one of my banking clients said … “My projects end up as expensive death marches of data cleansing and manipulation just to make the software work.” And we do this over and over ad infinitum.Not only do we suffer from data incongruence – we suffer from the limitations of relational technology that still dominates our way of processing data. For the record, relational technology is over 50 years old. It was (and is) great for computation and structured data. It’s not good for ad hoc inquiry and scenario-based analysis. The truth is that data has become isolated and mismatched across repositories due to technology fragmentation and the rigidity of the relational paradigm. Enterprises (including government enterprises) often have thousands of business and data silos – each based on proprietary data models that are hard to identify and even harder to change. I refer to this as the bad data tax. It costs most organizations somewhere around 40-60% of their IT budget to address. So, let’s recognize that this is a real liability. One that diverts resources from business goals, extends time-to-value for analysts, and leads to knowledge worker frustration. The new task before FSOC leadership and the FDTA is now about fixing the data itself.
- Solution (slide 3) – The good news is that the solution to this data dilemma is actually quite simple and twofold in nature. First – adopt the principles of good data hygiene. And on that front, there appears to be good progress thanks to efforts around the Federal Data Strategy and things related to BCBS 239 and the Open Government Data Act. But governance alone will not solve the data dilemma.The second thing that is required is to adopt data standards that were specifically designed to address the problems of technology fragmentation. And these open data web-based standards are quite mature. They include the Internationalized Resource Identifier (or IRI) for identity resolution. The use of ontologies – that enable us to model simple facts and relationship facts. And the expression of these things in standards like RDF for ontologies, OWL for inferencing and SHACL for business rules.From these standards you get a bunch of capabilities. You get quality by math (because the ontology ensures precision of meaning). You get reusability (which eliminates the problem of hard coded assumptions and the problem of doing the same thing in slightly different ways). You get access control (because the rules are embedded into the data and not constrained by systems or administrative complexity). You get lineage traceability (because everything is linked to a single identifier so that data can be traced as it flows across systems). And you get good governance (since these standards use resolvable identity, precise meaning and lineage traceability to shift governance from people-intensive data reconciliation to more automated data applications).
- FDTA (slide 4) – Another important component is that this is happening at the right time. I see the FDTA as the next step in a line of initiatives seeking to modernize regulatory reporting and reduce risk. I’ve witnessed the efforts to move to T+1 (to address the clearing and settlement challenge). I’ve seen the recognition of global interdependencies (with the fallout from Long Term Capital, Enron and the problems of derivatives in Orange County). We’ve seen the problems of identity resolution that led to KYC and AML requirements. And I was actively involved in understanding the data challenges of systemic risk with the credit crisis of 2008.The problem with all these regulatory activities is that most of them are not about fixing the data. Yes, we did get LEI and data governance. Those are great things, but far from what is required to address the data dilemma. I also applaud the adoption of XBRL (and the concept of data tagging). I like the XBRL taxonomies (as well as the Eurofiling regulatory taxonomies) – but they are designed vertically report-by-report with a limited capability for linking things together. Not only that, most entities are just extracting XBRL into their relational environments that does little to address the problem of structural rigidity. The good news is that all the work that has gone into the adoption of XBRL is able to be leveraged. XML is good for data transfer. Taxonomies are good for unraveling concepts and tagging. And the shift from XML to RDF is straightforward and would not affect those who are currently reporting using XBRL.One final note before I make our pitch. Let’s recognize that XBRL is not the way the banks are managing their internal data infrastructures. They suffer from the same dilemmas as the regulators and almost every G-SIB and D-SIB I know is moving toward semantic standards. Because even though FDTA is about the FSOC agencies – it will ultimately affect the financial institutions. I see this as an opportunity for collaboration between regulators and the regulated, in building the infrastructure for the digital world.
- Proposal (slide 5) – Semantic Arts is proposing a pilot project to implement the foundational infrastructure of precise data about financial instruments (including identification, classification, descriptive elements and corporate actions), legal entities (including entity types as well as information about ownership and control), obligations (associated with issuance, trading, clearing and settlement), and holdings about the portfolios of the regulated entities. These are the building blocks of linked risk analysis.To implement this initiative, we are proposing you start with a single simple model of the information from one of the covered agencies. The Initial project would focus on defining the enterprise model and conforming two to three key data sets to the model. The resulting model would be hosted on a graph database. Subsequent projects would involve expanding the footprint of data domains to be added to the graph, and gradually building functionality to begin to reverse the legacy creation process.We would initiate things by leveraging the open standard upper ontology (GIST) from Semantic Arts as well as the work of the Financial Industry Business Ontology (from the EDM Council) and any other vetted ontology like the one OFR is building for CFI.Semantic Arts has a philosophy of “think big” (like cross-agency interoperability) but “start small” (like a business domain of one of the agencies). The value of adopting semantic standards is threefold – and can be measured using the “three C’s” of metrics. The first C is cost containment starting with data integration and includes areas focused on business process automation and consolidation of redundant systems (best known as technical agility). The second C is capability enhancement for analysis of the degrees of interconnectedness, the nature of transitive relationships, state contingent cash flow, collateral flow, guarantee and transmission of risk. The final C is implementation of the control environment focused on tracking data flow, protecting sensitive information, preventing unwanted outcomes, managing access and ensuring privacy.
- Final Word (contact) – Just a final word to leave you with. Adopting these semantic standards can be accomplished at a fraction of the cost of what you spend each year supporting the vast cottage industry of data integration workarounds. The pathway forward doesn’t require ripping everything out but instead building a semantic “graph” layer across data to connect the dots and restore context. This is what we do. Thank you.