Governance in a Data-Centric Environment

How a Data-Centric Environment Becomes Harder to Govern

A traditional data landscape has the advantage of being extremely silo-ed.  By taking your entire data landscape and dividing it into thousands of databases, there is the potential that each database is small enough to be manageable.

As it turns out this is more potential than actuality.  Many of the individual application data models that we look at are individually more complex than the entire enterprise model should be.  However, that doesn’t help anyone trying to govern.  It is what it is.

What is helpful about all this silo-ization is that each silo has a smaller community of interest.  When you cut through all the procedures, maturity models and the like, governance is a social problem.  Social problems, such as “agreement,” get harder the more people you get involved.

From this standpoint, the status quo has a huge advantage, and a Data-Centric firm has a big challenge: there are far more people whose agreement one needs to solicit and obtain.

The other problem that Data-Centric brings to the table is the ease of change.  Data Governance likes things that change slower than the process can manage.  Often this is a toss-up.  Most systems are hard to change and most data governance processes are slow.  They are pretty much made for each other.

I remember when we built our first model driven application environment (unfortunately we chose health care for our first vertical).  We showed how you could change the UI, API, Schema, Constraints, etc.  in real time.  This freaked our sponsors out.  They couldn’t imagine how they would manage [govern] this kind of environment.  In retrospect, they were right.  They would not have been able to manage it.

This doesn’t mean the approach isn’t valid— it means we need to spend a lot more time on the approach to governance. We have two huge things working against us: we are taking the scope from tribal silos to the entire firm and we are increasing the tempo of change.

How a Data-Centric Environment Becomes Easier to Govern

A traditional data landscape has the disadvantage of being extremely silo-ed.  You get some local governance being silo-ed, but you have almost no hope on enterprise governance.  This is why its high-fives all around for local governance, while making little progress on firm wide governance.

One thing that data-centric provides that makes the data governance issues tractable is incredible reduction in complexity.  Because governance is a human activity, getting down to human scales of complexity is a huge advantage.

Furthermore, to enjoy the benefits of data-centric you have to be prepared to share.  A traditional environment encourages copying of enterprise data to restructure it and adapt it to your own local needs.  Pretty much all enterprises have data on their employees.  Lots of data actually.  A large percentage of applications also have data on employees.  Some merely have “users” (most of whom are employees) and their entitlements, but many have considerably more.  Inventory systems have cycle counters, procurement systems have purchasing agents, incident systems have reporters, you get the pattern.

Each system is dealing with another copy (maybe manually re-entered, maybe from a feed) of the core employee data.  Each system has structured the local representation differently and of course named all the fields differently.  Some of this is human nature, or maybe data modeler nature, that they want to put their own stamp on things, but some of it is inevitable.  When you buy a package, all the fields have names.  Few, if any of them, are the names you would have chosen, or the names in your enterprise model, if you have one.

With the most mature form of data-centric, you would have one set of enterprise employee data.  You can extend it, but the un-extended parts are used just as they are.  For most developers, this idea sounds either too good to be true or too bad to be true.  Most developers are comfortable with a world they control.  This is a world of tables within their database.  They can manage referential integrity within that world.  They can predict performance within that world.  They don’t like to think about a world where they have to accept someone else’s names and structures, and to agree with other groups decision making.

But once you overcome developer inertia on this topic and you are actually re-using data as it is, you have opened up a channel of communication that naturally leads to shared governance. Imagine a dozen departments consuming the exact same set of employee data.  Not local derivations of the HR golden record, or the LDAP files, but an actual shared data set.  They are incented to work together on the common data.  The natural thing to happen, and we have seen this in mature organizations, is the focus shifts to the smallest, realest, most common data elements.  This social movement, and this focus on what is key and what is real, actually makes it easier to have common governance.  You aren’t trying to foist one applications view of the world on the rest of the firm, you are trying to get the firm to understand and communicate what it cares about and what it shares.

And this creates a natural basis for governance despite the fact that the scope became considerably larger.

