Data-Centric’s Role in the Reduction of Complexity

Complexity Drives Cost in Information Systems

A system with twice the number of lines of code will typically cost more than twice as much to build and maintain.

There is no economy of scale in enterprise applications. There is dis economy of scale. In manufacturing, every doubling of output results in predictable reduction in the cost per unit. This is often called a learning curve or an experience curve.

Just the opposite happens with enterprise applications. Every doubling of code size means that additional code is added at ever lower productivity. This is because of complex dependency. When you manufacture widgets, each widget has no relationship to or dependency on, any of the other widgets. With code, it is just the opposite. Each line must fit in with all those that preceded it. We can reduce the dependency, with discipline, but we cannot eliminate it.

If you are interested in reducing the cost of building, maintaining, and integrating systems, you need to tackle the complexity issue head on.

The first stopping point on this journey is recognizing the role that schema has in the proliferation of code. Study software estimating methodologies, such as function point analysis, and you will quickly see the central role that schema size has on code bloat. Function point analysis estimates effort based on inputs such as the number of fields on a form, the elements in a transaction, or the columns in a report. Each of these is directly driven by the size of the schema. If you add attributes to your schema they must show up in forms, transactions, and reports, otherwise, what was the point?

I recently did a bit of forensics on a popular and well known high quality application: Quick Books, which I think is representative. The Quick Books code base is 10 million lines of code. The schema consists of 150 tables and 7500 attributes (or 7650 schema concepts in total). That means that each schema concept, on average, contributed another 1300 lines of code to the solutions. Given that most studies have placed the cost to build and deploy software at between $10 and $100 per line of code (it is an admittedly large range but you have to start somewhere) that means that each attribute added to the schema is committing the enterprise to somewhere between $13K and $130K of expense just to deploy, and probably an equal amount over the life of the product for maintenance.

I’m hoping this would give data modelers a bit of pause. It is so easy to add another column, let alone another table to a design; it is sobering to consider the economic impact.

But that’s not what this article is about. This article is about the insidious multiplier effect that not following the data centric approach is having on enterprises these days.

Let us summarize what is happening in enterprise applications:

The size of each application’s schema is driving the cost of building, implementing, and maintaining it (even if the application is purchased).
The number of applications drives the cost of systems integration (which is now 30-60% of all IT costs).
The overlap, without alignment, is the main driver of integration costs (if the fields are identical from application to application, integration is easy; if the applications have no overlap, integration is unnecessary).

We now know that most applications can be reduced in complexity by a factor of 10-100. That is pretty good. But the systems of systems potential is even greater. We now know that even very complex enterprises have a core model that has just a few hundred concepts. Most of the rest of the distinctions can be made taxonomically and not involve programming changes.

When each sub domain directly extends the core model, instead of the complexity being multiplicative, it is only incrementally additive.

We worked with a manufacturing company whose core product management system had 700 tables and 7000 attributes (7700 concepts). Our replacement system had 46 classes and 36 attributes (82 concepts) – almost a 100-fold reduction in complexity. They acquired another company that had their own systems, completely and arbitrarily different, smaller and simpler at 60 tables and 1000 attributes or 1060 concepts total. To accommodate the differences in the acquired company we had to add 2 concepts to the core model, or about 3%.

Normally, trying to integrate 7700 concepts with 1060 concepts would require a very complex systems integration project. But once the problem is reduced to its essence, we realize that there is a 3% increment, which is easily managed.

What does this have to do with data centricity?

Until you embrace data centricity, you think that the 7700 concepts and the 1060 concepts are valid and necessary. You’d be willing to spend considerable money to integrate them (it is worth mentioning that in this case the client we were working with had acquired the other company ten years ago and had not integrated their systems, mostly due to the “complexity” of doing so).

Once you embrace data centricity, you begin to see the incredible opportunities.

You don’t need data centricity to fix one application. You merely need elegance. That is a discipline that helps guide you to the simplest design that solves the problem. You may have thought you were doing that already. What is interesting is that real creativity comes with constraints. And when you constrain your design choice to be in alignment with a firms’ “core model,” it is surprising how rapidly the complexity drops. More importantly for the long-term economics, the divergence for the overlapped bits drops even faster.

When you step back and look at the economics though, there is a bigger story:

The total cost of enterprise applications is roughly proportional to:

These items are multiplicative (except for the last which is a divisor). This means if you drop any one of them in half the overall result drops in half. If you drop two of them in half the result drops by a factor of four, and if you drop all of them in half the result is an eight-fold reduction in cost.

Dropping any of these in half is not that hard. If you drop them all by a factor of ten (very do-able) the result is a 1000 fold reduction in cost. Sounds too incredible to believe, but let’s take a closer look at what it would take to reduce each in half or by a factor of ten.

Click here to read more on TDAN.com

Complexity Drives Cost in Information Systems

Contact Us

Learn More