System Project Failure: The Heuristics of Risk

Information Systems Project Failure: The Heuristics of Risk

An Evaluation of Risk Factors in Large Systems Engineering Projects

This article was originally published in the Journal of Information Systems Management, Volume 8, Number 1, Winter 1991. It is reprinted here by permission of the publisher: www.crcpress.com

System project failures are a well-known part of systems development; however. all the potential risks of planning and executing a project effort are not. This article offers heuristic guidelines to help Information Systems managers assess these inherent risk factors before initiating and while executing a project. Case examples illus­trate how a particular risk factor can result in a failed systems development effort. Most Information Systems managers who have been responsible for any type of systems development effort are familiar with project failure. Although publicity is rare, a few project failures have received coverage in the trade press. Some project failures have been labeled runaways, which is “a system that is millions [of dollars] over budget, years behind schedule, and—if ever completed—less effective than promised.

Several years ago, another runaway project disaster received notoriety because the failure affected five million New Jersey state drivers.

The fourth-generation language used to develop the system was not equipped to handle the daily volume of car registrations and license renewals. According to one accounting firm’s study, the problem is widespread: runaway projects accounted for some 35% of the total projects under development in its 600 largest client organizations. Federal government officials have become wary of a similar phenomenon known as grand design approaches to systems development.

A grand design, based on traditional systems development principles, dictates specification of all requirements up front. Problems surface in the federal government bureaucracies because projects are usually large in size and complexity. A large project scope means that costs escalate, time schedules lag, and user, project, and vendor personnel change frequently. Furthermore, Congress balks at the estimated costs, forcing compromise solutions that are prone to implementation failure.

This article attempts to determine the patterns that exist within systems projects that begin well and, for whatever reasons, finish less successfully. To focus the initial work, each project selected used a traditional systems development methodology. Each project failed to meet user expectations. The following observations tie in well with the authors’ experiences with systems development projects. More importantly, these observations may be useful in helping project managers assess the impact of changes to their projects relative to their future success or failure. Furthermore, senior managers and systems sponsors may find help with their decisions regarding the continuation or cancellation of troubled systems projects.

Framework to Identify System Failure Factors

The factors are organized along two dimensions: planning versus executing and technical versus human. The placement of the factors defies pigeonholing; as the framework suggests, few factors are purely classified in one dimension. Broken lines in the exhibit indicate dimension zones. For the purposes of this article, a clockwise approach explains the framework. Beginning with those factors that are most planning related, continuing clockwise through planning-human, executing-human, executing, executing-technical, and planning-technical.

Planning Factors

Although most project failures surface in late execution, the problems often originate during the planning and can occur irrespective of the planning approach being used. Such problems involve mistakes made in time and budget estimates as well as in compression (i.e., project scheduling and management activities) Estimating. Independent of specific tools or techniques, project managers or project sponsors generally use one of five estimating approaches—available budget, level of effort, bottom-up, comparable case, and executive mandate. Note that the approach taken appears to correlate with prospects for project success.

Available Budget

Information Systems managers accustomed to operating their department in a stable environment usually believe that projects will continue within the allotted budget, plus or minus a small percentage. The threat to their projects success is they fail to recognize that taking on a project adds many tasks and virtually none of these tasks can be shortchanged without jeopardizing the entire project.

Level of Effort

The level-of-effort estimate relies on a judgment call that a certain number of employees should be able to complete each task in a certain amount of time. For example, task one should take three people two months to complete. This approach at least recognizes that there is a relationship between the effort and the tasks initiated. However, the approach fails because there is no inherent feedback loop. Because the estimate is judgmental and no two tasks are the same, the next project is always slightly different. Thus, this type of project estimating does not improve with project manager experience.

Bottom-Up Approach

With this approach, planners ascertain individual tasks (e.g., number of pages of procedures to be written, number of employees to be trained, number of screens to be designed). Then, they assign resources and time blocks to accomplish the tasks. Because these estimating parameters are reasonably constant, they provide a common denominator from project to project. Through project time control systems, Information Systems managers know whether or not their estimates are correct. Therefore, the bottom-up method has a self-correcting mechanism as the project proceeds and a learning component to use for each project.

Comparable Case

Very often, especially in long-term planning, estimates must be made for projects prior to the knowledge of detail required for bottom-up estimating. Usually, there are a handful of parameters that determine the size of the project and compare it to similar projects in similar-size organizations in the same industry. For example, although accounts receivable projects may be very different between a construction company and a hospital, they should generally be comparable between hospitals that are approximately the same size. This method is particularly beneficial for project prioritization and rough-cut decisions regarding the necessary worker hours. However, statistics from similar projects may be difficult to obtain. Once the project begins, planners may convert to the bottom-up method.

Executive Mandate

At times, an organization’s senior executives dictate a project completion date on the basis of political, environmental, and market conditions. For example, one organization’s president pushed up a target completion date from three years to one year for a $25 million project involving major technological change and data conversion. Estimates crunched data conversion to two weeks and allowed only one week for a systems test. The reason: that president wanted to implement the system before his retirement.

Studies have shown that in many cases, assigning more staff to a project will not serve to compress the project completion timetable.

As a general rule and regardless of the estimating method used, the more detail included in estimating project time and budget, the more accurate the estimation. In practice, planners do not include enough detail, and a primary source of systems failures is a predilection toward gross underestimation. Gross underestimation does not mean 20% to 30% underestimation, but rather 100% to 1,000% (based on field observations). Furthermore, an acceptable estimation with sufficient detail does not guarantee that the estimate will be allocated the appropriate resources. The following example is a case in point.

Example 1

A client of one of the author’s was, for reasons outside his control, required to replace all of the organization’s information systems. He had a staff of eight and a 1 ‘/2-year timetable to complete the task. His own level of effort estimate suggested that this time was sufficient. An outside group performed a comparable case estimate indicating that the implementation time would require approximately 25,000 workdays or almost 20 times what he had available. Senior management was shocked, agreed in principle to the external estimation, but did nothing other than transfer responsibility for the project to another department.

The project, still staffed at eight but gradually growing, was allowed to drift. Two years later, there were more than 100 employees on the project with a considerable amount of work yet to be done. Had the organization acted on the estimates at the time, the project probably would have essentially been complete at the end of the two-year period and certainly the team members would have felt much better about their involvement.

Compression

This is the act of speeding up tasks. There are two types of compression: planned (i.e., the fast-track approach) and unplanned (i.e., the catch-up approach). Fast-tracked compression management is an art requiring the ability to begin tasks in parallel that are usually conducted sequentially. Managers must rely on their own judgment to predict enough of the outcomes of the tasks usually done first to successfully mesh succeeding tasks and compress the schedule for the final outcome.

Catch-up compression management is probably more common than fast-track management in systems development projects. The problem is that certain tasks do not compress well. Studies have shown that in many cases assigning more staff to a project will not serve to compress the project completion timetable. Instead, more staff usually delays further the completion of the project.

Planning-Human Factors

The planning-human factors include planning aspects that deal predominantly with human communication and scheduling. These factors are bid strategy, staffing, and scheduling. Bid Strategy. Almost all contemporary systems projects use outside vendors either for hardware, software, or other services.

These products and services are often put out to bid. The bidding strategy, whether it be for the entire project or for subcontracting portions of the project, has a major impact on the project’s success. Unfortunately, there is no single best bid-ding methodology. However, most project managers or organizations have a favorite approach. Examples include a fixed-price strategy (i.e., bidding to a set price) and always subcontracting software development to someone other than its designer to prevent conflict of interest. Among the most popular bidding strategies is the lowest bidder strategy.

The inherent risk in selecting the lowest bidder is the magnitude of differences in productivity between programmers and designers, often a factor of 10:1. The impact of this productivity differential is even greater for the individuals who have a major role in the project (e.g., systems architects and project managers). With that great individual variability on the quality and the productivity, selecting a low bid vendor would seem to be an almost certain prescription for failure, and it usually is. Government agencies in particular are forced through regulations to accept the lowest bidder.

As a result, they go to great lengths to disqualify vendors that private businesses would have eliminated on the basis of subjective evaluation. Instead, they redefine their requirements to be strict enough to attempt to eliminate the unqualified on the basis of submitting nonresponsive bids. Staffing. There are two facets to the staffing problem: incorrect staffing and staff turnover.

Incorrect Staffing

The most serious aspect of the incorrect staffing problem stems directly from the estimating problem. That is, inadequate total staff is due to a shortsighted estimate. Other incorrect staffing problems are retaining project members who are not results oriented, or who lack the ability to cooperate as team members, or who do not have the skills called for in the work plan (e.g., failure to include staff with end-user knowledge, or with sufficient technical knowledge of the architecture, or with systems development experience). The following example illustrates project difficulties when there is a lack of end-user knowledge.

Example 2

An agency was processing several million dollars worth of revenue checks per month. The processing procedure was very complex and required some extensive reconciliation. Many of the checks were for large amounts of money. In the current procedure, the check accompanied the reconciling paperwork until the reconciliation was complete. At that point the check was forwarded for deposit. As long as the reconciliation was done on the same day this was not a major problem.

However, the reconciliation process had become very involved, and as a result, checks were being delayed or misplaced. The proposed and designed solution was to create a system that would log the movement of the check so that at any given time anyone could tell the exact location of a check.

One change made to a program module or other design feature often creates a domino effect.

When it was suggested that a more straightforward solution would be to put the chicks directly into the bank and do the reconciliation later, the new system was strongly resisted. A significant problem in this example was that there was no one on the system project team familiar with current practices in revenue and collection systems.

Staff Turnover

This is a two-pronged problem. One prong is ensuring sufficient continuity throughout the life of the project; the other prong is recognizing that there will be major changes in the composition of the team as the project progresses. It is difficult to ensure absolute continuity of staff to a project team because employees are always free to quit, or they may get reassigned. However, it is possible to avoid planned changes in the staff team. The classic waterfall life cycle methodology provides the opportunity, and in some cases the requirement, to reconsider and rebid every project at the end of every systems development phase.

Besides the effect of losing the skills and knowledge of the outgoing members of the team, there is a much more important and subtle factor at work. That is, the new team may feel no compulsion to the design or the solution as proposed in the previous phase. Particularly tenuous are those decisions related to project scope and expense. As a result, there is very often a subtle rescoping of the project every time the members of the project team change.

Scheduling

Still in the realm of planning-human, although now at a more tactical level, are those factors related to scheduling (including the sequencing of activities). The sequence and scheduling of project activities will vary by project type. For example, transaction processing systems usually follow the traditional systems development life cycle (SDLC) approach whereas decision support systems may use applications prototyping. Whatever methodology is used, the sequence and scheduling of project activities should follow a logical, additive progression toward defined goals.

Executing-Human Factors

Project execution, from the human (or personal) side, may be stalled by the lack of feedback, user involvement, and motivation. Each factor inhibits smooth execution of the project.

Feedback

The lack of unbiased feedback surfaces when project managers and systems personnel believe that they can move forward and meet impossible deadlines. Part of the problem is that progress in software or systems development is mostly invisible. Programmers may deceive both themselves and their managers about the extent of their progress. For example, a programmer believed to be 90% complete on a module may actually have 50% of the work yet to do.

User Involvement

This factor includes both the importance of user participation in the design and adequate user training to use the system. The time allotted to procedure development and user training is often too short. User management usually regards systems training as an addition to the regular job requirements rather than giving users sufficient time away from their job responsibilities to learn the system. One of the keys to a system’s project success is to establish ownership among systems users. An effective method of accomplishing user ownership is to let users change requirements during the development process or tailor requirements afterwards. However, this is a two- edged sword if change is not controlled. The importance of user ownership is illustrated in the following example.

Example 3

Because of regulatory changes, a company needed to change both its cost accounting system and allocation methods. Users initiated an internal project that defined some unnecessary complex algorithms and required a major mainframe development effort. A senior vice-president, realizing that the project would not be completed by the established deadline, authorized a second, external team to create a second backup system. Within three weeks, the external team completed the second system using a fully functional prototype at about one-tenth the development costs and one-tenth the operating costs of the internal project.

By this point, the users had such a strong personal involvement with the system they were developing that they rejected this new system despite its advantages. From the external team’s perspective, the project was a failure in spite of the overwhelming cost, schedule, and functional advantages. This failure occurred be- cause the users were not involved in the development of the system.

Motivation

Motivation is a universal personnel problem. With respect to systems personnel, standard time allotments for a given unit of work are not always valid because the motivation of systems personnel varies greatly; that is, the level of motivation of systems staff members will determine how quickly (or slowly) it takes them to complete the work. Two aspects of motivating systems personnel are project-reward structures and the project manager’s success in motivating the team members.

The reward structure can have a significant effect on a project’s outcome. For example, one project team was staffed by employees who performed this systems implementation in addition to their full-time jobs. They were told that when the system converted, they would have the new positions that this new system implied.

As a result, team members worked a lot of overtime to complete the project. Another project effort involved the same arrangement except for the reward structure. Team members had to develop the system in their spare time without the incentive of building future jobs. As a result, the project failed.

Executing Factors

Change management and subsequent workarounds and catch-ups are execution problems stemming directly from the lack of unbiased feedback. When people set impossible deadlines, they act frantically to get the work done rather than admit that they are behind schedule.

Change Management

Once a design specification is finalized, the order goes out to freeze the spec. Although no specifications are ever absolutely frozen, too many changes after finalization may create havoc. One change made to a program module or other design feature often creates a domino effect. Actual changes are often on a larger scale than originally intended.

The extent and impact of subsequent changes to design specifications are products of design quality, project team-user relations, and the project team’s attitude toward change. Analysts who want to please everyone may bring about a project’s demise. Every suggestion brought forward by a user becomes a change. Changes beget other changes until the project dies of accumulated entropy, as in the following example.

Example 4

One project exploded with accumulated changes (see Exhibit 2). Suggested changes (e.g., an inventory aging report) were documented on functional investigation memos (FIMs). A FIM is a one-page narrative describing the change from a user’s perspective. The team analyzed each FIM to determine such affected systems components as data base layout changes or program changes. The change to each affected component was documented on what was called a systems investigation memo (SIM).

Once, authorized, SIMs became technical change requests (TCRs), which consist of module-level detail from which programmers could code. The idea was to accumulate all changes and then open each module only once. This philosophy missed an important point. That is, it is a better strategy to focus on the impact of a single change throughout the entire system than to focus on testing individual modules. In any case, requested changes kept coming in and, as the exhibit suggests, complexity multiplied. After nearly a year of implementing changes, the change backlog was larger than the original change backlog.

After a change in management, the team eliminated all of the unnecessary changes, backed out of some of the coding changes that had already been made, and established a completed version of the system within five months.

Workarounds

Some project managers will continue to find more ingenious ways to live with a problem rather than solve it. For example, one organization implemented an MRP II system (Manufacturing Resource Planning) that was inconsistent with its previous business practices, which were already identified on PC software and procedures. Rather than reconciling the inconsistencies and opting in favor of one of the two systems, employees looked for ways to work around the inconsistencies.

For example, the data element due-date on the informal PC system was the most recent estimate as to when the materials would arrive. In contrast, due-date on the formal MRP II system was based on calculated need dates from the current engineering design. Instead of recognizing the problem that they were trying to pursue two ways to conduct business, the employees attempted to reconcile the two due dates. Eventually, the workarounds became intolerable, and within nine months, the PC system and old procedures were abandoned.

Executing-Technical Factors

Although there are human aspects to vendor, control, and performance factors, the authors view these factors as predominantly technical issues that can compromise project execution.

Vendor

As outsiders, vendors usually get the blame for many project problems, and often, they deserve it. Depending on the particular role they play, vendors may be part of other failure factors discussed in this article. Here, the concentration is on vendor problems in conjunction with the request for proposal (RFP). Ironically, the RFP process may result in the very problems that the process was designed to avoid. That is, the RFP may lead to the implementation of inappropriate products or solutions. Through RFPs, vendors are often requested to respond to user requirements through a checklist evaluation. The evaluation will be weighted by the customer to determine the vendor with the highest score.

This method, however, has several flaws. First, vendors are trusted to make the evaluation for themselves and considerable license is usually used with their answers. Customers often try to address this by making the vendor’s response an addendum to the eventual contract. Experience suggests that this approach is not very effective. If not solved during design or conversion, performance problems may never be resolved.

Second, implicit in the checklist evaluation approach is the assumption that a series of very minute and discrete requirements will actually result in a comprehensive high-quality system. It is often surprising how vendors will continue to add features into their system with very little regard to the impact on the integrated whole, as in the following example.

Example 5

A medium-sized construction company performed a very detailed RFP with checklists.

A relatively unknown software vendor came in first on the functional requirements checklist evaluation even though its solution was written in an exotic operating system and programming language. Management ignored a recommendation to weigh the reliability of a more established vendor’s credibility and in stalled base. Problems occurred immediately.

The package promised a multiplicity of features (e.g., the A/R system supported either an open item or balance forward processing variable by customer). However, these features constantly interfered with one another; the total number of options far exceeded any possibility of testing all combinations and permutations. The implementation team knew that the vendor had performed little integration testing. As a result, they had to do the vendor’s debugging as well as a considerable amount of design. The only way the team could make the system workable was to remove large portions of code to reduce testing requirements to a manageable level.

Control

Controls are a vital part of information systems and in particular those that handle financial transactions. Control code in these systems may be greater than the entire application proper, and adding controls to a system after the fact may cost as much as the original system itself. One organization experienced these costs with a general ledger accounting system, as shown in this example.

Example 6

The literature promoting a system advertised its ability to post transactions into any period: past, present, or future. “What if,” stated the literature, “you find an invoice in the bottom of the drawer that is more than a year old? It does happen.”

However, the controls for this complexity were not built into the, system. Transactions could be posted in previously closed months without a corresponding transaction to change the current month’s balance. Miskeying a date could post it into the distant future, omitted from all financial reports until that date arrived. The solution was straightforward but not easy. The application needed a new control record to indicate which months and years were open for posting. Unfortunately, nearly 100 transactions in nine applications had to be changed to check for an open accounting period prior to creating a transaction, and all programs had to be reviewed to check for the possibility of creating unbalanced general ledger transactions.

Performance

Almost every meaningful systems development project has performance problems. If not solved during design or conversion, performance problems may never be resolved. Some examples include a minicomputer-based system with an interface file with seven records that took more than 12 hours to process, and an online shop floor control system with a 45-minute response time for some transactions.

Planning-Technical Factors

This last section targets two factors that may lead to systems failure. They are experimenting with new technology and technical architecture (i.e., designing the system independent of technical considerations).

Experimenting with New Technology

Experimenting with new technologies is not a problem unless managers take a mainstream business systems development project and jeopardize it by experimenting with unproven technology. The following example illustrates this problem with a data base system.

Example 7

In the midst of the design of the large online materials management system, a hardware vendor suggested to a client that an advanced data base system it was working on would solve the client’s data base tuning and performance problems.

This data base system relied on transferring part of the data base search logic to the disk read heads, which would allow it to search an entire disk for unstructured information very rapidly without draining the CPU resources. One of the authors pointed out that it would be useful for unstructured queries, but the application being designed was for designated transactions that knew which data base records they required. The vendor persisted and sent senior designers of the product to the client to argue the case.

Fortunately for the client, the vendor’s own statistics provided evidence that this product would not help the performance of the application and indeed could hinder it significantly. It seemed as more users got on this system and began cueing up unstructured queries, the system degraded exponentially. Although this particular client was spared the expense and distraction of this technical experimentation, another client (in the same city) purchased the system to use it for transaction processing and accessing unstructured queries. These unstructured queries so degraded the transaction processing that a separate machine had to be set up to provide queries on a non- integrated, standalone basis.

Technical Architecture

Not too long ago, it was popular to design systems independent of their technical architecture. The intention was to prevent knowledge of technical details from biasing the best functional solution. However, this does not work well, as shown in the following example.

Example 8

Analysts familiar only with minicomputer architectures were trying to develop an application for a mainframe environment. In this case, a minicomputer architecture would not work because in a mainframe online architecture, information is keyed into the terminal where it is buffered and stored until the user completes a full screen of information. Then, the user presses a send key, and the entire screen’s worth of information is sent as one block to the mainframe.

This initiates the application program long enough to process the incoming message, perform any necessary data base updates, and format and send another screen of information to the user. The application program then effectively terminates. In a minicomputer online architecture, however, the application program is constantly active when the user is at the workstation. Every time the user interacts with the screen, the application program responds.

In one of its late design reviews of this project, management noted that the user design implied a minicomputer architecture; that is, interaction with the CPU after entry of every field. Not only was this design used to create error messages, but also to actually change the layout of the screen. At this point, the analysts refused to change their design and convinced users to purchase a standalone minicomputer. This meant a need less expense to the users and gave them a nonintegrated, standalone system.

Conclusion

Structural and civil engineering made some of their great strides when they systematically studied failed structures and then incorporated lessons from those investigations into their methods, standards, and approaches. The information systems industry is still in the early stages of a similar evolution.

It is as if it is just beginning to investigate collapsing bridges and broadly categorize the failures (e.g., structural problems, weakness in materials, and unanticipated environmental forces such as flooding). Systems failures are commonplace. Heuristic advice about how to prevent systems failure once a project is underway is less common. The 15 project-risk factors, identified on the systems failure risk framework, and the case examples illustrating how each of these factors can con tribute to project failure, are designed to help Information Systems managers understand and control their own systems development projects.

Using this framework as a guide, Information Systems managers can broaden their perspective on the sources of potential problems, and in so doing, prevent some of the unnecessary project failures they currently face.

 


David McComb is president of First Principles Inc, a consulting firm that specializes in the application of object-oriented technology to business systems. Previous to founding the company, McComb was a senior manager for Andersen Consulting, where for 12 years he managed major systems development projects for clients on six continents. He received a BS degree and an MBA from Portland State University. Jill Y. Smith is an assistant professor of MIS in the College of Business Administration at the University of Denver. She obtained a PhD in business computer information systems from the University of North Texas. She is also a research associate in UNT’s Information Systems Research Center.

Notes

1. J. Rothfeder, “It’s late, costly, incompetent—but try firing a computer system,” Business Week (November 7, 1988), pp 164—165.

2. D. Kull, “Anatomy of a 4GL Disaster,” Computer Decisions (January 11, 1986), pp 58—65.

3. Rothfeder.

4. An Evaluation of the Grand Design Approach to Developing Computer Based Application Systems, Information Resources Management Service, US General Services Administration (September 1988).

Written by Dave McComb

Deplorable Software

Deplorable Software

Why can’t we deploy software as well we did fifty years ago?

The way we build and deploy software is deplorable. The success rate of large  software projects is well under 50%. Even when successful, the capital cost is hideous. In his famous “Mythical Man Month,” Frederick Brooks observed that complexity comes in two flavors: essential (the complexity that comes from the nature of the problem) and accidental (the complexity that we add in the act of attempting to solve the problem). He seemed to be suggesting that the essential portion was unavoidable (true) and the larger of the two. That may have been true in the 60’s but I would suggest that most of what we deal with now is complexity in the solution.

Let’s take a mundane payroll system as a case in point. The basic functionality of a payroll system has barely changed in 50 years. There have always been different categories of employees (exempt and non), different types of time (regular, overtime, hazardous, etc.), different types of deductions (before and after tax), different types and jurisdictions of taxes (federal, state, local), various employer contributions (pension, 401K, etc.), and all kinds of weird formulas for calculating vacation accrual. There have always been dozens of required government reports, and the need to print checks or make electronic deposits. But 50 years ago, with tools that today look as crude a flint axe, we were able in a few dozen man months to build payroll systems that could pay thousands of employees. Nowadays our main options are either to pay a service (hundreds of thousands per year in transaction costs for thousands of employees) or implement a package (typically many millions of dollars and dozens to hundreds of person years to get it implemented).

I’m not a Luddite. I’m not pining for the punched cards. But I really do want an answer to the question: what went wrong? Why have we made so little progress in software over the last 50 years?

Written by Dave McComb

Interested in a Solution? Read Dave McCombs, “Software Wasteland”

A Minimalist Upper Ontology

Gist is designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and the least amount of ambiguity, known as a minimalistMinimalist Upper Ontology upper ontology.

A title guaranteed to scare off just about everyone; if you’re not familiar with work on upper ontologies, the title is just opaque. If you are familiar, you’ll likely think that the combination of “minimalist” with “upper ontology” is an oxymoron. So, now that I’ve gotten rid of all my audience, I can probably say just about anything. And will. Let’s review our position here. For two systems to communicate they must commit to a common ontology. It doesn’t matter how elegant or clever your ontology is, if no one else shares it, you don’t participate in anything broader than your own ontology. Given that there are three main positions:

  • Wait until you want to integrate, and then build a bridge ontology. This works, but is numerically exhaustive if you have a lot of other ontologies to link to.
  • Integrate on a topic by topic basis. Use a set of special purpose ontologies to link up. This is a reasonable strategy and works for a lot of things (geography for instance).
  • Commit to an upper ontology early. If you commit to a very broad upper ontology, you are conceptually linked to anyone else who does.

For some that third strategy is very appealing. And there are some options to choose from here, most notably Cyc and SUMO. But, there is also a dark side. Any time you commit to an ontology, you agree to be bound by all the assertions made in that ontology. If nothing else, you need to review them, understand them, and determine whether committing to them will cause problems.

As a result, the most popular shared ontologies to date have been narrow scope upper ontologies, such as Dublin Core for documents, FoaF for contact lists and interests, and RSS for news feeds. What they all share is a small set of concepts, and relatively few constraints. I have postulated, built and will present what I call a “minimalist upper ontology” called, Gist. Gist is very broad in scope, comparable to the large upper ontologies (in this case I am trying to cover commercial information systems, so most of the corporate and government systems, but not games, compilers, embedded or scientific systems). But I have tried to mimic the size of the more popular ontologies: there are about 50 concepts in this ontology.

I believe there are immediate benefits for projects adopting it just to remove ambiguity from their definitions. But longer term I think it sets a basis for much broader scale cooperation. I think I’m on to something here. And it will only be of value if it is shared. The Gist ontology is available for free download at https://www.semanticarts.com/gist/

 

Necessary and Sufficient

What’s the real difference between necessary and sufficient?

We just completed another training class, and like they say, “no one learns more than the instructor.” In this case the blindingly obvious and yet elusive pattern that revealed itself was the separation of the sufficient from the necessary. Until last week, while we had an intellectual understanding of the distinction between “necessary” and “necessary and sufficient” (and a very tenuous grip on sufficient but not necessarily necessary), we weren’t using the distinction consistently in our designs.

In the course of discussion, prompted by some questions in the class and elaborated in the bar (thank god for cocktail napkins) we were able to tease out the patterns of “sufficient” (technically a superclass of a restriction) from necessary (a subclass of a restriction), and to line them up with some design patterns. “Sufficient” is essentially the “rule in” pattern. For instance, having a child who is a human is sufficient to make you a human. But of course it is not necessary. Having a biologicalMother who is an animal is necessary as a Person, but not sufficient. I’m going back to gist and factoring out the necessary from the sufficient.

Written by Dave McComb

It Isn’t Architecture Until It’s Built

It’s our responsibility as architects to make sure our work is implemented. We’ve been dealing a lot lately with questions about what makes a good architecture, what should be in an architecture, what’s the difference between a technical architecture and an information architecture, etc. But somewhere along the line we failed to emphasize perhaps one of the most important points in this business.

“It isn’t architecture until it’s built.” While that seems quite obvious when it’s stated, it’s the kind of observation that we almost need to have tattooed or at least put on our wall where we won’t forget it. It’s very easy to invest a great deal of time in elegant designs and even plans. But until and unless the architecture gets implemented it isn’t anything at all; it’s just a picture. What does get implemented has an architecture. It may not be very good architecture. It may not lend itself to easy modification or upgrade or any of the many virtues that we desire in our “architected” solutions. But it is architecture.

So the implication for architects is that we need to dedicate whatever percent of our time is necessary to ensure that the work gets implemented. It’s really very binary. You belong to an architectural group. Maybe you are the only architect, or maybe there are five people in your group. In either case, if your work product results in a change to the information architecture the benefits can be substantial. Almost any architectural group could be justified by a 10 or 20% improvement. Frankly, in most shops a 50 to 90% improvement in many areas is possible. So on the one side, if a new architecture gets adopted at all it’s very likely to have a high payback. But the flip side is that a new architecture, no matter how elegant, is not worth anything if it’s not implemented and the company would be acting rationally if it terminated all the architects.

The implication is that as architects we need to determine the optimal blend between designing the “best architecture” and investing our time in the various messy and political activities that ensure that an architecture will get implemented. These range from working through governance procedures to making sure that management is clear about a vision to continually returning to the cost-benefit advantages, etc. The specifics are many, and varied. In many organizations you may be lucky enough that you may not have to invest a great deal of your time to get a new architecture and implement it. Perhaps you’re fortunate enough to have insightful leadership or a culture that is eager to embrace a new architecture. If that’s the case, you might get away with spending 10 or 20% of your time ensuring that your architecture is getting implemented and spend the vast majority on developing, designing and enhancing the architecture itself.

However, if you’re like many organizations, life for the architect will not be that easy. You might find it profitable to spend nearly half your time in activities that are meant to promote the adoption of the architecture. Certainly you should never pass up an opportunity to make a presentation or help goad a developer along. Indeed, the importance is so great that given an opportunity to present you would do well to invest disproportionately in the quality of the presentation, as a perfunctory presentation about the status or a particular technical standard is not likely to move developers or management to adopt it and you may need to return to the theme over and over again.

As someone once pointed out to me, in matters such as this the optimal amount of communication is to over communicate. The rule is when you’re absolutely certain that you’ve communicated an idea so many times, so thoroughly, and so exhaustively that it just is not possible that anyone could tolerate hearing it any more, that’s probably just about right. Experience says that when we think we’ve communicated something thoroughly and repeatedly three-fourths of the audience has still not internalized the message. People are busy and messages bounce off them and need to be repeated over and over and over. And you’ll find that each part of your audience will accept and internalize a message in a different way from a different presentation and at different rates. I’m continually amazed when I get some feedback with a particular stakeholder at some meeting. The coins finally drop and they become advocates for some position we’d taken. In some cases I’ve gone back and realized that we’ve presented it many times to that person and somehow finally one time it took. In other cases we realized that, in fact, we hadn’t presented already it to that individual. We thought we’d covered everyone but they weren’t in certain meetings or we repeat something so many times we think everybody must have heard it and, in fact, that’s not the case at all.

In closing, I’d like to recommend that every architect make a little plaque and put it near their desk that says: “It isn’t architecture until it’s built.” That might help you decide what you are going to do tomorrow morning when you come into work.

Written by Dave McComb

Event-Driven Architecture

Event-driven architecture as the latest buzzword in the enterprise architecture space.

If you’ve been reading the trade press lately, you no doubt have come across the term event-driven architecture as the latest buzzword in the enterprise architecture space.

So you dig about to find out just what is this event-driven architecture. And if you dig around a bit, you’ll find that event-driven architecture (EDA) is an application architecture style that is defined primarily by the interchange of real-time or near-real-time messages or events.

Astute readers of this web site, our white papers, attendees at our seminars, and of course our clients, recognize that this is exactly what we have been espousing for years as to what a good service oriented architecture looks like. You may recall our Enterprise Message Modeling architecture that prominently featured publishing. Event analysis was to define key messages being sent from application to application. You may recall our many exhortations to use “publish and subscribe” approaches for message dispatch whenever possible. You may recall us relying on events to populate managed replicated stores for just this purpose.

So, you might ask, why does the industry need a new acronym to do what it should have been doing all along?

First, a bit of history. In the 1960s MRP (Material Requirements Planning) was born. To the best of my knowledge, the first commercial implementation was at the GE Sylvania television plant. The system started from the relatively simple idea that a complex Bill of Material could be exploded and time phased to create a set of requisitions for either inventory parts or purchased parts. But these early systems went considerably beyond that and “closed the loop,” checking inventory, lead times, etc. After the successes of these early systems, a number of packaged software vendors began offering MRP software. However, to meet the common denominator and make the product as simple as possible, these products very often did not “close the loop;” they did not factor in changes in demand to already existing schedules, etc. Then a mini-industry, APICS, the American Production Inventory Control Society, sprang up to help practitioners deal with these systems. What they soon proposed was that these MRP systems needed to be “closed loop.” Sure enough, a few vendors did produce “closed loop” systems. This created a marketing problem. The response was MRPII and a change in the acronym; it now stood for Manufacturing Resource Planning.

“MRPII is what everyone needs.” And most of the education and marketing was about the shortcomings of the earlier MRP systems. Of course, the earlier MRP systems were, for the most part, just bad implementations, not something that was more primitive, in the way that we look at Paleolithic art.

And so it is with SOA. Apparently what has happened is that the Web services movement has become associated with service oriented architecture. However, most practitioners of Web services are comfortable using Web services as a simple replacement for the remote procedure call (RPC). As a result, many organizations are finding their good intentions of SOA being sucked down into a distributed request/reply environment, which is not satisfying the issues the architecture was meant to address. Nor is it delivering on the promises of the architecture: loose coupling, and the commoditization of shared services.

Perhaps it’s inevitable we’ll have to deal with new acronyms like EDA. But if you’ve been tuned in here for a while, think of EDA as SOA done right.

Written by Dave McComb

 

Strategy and Your Stronger Hand

Those of us in the complex sale sector need to be aware that volume operations from adjacent marketplaces will soon enter ours.

The December 2005 issue of the Harvard Business Review has excellent articles by two of my favorite business authors, Geoffrey Moore (“Strategy and Your Stronger Hand“) and Clayton Christiansen (“Marketing Malpractice: The Cause and the Cure,” which is applicable as we start looking at commercializing Semantic Technology).

Moore’s article has many fresh insights; chief among them is that companies have a dominant business model. The model does not depend on the industry they are in, nor their age or size. He likens this to our dominant “handed-ness” and as the editor pointed out on the editorial page, “It’s easier to convert a shortstop into an outfielder than it is to change a southpaw into a righty.”

Some firms’ dominant model is “volume operations” and for others it is “complex systems.” The first relies on many customers, brands, advertising, channels and compelling offers. The latter relies on targeted customers and the integration of third party products into total solutions. For each the grass often looks greener in the other model, but almost no business succeeds when they attempt to change models.

The rhythm of most high tech sectors is that the complex sale companies forge new territories and solve unique customer problems. The volume companies come in later and try to commoditize the solution. To survive, the complex sale companies need to do two things simultaneously: defend, for as long as possible, the position they have already won, and move up the solution chain and incorporate the newly commoditized components into an even more interesting solution.

The one thing they need to avoid is trying to convert their own early wins into volume opportunities. What does this have to do with semantics? We are just beginning the commercial roll out of this technology. We will have all the fits and starts of any new high tech sector. We have an opportunity to be a bit more self aware.

Those of us in the complex sale sector need to be aware that volume operations from adjacent marketplaces will soon enter ours. We need to be continually vigilant about incorporating rather than competing, and moving on up the solution chain. Consumers of this technology have the opposite challenge: how to recognize which aspects of their problems require “complex” solutions and which aspects are ripe to be solved with “volume” solutions.

 

The Zachman Framework

Shortly after you start your inquiry about software architecture, or enterprise architecture as it is often called, you will come across the Zachman Framework.

The Zachman Framework is a product of John Zachman who has been championing this cause for at least 15 years, first with IBM and then on his own. As with so many things in this domain, we have an opinion on the Zachman Framework and are more than willing to share it.

What Is the The Zachman Framework?

First though, let’s describe just what the Zachman Framework is. John Zachman believes, as we do, that software is a human created artifact of large-scale and as such we may learn a considerableEnterprise Architecture: The Zachman Framework amount from the analogy between software and other large-scale artifacts of human creation. In the early days of software development, there were many who felt that perhaps software was a creative activity more akin to writing or artistry or perhaps craftsmanship on a small-scale.

However, this idea has mostly faded as the scale and interconnectedness of the pieces has continued to increase. Some of John’s early writings compared software or enterprise architecture to the architectures needed to support airframe manufacturing, aircraft, rockets and the like. His observation was that in order to deal with the complexity of the problem in these domains, people have historically divided the problem into manageable pieces. The genius of John’s approach and his framework has been in the orthogonality of the dimensions of which he divided the framework.

The Zachman Framework is displayed as a matrix. However, John is careful to point out that it is not a “matrix” but a “schema” and should not be extended or modified, as in his belief it is complete in its depth and breadth.

The orthogonal dimensions referred to above are shown in rows and columns here. In the rows are the points of view of the various stakeholders of the project or the enterprise. So for instance, at the highest level is the architecture from the point of view of the owner of the project or the enterprise and as we transform the architecture through the succeeding rows we gradually get to a more and more refined scope, as would be typical of the people who need to implement the architecture or eventually the products of the architecture. In a similar way, the columns are orthogonal dimensions and in this case, John refers to them as the interrogatories So, each column is an answer to one of Rudyard Kipling’s six able servants: who, what, when, where, how, and why. Each column is a different take on the architecture, so for instance, the “what” column deals with information, the “things”about which the system is dealing.

In the aircraft analogy, it would be the materials and the parts. Likewise, the “how” column refers primarily to functions or processes; the “where” to the distribution of networking; the “when” to scheduling cycle times and workflow; the “who” to the people and organizations involved in each of the processes; and the “why” to strategy and eventually down to business rules.

Behind this framework then are models which allow you to describe an architecture or any artifact; a specific design of a part of a product or database table, or whatever, within the domain of that cell. Many people have the misperception that at the high-level there is less detail and that you add detail as you come down the rows. As John is very fond of pointing out, you need “excruciating” detail at every level and what is occurring at the transition from row to row is the addition of implementation constraints that make the thing buildable.

John has been a tireless champion of this cause, and from that standpoint we have him to thank for pointing out that this is an issue, and furthermore for championing it and keeping it in the forefront of discussion for a long, long period of time. He’s been instrumental in making sure that senior management understands the importance and the central role of enterprise architecture.

What the Zachman Framework Is Not

At this point though, we need to point out that the Zachman Framework is not an architecture. And the construction of models behind the framework is not, in and of itself, an architecture. It may be a way to describe an architecture, it may be a very handy way for gathering and organizing the input you need into an architectural definition project, but it is not an architecture nor is it a methodology for creating one. We believe the framework is an excellent vehicle for explaining, communicating, and understanding either a current architecture or a proposed architecture.

However, it is our belief that a software architecture, much like a building architecture or an urban plan, is a created and designed artifact that can only be described and modeled after it has been created and that the act of modeling it is not the act of creating it. So in closing, to reconcile our approach with the Zachman Framework we would say that firstly, we have a methodological approach to creating enterprise software architecture. Secondly, we have considerable experience in actually performing this and creating architectures that people have used to develop and implement systems. Thirdly, these architectures that we have designed/developed can be modeled, described, and communicated using the Zachman Framework. But that does not mean that they were, or in our opinion, even could be created through a methodological modeling process as suggested by the Zachman Framework.

Architecture and Planning

“Action without planning is folly but planning without action is futile.”

In this write-up, we explore the intimate connection between architecture and planning. At first blush, they seem to be completely separate disciplines. On closer examination, they appear to be twoArchitecture and Planning sides of the same coin. But in the final examination, we find that they are intimately intertwined but still separate and potentially independent. The motivation for this paper was an observation that much of our work deals with system planning of some variety. And yet, there is virtually nothing on a web site on this topic.

On one level that may be excusable. There is nothing drastically new about our brand of planning that distinguishes it from planning as it has been practiced for decades. On the other hand, system architectures typically are new and evolving and there are new observations to be made. But there’s more to it than that. We have so baked planning into our architectural work that we no longer notice that it’s there. This paper is the beginning of an attempt to extricate the planning and describe it as a sub discipline of its own. Are architecture and planning the same thing? Can we have one without the other? This is where we begin our discussion.

Certainly, we can have planning without architecture. Any trivial planning is done without architecture. We can plan a trip to the store or a vacation without dealing with architecture. We can even do a great deal of business planning, even system planning, as long as the implicit assumption is that the new projects will continue using any existing architecture. So certainly, we can have planning without architecture. But can we have architecture without planning? Well, certainly it’s possible to do some architectural work without planning.

There are two major ways this can come to be. One is that we can allow developers to develop whatever architecture they want without subjecting it to a planning process. The end product of this is the ad hoc or accidental changes that so characterize the as built architectures we find. The other way, which is as common, is to allow an architectural group to define an architecture without requiring that they determine how we get from where we are to where we want to be. Someone once said, “Action without planning is folly but planning without action is futile.” The architect who does architectural work without doing any planning is really just participating in an exercise in futility.

An intentional architecture requires a desired “to be” state, where some aspect of software development, maintenance or operation is better than it currently is. There are many potential aspects to the better state in the “to be” architecture: it could be less risky, it could be more productive, it could scale better, it could be more flexible, it could be easier for end-users to use, it could be more consistent, etc.

What they all share is that it is not the same as what exists now and in order to migrate from the “as is” to the “to be” requires planning. In the nineties, we seemed able to get away with a much more simplistic view of planning. “Rip and replace” was the order of the day once you determined what the target architecture looked like. Most organizations now have far too much invested in their legacy systems to contemplate a “rip and replace” strategy to improve either their architectures or their applications. As a result, the onus is on the architects to determine incremental strategies for shifting the existing architecture to the desired one. The company must continue to run through the potentially long transition period.

The constraints of the many interim stages of the evolving architecture and applications create many challenges for the planner. In some ways, it’s much like the widening of the heavily trafficked highway: it would be quite simple to widen it, if we could merely get all this traffic off of it but given that we can’t, there is often an extremely elaborate series of detours that each has to be planned, implemented and executed. In conclusion, I think we can see that architecture desperately needs planning. Indeed, the two are inseparable. While planning can certainly live on in the absence of architecture, architecture will not make any meaningful progress in any established company without an extreme commitment to planning.

By Dave McComb

Response Time Quanta

How do we perceive software response time? (I’m indebted to another author for this insight, but unfortunately I cannot credit him or her because I’ve lost the reference and can’t find it either in my pile of papers I call an office, nor on the Internet. So, if anyone is aware whose insight this was, please let me know so I can acknowledge them.)

Basic Thesis

  • In most situations, a faster response from a system (whether it is a computer system or a human system) is more desirable than a slower one.
  • People develop strategies for dealing with their experience of and expectation of response times from systems.
  • Attempts to improve response time will not even be perceived (and therefore will be effort wasted) unless the improvement crosses a threshold to where the user changes his or her strategy.

These three observations combine to create a situation where the reaction to response time improvement is not linear: a 30% improvement in response time may produce no effect, while a 40% improvement may have a dramatic effect. It is this “quantum-like” effect that gave rise to the title.

First Cut Empirical Model – No Overlaps

Our first cut of the model lumps each response into a non-overlapping range. As we’ll observe later, it is not likely that simple, however, it is surprising how far you can get with this.

Quanta Name Response timeExampleUser perceptionUser response/ strategy
SimultaneousLess than 1/10th of a secMouse cursor delay on a fast system, selection highlight, turning on an incandescent light bulbUsers believe that the two things are the same thing. That there is no indirection. Moving the mouse is moving the cursor, that the click directly selects the item and that the switch turns on the lightTransparency. Users are not aware there is an intermediary between their action and the result
Instant1/10th – ½ secondScrolling, dropping physical objectBarely perceptible difference between the stimulus and the response, but just enough to realize the stimulus causes the effect.Users are aware but in control. Their every action is swiftly answered with a predictable response. No strategy required.
Snappy/ Quick ½ – 2 secondsOpening a new window, pulling a drop down list, turning on a fluorescent lightMust pay attention, "did I click that button?" (Have you ever spun the knob on a bedside lamp in a hotel, thinking it wasn't working, when you were just too fast for the fluorescent?)Brief pause, to prevent initiating the response twice. Requires conscious attention to what you are doing, which distracts from the direct experience.
Pause2–10 secondsA good web site, on a good connection. The time for someone to orally respond to a questionI have a few seconds to focus my attention elsewhere. I can plan what I'm going to do next, start another task etc. Frustration if it's not obvious the activity is in progress (hourglass needed).Think of or do something else. Many people now click on a web link, and then task switch to another program, look at their watch or something else. This was the time when data entry people would turn the page to get to the next document.
Mini Task 10 – 90 secondsLaunching a program, shutting down, asking for someone to pass something at the dinner tableThis task is going into the background until it is complete. Time to start another task (but not multiple other tasks).Time for a progress bar.You're obligated to do something else to avoid boredom. Pick up the phone, check your todo list, engage in conversation, etc.
Task90 seconds – 10 minutesA long compile, turning on your computer, rewinding a video tapeNot only do I start another task of comparable length, I also expect to have some notification that the first task is complete (a dialog box, the click the video makes).This is where the user starts another task, very often changing context (leaving the office, getting on the phone, etc.), however, the second task may be interruptible when the first task finishes.
Job10 – 60 minutesVery long compile, do a load of laundryJob is long enough that it is not worth hanging around until it is complete.Plan ahead for this, do not casually start a process that will take this long until you have other filler tasks planned (lunch, a meeting, something to read, etc.). Come back when you're pretty sure it will be done
Batch process1 – 12 hoursOld-fashioned MRP or large report run, airplane flight.Deal with the schedule more than monitoring the actual event in progress.Schedule these.
Wait½ – 3 daysResponse to email, Reference check call back, Dry cleaning,I potentially have too many of these at once. I'll lose track of them if I don't write them down.Todo lists
Project3 days – 4 monthsSoftware Project, Marketing campaign, GardeningThis is too long to wait to find out what is happening.Active statusing at periodic intervals

My contention is that once a user recognizes a situation and categories it into one of these quanta, they will adopt the appropriate strategy. For many of the strategies they won’t notice if the response time has improved, until and unless it improves enough to cause them to change strategies. Getting a C++ compile time down from 4 minutes to 2 minutes likely won’t change anyone’s work habits, but going to a Pause or Snappy turnaround, like in a Java IDE, will. In many cases the strategy obviates any awareness of the improvement. If I drop my car at the car wash before lunch and pick it up afterward, I’ll have no idea if they improved the throughput such that what used to take 40 minutes now only takes 15. However a drive-through that only takes 10 minutes might cause me to change how I do car washes.

Overlapping Edges

While I think the quantum effect is quite valid, I don’t believe that the categories are quite as precise as I suggested, and I think they may vary as someone is moving up and down the hierarchy. For instance a 2.5 second response time may in some contexts be considered snappy.

Implications

I think this has implication for systems design as well as business design. The customer facing part of a business presents a response time to the customer. The first implication is that in any project (software, hardware or network improvement, or business process reengineering) there should be a response time goal, with a reason for that, just as valid as any other requirement of a project. Where an improvement is desired, it should require that the improvement cross at lease one quanta threshold and the benefit ascribed from doing so be documented. IBM made hay in the 70’s with studies showing that dramatic productivity gains from sub-second response time on their systems more than made up for the increased cost of hardware. What was interesting was that the mathematical savings from the time shaved off each transaction wasn’t enough to justify the change, but that users worked with their systems differently (i.e., they were more engaged) when the response time went down. Some implications for… call center response time: if you expect it will be a “job” [> 10 minutes] you will plan your call much more carefully. on line ordering: when products arrive first thing the next morning and people expect that, they deal with ordering, and setting up reminders that somethings will arrive. installation programs: unless it is a “mini task” and can be done in-line (like getting a plug-in) you need to make sure that all the questions can be answered up front and the install can then run in the background. Many writers of installation programs wrongly believe that asking the user questions throughout the installation process will have them think the installation is snappy. Hello — nobody thinks that, they expect it to be a “task” and would like to turn their attention elsewhere. However, if they do something else and come back and find the install stopped because it was waiting for more info from the user, they get pissed (it was supposed to be done when they got back to it.)

Written by Dave McComb