Dave McComb

Architecture and Planning

August 24, 2021July 11, 2014 by Dave McComb

“Action without planning is folly but planning without action is futile.”

In this write-up, we explore the intimate connection between architecture and planning. At first blush, they seem to be completely separate disciplines. On closer examination, they appear to be two sides of the same coin. But in the final examination, we find that they are intimately intertwined but still separate and potentially independent. The motivation for this paper was an observation that much of our work deals with system planning of some variety. And yet, there is virtually nothing on a web site on this topic.

On one level that may be excusable. There is nothing drastically new about our brand of planning that distinguishes it from planning as it has been practiced for decades. On the other hand, system architectures typically are new and evolving and there are new observations to be made. But there’s more to it than that. We have so baked planning into our architectural work that we no longer notice that it’s there. This paper is the beginning of an attempt to extricate the planning and describe it as a sub discipline of its own. Are architecture and planning the same thing? Can we have one without the other? This is where we begin our discussion.

Certainly, we can have planning without architecture. Any trivial planning is done without architecture. We can plan a trip to the store or a vacation without dealing with architecture. We can even do a great deal of business planning, even system planning, as long as the implicit assumption is that the new projects will continue using any existing architecture. So certainly, we can have planning without architecture. But can we have architecture without planning? Well, certainly it’s possible to do some architectural work without planning.

There are two major ways this can come to be. One is that we can allow developers to develop whatever architecture they want without subjecting it to a planning process. The end product of this is the ad hoc or accidental changes that so characterize the as built architectures we find. The other way, which is as common, is to allow an architectural group to define an architecture without requiring that they determine how we get from where we are to where we want to be. Someone once said, “Action without planning is folly but planning without action is futile.” The architect who does architectural work without doing any planning is really just participating in an exercise in futility.

An intentional architecture requires a desired “to be” state, where some aspect of software development, maintenance or operation is better than it currently is. There are many potential aspects to the better state in the “to be” architecture: it could be less risky, it could be more productive, it could scale better, it could be more flexible, it could be easier for end-users to use, it could be more consistent, etc.

What they all share is that it is not the same as what exists now and in order to migrate from the “as is” to the “to be” requires planning. In the nineties, we seemed able to get away with a much more simplistic view of planning. “Rip and replace” was the order of the day once you determined what the target architecture looked like. Most organizations now have far too much invested in their legacy systems to contemplate a “rip and replace” strategy to improve either their architectures or their applications. As a result, the onus is on the architects to determine incremental strategies for shifting the existing architecture to the desired one. The company must continue to run through the potentially long transition period.

The constraints of the many interim stages of the evolving architecture and applications create many challenges for the planner. In some ways, it’s much like the widening of the heavily trafficked highway: it would be quite simple to widen it, if we could merely get all this traffic off of it but given that we can’t, there is often an extremely elaborate series of detours that each has to be planned, implemented and executed. In conclusion, I think we can see that architecture desperately needs planning. Indeed, the two are inseparable. While planning can certainly live on in the absence of architecture, architecture will not make any meaningful progress in any established company without an extreme commitment to planning.

By Dave McComb

Response Time Quanta

August 24, 2021June 22, 2014 by Dave McComb

How do we perceive software response time? (I’m indebted to another author for this insight, but unfortunately I cannot credit him or her because I’ve lost the reference and can’t find it either in my pile of papers I call an office, nor on the Internet. So, if anyone is aware whose insight this was, please let me know so I can acknowledge them.)

Basic Thesis

In most situations, a faster response from a system (whether it is a computer system or a human system) is more desirable than a slower one.
People develop strategies for dealing with their experience of and expectation of response times from systems.
Attempts to improve response time will not even be perceived (and therefore will be effort wasted) unless the improvement crosses a threshold to where the user changes his or her strategy.

These three observations combine to create a situation where the reaction to response time improvement is not linear: a 30% improvement in response time may produce no effect, while a 40% improvement may have a dramatic effect. It is this “quantum-like” effect that gave rise to the title.

First Cut Empirical Model – No Overlaps

Our first cut of the model lumps each response into a non-overlapping range. As we’ll observe later, it is not likely that simple, however, it is surprising how far you can get with this.

Quanta Name	Response time	Example	User perception	User response/ strategy
Simultaneous	Less than 1/10th of a sec	Mouse cursor delay on a fast system, selection highlight, turning on an incandescent light bulb	Users believe that the two things are the same thing. That there is no indirection. Moving the mouse is moving the cursor, that the click directly selects the item and that the switch turns on the light	Transparency. Users are not aware there is an intermediary between their action and the result
Instant	1/10th – ½ second	Scrolling, dropping physical object	Barely perceptible difference between the stimulus and the response, but just enough to realize the stimulus causes the effect.	Users are aware but in control. Their every action is swiftly answered with a predictable response. No strategy required.
Snappy/ Quick	½ – 2 seconds	Opening a new window, pulling a drop down list, turning on a fluorescent light	Must pay attention, "did I click that button?" (Have you ever spun the knob on a bedside lamp in a hotel, thinking it wasn't working, when you were just too fast for the fluorescent?)	Brief pause, to prevent initiating the response twice. Requires conscious attention to what you are doing, which distracts from the direct experience.
Pause	2–10 seconds	A good web site, on a good connection. The time for someone to orally respond to a question	I have a few seconds to focus my attention elsewhere. I can plan what I'm going to do next, start another task etc. Frustration if it's not obvious the activity is in progress (hourglass needed).	Think of or do something else. Many people now click on a web link, and then task switch to another program, look at their watch or something else. This was the time when data entry people would turn the page to get to the next document.
Mini Task	10 – 90 seconds	Launching a program, shutting down, asking for someone to pass something at the dinner table	This task is going into the background until it is complete. Time to start another task (but not multiple other tasks).Time for a progress bar.	You're obligated to do something else to avoid boredom. Pick up the phone, check your todo list, engage in conversation, etc.
Task	90 seconds – 10 minutes	A long compile, turning on your computer, rewinding a video tape	Not only do I start another task of comparable length, I also expect to have some notification that the first task is complete (a dialog box, the click the video makes).	This is where the user starts another task, very often changing context (leaving the office, getting on the phone, etc.), however, the second task may be interruptible when the first task finishes.
Job	10 – 60 minutes	Very long compile, do a load of laundry	Job is long enough that it is not worth hanging around until it is complete.	Plan ahead for this, do not casually start a process that will take this long until you have other filler tasks planned (lunch, a meeting, something to read, etc.). Come back when you're pretty sure it will be done
Batch process	1 – 12 hours	Old-fashioned MRP or large report run, airplane flight.	Deal with the schedule more than monitoring the actual event in progress.	Schedule these.
Wait	½ – 3 days	Response to email, Reference check call back, Dry cleaning,	I potentially have too many of these at once. I'll lose track of them if I don't write them down.	Todo lists
Project	3 days – 4 months	Software Project, Marketing campaign, Gardening	This is too long to wait to find out what is happening.	Active statusing at periodic intervals

My contention is that once a user recognizes a situation and categories it into one of these quanta, they will adopt the appropriate strategy. For many of the strategies they won’t notice if the response time has improved, until and unless it improves enough to cause them to change strategies. Getting a C++ compile time down from 4 minutes to 2 minutes likely won’t change anyone’s work habits, but going to a Pause or Snappy turnaround, like in a Java IDE, will. In many cases the strategy obviates any awareness of the improvement. If I drop my car at the car wash before lunch and pick it up afterward, I’ll have no idea if they improved the throughput such that what used to take 40 minutes now only takes 15. However a drive-through that only takes 10 minutes might cause me to change how I do car washes.

Overlapping Edges

While I think the quantum effect is quite valid, I don’t believe that the categories are quite as precise as I suggested, and I think they may vary as someone is moving up and down the hierarchy. For instance a 2.5 second response time may in some contexts be considered snappy.

Implications

I think this has implication for systems design as well as business design. The customer facing part of a business presents a response time to the customer. The first implication is that in any project (software, hardware or network improvement, or business process reengineering) there should be a response time goal, with a reason for that, just as valid as any other requirement of a project. Where an improvement is desired, it should require that the improvement cross at lease one quanta threshold and the benefit ascribed from doing so be documented. IBM made hay in the 70’s with studies showing that dramatic productivity gains from sub-second response time on their systems more than made up for the increased cost of hardware. What was interesting was that the mathematical savings from the time shaved off each transaction wasn’t enough to justify the change, but that users worked with their systems differently (i.e., they were more engaged) when the response time went down. Some implications for… call center response time: if you expect it will be a “job” [> 10 minutes] you will plan your call much more carefully. on line ordering: when products arrive first thing the next morning and people expect that, they deal with ordering, and setting up reminders that somethings will arrive. installation programs: unless it is a “mini task” and can be done in-line (like getting a plug-in) you need to make sure that all the questions can be answered up front and the install can then run in the background. Many writers of installation programs wrongly believe that asking the user questions throughout the installation process will have them think the installation is snappy. Hello — nobody thinks that, they expect it to be a “task” and would like to turn their attention elsewhere. However, if they do something else and come back and find the install stopped because it was waiting for more info from the user, they get pissed (it was supposed to be done when they got back to it.)

Written by Dave McComb

Time Zones

August 24, 2021June 3, 2014 by Dave McComb

Reflections on low-level ontology primitives.

We had a workshop last week on gist (our minimalist upper ontology). As part of the aftermath, I decided to get a bit more rigorous about some of the lowest level primitives. One of the basic ideas about gist is that you may not be able to express every distinction you might want to make, but at least what you do exchange through gist will be understood and unambiguous. In the previous version of gist I had some low level concepts, like distance, which was a subtype of magnitude. And there was a class distanceUnit which was a subclass of unitOfMeasure. And unit of measure has a property that points to conversion factor (i.e., how to convert from one unit of measure to the base unit of that “dimension”). But what occurred to me just after the workshop is that two applications or two organizations communicating through gist could still create problems by just picking a different base (i.e., if one said their base for distance was a meter and another a foot, they have a problem).

This was pretty easily solved by going to NIST, and getting the best thinking on what these dimensions should be and what the base unit of each dimension should be. Looking at it, I don’t think there ought to be much problem with people adopting these. Emboldened, I thought I would do the same for time.

For starters, universal time seems to the way to go. However, many applications record time in local time so we need some facility to recognize that and provide an offset. Here’s where the problem came in and maybe you dear readers can help. After about an hour of searching the web the best I could find for a standard in this area is something called the tz database. While you can look up various cities, I didn’t see anything definitive on what the geographical regions are that make up each of the time zones. To make things worse, the abbreviations for time zones are not unique, for instance, there is an EST in North America and one in Australia. If anyone has a thought in this area, I’m all ears.

Semantisize

August 24, 2021May 13, 2014 by Dave McComb

Semantic technology resources

I was alerted to this site: www.semantisize.com from a comment. It’s pretty cool. You can while away a lot of time on this site which is rounding up lots of podcasts, videos, etc., all related to Semantic Technology. I got a kick out of a video of Eric Schmidt taking a question from the floor on “What is Web 3.0?” Schmidt’s answer: “I think you [the questioner] just made it up.”

Application Scope: A Fractal View

August 24, 2021May 2, 2014 by Dave McComb

The scope of an application is one of those topics that seems to be quite important and yet frustratingly slippery.

There are several very good reasons for its seeming importance. One, the scope of an application eventually determines its size and, therefore, the cost to develop and/or implement it. It’s also important in that it defines what a given team is responsible for and what they have chosen to leave undone or be left for others to complete. But it’s slippery because our definitions of scope sound good; we have descriptive phrases about what is in and what is out of scope.

But as a project progresses, we find initially a trickle and eventually a tidal wave of concerns that seem to be right on the border between what is inside and outside scope. Also, one of the most puzzling aspects of an application’s scope is the way a very similar scope description for one organization translates into a drastically different size of implementation project when applied to another organization. In this paper, we explore what scope really is, the relationship between it and the effort to implement a project, and how an analogy to fractal geometry will have some bearing and perhaps be an interesting metaphor for thinking about application scope issues.

What is Fractal Geometry?

Fractal geometries are based on recursive functions that describe a geometrical shape. For instance, in the following illustrations we see a simple geometric shape, a triangle. In the second figure, we see a shape that’s been defined by a function which says, “in the middle of each side of the triangle put another triangle; and in the middle of the side of that triangle put another, and yet another, etc.” We’ve all seen the beautiful, multicolored, spiral shapes that have been created with the Mandelbrot sets. These are all variations of the fractal geometry idea. There are several things of interest about fractal geometry besides their aesthetic attractiveness. The main thing to note in this example, and in most others like this, is that while the contained area of the shape grows a little bit with each iteration of the function or with each increase in resolution of the shape, the perimeter continues to grow at a constant rate. In this case, with each iteration, the perimeter gets 33% longer. You can see that on any one side which was three units long, we could remove the middle unit and replace it with two equal length units. So we now have four thirds of the perimeter on that one side; this is repeated on all sides, and is repeated at every level. If we were to continue to run the recursion, the detail would get finer and finer until the resolution would be such that we would no longer be able to see all the details of the edges. And yet, as we magnify the image we would see that in fact the detail was there. Fractal geometricians sometimes say that Britain has an infinitely long coastline. The explanation is that if you took a low resolution map of Britain and measured the coastline you would get a particular figure. But as you increase the resolution of your map, you find many crags and edges and inlets that the previous resolution had omitted. If you were to add them up, the total coastline would now be greater. And if you went to an even higher resolution, it would be greater again and supposedly ad infinitum. So as we see from this, the perimeter of a fractal shape is a matter of the resolution that we use to observe and measure it. Hold that thought.

Application Size

When we talk about the size of application, we are usually concerned with some aspect of that application that could change in such a way that our effort to develop or implement it would change. For instance, a common size metric for applications has been number of lines of code. A system with one million lines of code is considered in many ways ten times as large as one with 100,000 lines of code. That is a gross simplification but let’s start there. It would take ten times as long to write one million lines of code as it would to write 100,000 lines of code.

Similarly, it would take ten times as long, on average, to modify part of a system that has one million lines of code as opposed to one with 100,000 lines of code. But lines of code was a reasonable metric only when most of our systems were written by hand, in procedural code, in languages that were at least comparable in their expressiveness.

Nowadays, a given system will be built of handwritten code plus generated code plus graphically designed components, style sheets, inherited code, libraries, etc. The general concepts of size and complexity still exist, but we are more likely to find the complexity in some other areas. One of the main areas where we find complexity in information systems is the size and variety of the user interfaces. For instance, a system with 100 different user interfaces can be thought of as one-tenth as complex as one with 1000 different user interfaces.

This relationship held with the lines of code metric in the old days because each screen was written by hand; the more screens, the more lines of code. Nowadays, although the connection between the screens and the lines of code may not be as valid because much of the code is generated, there is still ten times as much complexity in designing the user interfaces, laying them out, discussing them with users, etc. Other metrics that tend to indicate larger sized applications are APIs (application programming interfaces).

An API is a function call that the system has made public and can be used by other systems. An application API could be almost anything. It could be get_ quote (order_number) or it could be post_ ledger (tran). The size of applications that support APIs tends to be proportional to the number of APIs they have published. This is partly because each API must have code behind it to implement the behavior described, and partly because the act of creating the signature and the interface requires definition, design, negotiating with other system users, and maintaining and upgrading these interfaces.

Finally, one other metric that tends to move proportionally with the size of application systems is the size of their schema. The schema is the definition of the data that is being maintained by the application. Classically, it is the tables and columns in the database. However, with object oriented or XML style systems, the schema will have a slightly different definition.

An application for a database with 10,000 items in its schema (which might be 1,000 tables and 9,000 columns in total), all other things being equal, will very often be ten times as complex as an application with 1,000 items in its schema.. This is because each of those individual attributes is there for some reason and typically has to be moved from the database to a screen, or an API, or an algorithm, and has to be moved back when it’s been changed, etc.

Surface Area v Volume

Now, imagine an application as a shape. It might be an amorphous amoeba-like shape or it might be a very ordered and beautiful snowflake-like shape that could have been created from fractal geometry. The important thing to think about is the ratio of the surface area to the volume of the application. My contention is that most business applications have very little volume. What would constitute volume in a business application would be algorithms or rules about what is allowed and permissible. And while it may first seem that this is the preponderance of what you find when you look at a business application, the truth is far from that.

Typically, 1 or 2% of the code of a traditional business application is devoted to algorithms and/or true business rules. The vast majority is dealing with the surface area itself. To try another analogy, you may think of a business system as very much like a brain, where almost all the cognitive activity occurs on the surface and the interior is devoted to connecting regions of the surface. This goes a long way towards explaining why the surface of the brain is a convoluted and folded structure rather than a smooth organ like a liver.

Returning to our business application, let’s take a simple example of an inventory control system. Let’s say in this example that the algorithms of the application (the “volume”) have to do economic order quantity analysis, in other words, calculate how fast inventory is moving and determine, therefore, how much should be reordered and when. Now, imagine this is a very simple system. The surface area is relatively small compared to the volume and would consist of user interfaces to process receipts, to process issues, perhaps to process adjustments to the inventory, and a resultant inventory listing.

The schema, too, would be simple and, in this case, would have no API. Here is where it becomes interesting. This exact same system, implemented in a very small company with a tiny amount of inventory, would be fairly small and easy to develop and implement. Now take the same system and attempt to implement it in a large company.

The first thing you’ll find is that whatever simplistic arrangement we made for recording where the inventory was would now have to be supplemented with a more codified description, such as bin and location; for an even larger company that has automated robotic picking, we might need a very precise three-dimensional location.

To take it further, a company that stores its inventory in multiple geographic locations must keep track of not only the inventory in each of these locations but also the relative cost and time to move inventory between locations. This is a non-issue in our simplistic implementation.

So it would go with every aspect of the system. We would find in the large organization enough variation in the materials themselves that would warrant changes in the user interfaces and the schema to accommodate them, such as the differences between raw materials and finished goods, between serialized parts and non-serialized parts, and between items that do and do not have shelf life. We would find the sheer volume of the inventory would require us to invent functionality for cycle counting, and, as we got large enough, special functionality to deal with count errors, to hold picking while cycle count was occurring, etc.

Each of these additions increases the surface area, and, therefore, the size of the application. I’m not suggesting at all that this is wasteful. Indeed, each company will choose to extend the surface area of its application in exactly those areas where capital investment in computerizing their process will pay off. So what we have is the fractal shape becoming more complex in some areas and not in others, very dependent on the exact nature of the problem and the client site to which it is being implemented.

Conclusion

Once we see an application in this light, that is, as a fractal object with most of its size in or near its perimeter or its surface, we see why the “same” application implemented in organizations of vastly different size will result in applications of vastly different size.. We also see why a general description of the scope of an application may do a fair job of describing its volume but can miss the mark wildly when describing its surface area and its perimeter.

To date, I know of no technique that allows us to scope and estimate the surface area of an application until we’ve done sufficient design, nor to analyze return on investment to determine just how far we should go with each aspect of the design. My intention with this white paper is to make people aware of the issue and of the sensitivity of project estimates to fine details that can rapidly add up.

Spectrograph

August 24, 2021March 18, 2014 by Dave McComb

Last night at the EDM Council meeting, Dave Newman from Wells Fargo used spectroscopy as an analogy to the process used to decompose business concepts into their constituent parts. The more I’ve been thinking about it the more I like it. Spectrograph Last week I was staring at the concept “unemployment rate” as part of a client project. Under normal light (that is, using traditional modeling approaches) we’d see “unemployment rate” as a number, probably attach it to the thing that the unemployment rate was measuring, say the US Economy, and be done with it. But when we shine the semantic spectrometer at it, the constituent parts start to light up. It is a measurement. Stare at those little lines a bit harder: the measurement has a value (because the term “value” is overloaded, in gist we’d say it has a magnitude) and the magnitude is a percentage. Percentages are measured in ratios and this one (stare a little harder at the spectrograph) is the ratio of two populations (in this case, groups of humans). One population consists of those people who are not currently employed and who have been actively seeking employment over the past week, and the other is the first group plus those who are currently employed. These two populations are abstract concepts until someone decides to measure unemployment. At that time, the measurement process has us establish an intensional group (say residents of Autuaga County, Alabama) and perform some process (maybe a phone survey) of some sample (a sub population) of the residents. Each contacted resident is categorized into one of three sub sub populations (currently working, currently not working and actively seeking work, and not working and not actively seeking work). Note: there is another group that logically follows from this decomposition, is not of interest to the Bureau of Labor Standards, but is of interest to recruiters: working and actively seeking employment. Finally, the measurement process dictates whether the measure is a point in time or an average of several measures made over time. This seems like a lot of work for what started as just a simple number. But look at what we’ve done: we have a completely non-subjective definition of what the concept means. We have a first class concept that we can associate with many different reference points, for example, the same concept can be applied to National, State, or Local unemployment. An ontology will organize this concept in close proximity to other closely-related concepts. And the constituent parts of the model (the populations for instance) are now fully reusable concepts as well. The other thing of interest is that the entire definition was built out of reusable parts (Magnitude, Measurement, Population, Measurement Process, Residence, and Geographic Areas) that existed (in gist) prior to this examination. The only thing that needed to be postulated to complete this definition was what would currently be two taxonomic distinctions: working and seeking work. David, thanks for that analogy.Last night at the EDM Council meeting, Dave Newman from Wells Fargo used spectroscopy as an analogy to the process used to decompose business concepts into their constituent parts. The more I’ve been thinking about it the more I like it. Last week I was staring at the concept “unemployment rate” as part of a client project. Under normal light (that is, using traditional modeling approaches) we’d see “unemployment rate” as a number, probably attach it to the thing that the unemployment rate was measuring, say the US Economy, and be done with it. But when we shine the semantic spectrometer at it, the constituent parts start to light up. It is a measurement. Stare at those little lines a bit harder: the measurement has a value (because the term “value” is overloaded, in gist we’d say it has a magnitude) and the magnitude is a percentage. Percentages are measured in ratios and this one (stare a little harder at the spectrograph) is the ratio of two populations (in this case, groups of humans). One population consists of those people who are not currently employed and who have been actively seeking employment over the past week, and the other is the first group plus those who are currently employed. These two populations are abstract concepts until someone decides to measure unemployment. At that time, the measurement process has us establish an intensional group (say residents of Autuaga County, Alabama) and perform some process (maybe a phone survey) of some sample (a sub population) of the residents. Each contacted resident is categorized into one of three sub sub populations (currently working, currently not working and actively seeking work, and not working and not actively seeking work). Note: there is another group that logically follows from this decomposition, is not of interest to the Bureau of Labor Standards, but is of interest to recruiters: working and actively seeking employment. Finally, the measurement process dictates whether the measure is a point in time or an average of several measures made over time. This seems like a lot of work for what started as just a simple number. But look at what we’ve done: we have a completely non-subjective definition of what the concept means. We have a first class concept that we can associate with many different reference points, for example, the same concept can be applied to National, State, or Local unemployment. An ontology will organize this concept in close proximity to other closely-related concepts. And the constituent parts of the model (the populations for instance) are now fully reusable concepts as well. The other thing of interest is that the entire definition was built out of reusable parts (Magnitude, Measurement, Population, Measurement Process, Residence, and Geographic Areas) that existed (in gist) prior to this examination. The only thing that needed to be postulated to complete this definition was what would currently be two taxonomic distinctions: working and seeking work. David, thanks for that analogy.

The Profound Effect of Linked Data on Enterprise Systems

August 24, 2021September 12, 2013 by Dave McComb

Jan Voskuil, of Taxonic, recently sent me this white paper. It is excellent. It’s so good I put it in the taxonomic category, “I wish I would have written this”. But since I didn’t, I did the next best thing. Jan has agreed to let us host a copy here on our web site. Enjoy a succinct 18 page white paper suitable for technical and non technical audiences. Very nice explanation of why the flexible data structures of linked data are making such a profound difference in the cost of changing enterprise systems.

Download whitepaper by Jan Voskuil – Linked Data in the Enterprise

How Data Became Metadata

August 24, 2021August 26, 2013 by Dave McComb

Is it just me, or is the news that the NSA is getting off the hook on its surveillance of us because it’s just “metadata” more than a bit duplicitous? Somehow the general public is being sold this idea that if the NSA is not looking at the content of our phone calls or email, then maybe it’s alright. I’m not sure where this definition of metadata came from, but unfortunately it’s one of the first that the general public has had and it’s in danger of sticking. Our industry has not done ourselves any favors by touting cute definitions of metadata such as “data about data.” Not terribly helpful. Those of us who have been in the industry longer than we want to admit, generally refer to metadata as being equivalent to schema. So the metadata for an email system might be, something like:

sender (email address)
to (email addresses)
cc (email addresses
bcc (email addresses)
sent (datetime)
received (datetime)
read (boolean)
subject (text)
content (text)
attachments (various mimetypes)

If we had built an email system in a relational database, these would be the column headings (we’d have to make a few extra tables to allow for the multi-value fields like “to”). If we were designing in XML these would be the elements. But that’s it. That’s the metadata. Ten pieces of metadata. The NSA is suggesting that the values of the first seven fields are also “metadata” and that only the values of the last three constitute “data.” Seems like an odd distinction to me. And if calling the first seven, presumably harmless “metadata” and thereby giving themselves a free pass doesn’t creep you out, then check out this: https://immersion.media.mit.edu/ . Some researchers at MIT have created a site where you can look at your own email graph in much the same way that the NSA can. Here is my home email graph (that dense cloud above is the co-housing community I belong to and source of most of my home email traffic). Everybody’s name is on their nodes. All patterns of interaction are revealed. And this is just one view. There are potentially views over time, views with whether an email was read, responded to, how soon etc., etc. Imagine this data for everyone in the country, coupled with the same data for phone calls. If that doesn’t raise the hackles of your civil libertarianism, then nothing will. ‘Course, it’s only metadata. by Dave McComb

The re-release of DBBO: Why it’s better.

August 24, 2021July 30, 2013 by Dave McComb

I’m writing this on an airplane as I’m watching the South Park episode where the lads attempt to save classic films from their directors who want to re-release them to make them more politically correct and appeal to new audiences. (The remake of Saving Private Ryan where all the guns were replaced with cell phones, for instance). So it’s with great trepidation that I describe the motivation and result of the “re-release” of DBBO. We’ve been teaching “Designing and Building Business Ontologies” for over 10 years. We have been constantly updating, generally adding more material. What we found was that while the material was quite good, we had to rush to get through it all. Additionally, we added material that caused us to explain concepts we hadn’t yet covered in order to make a particular point. And of course there was ever more material we wanted to add. Also we were interested in modularizing the material, so that people could take less than the full course if that’s what they needed. We essentially created a dependency graph of the concepts we were covering and dramatically re-sequenced things. This shone a light on a number of areas where additional content was needed. Our first trial run was an internal training project with Johnson & Johnson. This group at J&J had already built some ontologies in Protégé and TopBraid and had training in Linked Open Data and SPARQL. So we were able to try out two aspects of the new modularity: if students had the right pre-requisites, they could start in the middle somewhere. However, with the new packaging, would it be possible to spread the training out over a longer period of time and still give the students something they could use in the short term? So with J&J we did the new days 3, 4 and 5 in two sessions separated by two weeks. I’m happy to report that it went very well. Then in May we had our first public session. We’ve decided to have days 1-3 on Wednesday – Friday and days 4-6 on the following Monday – Wednesday. Some people who really want to learn all there is to learn can power through all six sessions contiguously. There is a lot to do in the Fort Collins area, and we’ve heard the break is good for consolidating what was learned before getting back into it. Others have decided to take days 1-3 and come back later to finish up days 4-6. The course is now more logically structured, not quite as rushed (although there is a lot of material still) hard to imagine with all the hands on work, we created nearly 2,000 slides, 90% of which were new or substantially upgraded.Learn more or sign up for DBBO. Dave McComb

Semantic Tech meets Data Gov

August 24, 2021July 19, 2013 by Dave McComb

Watch out world! Eric Callmann, a vet of data governance, recently joined the Semantic Arts team as a consultant. We like his fresh and unique perspective on how to use semantic technology to help manage the mass amounts of data that could potentially drive us all mad.

We did a little Q&A with Eric to find out more about him and his take on #SemTech2013.

SA: What’s your background and how did we find you?

Eric Callmann: My background is in developing information quality programs at large information services companies. I found my way to Semantic Arts by way of a

n Information Governance Council where I met Dave McComb.

SA: What made you want to take the leap into semantics?

EC: The arena of semantics and ontologies is at the forefront of emerging, next generation tools to help us manage the larger and larger amounts of data that are being produced. Also the development of data governance programs had me thinking that as the realm of semantics grows, there is a greater need to govern and ensure high quality triples created within ontologies. I shared this with Dave and he felt the same way. As a result, I joined Semantic Arts in May and have been drinking from the semantic fire hose ever since.

SA: So we sent you to SemTech in June, what was your experience?

EC: I was fortunate to be able to attend SemTech2013. For someone who is not a diehard techie (e.g. I don’t build apps, manage databases, install hardware, etc.), rather a business person who knows how to leverage technology, the event was a way to see into the future.

SA: What do you mean by ‘seeing into the future’?

EC: What I mean by being able to see into the future is that semantic technologies is and will continue to make all of our lives on the internet and within enterprises easier to find the things we need. For example, Walmart presented their use of Semantics to improve the search experience on Walmart.com. They are leveraging publicly available information, such as Wikipedia, to help the search engine understand the context of what one is searching for. Throughout the conference there were numerous presentations about using this technology in new ways to improve analytics and develop new services. To say the least, the amount of innovation and entrepreneurship that is happening in this space is astounding. There are already new services that are being offered that use this technology such as Whisk.com that allows you to find a recipe and have your grocery list created automatically at your favorite grocery store (Note: This is only available in the UK…lucky Londoners). If Walmart is doing it, it is pretty obvious that Google and Yahoo! are leveraging this great technology in really cool ways too. More from Eric’s perspective in the future. We are excited to have Eric on board! If you have any questions for Eric, make sure to shoot him an email: [email protected]