We recently conducted a webinar on How to Build an Agile Enterprise Ontology for Dataversity. Here are links to both the live recording and slides of the webinar from May 14, 2015.

We recently conducted a webinar on How to Build an Agile Enterprise Ontology for Dataversity. Here are links to both the live recording and slides of the webinar from May 14, 2015.

We have a simple and effective way in gist to represent a wide range of physical quantities such as ‘82 kg’, ‘3 meters’ and ‘20 minutes’. Each quantity has a number and a unit, such as ‘meter’ or ‘second’. In addition to these simple units, we have unit multiplication and division to represent more complex units, e.g. for speed and acceleration. A standard speed unit is meters per second [m/s] and a standard acceleration unit is meters per second per second [(m/s)/s] or simply [m/s^2].
Physicists as well as business people like to avoid the inconvenience of working with very large or very small numbers like 1000 meters, or .00000001 meters (a trillionth of a meter). If you counted to see if the number of zeros was correct, you understand the problem. So we create units like kilometer and picometer and give them conversion factors. This works for any kind of unit (time, electric current, mass). Note that the standard units have a conversion of 1 (which in normal parlance, means there is no conversion necessary). See figure 1 for some examples.

Figure 1: Example Quantities
We also have found a need for counting units like dozen or gross. For example, a wine merchant stocks and sells cases of 12 bottles of wine, so counting in dozens is more convenient than counting single bottles of wine. What is interesting is that we can use the exact same structure for representing ‘4 dozen’ or ‘7 gross’ as we do for representing things like ‘82 kg’ and ‘20 minutes’. Take ‘4 dozen’, the number is 4, and the unit is ‘dozen’ and the conversion is 12.
In gist there is also a way to represent percentages, which we have always treated as a ratio. After all, when speaking of a percentage, there is always an explicit or implicit ratio somewhere. For example:
The units for the first example are barrels/barrels which cancel out leaving a pure number. Similarly, the units for the second example are grams/grams which again cancel out. In fact, every ratio unit that corresponds to a percentage will cancel out and leave a pure number. This means that although it may be useful to do so, we don’t need to represent gist:Percentage using a ratio unit.
Another thing that we never realized before is that, being a pure number, a percentage can be represented in the same way we represent dozen or gross. The only difference is the conversion (12 vs. .01). We can use this same structure to represent:
See figure 2 for the representational structures.

Figure 2: One structure for number units and ordinary units
Notice how ‘ 4 cm’ is very similar to ‘4 percent’:
This means we can use the same computational mechanism to perform units conversion for pure numbers like 4 dozen and 4% as we do for ordinary physical quantities like 4 cm or 82 kg.
One question remains. Whereas we can readily see that the conversion factor for kilometer is based on the standard unit of meter, and the conversion factor for hour is based on the standard unit of second, what are the conversion factors of 12, .01 and .00001 (for dozen, percent and basis point) based on? What does it mean to have a standard unit for these pure numbers with a conversion of 1?
Let’s look to see how gist represents dozen and kilometer to see if that gives us any insight.
Curiously, while ‘meter’ actually means something to us, and we know what it means to say ‘3 meters’, it strange to think what ‘3 eaches’ could possibly mean. I invite you to stare at the following table for a while and see some analogies.

Figure 3: Standard Unit for Pure Number Quanties
Then notice that:
But what is it, such that if you have 48 of them gives you the number 48? The answer is the number one: 48 x 1 = 1. So the meaning of gist:each is the number one acting as a unit. This is a mathematical abstraction. The ??’s in figure 2 stand for ‘each’ which is the standard number unit. So when you say ‘3 eaches’ it is just 3 of the number one which is just the pure number 3. As an aside, we can also say that ‘each’ is the identity element for unit multiplication and division. This is analogous to the number 1 being the identity element for multiplication and division of numbers.
Note that while conceptually they mean the same thing, syntactically gist:each is very different from the number one as a number whose datatype is say integer, or float.
Notice that for these pure numbers in convenient sized units, we are usually counting things: how many dozens, how many basis or percentage points, or how many parts per million. We refer to ‘each’ thing as ‘one’ thing being counted. So that links gist:each to the number one. Thus, despite the awkwardness of speaking of ‘3 eaches’ the names ‘Count’, ‘CountingUnit’ and ‘each’ are quite reasonable.
Finally, insofar as all instances of CountingUnits are based on the number one, and all instances of Count represent pure numbers, we can think of every CountingUnits as a degenerate unit, and we can think of gist:Count as a degenerate quantity. A ‘real’ quantity is not just a number, it has a number and has a non-numeric unit.
So in conclusion:
This blog is wholly inspired by an observation made by Dave McComb that collections and queries have an interesting relationship.
Collections frequently arise when creating enterprise ontologies. In manufacturing, there are lists of approved suppliers and lists of ingredients. In finance there are baskets of stocks making up a portfolio, and a 30-year mortgage corresponds to a collection of 360 monthly payments. In healthcare, there are lists of side effects for a given drug, and a patient bill is essentially a collection of line items, each with an associated cost. We will look at two ways to model collections this and consider the pros and cons of each. We will consider a list of approved suppliers for salt.
Represent the list as an explicit collection
The first is to create an explicit collection called say: _ApprovedSaltSuppliers. The members of this collection are each suppliers of salt, say _Supplier1, _Supplier14 and _Supplier 23. We can link each of the suppliers to the collection using the object property gist:memberOf. So far, we have 4 instances, one property, three triples and no classes.

Figure 1: Simple Collection
It is always good practice to say what class an instance belongs to. What kind of a thing is the instance: _ApprovedSaltSuppliers? First, it is a collection. We have a class for that called gist:Collection, so we will want _ApprovedSaltSuppliers to be an instance of gist:Collection. However, it more than just any collection, it is not a jury, it is not a deck of cards. More specifically, it is a list of approved suppliers. So we create a class called ListOfApprovedSuppliers and declare _ApprovedSaltSuppliers to be an instance of that class. We also make ListOfApprovedSuppliersa subclass of gist:Collection, ensuring that _ApprovedSaltSuppliers is an instance of gist:Collection.
Using a SPARQL query to get the list
The second approach is inspired by the fact that a SPARQL query returns a report that is a collection of items from a triple store based on some specific criteria. Instead of having an explicit collection in the triple store for approved suppliers of a given substance, you could simply link that substance to each of the approved suppliers and then write a SPARQL query to find them from a triple store of past, present and potential future suppliers. See figure for how this would be done.
Note, we have added some contextual information. First, we indicated the kind of substance they are being approved to supply. We also have a link to a person in charge of maintaining that list. Can you think of what other information you might want to associate with the list of approved salt suppliers if you were modeling this for a specific organization?

Figure 2: Comparing two approaches
How should we skin this cat?
Which approach is best under what circumstances and why? Have another look at the examples above in finance, manufacturing and healthcare. For each, consider the following questions:
For the list of approved suppliers, we already saw that the answer to the first question is yes. The answer to the second question is also yes because suppliers come and go. There will likely be a moderate amount of change. The third question is less clear. It could be that the an approved supplier list is just connected to a given substance, and nothing else. In this case, the answer to the third question would be yes. The way we have modeled it includes a named individual responsible for creating and maintaining it. The list might also be part of a larger body of documentation. Or, if it is a larger organization where different divisions had their own approved supplier list, you would need to indicate which organization is being supplied. With this extra information, the answer to the third question would be no.
Consider a patient bill. It is not that obvious how to represent the information so a simple query can give an answer. First, a given patient might have many bills over time, which would connect them to many different line items. It is a bit awkward. Second, these items will never change. Finally, while it makes sense to represent a patient bill as being in essence, a list of line items, it is much more than that. It is not only connected to the patient, but also to the hospital, to the provider, and possibly to an insurance company.
To the extent that the answer to the above three questions is yes, you are probably better off just writing an on-demand query. Conversely, if the answers to the above questions tend to be no, then you probably do want to represent and manage an explicit collection.
In 2013 marijuana became legal in Colorado and Washington. It did not escape the notice of our clients that all Semantic Arts staff are from those two states. Suspicions grew deeper when we told them about the large collaborative knowledge base called Freebase. The looks on their faces told us they were thinking about freebase. We were but a whisker away from the last straw on the day we introduced them to the idea of degeneracy. “No, no we insisted, we are talking about a situation that commonly arises in mathematics and computing where something is considered to be a degenerate case of something else”. That was a close call; fortunately, all that confusion was “just semantics”.
Today we explain the idea of degeneracy and why it is useful for computing and ontology.

Figure 1: Examples of Degeneracy
Examples of Degeneracy: A circle is defined to be the set of points that are equidistant from a given point. The radius of the circle is that distance. But what do you have if the radius is zero? You have the set of points that are zero length from a given point, which is to say just that one point. In mathematics we would say a point is the degenerate case of a circle.
We all know what a rectangle is, but what happens if the width of an otherwise ordinary rectangle is zero? Then you just get a single line segment. Again, we say that a line segment is a degenerate case of a rectangle.
A set normally has two or more members, otherwise what is the point of calling it a set? Yet, the need for speaking of and representing sets that have 0 or 1 elements often arises. It happens so frequently that they have names: empty set and singleton set. They are degenerate cases of a set.
An example of a more complex structure than a set, is a process that consists of any number of tasks, and some ordering indicating what tasks must be done before what other tasks. However, sometimes, during computation or analysis, it can be convenient or even necessary to allow for processes that have zero tasks, or just one task. We could refer to such processes as empty or singleton processes. These are degenerate cases of a process that ordinarily should have two or more tasks.
What do all these examples have in common? What can we say about every case of degeneracy?
Definition
I propose the following as a working definition of degeneracy. We say that an X is a degenerate case of a Y when:

Figure 2: Number units as Degenerate Units of Measure
Why bother?
It might seem rather silly, or something that only mathematicians would bother about, but it turns out that in computing, degeneracy is very important. Let’s say you want to compute the average of an arbitrary set of numbers. In every day parlance, it makes no sense to speak of the average of a single number. However, you want your algorithm to work if you get a set that happens to only have one number in it. You therefore want to be able to pass the algorithm a set with one element in it.
When processes are being executed, tasks are being done, so a set of tasks may dwindle down to one and then to zero. If you want the algorithm to work, it needs to understand what it means to have a process with 1 or zero tasks. When there are no tasks left, the task is to do nothing. Sometimes it is even helpful to consciously put a ‘do nothing’ task in a plan or process.
Generally speaking, degenerate cases are useful when you want computational infrastructure to still work on the edge cases.
The most interesting example that I have seen of this in the context of ontology work arises in the context of doing unit conversions for physical quantities. For example, you convert 4 cm meters using a conversion factor of .01. You convert 3 kg to grams by using a conversion factor of 1000. We have an ontology for representing such physical quantities with units and conversion factors. Using this ontology, we have code to do units analysis and computing conversions.
It turns out to be convenient to give pure numbers ‘units’ just like we give physical quantities units. For example, a wine merchant might sell cases with 12 bottles each. A unit of ‘dozen’ would come in handy, with a conversion of 12. Another convenient number unit is percent with a conversion of .01. To convert 250% to the true number, you multiply 250 by .01 to get 2.5. To convert 4 dozen to the true number, you multiply 4 by 12 to get 48.
The is completely analogous to converting 250 cm to meters, you multiply 250 by .01 to get 2.5 meters. It turns out that you can represent pure numbers this way using units in a way that is exactly analogous to how your represent physical quantities like 4 cm and 5 watts. This means that the same code for doing units conversions on physical quantities with ordinary units also works for number units. A pure number like 48 represented as a 4 with the unit ‘dozen’ is a degenerate case of a physical quantity such as ‘250 cm’. Pure number units like dozen and percent are degenerate cases of ordinary physical units like cm or watt.
Go back and look at the previous section and check the extent to which pure numbers and number units fit the definition of degeneracy.
For a detailed look at number units see the blog: Quantities, Number Units and Counting
As the Season to be Jolly gets into full swing, I reflect one of my favorite ways to be jolly: laughter. What makes us laugh (or not) in a given situation? Some people just laugh, like my mother, who regularly burst into gales of laughter, for no apparent reason. The more puzzled our looks, the harder she laughed. I dedicate this blog to my mom, who passed away just before Christmas a year ago.
What happened?
There are many things that can trigger us to laugh. It is often someone making a quip or writing/telling a story or joke with the intention to make others laugh. Of course, such attempts often fall flat. Conversely, sometimes things just happen that are found to be funny, and no one was playing the role of comedian. The trigger event might be something that is unfolding live – perhaps just a stray thought that comes into your head. Or you might be watching videos, reading, or listening to a podcast. Or worse, the person next to you on a boring commute, earbuds in place, is doing so and laughing their head off.
Then what did you do?
So an event happens that comes into our awareness, and we outwardly respond to it in some way. f we are not impressed, we may: grimace, groan, guffaw, blank stare, or say “yark yark”. On the positive side, reactions include: smile, chuckle, giggle, laugh, gales of laughter and falling over laughing. In extreme cases, each time you recall the event, in subsequent minutes, hours and days, you will again laugh uncontrollably. It could be weeks, months or even years before your response to merely remembering the original event fades back to a mere smile, or warm feeling inside. See the diagram below. Note the similarity to how gratitude was characterized in the Thanksgiving Blog several weeks ago.

Why did you do that?
What is interesting is not so much the trigger event itself, nor even what the reactions are: it’s what happens in between. What do we find funny and why? First, we see or understand the thing that is [or is supposed to be] funny. When the event is an overt attempt at humor, we call this ‘getting it’. The next step is very personal; how much and why do we appreciate what we just saw or understood? Many people that get a bad pun won’t enjoy or appreciate it in any way. Other people may readily acknowledge that that same pun is pretty bad, but they giggle nevertheless. Below are just a few thoughts that come to mind: patterns for things that contribute to being funny.

For the 2014 Season to be Jolly, you now have a simple conceptual model describing the main elements of laughter & amusement how they relate to each other. In our business, we call this an ‘ontology’. Sometimes the hardest thing is to come up with is a good name for an ontology. The ones that first come to mind are descriptive, but boring – like Ontology of Laughter (OoL) , or The Laughter Ontology (TLO). Or if you want cool acronym, you might try Crafted Ontology Of Laughter (yark yark). Alternatively, the name you think of calls attention to itself by trying so hard to be clever it falls flat on its face, like “OnJollygy” (pronounced: on jolly jee).
Exercise for the reader:
For example, for me, this name just came as a random thought (trigger event). Because I am total sucker for plays on words, I immediately enjoyed it and giggled with glee (reaction) – but the first person I told gave me a blank stare and the next one just grimaced.
Feel free to think I’m a bit weird. I attribute this to genetics. My mom never needed reason to laugh and my dad is a die-hard punster who wears a t-shirt that says: “A Mature Pun is Fully Groan”. His favorite was a quintuple pun involving lions, sea gulls and commerce – have you heard that one?
Whether you experience Groans, Giggles or Gales of Laughter, may your Holidays be Jollily filled with Joy.
I wrote the following last year, and am inspired to share this more publicly, today, the Monday of Thanksgiving week, 2014.
It is Thanksgiving Day, 2013 and I just came across an article describing how science has tied gratitude to “the tendency to feel more hopeful and optimistic about one’s own future, better coping mechanisms for dealing with adversity and stress, [and] less instances of depression” among other things.
But what exactly is gratitude? And will all forms of gratitude give a similar boost to happiness? Examples range from a simple automatic thank you when someone opens a door for you, to a profound mystical experience where one may find oneself weeping in an alpine meadow of flowers surrounded by glacier-draped peaks.
An ontological analysis that I conducted in the last hour reveals the following essential elements of gratitude.
Here is a picture. The rectangles represent the key kind of things, the links indicate how they are related to each other. Note that the person has to be aware of the trigger event in order for them to express gratitude in response to it.

Main Elements of Gratitude
In addition, every gratitude response will have a certain form and a certain character. The character of the gratitude response might be unconscious or conscious; if the latter, it might be at a thinking level without any real feeling, or it might be deeply felt. A gratitude response will also take a certain form, e.g. returning a favor, a verbal or written thank you, or just an inner feeling.
This is what is essential, but there are various optional things too. For example, the trigger event may have been triggered by another person, or others may have had nothing to do with it (e.g. a rainbow). If the former, the triggering person may or may not have explicitly intended to benefit the thankful one. They may have just made an offhand remark that someone found value in, or written something in an article that had a great benefit to a reader who emailed a thank you. In this example, the gratitude response was targeted at the author who triggered the response, but that would not always be the case. The thankful one may have just felt deep gratitude that the author never knew about.
Then there is the question of whether the gratitude response actually affected anyone. The author might be pleased that a reader expressed gratitude, or they might not care, or they might not even see the thank you email. Even the thankful one might not be affected, if their gratitude response is fully on autopilot (e.g. thanking one for opening a door).
Below is a diagram summarizing all the points we have raised about gratitude. It is essentially an ontology of gratitude. The dotted lines indicate optional links, the solid ones are necessary.

So how can we use this ontology of gratitude to pave the road to euphoria? I speculate that the science on gratitude will show that the gratitude has to be felt to be the most valuable. On Thanksgiving, we often go around the table and say what we are thankful for. But does it really mean anything? Are we just saying it or do we really feel it?
This thanksgiving, I am thankful for my creative mind and that I have a job that pays me to do what I love: distilling the essence from a complex web of ideas. It is deeply felt. There, I feel much better already!
Happy Thanksgiving
We humans are categorizing machines, which is to say, we like to create metaphorical buckets, and put things inside. But there are different kinds of buckets, and different ways to model them in OWL and gist. The most common bucket represents a kind of thing, such as Person, or Building. Things that go into those buckets are individuals of those kinds, e.g. Albert Einstein, or the particular office building you work in. We represent this kind of bucket as an owl:Class and we use rdf:type to put something into the bucket.
Another kind of bucket is when you have a group of things, like a jury or a deck of cards that are functionally connected in some way. Those related things go into the bucket (12 members of a jury, or 52 cards). We have a special class in gist called Collection, for this kind of bucket. A specific bucket of this sort will be an instance of a subclass of gist:Collection. E.g. OJs_Jury is an instance of the class Jury, a subclass of gist:Collection. We use gist:memberOf to put things into the bucket. Convince yourself that these buckets do not represent a kind of thing. A jury is a kind of thing, a particular jury is not. We would use rdf:type to connect OJ’s jury to the owl:ClassJury, and use gist:memberOf to connect the specific jurors to OJ’s jury.

A third kind of bucket is a tag which represents a topic and is used to categorize individual items for the purpose of indexing a body of content. For example, the tag “Winter” might be used to index photographs, books and/or YouTube videos. Any content item that depicts or relates to winter in some way should be categorized using this tag. In gist, we represent this in a way that is structurally the same as how we represent buckets that are collections of functionally connected items. The differences are 1) the bucket is an instance of a subclass of gist:Category, rather than of gist:Collection and 2) we put things into the bucket using gist:categorizedBy rather than gist:memberOf . The Winter tag is essentially a bucket containing all the things that have been indexed or categorized using that tag.

Below is a summary table showing these different kinds of buckets, and how we represent them in OWL and gist.
| Kind of Bucket | Example | Representing the Bucket | Putting something in the Bucket |
| Individual of a Kind | John Doe is a Person | Instance of owl:Class | rdf:type |
| A bucket with functionally connected things inside | Sheila Woods is a member of OJ’s Jury | Instance of a subclass of gist:Collection | gist:memberOf |
| An index term for categorizing content | The book “Winter of our Discontent” has Winter as one of its tags | Instance of a subclass of gist:Category | gist:categorizedBy |
In a prior blog (SPARQL: Updating the URI of an owl:Class in place) we looked into how to use SPARQL to rename a class in a triple store. The main steps are below. We showed how to do this for the example of renaming the class veh:Auto to veh:Car.
The last step addresses the fact that there are a few other things that you might need to do to address all the consequences of renaming a class. Today we will see how to handle the situation where your instances use a naming convention that includes the name of the class. Let’s say the instances of Car (formerly Auto) are all like this: veh:_Auto_234 and veh:_Auto_12. We will want to change them to be like: veh:_Car_234.
The main steps are:
CONSTRUCT to preview the new triples using the results of step 1.DELETE and INSERT to swap out the old triples with the new URI in the object.CONSTRUCT to preview the new triples using the results of step 1.DELETE and INSERT to swap out the old triples with the new URI in the subjectIn practice, we do step 1 and step 2a at the same time. We find a specific instance, and filter on just that one (e.g. veh:_Auto_234) to keep things simple. Because we will be using strings to create URIs, we have to spell the namespaces out in full, or else the URI will incorrectly contain the string “veh:” instead the expanded form, which is: “http://ontologies.myorg.com/vehicles#”.
CONSTRUCT {?s ?p ?newURI}
WHERE {?oldURI rdf:type veh:Car .
?s ?p ?oldURI.
FILTER (?oldURI in (veh:_Auto_234))
BIND (URI(CONCAT ("http://ontologies.myorg.com/vehicles#_Car_",
STRAFTER (STR(?oldURI),"_Auto_")))
AS ?newURI)
}
This should return a table something like this:
| Subject | Predicate | Object |
| veh:_TomJones | gist:owns | veh:_Car_234 |
| veh:_JaneWrenchTurner | veh:repaired | veh:_Car_234 |
| veh:_PeterSeller | veh:sold | veh:_Car_234 |
This tells you there are exactly three triples with veh:_Auto_234 in the object, and shows you what the new triples will be when you replace the old ones. After this, you might want to remove the FILTER and see a wider range of triples, setting a LIMIT as needed. Now you are ready to do the actual replacement (step 2b). This is what you do:
DELETE statement to remove the triple that will be replaced.CONSTRUCT” with “INSERT” leaving alone what is in the brackets.WHERE clause as it is, except to remove the FILTER statement, if it is still there (or just comment it out).
Sample Graph of Triples
This will do the change in place for all affected triples. Note that we have constructed the URI from scratch, when all we really needed to do was do a string replace. The latter is simpler and more robust. Using CONCAT and STRAFTER gives the wrong answer if the string “_Auto_” does not appear in the URI. Here is the query to execute, with the simpler string operation:
DELETE {?s ?p ?oldURI}
INSERT {?s ?p ?newURI }
WHERE {?oldURI rdf:type veh:Car .
?s ?p ?oldURI .
BIND (URI(REPLACE(STR(?oldURI), "_Auto_", "_Car_")) AS ?newURI)
}
Step 3 is pretty much identical, except flip the subject and object. In fact, you can combine steps 2 and 3 into a single query. There are a few things to watch out for:
We have been developing solutions for our clients lately that involve loading an ontology into a triple store, and building a UI for data entry. One of the challenges is how to handle renaming things. If you want to change the URI of a class or property in Protégé you load all the ontologies and datasets that use the old URI and use the rename entity command in the Refactor menu. Like magic, and all references to the URI are changed with the press of the Enter key. In a triple store, it is not so easy. You have to track down and change all the triples that refer to the old URI. This means writing and executing SPARQL queries using INSERT and DELETE to make changes in place. Below is an outline of how to rename a class.
Let ?oldClass and ?newClass be variables bound to the URI for the old and new classes respectively – e.g. ?oldClass might
be veh:Auto and ?newClass might be veh:Car. The class rename operation involves the following steps:
?oldClass to be instances of ?newClass instead. e.g.veh:myTeslaS rdf:type veh:Auto is replaced withveh:myTeslaS rdf:type veh:CaroldClass as the object. It may occur in triples where the subject is a blank node and the predicate is one of the several used for defining OWL restrictions. E.g . _:123B456x78 owl:someValuesFrom veh:Auto?oldClass as the subject. It may occur in triples for declaring subclass relationships, comments as well as the triple creating the class in the first place. e.g. veh:Auto rdf:type owl:Classveh:Auto_234)then you will have to find all the URIs that start with veh:_Auto and use veh:_Car instead. We will look into this in a future blog.Here is some SPARQL for how to do to the first step.
# Rename class veh:Auto to veh:Car
# For each ?instance of ?oldClass
# Replace the triple <?instance rdf:type ?oldClass>
# with <?instance rdf:type ?newClass>
DELETE {?instance rdf:type ?oldClass}
INSERT {?instance rdf:type ?newClass}
WHERE {BIND (veh:Auto as ?oldClass)
BIND (veh:Car as ?newClass)
?instance rdf:type ?oldClass . }
There are many ways to make mistakes here. Watch for the following:
DELETE before the INSERT seems wrong, fear not, it is just an oddity in the SPARQL syntax.INSERT and DELETE, always use CONSTRUCT to see what new triples will be created.After you get step 1 working, try out steps 2 and 3 on your own, all you need to do is some straight-forward modifications to the above example. Step 4 involves more exploration and custom changes.
In an upcoming blog, we explore one of those changes. Specifically, if your naming convention for instances uses the class name in the URI then those instance URIs will have to change (e.g. from veh:_Auto_2421 to veh:_Car_2421).

We’ve found ourselves working with D3(d3js.org) more and more lately, both for clients and for our own projects. So far we’ve really just begun to scratch the surface of what it can do (if you’re unfamiliar, take a moment to browse the examples). Despite our relative lack of experience with the library, we’ve been able to crank out demos at a wonderfully satisfying pace thanks to our decision early on to focus on creating high-level, DRY abstractions. Though there is a ton of flexibility and extensibility, D3 does seem to have a few broad categories of visualizations based around the shape of input data. The one we’re going to focus on here is the flare.js format, which looks roughly like this:
https://gist.github.com/scogle/15c1da0f631129da64a7
This format seems to be the basis of a large number of useful visualizations dealing with nested, tree-shaped data. The challenge is that in semantic systems it’s just not possible to construct a SPARQL query that returns data in a format anywhere near the “flare” example above. What you’re more likely to see something structurally very similar to this:
https://gist.github.com/scogle/471166e99b1b1a25b82f
So what to do? One path we could take is to write a server-side endpoint that returns data in the right structure, but this has the disadvantage of being less flexible, not to mention it spreads out concerns of the visualization layer wider than they need to be. The other option is to do the conversion on the front end. Rather than dive into writing a bespoke method to covert the specific data on hand to a format tailored to the visualization we want, why not take a step back and write a reusable abstraction layer? The end result, without giving too much away, looks roughly like this:
https://gist.github.com/scogle/cab5a3894a32a3122058
The advantage of this over D3’s built-in nest() method is that it will recursively construct n-depth trees from flat data, as long as the relationships are consistently expressed. This eliminates one of the major time-consumers in creating these visualizations–the data conversion in a reusable and flexible way. But why stop there? We created abstractions for various visualizations as well. Now, creating an icicle graph (like the one above) is as simple as:
https://gist.github.com/scogle/7752abf671003cae5469
This is still in the early stages of development so it’ll be some time before we release it to the public, but stay tuned.