Team Building – The Importance of an Ontology

Team Building – The Importance of an Ontology

Effective communication is something that many in the C-suite aspire to have in their organization. But what does this really entail in practice? At one level, it is about the high-level aspirations of the organization, the expectations of clients, the hopes of investors, and so on. At another level, deeper into the workings of the enterprise, it is about clear messaging. It is about ensuring that effort is not duplicated because of a misunderstanding. It is about being able to re-use information assets effectively within different lines of business. It is about analyses of data coming to the same conclusion, and that the conclusion cannot be contested because of a failure to use the correct analytical procedure. 

There are many good blogs and papers on this topic. They keep cropping up regularly, so the topic clearly has not been solved. One example,  https://blog.landscapeprofessionals.org/team-building-the-importance-of-a-shared vocabulary/, could have been in any discipline but concerns landscaping. It clearly expresses the need for a ‘shared vocabulary’ and a ‘standardized terminology’. One sentence really stands  out for me: “Having a consistent language when interacting with clients creates a cohesive  experience for them no matter who they are speaking to in your organization.” The “cohesive experience” is what it is all about – that warm feeling when you know you’re on the same wavelength as somebody else. From that, they can intuit your next move, show empathy, and so on.  

In writing “Is An Upper Ontology Useful?” [https://ceur-ws.org/Vol-3661/10-SHORT PWinstanley-ISKOUK2023.pdf] I specifically called out the utility of an upper ontology as a vocabulary in common – just like the shared vocabulary that is desired among landscape professionals and many others. But is an upper ontology something too technical to bring a cohesive experience to a company like yours? Not really. The process of bringing an ontology  into an enterprise is a very social one. People must get together to work out what the types of  ‘things’ they have in their business, what words get used across all parts of the business and  client base to refer to these ‘things’ in a consistent manner, and how they can be defined in a  non-circular way. The experience of building an ontology develops cohesion. Sometimes the development of an ontology exposes pre-existing fracture lines within the sta4. But using an  RDF and OWL ontology together with IRIs as identifiers provides a mechanism to handle  ambiguity gracefully. Gone are the days when people were in a room until one “winner”  succeeded in getting their way. The move from using labels to using namespace-based IRIs as identifiers of the concepts in the development of a shared vocabulary (“strings to things”) gives your organization considerable flexibility to handle variation of expression but show the shared meaning, as well as splitting out the meaning of concepts where the use of labels alone might lead to confusion coming from multiple concepts sharing the same label.  

If you want effective communication in your enterprise, and you feel the need for a shared vocabulary, then implement an ontology. Yes, it’s for information integration, and that means  something involving computers. But that is not the only aspect of implementing an ontology.  The process itself is helpful for team building, and the output is an excellent way to bring that  cohesive communication experience to your business and your clients.

From Labels to Verbs –Child’s Play!

From Labels to Verbs –Child’s Play!

Watching a child acquire their first language skills is nothing like acquiring a new language. The 2-year-old is learning vocalisation, and also ensuring that they get the correct labels for the things around them. “Mummy”, “Daddy”, “cat”, “car”, “sea” and so on. Often this is done with gestures and pointing. The whole person is involved in enunciating and conveying meaning as part of a dialogue. Others in the child’s environment will be encouraging them, but the child, too, will be observing and understanding at a level way beyond their ability to join in – at least for a month or two. 

Over time, labels get replaced by phrases. Verbs and adjectives come into the mix. The degree of sophistication improves both the communication, and the ability to show that something communicated to the child has been understood as intended. 

Are there any lessons from the child acquiring language from which enterprises can learn when it comes to their development in acquiring semantic skills, assets and technologies? I think there are. I see that enterprises follow a somewhat similar path to the child in acquiring  semantic skills – tending to start with simple collections of labels (controlled vocabularies and  simple taxonomies) before venturing into using more complex information structures with verbs  and adjectives. (These come from ontologies that provide more scope for knowledge representation than controlled vocabularies and taxonomies do). 

Historically, this has been the case. We went nearly 2000 years between Plato’s Socrates  “carving nature at its joints” to help us understand what ‘things’ there are in our world to William  A. Woods writing “What’s in a link?”.

Semantic Arts has developed a core, upper ontology that carves enterprise information at its seams. This has been described in many ways, most recently in a form like the Mendeleev periodic table of elements iii. But where are the links? Are we still approaching the persuasion to enterprises to use semantics in the same way as a child acquiring language rather than as someone already with language skills learning a new language? Do people in business and industry see their information assets as telling a story? Have they the understanding that information ‘bricks’ can be organised to build a variety of information stories in the same way that a set of Lego/Duplo bricks can be organised into the shape of a house, or a boat, or a space rocket? It is the links, the predicates in the RDF model, that help bring together instances of classes into a phrasal structure that is as simple as a 3-year-old’s language constructs. 

Let’s use the remainder of this post just to examine the vocabulary of Semantic Arts’ ‘gist’  ontology from the perspective of the links – those predicates on which predicate logic is based. 

‘gist’ version 13 has 63 object properties. These are the property type that relates one ‘thing’ to another ‘thing’. There are also 50 data properties, the type that relates a ‘thing’ to a ‘string’ of some sort. We know that the ‘string’ could be assigned to a ‘thing’ type by using xsd:anyURI, but let’s leave that for now. 

There are 3 object properties where only the domain (the origin of the relationship ‘arrow’) is  specified in the ontology: 

gist: owns
gist:providesOrderFor
gist:isAbout

and 14 where the range alone (the class at the pointy end of the relationship ‘arrow) is specified: 

gist:hasAccuracy 
gist:hasParty 
gist:hasPhysicalLocation 
gist:comesFromAgent 
gist:hasAspect 
gist:isIdentifiedBy 
gist:isAllocatedBy
gist:isMadeUpOf 
gist:comesFromPlace 
gist:goesToPlace 
gist:hasMagnitude 
gist:isRecognizedBy 
gist:hasAddress 
gist:goesToAgent

There are only 6 object properties where the ontology ‘constrains’ us to a specific set of both  domain and range classes: 

gist:isGeoContainedIn 
gist:hasUnitOfMeasure 
gist:prevents
gist:hasUnitGroup 
gist:hasBiologicalParent 
gist:isFirstMemberOf

This leaves 40 object properties that are a little more flexible in their intended use. 

gist:isExpressedIn 
gist:isCategorizedBy 
gist:hasDirectBroader
gist:hasBroader 
gist:isRenderedOn 
gist:isGovernedBy 
gist:hasParticipant 
gist:hasGiver 
gist:precedesDirectly 
gist:requires 
gist:isPartOf 
gist:precedes 
gist:allows 
gist:isDirectPartOf
gist:contributesTo 
gist:hasMultiplier 
gist:isMemberOf 
gist:hasGoal 
gist:accepts 
gist:hasUniqueNavigationalParent gist:hasNavigationalParent 
gist:isTriggeredBy 
gist:links 
gist:isBasedOn 
gist:isConnectedTo 
gist:hasRecipient 
gist:occursIn 
gist:isAgectedBy
gist:refersTo 
gist:hasSubtrahend 
gist:hasDivisor 
gist:conformsTo 
gist:hasAddend 
gist:hasUniqueBroader
gist:produces 
gist:isUnderJurisdictionOf gist:linksFrom 
gist:ogers 
gist:hasIncumbent 
gist:linksTo

One of our observations is that the more one focuses on classes, the tighter one sticks to domains within the enterprise. These then follow the verticals of the periodic table. We are re-enforcing the siloed view, a little. We are keeping interoperability, but looking at the enterprise from the perspective of business areas. But if we move to modelling with relationships, then we can think more openly about how similar patterns occur between the ‘things’ of the enterprise across different areas. It leads us to a much more abstract way of thinking about the enterprise because these relationships will crop up all over the place. (This is one reason the ‘gist’ ontology does not specify domains or ranges for the majority of its properties. It increases flexibility and  opens up more possibilities for patterns.) 

For a bit of semantic fun, look in the real world for opportunities to use the gist properties as verbs. Go into work and think about how many types of ‘thing’ you can see in the enterprise where ‘thing  A’ gist:produces ‘thing B’. See where you can find ‘thing X’ gist:hasGoal ‘thing Y’. Unlike the child we started this article with, you as the reader already know language. So you can use more than  just labels and start making statements of a phrasal nature that use these ‘gist’ relationships. 

gist:produces
a owl:ObjectProperty ; 
rdfs:isDefinedBy <https://w3id.org/semanticarts/ontology/gistCore> ; 
skos:definition “The subject creates the object.”^^xsd:string ; 
skos:example “A task produces a deliverable.”^^xsd:string ; 
skos:prefLabel “produces”^^xsd:string ; 
gist:hasGoal
 a owl:ObjectProperty ; rdfs:isDefinedBy <https://w3id.org/semanticarts/ontology/gistCore> ; 
skos:definition “The reason for doing something”^^xsd:string ; 
skos:prefLabel “has goal”^^xsd:string ; 

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs

There was a time when RDF and triplestores were only seen through the lens of massive data integration. Teams went to great extremes to show how many gazillion triples per second their latest development could ingest, and large integrations did likewise with enormous datasets. This was entirely appropriate and resulted in the outstanding engineering achievements that we now rely on daily. However, for this post I would like to look at the other end of the spectrum – the ‘small is beautiful’ end – and I will relate a story about networking tiny, community-based air quality sensors, a Raspberry Pi, low-code orchestration software, and an in-memory triplestore. This is perhaps one of a few blog posts, we’ll have to see. It starts with looking at using graphs at the edge, rather than the behemoth at the centre where this blog started.

The proposition was to see if the sensor data can be brought together through the Node Red low-code/no-code framework to feed the data into RDFox, an in-memory triplestore and then data summaries periodically pushed to a central, larger triplestore. Now, I’m not saying that this is the only way to do this – I’m sure many readers will have their views on how it can be done, but I wanted to see if a minimalist approach was feasible. I also wanted to see if it was rapid and reproducible. Key to developing a broad network of any IoT is the need to scale. What is also needed though is some sort of ‘semantic gateway’ where meaning can be added to what might otherwise be a very terse MQTT or similar feed.

So, let’s have a look for a sensor network to use. I found a great source of data from the Aberdeen Air Quality network (see https://www.airaberdeen.org/ ) led by Ian Watt as part of the Aberdeen ODI and Code The City activities. They are contributing to the global Sensor.Community (formerly Luftdaten) network of air quality sensors. Ian and his associated community have built a handful of small sensors that detect particulate matter in air, the PM10 and PM2.5 categories of particles. These are pollutants that lead to various lung conditions and are generated in exhaust from vehicles, and other combustion and abrasion actions that are common in city environments. Details of how to construct the sensors is given in https://wiki.57north.org.uk/doku.php/projects/air_quality_monitor and https://sensor.community/en/sensors/ . The sensors are all connected to the Sensor.Community (https://sensor.community/en/) project from which their individual JSON data feeds can be polled by a REST call over HTTP(S). These sensors cost about 50 Euros to build, in stark contrast to the tens of thousands of Euros that would be required to provide the large government air quality sensors that are the traditional and official sources of air quality information (http://www.scottishairquality.scot/latest/?la=aberdeen-city) . And yet, despite the cheapness of the device, many studies including those of Prof Rod Jones and colleagues in Cambridge University (https://www.ch.cam.ac.uk/group/atm/person/rlj1001) have found that a wide network of cheap sensors can provide reliable and useful data for air quality monitoring.

So, now that we’ve mentioned Cambridge University we can go on to mention Oxford, and in particular RDFox, the in-memory triplestore and semantic reasoner from Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). In this initial work, we are not using the Datalog reasoning or rapid materialization that this triplestore affords, but instead we stick with the simple addition of triples, and extraction of hourly digests of the data. In fact, RDFox is capable of far more beyond the scope of today’s exercise and much of the process here could be streamlined and handled more elegantly. However, I chose not to make use of these attributes for the sake of showing the simplicity of Node-RED. You might expect this to require a large server, but you’d be wrong – I managed all of it on a tiny Raspberry Pi 4 running the ARM version of Ubuntu. Thanks to Peter Crocker and Diana Marks of Oxford Semantic Technologies for help with the ARM version.

Next up is the glue, pipelining the raw JSON data from the individual sensors into a semantic gateway in which the data will be transformed into RDF triples using the Semantic Arts ‘gist’ ontology (https://www.semanticarts.com/gist/ and https://github.com/semanticarts/gist/releases). I chose to do this using a low-code/no-code solution called Node RED (https://nodered.org/). This framework uses a GUI with pipeline components drawn onto a canvas and linked together by arrows (how very RDF). As the website says, “Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.“ This is exactly what we need for this experiment in minimalism. Node-RED provides a wealth of functional modules for HTTP and MQTT calls, templating, decisions, debugging, and so forth. It make it easier to pipeline together a suite of processes to acquire data, transform it, and then push it on to somewhere else than traditional coding. And this ‘somewhere else’ was both the local RPi running RDFox and the Node-RED service, and also a remote instance of RDFox, here operating with disk-based persistence.

Following the threading together of a sequence of API calls to the air sensor endpoints with some templating to an RDF model based on ‘gist’ and uploading to RDFox over HTTP (Figs 1 & 2), the RDFox triplestore can then be queried using a SPARQL CONSTRUCT query to extract a summary of the readings for each sensor on an hourly basis. This summary included the minimum and maximum readings within the hour period for both particle categories (PM10 and PM2.5), for each of the sensors, together with the numbers of readings that were available within that hour period. This was then uploaded to the remote RDFox instance, (Figs 3 & 4) and that store becomes the source of the hourly information for dashboards and the like (Fig 5). Clearly, this approach can be scaled widely by simply adding more Raspberry Pi units.

The code is available from https://github.com/semanticarts/airquality

The experiment worked well. There were minor challenges in getting used to the Node RED framework, and I personally had the challenge of swapping between the dot notation of navigating JSON within Javascript and within the Node RED Mustache templating system. In all, it was a couple of Friday-afternoon experiment sessions, time well spent on an enjoyable starter project.

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 1:  The ‘gather sensor data’ flow in Node-RED.

 

Fig 2: Merging the JSON data from the sensors into an RDF template using Mustache-style templating.

 

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 3: A simple flow to query RDFox on an hourly basis to get digests of the previous hour’s sensor data.

 

A Slice of Pi: Some Small Scale Experiments with Sensor Data and Graphs
Fig 4: Part of the SPARQL ‘CONSTRUCT’ query that is used to create hourly digests of sensor data to send to the remote persisted RDFox instance.

 

Fig 5: Querying the RDFox store with SPARQL to see hourly maximum and minimum readings of air pollutants at each of the sensor sites.

 

Connect with the Author

DCAF 2021: Third Annual Data-Centric Architecture Forum Re-Cap

Written by Peter Winstanley

The Data-Centric Architecture Forum was a success!

If growth of participants is an indicator of the popularity of an idea, then the Third Annual Data-Centric Architecture Forum is reflecting a strong increase in popularity, for this year over 230 participants joined together for a three-day conversation.  This is a huge increase on the 50 or so who braved the snows of Fort Collins last year.  Perhaps the fact that the meeting was online was a contributing factor, but then spending three days in online meetings with presentations, discussion forums, and discussions with vendors needs a distinct kind of stamina, as it misses out on the usual conviviality of a conference – meals together, deep discussions over beer or coffee, and thoughtful walks and engaging sightseeing trips.  This year’s Data-Centric Architecture Forum was a paradigm shift in itself.  The preparatory work of Matt Faye and colleagues at Authentric provided us with a virtual auditorium, meeting rooms, Q&A sessions, socialization venues, and vendor spaces.  The Semantic Arts team were well-rehearsed with this new environment, but it was reassuring to find that conference attendees soon became acquainted with the layout and, quite quickly, the conference was on a roll with only the very infrequent glitch that was quickly sorted by Matt and team.

Paradigm shift was not only evident in the venue, it was also a central theme of the conference, the idea that we are on the cusp of a broad transformation of practice in informatics, particularly within the enterprise.  Dave McComb placed the Kuhnian idea of revolution squarely on the table as he commenced proceedings.   In many ways this is something we have become all too familiar with as the internet has given us hospitality companies with no hotels, taxi companies with no cars, and so on.  Here we are moving to there being applications with no built-in data store.  How can this possibly work?  It flies in the face of decades, perhaps centuries, of system design.  This Forum focused on architecture – the key elements necessary to implement a data-centric approach.

Some of the presentations covered the whole elephant, trunk to tail, whereas others focused on specific aspects.  I’ll take a meander through the key messages for me, but as ever in these sorts of reviews, there is no easy way to do justice to everyone’s contribution, and my focus may not be your focus.  However, given that the Forum was a ‘digital first’ production, you will be able to access the talks, slide decks and discussions yourself to make up your own mind—and I hope that you do.  A full complement of all recorded presentations can be available for purchase at the same price as admission.  They can be purchased here or inquire further at [email protected]

Understanding that “The future is already here — it’s just not evenly distributed” means that we have to disentangle the world around us and sift out the ideas and the implementations that show this future, and perhaps recognise early places where the future is likely to arise from places where the technology isn’t perhaps the most sophisticated, but the marketing is more advanced (thinking Betamax vs VHS here).  As Mark Musen pointed out in “A Data-Centric Architecture for Ensuring the Quality of Scientific Data”, when given a free reign people make a mess of adding metadata, and this can be remedied by designing minimal subsets that do a ‘good enough’ job.  Once a community realises that a satisficing minimum metadata set can deliver benefit in a domain, this model can be rolled out with similar good effect to other domains.  We know from Herbert Simon’s work that organisations naturally fill this ‘satisficing’ concept of operations, and as Alan Morrison and Mark Ouska discussed in their presentation on lowering the barriers to entry, going with the organisational flow – ensuring that there was an organisational language to express the new ideas – is a key element in successful adoption.  How else are we to bring to market the range of technologies presented by the 13 vendors exhibiting at the Forum?  Their benefits need to be describable in user stories that have resonance in all enterprises, for this isn’t just a revolution for science or for engineering, just like Berners-Lee tweeted about both the Olympic Games and the World Wide Web at the start of the London Olympic Games, “This is for everyone”.

Being for everyone requires the technologies, such as the inclusion of time and truth or the responding to events that are possible in modern triplestores, are able to be populated at scale with soundly-created information assets.  The approaches to “SemOps,”an automation ecosystem to provide scalable support to people managing enterprise data assets in a data-centric modality, was the focus of the presentation by Wallace and Karii.  Being for everyone also means that information needs to be used across domains, and not just within the highly tailored channels that are typical of current application-centric architectures.  Jay Yu from Intuit and Dan Gshwend from Amgen, among others, showed their organisation’s paths to this generalised, cross-domain use of enterprise information, and the social dimension of this liberation of data across the enterprise was considered by Mike Pool and also by Laura Madsen, who both provided their experiences on governance in data-centric worlds.  Security was also covered, albeit later in a video presentation, by Rich Sinnott from Melbourne.

So, where are we at?  With attendance from North and South America, Europe and Oceania, the Forum showed us that there is a global appeal to the ideas of data-centricity.  There is commercial activity by various scales of solution vendors and implementing enterprises.  There is also consideration both within enterprises and by specialist consultants in the human factors associated with implementation and management of data-centric architectures.  However, there are still considerable challenges in cross-domain implementation of data-centricity, and the need to scale simultaneously not only the technical infrastructures and human skills, but also the involvement of individuals at a personal level in the management of their information and their active involvement in the contribution of that information to the global web of data.  The news from the BBC on their work with Solid pods and other personal information stores gave the Forum an inkling of the scale of change that is about to hit us.  Let us hope that the Third Data-Centric Architecture Forum has played a catalytic role in this global transformation, and I hope to have many enjoyable discussions with readers as we evaluate progress on our journey at the next Forum in a year’s time.

Click here to purchase DCAF 2021 Presentation recordings.