User:Ottomachin/Real world modelling and OWL

At a recent meeting the workshop leader reminded us that our primary task was "to create a model of the real world in OWL". This task statement can be decomposed into two activities:

  • model the real world
  • express the model in the web ontology language OWL (although OWL2 now seems to have reverted to being a "virtual" language, in so far as it now has several syntaxes, and the number seems to be growing !).

The purpose of this memo is to show that there is:

  • a preferred method for creating models of the real world, ie. ER (Entity-Relationship) modelling (now subsumed in UML)
  • a preferred method for publication of these models in OWL, ie. systematic translation.

Apart from simple common sense, another justification for decomposing the primary task into two subtasks can be found in the Protégé user documentation:

"We assume you already know what you are doing and have a list of the main elements of the ontology written down in pencil or sketched in a text editor, mind-map, concept map or whatever tool you prefer. (Of course you have to have done the thinking first. This is just how to get the results of thinking into Protégé as easily as possible)". [1]

This statement, as well as confirming the two subtasks of modelling and conversion to OWL, also hints at other dimensions of these two subtasks, ie. that for the first, we should probably use some tool; and that for second, at least one requirement is that it can be done easily.

Modelling

edit

Greek Philosophy

edit

Modelling the real world is a well known and understood activity. From at least the time of the ancient Greek philosophers there was understanding of the ontological (aspects of being) and epistemological (aspects of knowledge) and their application to understanding the real world. Plato (428 BC) described the:

"... apprehension of the unchanging form (ie. types and properties) and the relationships between them".[2]




His student Aristotle (384 BC), perhaps the most famous of all ancient Greek philosophers and the father of logic, is known for his categories of being and his considerations of the "bare particulars" (objects in themselves) and their "inherence relations" (their properties and relationships).[3] In various ontological theories of the "categories of being" it can be seen that the common core categories are category, property and relationship.[4] Aristotle was also aware of questions such as whether or not an object should be considered as just the totality of it's attributes and relationships (ie. "bundle theory" which while perhaps not philosophically convincing, does in fact have some utility in the actual practice of modelling).

Modern Philosophy

edit

A modern philosophical dictionary[5] summarises these philosophical concepts:

"In its weakest sense, the word object is the most all-purpose of nouns, and can replace a noun in any sentence at all ... Thus objects are things as diverse as the pyramids, Alpha Centauri, the number seven, my belief in predestination, and your mother's fear of dogs."

This is very close to a common modern definition often given for an entity, which is a "Person, Place, Event, Concept or Thing". The dictionary goes on to quote a more modern philospher, Charles S. Peirce who succinctly defines the broad notion of an object as follows:

"By an object, I mean anything that we can think, i.e. anything we can talk about."




and the dictionary then continues to a final key summary:

"In a more restricted sense, an object is something that can have properties and bear relations to other objects."




Entity, Attribute and Relationship

edit

In other words, the world consists of entities, and these entities may have attributes and may participate in relationships. The consequence of all of this philosophy, is that we can be quite confident, that if we wish to create a description or model of the real world, we can do so by using just these three concepts:

  • entity (class, category, type, object)
  • attribute (property, quality)
  • relationship (relation, association)

In fact, if we should embark upon modelling the world using some other technique, we must answer the question: what is the philosophical and theoretical basis for that technique. If there is no cogent answer, then we must accept that we are being arbitrary and ad hoc in our modelling, ie. we are just making it up as we go along.

It is a common refrain in ontological texts that: "Anyone can say anything about anything".

But just because we can, doesn't mean we should.

Data Modelling

edit

As the field of "Data Processing" evolved from the mid-20th century, software engineers realised that functional requirements could be categorised as being either "data requirements" or "process requirements". The data requirements analysts eventually began to realise that the data should in fact constitute a structural picture of the business, and that the technological details of implementation should be subordinate to the necessity for realism in this picture of the business. This culminated in 1976 with the seminal paper [6] by the father of Entity-Relationship modelling, Peter Chen, which describes the goal of creating models of the real world and comes to the same fundamental concepts as the philosophers who had gone before him:

"A data model, called the entity-relationship model, is proposed. This model incorporates some of the important semantic information about the real world." ...

"The entity-relationship model adopts the more natural view that the real world consists of entities and relationships."




the paper then contrasts the then existing common data model methodologies:

"The data structure diagram is a representation of the organisation of records and is not an exact representation of entities and relationships."



Other authors also had a similar programme, eg. Kent [7]

"One thing we ought to have clear in our minds at the outset of a modelling endeavour is whether we are intent on describing a portion of "reality" (some human enterprise) or a data processing activity." ...

"Most models describe data processing activities, not human enterprises." ...

"They pretend to describe entity types, but the vocabulary is from data processing: fields, data items, values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead techniques for locating records in files." ...

"Failure to make the distinction leads to confusion regarding the roles of symbols ...".

Comparison of the concepts from the various domains of knowledge representation shows the synonomy amongst the terms used:

Domain Type Object Property Data Property
Epistemology object relation quality
Grammar noun verb adjective
ER modelling entity relationship attribute
UML class association attribute

It is thus clear that the ER modelling technique is purpose built to enable modelling of the real world. We are also fortunate in that there exists a wealth of supporting training, experience, best practice and academic literature, which has been built up over the last three decades. Not to mention the philosophical considerations over millenia. The technique is sound. Unlike OWL for instance, there are no semantic paradoxes remaining in its usage.

Modelling Tools

edit

Neither are we limited to pencils and backs of envelopes as our modelling tools. There are many ER tools available (although it is important that the tool is a proper ER tool and not just a record modelling tool in disguise - ERWin being a notable failure in this regard). It can also be demonstrated that a small subset of the extensive number of UML class diagram elements are sufficient to be completely analogous to ER diagrams and so the new generation of UML CASE (Computer Aided Software Design) tools can be used in place of ER CASE tools. In describing the purpose of ontologies, Protégé user documentation[8] says:

"Why would someone want to develop an ontology? Some of the reasons are: to share common understanding of the structure of information among people or software agents" ... "Sharing common understanding of the structure of information among people or software agents is one of the more common goals in developing ontologies" (Musen 1992; Gruber1993)."

The UML class diagrams are exactly "structural" in nature with "information" as their focus. The article continues to give a description "in practical terms" of what the steps are in the development of an ontology, which could just as easily be instructions for building an ER or UML model:

"In practical terms, developing an ontology includes:

  • defining classes in the ontology,
  • arranging the classes in a taxonomic (subclass–superclass) hierarchy,
  • defining slots and describing allowed values for these slots"








although the subsequent steps are more closely related to the entry of instance data into a database.

The OWL2 primer[9] also gives (though very grudgingly) support to the concept that in many cases, an ontology is very closely related to a data model:

"OWL 2 is not a database framework. Admittedly, OWL 2 documents store information and so do databases. Moreover a certain analogy between assertional information and database content as well as terminological information and database schemata can be drawn ... Still, technically, databases provide a viable backbone in many ontology-oriented systems."

This is even more certainly the case in our primary field of interest "interoperability" where it is our key objective to discover and relate together various structures of data. The benefits of using visual modelling tools have been known for many years, one of the primary benefits of ER modelling has always been its visual nature:

"The utility of a visual syntax for modelling languages has been shown in practice and visual modelling paradigms such as the Entity Relationship (ER) model or the Unified Modelling Language (UML) are used frequently for the purpose of conceptual modeling. Consequently the necessity of a visual syntax for knowledge representation (KR) languages has been argued frequently in the past."[10]

Translation

edit

Once we have a model the next step is to then create a representation of the model in OWL. As previously mentioned, we would like this process to be as easy as possible (amongst other requirements!):

"The semantic web and web service take ontology into usage to describe the important concepts and relations among them. But the construction of ontology from scratch is costly and difficult. In this paper an approach is proposed to construct OWL ontology from XML document with the help of entity-relation model, and this approach will alleviate the difficulties in ontology construction."[11]

Another paper[12] also describes the need for a model and for automatic translation of models into OWL:

"Effective collaboration among customer and team members is essential for the creation of the correct ontology model. Equally necessary is a mechanism to automatically transform this model into ontology script."

The paper then describes some of the other benefits of automatic translation of models into OWL:

"This paper describes a new methodology, “Model Driven Ontology”, in which using a standard modeling activity as a key process ... This would lead to a consistent ontology model validated and approved by all members ... is then systematically transformed to a formal ontology, facilitating the development of enterprise-wide information exchange and sharing, which can be uniformly developed, centrally maintained, and efficiently reused ..."

It has long been obvious that most of the processes of software engineering should, where-ever possible, be automated and "untouched by human hand". "Model Driven" architecture, design and programming are all in vogue. It is well understood that automation of these processes has many benefits, the automated processes are then:

  • manageable
  • auditable
  • reversible
  • repairable
  • repeatable
  • shareable
  • efficient
  • standardised
  • self documented

The following figure[13] shows a categorisation of various approaches to the automatic generation of ontologies:

Figure 1 Classification of database-to-ontology mapping approaches.

and also a survey of some tools:

Figure 2 Features of different database-to-ontology mapping tools

Note that DataGenie has now been superceded by DataMaster[14] [15]. Note also that the focus of this memo is the creation of new ontologies rather than mapping to existing.

Methods

edit

There are several options for creating ontologies from UML models. The first consideration is the difference between ontology Classes and Individuals. This is analogous to the difference between table definitions and row instances in a relational database. Thus one immediately obvious approach would be to generate a relational database from a UML model then translate from the database into OWL. This approach would also allow for any required instance data to be inserted into the relational tables and then exported into OWL. The D2R tool[16] supports the export of both tables and rows from a relational database into OWL classes and OWL individuals. This tool creates proper OWL classes, whereas DataMaster and some other tools[17] create definitions of tables and columns wrapped in RDF/XML, usually in Relational OWL.

Figure 3 Schema represented in OWL and Relational.OWL

Although obviously useful for certain purposes, relational OWL is not what we would want, as it models a database and not the domain ie. it is not a model of the real world.

It is also possible to create OWL ontologies directly from a UML model. The obvious path is to first export the UML model into a .XMI file and then have an automatic translator read from the .XMI and generate the OWL ontology. This approach should give to the translator a measure of independence from the UML CASE tool used, due to the XMI standard, however it is well known that vendor standardisation on XMI is unfortunately by no means perfect. There are many groups working on tools for generation of OWL from UML[18] [19] [20] [21]. The OMG (Open Management Group) has also recently published a set of standards, metamodels, profiles and .XMI files to support UML to OWL mapping.[22]. An example[23] using the ODM exists on the Eclipse site.

UML Class models translate well to the ontology Classes; but if any individuals were also required then additional Object modelling may be necessary. An alternative hybrid approach, where instance data is sourced from other datastores eg. spread sheets or databases, would not be difficult and would probably be preferrable.

Fortunately, in general, we would not expect to have great need of very much instance data, eg. we certainly would expect to model a class for PERSON, but it is rather unlikely that we would also wish to model Jack, Jill, Tom, Dick and Harry. Any instance data, which is of interest, is most likely to be "lookup data" such as "reference data" eg. country_codes, currency_codes etc. or perhaps "master data" eg. product_code, department_code etc. and rarely, if at all, "transaction data". Lookup data occurs in much lower volumes than transaction data. If transaction level data is required, it is almost certain to be sourced from operational databases, and would require migration rather than modelling.

It is worthy of note, that the JC3IEDM, an ER model, has recently been translated into OWL. Not only is this an example of exactly the kind of task which will become more common in the future, there should also be lessons learned and even some technology arising from this project.

Conclusion

edit

The purpose of this article has been to show that there is:

  • a preferred method for creating models of the real world, ie. ER (Entity-Relationship) modelling
  • a preferred method for representation of these models in OWL, ie. automated and systematic translation

It is necessary that in order to create correct models, a proven technique is used. It is necessary that in order to avoid the errors and idosyncracies that arise from hand coding, and to efficiently achieve standardised, auditable, repeatable, reversible, repairable ontologies, a systematic automated approach is necessary.

 
no hands ma!

References

edit
  1. ^ http://protegewiki.stanford.edu/index.php/Protege4Pizzas10Minutes
  2. ^ http://en.wikipedia.org/wiki/Plato
  3. ^ http://en.wikipedia.org/wiki/Aristotle
  4. ^ http://en.wikipedia.org/wiki/Category_of_being
  5. ^ http://www.statemaster.com/encyclopedia/Object-%28philosophy%29
  6. ^ Chen, Peter Pin-Shan. "ACM Transactions on Database Systems, Vol. 1, No. 1. March 1976, pp 9-36". The Entity-Relationship Model - Toward a Unified View of Data (PDF).
  7. ^ Kent, William. "North Holland 1978". Data and Reality.
  8. ^ http://protege.stanford.edu/publications/ontology_development/ontology101.pdf
  9. ^ http://www.w3.org/TR/2009/REC-owl2-primer-20091027/
  10. ^ http://www.aifb.uni-karlsruhe.de/WBS/sbr/publications/iswc04%20sbr.pdfSara Brockmans, Raphael Volz, Andreas Eberhart, and Peter Löffler. Visual Modeling of OWL DL Ontologies Using UML.{{cite book}}: CS1 maint: multiple names: authors list (link)
  11. ^ http://www.computer.org/portal/web/csdl/doi?doc=doi/10.1109/CIS.Workshops.2007.139
  12. ^ http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-440/paper14.pdf
  13. ^ http://vision.u-bourgogne.fr/Le2i/user_data/publications/InterDB07-Ghawi.pdfRaji Ghawi, Nadine Cullot. Database-to-Ontology Mapping Generation for Semantic Interoperability.
  14. ^ http://protege.stanford.edu/conference/2007/presentations/10.01_Nyulas.pdf
  15. ^ http://protegewiki.stanford.edu/index.php/DataMaster
  16. ^ http://www4.wiwiss.fu-berlin.de/bizer/d2r-server
  17. ^ http://www.crpit.com/confpapers/CRPITV43deLaborda.pdf
  18. ^ http://wiki.cimtool.org/UMLOWL.html
  19. ^ http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1344798
  20. ^ http://vision.u-bourgogne.fr/Le2i/user_data/publications/InterDB07-Ghawi.pdf
  21. ^ http://umlowlgen.com
  22. ^ OMG (May 2009). "Ontology Definition Metamodel (ODM) Version 1.0.{{cite book}}: CS1 maint: year (link)
  23. ^ http://www.eclipse.org/m2m/atl/usecases/ODMImplementation