Developing A Topic Map Programming Model


Table of Contents

Justification for Developing a Topic Map Programming Model
Reduced Developer Learning Curve
Application Portability
Enable Library Development
Relationship to TMQL
Requirements of a Topic Map Programming Model
Architecture Proposals
DOM Node / Node List based Architecture
Graph-based Architecture
XTM Conceptual Model Architecture
XTM Syntax-based Architecture
XTM Syntax-based Stream Architecture
Conclusions

Abstract

Topic maps provide a standard data model for desribing complex, interconnected information. ISO/IEC 13250:2000 and XTM 1.0 provide standard serializations of that data model using SGML and XML syntax. We believe that having addressed the problem of interchange, it is time for the topic map community to consider the development of a standard interface for programming. Developing a standard programming model has the benifits of flattening the learning curve for developers and protecting the investment that businesses make in bespoke topic map application development.

This paper describes an attempt to develop a progamming model for topic maps which meets the requirements of maximum simplicity; maximum practicality and minimal size. We explore possible means of representation including the use of the Document Object Model's Node and NodeList interfaces; an abstract graph-based representation and an object-oriented interface based on the XTM Conceptual Model. In addition, this paper considers the standard programmatic tasks of topic map parsing, serialization and traversal and suggests a possible programming model to enable these tasks to be carried out more efficiently.

Justification for Developing a Topic Map Programming Model

The topic map programming model embodies two major pieces of work - the development of a topic map data model; and the development of an object-oriented API for modern OO procedural languages such as Java, C++ and Python. In our opinion, it is the latter which is the important end-product of this process. Standard APIs are in general a Good Thing. The reasons why are manifold.

Reduced Developer Learning Curve

By introducing a standard API for topic maps, a developer can take her knowledge from one implementation of a topic map system to another without having to spend significant amounts of time learning a new API. Additionally, by standardising on a single API, training material and developer support communities need not be restricted to the vendors - this would make it possible for a much larger body of training material to be made available to the topic map "newbie". An extant example of this is the number of DOM API training courses which are available (A quick Google search lists 3,220 hits for the search string "DOM API Training")

Application Portability

For organisations making investment in developing topic map applications, the existence of a standard API provides a degree of protection for that investment. The current situation for topic map application developers is that moving between systems would require a complete rewrite of the customised code, so despite having invested in a portable format for information organisation and exchange, customers are still locked-in to a particular vendor's implementation by the APIs used to develop their bespoke applications or application extensions.

Enable Library Development

A standard API for topic map access and manipulation would also allow developers to create higher-level application development libraries and tools which are portable across all systems implementing that API. This could enable the development of high-level toolkits such as standard indexing and querying APIs, toolkits for topic map generation from meta-data sources and topic map visualisation and navigation applications. Again, XML's SAX and DOM show the way, with applications ranging from transformation (XSLT) to content management applications (Cocoon and Zope) << More examples ??? >> all being created on top of these standard access and manipulation APIs.

Standard APIs should be seen as a necessary prerequisite for topic maps to be moved into wide-spread consumer applications such as web browsers and operating systems.

Relationship to TMQL

Topic Map Query Language (TMQL) is a proposed work item for both ISO and XTM. TMQL will provide a standardised language for topic map query and update, similar in scope to that of SQL for relational database systems. There is overlap between the purpose of TMQL and that of a standard topic map API in that both are attempting to define a standard means of data access and manipulation. The current TMQL proposal defines operations on topic maps which return topic maps as their 'results set'. Such results sets would still require representation in a data model and APIs for accessing that data to be useful to a client application, in this way a standard topic map API would be a natural adjunct to TMQL providing a JDBC-like API for manipulating the results of a TMQL query.

Requirements of a Topic Map Programming Model

Minimal size: - Cover the core and let developers create the extensions. Keep # of classes and operations low.

Simplicity: - Ground the API in structures which will be familiar to a topic map user.

Practicality: - Support for common processing operations: import, export, traversal and direct manipulation. Additional practicality such as providing merging, indexing and filtering would sacrifice the simplicity principle. In other words, draw a box around the core model - use that core model to provide a solid foundation for building support for future standard apps such as TMQL (compare to XSLT on top of DOM)

Constraints:

  • No representation of templating mechanisms
Should indexing be out of scope for this excercise ?

Architecture Proposals

<< One sub-section for each proposal. Each sub section should contain a description of the model; some simple example of the kinds of interfaces which would be provided by the architecture (in Java / Python / IDL); the advantages and disadvantages, (esp. wrt the Requirements).

DOM Node / Node List based Architecture

Attempt to develop TM API as an Extension to the DOM.

Will probably have to give up halfway through due to lack of flexibility in the architecture - not suprising as it is really an architecture for representing the syntax.

Graph-based Architecture

The graph-based topic map API architecture is developed from the 4th December draft of the XTM Processing Model XTMP. This document defines the processing of a topic map document into a graph structure which consists of just three types of nodes and four types of connecting arc between nodes. The nodes represent the basic elements of the abstract data model of topic maps: topics, associations and scopes. The connecting arcs are:

  • Scope Participant Arcs which connect a scope node to a topic node which defines one of the subjects in that scope.
  • Association Scope Arcs which connect an association node to the scope node which defines the context within which the association is considered to be valid.
  • Association Member Arcs which connect an association node to a topic node which plays a role in the association. These nodes are optionally labelled by a further topic node which characterises the role being played in the association.
  • Association Template Arcs which connect an association node to a topic node which defines a template for the association, that constrains the roles and the role players which may be used in the association.

The processing model also defines the concept of the Subject Identity Point which is a binding point where all topics with the same subject are merged. The concept of merging is central to topic maps and subject and subject identity are pivotal to this concept. A subject may be represented in two distinct ways - by reference to the addressable object which is the subject (the subject constituting resource) or by reference to an addressable object which describes the subject (the subject indicator resource) . A subject identity point is shared by all subject indicator resources which describe the same subject (and the subject constituting resource which is the subject, if such a resource exists).

Developing the API

In developing this data model into an API, we have elected to create classes representing each of the node types and class associations to represent the arcs. However, this approach makes it impossible to capture the 'label' property of an Association Member Arc, and so the Association Member Arc has to be promoted to a first-class object and give a property to represent this label.

The second decision to be made regards the representation of the concept of the Subject Identity Point. A Subject Identity Point consists of zero or one subject constituting resources and zero or more subject indicator resources. The XTM Processing Model defines rules which require there to be, after processing only one topic node for each Subject Identity Point, this one-to-one relationship means that the properties of a Subject Identity Point could be expressed as properties of the topic node.

Finally an object is required to represent the entire graph with all of its nodes and arcs. This is provided by the TopicMap class which simply serves as a container of topic nodes, scope nodes and association nodes.

The UML diagram of this API is shown in Figure 1.. It should be noted that this proposal is very liberal in allowing almost all references between classes to be traversed bidirectionally. Should it be ? Should we move some of the reverse lookups to an index ? e.g. LocatorIndex ? Also is it necessary to be able to traverse from a TNode to an SNode ?

Figure 1. UML Diagram of the Node Graph Topic Map API

API Analysis

Very minimal - but probably too much so.

XTM Conceptual Model Architecture

Translate the UML from the conceptual model group directly into classes.

This will probably turn out to be very close to the kind of model I think we both have in mind, but will lack some constructs.

Figure 2. UML Class Diagram for Conceptual Model-based API

XTM Syntax-based Architecture

Create a model that more directly reflects the XTM syntax (justification: once you know the syntax, using the API is not a big mental leap)

XTM Syntax-based Stream Architecture

Assuming that the XTM Syntax-based architecture turns out to be the favourite, then go on to suggest a SAX-style stream-based architecture.

Conclusions

Some final thoughts about the selection process.

Any interesting points that come up during evaluation of the different architectures.

Bibliography

XTMP. XML Topic Maps (XTM) Processing Model 1.0 (http://www.topicmaps.org/xtm/1.0/xtmp1-20001204.html). TopicMaps.Org. 4th December 2000.


Copyright Ontopia AS, 2001