Architecture Proposals

In this section we present each of the architectures considered as candidates for a topic map programming model.

API-1: DOM Extension Architecture

The XML Document Object Model (DOM) provides a simple abstraction of an XML document as a tree (or collection of trees) consisting of nodes which represent the XML document markup and content. It is popular with developers because of its simplicity - especially for a developer already familiar with the concepts of XML markup - and because of its functionality - for example, being able to locate the DOM node which has a specific ID attribute value, or locating the set of nodes representing elements with a specific tag name.

API-1 is a topic map programming API developed as an extension of the DOM, similar to the HTML extension which is part of the DOM specification. The DOM Node class provides basic node hierarchy operations, such as insert and deletion of nodes, managing a child list of a node and support for both depth-first and breadth-first traversal. The DOM Element class, which is derived from the DOM Node class. The topic map extension provides a set of additional classes, all derived from the DOM Element class. Most of the classes provide no additional functionality other than a labelling function (returning a distinct value for the nodeType property) with the exception of TopicMap, AddressableSubject and NonAddressableSubject which return the URI of the base address of the topic map, the addressable subject or the subject indicator respectively; and TopicReference which returns the type of reference (with distinct values for references made directly to the topic and references made via a subject indicator reference) and the locator used in the reference. The UML diagram in Figure 1 shows the class structure of this architecture. The DOM classes of Node, Document and Element are shown in this diagram along with some of their public functions to give a feel for the range of operations such an implementation would make available to the programmer.

Developing The API

The principal design issue in developing this architecture is the handling of topic references within the constraints of a tree-based architecture. The data model which we are attempting to represent is not a tree, but a graph of interconnected topics. A tree cannot be used to represent a graph (in the general case) without a construct for cross-linking between tree nodes which are not in a direct parent-child relationship. We provide this construct in the form of a TopicReference Node which is a surrogate for a Topic Node. The TopicReference Node must provide a function to resolve the reference to a TopicNode (which will be a direct child of the TopicMap Node).

Another decision, common to development of all the architectures is the representation of syntactic short-cut constructs such as the <instanceOf> element (which is a short-cut for creating a type-instance association between two topics) and names (which are privileged forms of occurrence). For this representation, we choose to match the form of the DOM extensions such as the HTML DOM and directly represent the syntax. This means that type/association equivalencies and other syntactic short-cuts are directly reflected in the model.

The third issue regards the representation of subjects which are not directly represented by topics in the topic map. Such subjects may be referenced from <subjectIndicatorRef> elements in parent elements such as <subjectIdentity>, <instanceOf> and <member>. For this model, we may either simply represent the parsed XTM syntax or else 'normalize' the syntax in some way. One proposed method of normalization is to reify all subjects referenced in the topic map. This means that when a <subjectIndicatorRef> is imported into this model, if its parent is any element other than a <subjectIdentity> element, it is represented by creating a new Topic Node (as a child of the TopicMap node) with no children and a child SubjectIdentity node which contains a single NonAddressableSubject node which has an href Attribute with the same value as the href attribute of the <subjectIndicatorRef> element; the <subjectIndicatorRef> element itself is represented with a TopicReference node which points to the newly created TopicNode. The advantage that this normalization mechanism confers is that the handling of references to non-addressable subjects is simplified somewhat as the application now need only ever deal with a reference to a Topic Node.

The following table and UML diagram illustrate the form of the proposed DOM extension programming model. The table shows the proposed node types for the DOM extension, with an indication of the expected containment hierarchy (the expected parent of an instance of the node type) and a mapping to the XTM element that the node type represents. 1 The UML diagram shows that the classes representing topic map constructs contain no methods or attributes, as all properties of these topic map constructs can be accessed using the DOM Level 1 operations defined by their base classes. However, in practice the methods of the base classes would almost certainly be supplemented by convenience functions if the API were to be fully developed further.

Node Type (Type in the DOM) Node parent Represents
TopicMap (Element) None topicMap
Type (Element) Topic, Occurrence, Association, Member instanceOf and roleSpec
Topic (Element) TopicMap topic
SubjectIdentity (Element) Topic subjectIdentity
AddressableSubject SubjectIdentity subjectIdentity/resourceRef
NonAddressableSubject SubjectIdentity subjectIdentity/subjectIndicatorRef
BaseName (Element) Topic baseName
Occurrence (Element) Topic occurrence
Scope (Element) Association, BaseName, Occurrence scope
Name (Element) BaseName, VariantName baseNameString, variantName/resourceData
Variant (Element) BaseName, Variant variant
Parameters (Element) Variant parameters
Occurrence (Element) Topic occurrence
Reference (Element) Occurrence, Variant resourceRef
Association (Element) TopicMap association
Member (Element) Association members
TopicReference (Element) Member, Scope, Parameters topicRef, subjectIndicatorRef
Figure 1 - UML for API-1

API Analysis

API-1 offers complete coverage of the XTM syntax, with a class for each of the elements defined in the XTM DTD. As the API is node-based, most of the classes are provided for tagging requirements only. While it would be possible to remove many of the classes shown in Figure 1, these classes do provide the essential hook for extensibility and the development of more complete APIs on a consistent base. Including the DOM classes of Document, Node and Element required to represent a topic map, the API consists of 19 classes and at least 13 class methods (more methods are defined for the DOM classes than are shown in the diagram, but these 13 are the minimum needed to traverse and manipulate the topic map).

Figure 2 shows a simple topic map represented in the programming model of API-1. The associations shown in red between TopicRef objects and Topic objects are generated as a result of evaluating the TopicRef to the Topic it references. The topic map being represented by the data structure shown in this diagram consists of a single association (assoc) between two topics (topic1 and topic2), both of which have a single base name in the unconstrained scope and one of which has an occurrence. The association and the roles of the association are typed by published subjects which are indicated by the reifying topics (rt1, rt2 and at1). This API requires a total of 27 objects to represent the topic map. The large number of programming constructs is due to the closeness of API-1 to the, somewhat verbose, XTM syntax, requiring that <topicRef> and <instanceOf> elements in the XTM syntax of the topic map have matching constructs in the programming model.

Figure 2 - API-1 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

A more serious criticism of this model is that the ordered-tree model of the DOM is not a suitable model for representing a topic map. The DOM regards node order as important, which is not the case for a topic map 2 . The DOM NodeLists are ordered lists, not sets, and so do not directly support operations such as duplicate suppression which are required for a complete implementation of the topic map model. Most seriously of all, the DOM provides no explicit support for making references between nodes which means that either the extension API must define such support (which could be regarded as breaking the DOM model to support topic maps) or else references can only be supported through the manipulation of DOM Attribute Node values. The API presented here does not create explicit references between nodes, but instead relies on the run-time resolution of attribute values to resolve TopicRef nodes to their referenced Topic node. This form of syntax-based reference is awkward for the developer to create and maintaining integrity of references would be more difficult to implement than in a system which uses direct object-to-object references.

API-2: Graph-based Architecture

The graph-based topic map API architecture is developed from the 4th December 2000 draft of the XTM Processing (XTMP) model This document is now no longer available on the Web, however the model is also substantially similar to the Topic Map Processing Model proposed by Newcomb and Biezunski . The XTMP model document, and the Topic Map Processing Model both define the processing of a topic map document into a graph structure which consists of just three types of nodes and four types of connecting arc between nodes. The nodes represent the basic elements of the abstract data model of topic maps: topics, associations and scopes. The connecting arcs are:

  • Scope Participant Arcs which connect a scope node to a topic node which defines one of the subjects in that scope.

  • Association Scope Arcs which connect an association node to the scope node which defines the context within which the association is considered to be valid.

  • Association Member Arcs which connect an association node to a topic node which plays a role in the association. These nodes are optionally labelled by a further topic node which characterises the role being played in the association.

  • Association Template Arcs which connect an association node to a topic node which defines a template for the association, that constrains the roles and the role players which may be used in the association.

  • The processing model also defines the concept of the Subject Identity Point which is a binding point where all topics with the same subject are merged. The concept of merging is central to topic maps and subject and subject identity are pivotal to this concept. A subject may be represented in two distinct ways - by reference to the addressable object which is the subject (the subject constituting resource) or by reference to an addressable object which describes the subject (the subject indicator resource) . A subject identity point is shared by all subject indicator resources which describe the same subject (and the subject constituting resource which is the subject, if such a resource exists).

    Developing the API

    In developing the XTMP model into an API, we have elected to create classes representing each of the node types and class associations to represent the arcs. However, this approach makes it impossible to capture the 'label' property of an Association Member Arc, and so the Association Member Arc has to be promoted to a first-class object and give a property to represent this label.

    The second decision to be made regards the representation of the concept of the Subject Identity Point. A Subject Identity Point consists of zero or one subject constituting resources and zero or more subject indicator resources. The XTM Processing Model defines rules which require that in a consistent topic map there be only one topic node for each Subject Identity Point. This one-to-one relationship means that the properties of a Subject Identity Point (the subject constituting and subject indicating resources) may be expressed as properties of the TNode class. Doing this does not prevent the API from representing topic maps which are not consistent as any such topic map would simply include more than one TNode with the same value for either subjectIndicatingResource subjectConstituting resource.

    The third issue regards the representation of the class-instance relationship. The XTMP model uses a templating mechanism in order to define the core association type which are required to express class-instance relationships, topic-occurrence relationships and other fundamental relationships of the topic map data model. For this reason, we need to break with our previously imposed constraint against inclusion of templating mechanisms in the programming model and include a template property for an ANode which references the TNode which defines the association template.

    Finally an object is required to represent the entire graph with all of its nodes and arcs. This is provided by the TopicMap class which simply serves as a container of topic nodes, scope nodes and association nodes.

    The UML diagram of this API is shown in Figure 3. It should be noted that this proposal is very liberal in allowing almost all references between classes to be traversed bidirectionally. It is arguable that bidirectional traversal of properties should be left out of the core programming model subsystem, delegating these reverse look-ups to an indexing subsystem built on top of the core model. However the bidirectional nature of these relationships is part of the essence of the topic map, especially when viewing that topic map as a graph.

    Figure 3 - UML Diagram of API-2

    Model Analysis

    The programming model developed for API-2 is extremely minimal. It certainly has the desired property of being small in size, just 6 classes (although a total of 40 methods are required to provide complete access to all of the properties of the topic map), but this simplicity is achieved at a cost to practicality as shown by the collaboration diagram in Figure 4. The diagram shows a similar simple topic map to that shown in Figure 2 for API-1, with the exception that to maintain some clarity in the diagram, the occurrence of one of the topics is not shown. Without this occurrence, 31 API objects are required to express the topic map (this total includes the TopicMap object which is not shown in the diagram). With the addition of the occurrence, an extra 6 objects would be required to express the topic-occurrence association template and an extra 5 to represent the topic-occurrence association, bringing the total number of objects required to 42. In practical use, the API complicates the job of the programmer who must be familiar with the XTM Processing Model as well as the XTM Syntax specification to be able to create and manipulate topic maps.

    On the positive side, API-2 treats all syntactic constructs, with the exception of the <scope> and <member> and <subjectIdentity> constructs, as TNodes - so reification of topic map constructs other than <topics>s is easily implemented and API-2 also includes full support for the templating mechanism described by the XTM Processing Model, a feature which is not an integral part of any of the other APIs developed here.

    Figure 4 - API-2 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

    API-3: XTM Conceptual Model Based Architecture

    The XTM 1.0 Specification includes an annex which describes the conceptual model implemented by the specification. As a record of 'what was in the minds' of the group which produced the XTM Specification, this document provides important input into the process of developing a programming model. To examine if this model is sufficient for a programming model, we present here a programming model developed upon the XTM Conceptual Model.

    The programming model presented as a pair of UML diagrams below is derived directly from the UML diagrams presented in the XTM 1.0 Specification annex. The first diagram shows the top-level class hierarchy of the API. The class Subject provides the explicit representation of the reification function of topics its relationship with the class Class, is used to represent the various type-instance relationships which exist in the XTM model and represented syntactically by the <instanceOf> and <roleSpec> elements.

    Figure 5 - UML Class Diagram of API-3 - Upper Hierarchy
    Figure 6 - UML Class Diagram of API-3 - Main Classes

    Model Analysis

    While the Conceptual Model clearly defines the relationship between topics and real world objects (by the use of Subject, NonAddressableSubject and Resource classes), the additional constructs required to do so add three extra classes, complicating the API for developers and causing API-3 to diverge from the XTM syntax. In fact despite consisting of some 13 classes and 40 methods, API-3 suffers from a lack of completeness with respect to coverage of the syntax as the <variantName> syntactic construct is not represented. To represent the <variantName> construct, it is necessary to consider a <variantName> as an occurrence of a topic with a fixed role type and with a scope defined as a union of the subjects referenced from the <parameter> elements of its ancestor <variant> elements and the <scope> element of its ancestor <baseName> element. From a conceptual perspective, this is clean as the two forms (a <variantName> and an <occurrence> of a specific type) are equivalent and it is redundant to include both forms in the model. However, from a programmer's perspective, the need to iterate or search through all of the occurrences of a topic to locate and manipulate its variant names and the lack of the ability to create a nested hierarchy of variant names as provided by the XTM syntax are weaknesses in the programming model.

    The way in which API-3 expresses the class-instance relationship is also divergent from the XTM syntax. API-3 allows any Subject instance to be in a class-instance relationship with zero or more Class instances (each of which is a NonAddressableResource). This is an accurate reflection of the underlying model of topic maps. However, the mechanism provided by the XTM syntax for defining class-instance relationships is to reify the Subject and the Classes to Topics and to define a class-instance relationship between the reifying topics. This syntactic mechanism should be more directly supported by a programming model to enable simpler import and export of XTM syntax data to/from the programming model, and to improve the mapping between the syntax and the programming model for developers already familiarity with the XTM syntax and the mechanism of reification. That said, Figure 7 shows how much simpler this makes the collection of objects required to express a sub-type/super-type relationship. The core concepts of sub-type, super-type and the sub-type/super-type association are represented as three Class objects (which are derived from Subject and so may have 0 or more SubjectIndicators), without the need for creating topics to reify the subjects. This means that only 16 objects are required to express the topic map (including the TopicMap object which is not shown in the diagram for clarity).

    Figure 7 - API-3 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

    API-4: Modified Conceptual Model-Based Architecture

    API-3 shows great promise as a base from which to develop a practical topic map programming model. To refine the model, we need to reduce the complexity of the representation of the relationship between a topic and a subject and we need to add the necessary classes to enable the syntactic constructs of <variant> and <variantName> to be represented more explicitly in the programming model.

    Developing The API

    To simplify the upper hierarchy of API-3, we choose to instead implement the Topic/Locator relationship from API-2. We remove Class, Subject, Resource and NonAddressableResource, and add the Locator class, making it the type for the properties of subject and subjectIndicator (changed from subjectConstitutingResource and subjectIndicatingResource, to more closely match the current state of the XTM Specification).

    Having removed Subject and Class from the class hierachy of the API, we must replace the, now removed, class-instance association for at least Topic, Association and TopicCharacteristic. Any class-instance relationship in a topic map may be represented as an association between a topic reifying the typed node and the topic or topics which reify the class of the node. However, the most common syntactic form of representation is the use of the <instanceOf> child element for the typed node, which is equivalent to defining an association between a topic reifying the typed node and the topic or topics reifying the class of that node. Thus the mechanism for the representation of type-instance associations in the programming model are directly related to the mechanism used for representing the reification of constructs of the topic map (such as associations, members, occurrences and so on). The means of representing reification of topic map constructs within a programming model may be broadly divided into the "implicit reification" of topic map constructs and the "explicit reification" of the constructs. Implicit reification requires that the programming model's class hierarchy acknowledges that all topic map constructs may be reified and so may exhibit any of the properties of a Topic. Typically, this would be done by making the Topic class a super-class of all other classes representing topic map constructs. Implicit reification is already a part of API-2. Explicit reification requires that the programmer control reification through the creation of a Topic object which regards another topic map construct as its subject indicator - typically an explicit reification programming model provides no direct support for reification beyond a means to uniquely and persistently identify any topic map construct. API-1 and API-3 are examples of explicit reification programming models.

    Figure 8 - Modified API-3 Class hierarchy with Implicit Refication
    Figure 9 - Modified API-3 Class hierarchy with Explicit Reficiation

    The implicit reification programming model (shown in Figure 8) has the advantage that a programmer is not required to perform any operations to establish the reification of a topic map construct. If she wants to give an Occurrence object a BaseName, this may be achieved simply through the inherited functions of the Topic class. The explicit reification model (shown in Figure 9) has the advantage of being more closely related to the XTM syntax, however, in this case, maintaining a close relationship with the syntax is exposing the developer to one of the weaknesses of the serialized syntactical form, so in this case a divergence from the API may be justified in that it delivers far greater functionality and there is a very clear relationship between the syntactical form of reification and its representation in the programming model. To support implicit reification of all topic map constructs, we choose to make Topic a super-type of Association and Member. The class TopicCharacteristic has no methods or properties which are common to its subclasses - so it is removed from the class hierarchy.

    In order to support the class-instance association in its commonly-used syntactic short-cut form (using <instanceOf> to represent a class-instance association in the unconstrained scope), we define the classes property for a Topic as a collection of Topic objects. The classes property represents only the class-instance associations in the unconstrained scope. To get all types in a particular scope, we must define a helper operation getTypes(Scope s) which returns all of the Topic objects which play the role of class in a class-instance association in which the current topic plays the instance role, where the association member characteristics of each role are in the scope s. It is not necessary to provide an equivalent setTypes() function as this operation may be implemented by creating an Association object which links the Topic object and its type. As we have already solved reification by deriving all other topic map constructs from Topic, the solution for Topic applies equally to all other topic map constructs represented in the API.

    To complete the coverage of the XTM syntax in API-4, two new classes are added, Variant and VariantName. Initially, both Variant and VariantName are derived from TopicNode. However, as both BaseName and Variant share the property of a list of child Variants, the API is extended to introduce a common super-class, VariantContainer. The resulting API class diagram is shown in Figure 10

    Figure 10 - API-4 final class diagram

    API Analysis

    API-4 maintains a very close mapping to the XTM syntax. All of the syntactic constructs can be mapped to a class or property in the API which in most cases shares the name of the syntactic construct. The only construct without a direct mapping is the <subjectIdentity>, the content of which is represented by the subjectIndicators and subject properties of the Topic class. This complete coverage is costly in terms of additional classes and functions, bringing the total size of this API to 11 classes and 48 methods. Much of the complexity of the API is contained in the Topic class (with 13 class methods) which is the super-type for most of the other classes. Figure 11 shows that for our simple example, API-4 proves no more complex than API-3, requiring 16 objects (including the TopicMap object which is not shown) to represent the topic map.

    Figure 11 - API-4 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

    1 Where a node type represents an element in context, the context is shown using XPath slash-separated path syntax. Back

    2 In fact, it could be argued that for certain applications, it is desirable for the ordering of objects in the source topic map to be preserved. This is especially the case for editing applications where the ability to 'round-trip' a file without significant alteration to its content is often seen as a desirable feature. Back

    Up: Developing A Topic Map Programming Model
    Previous: Requirements of a Topic Map Programming Model Next: Beyond The Data API