Topic Map Fragment Exchange

The exchange of small subsets of topic map information is the principle form of communication between TMShare peers. A topic map fragment is a self-contained topic map, expressed in the XTM 1.0 interchange syntax. The fragment is created as a copy of constructs from one or more source topic maps. The constructs which are copied are based on a simple algorithm which is described in more detail below.

Fragment Requests

Requests are issued by TMShare peers either as a result of the user browsing to a particular topic in a local topic map or in the cache of topic map information received from previous requests; or as a result of a direct invocation of the query by user action. Requests fall into one of three categories:

  • A request for a specific topic by the topic's object ID.

  • A request for all topics which represent a specific subject by specifying a subject indicator or subject address URI.

  • A request for all topics which satisfy a tolog query.

  • Request Topic By ID

    This type of request requires that the sending peer be already aware of a particular topic map object that it requires more information on. This knowledge may have been acquired by some process external to the operation of TMShare (e.g. the object ID was provided as a link in an email from a colleague); or it may make use of information about topic map objects received on previous requests.

    The ability to specify one particular topic map object depends upon a scheme for globally unique identifiers for topic map objects. TMShare achieves this by use of the globally unique peer identity strings provided by JXTA — each topic map object ID is constructed from a concatenation of the JXTA peer ID; the topic map identifier assigned to the topic map by the peer which manages the topic map; and the ID assigned to the object within the topic map. As JXTA peer identifiers are universally unique; and topic map identifiers assigned by a peer are required to be unique across all topic maps managed by that peer, the individual identifiers assigned to topic map objects can be shown to also be universally unique.

    A useful side-effect of the object identification strategy is that by making use of the JXTA peer identifier as the root of the object identifier, a peer can quickly identify the source of a particular topic map object and if necessary, use that identifier to establish a direct connection to the source.

    Request Subject By Subject Indicator or Subject Address

    This type of request allows a requesting peer to specify one or more subject indicator or subject address URIs. In a topic map, any given subject may have one or more subject indicators, which are identifiers for the subject encoded as URIs. The subject indicator URI is independent of the address of the topic or topics that describe the subject. As a result, multiple peers may all possess different information regarding a subject identified using the same subject indicator URI. By broadcasting a request for information pertaining to a subject by using a subject indicator URI a peer may expect multiple responses, which may support or conflict with each other. It is the responsibility of the requesting peer to determine how to handle any conflicts that may arise.

    Due to the location-independent nature of subject indicators, these are the preferred means of requesting additional topic map information from remote peers when browsing a local topic map. By default, a TMShare application will send a request using this mechanism to all members of the TMShare network whenever a topic is selected in the browser interface that has one or more locally specified or cached subject indicators associated with it. To minimize traffic between nodes, a TMShare peer keeps track of the most recently sent requests and will not rebroadcast requests for those subject identifiers which were last requested within a configurable time-out period.

    Request By Tolog Query

    The final type of request allows a peer to retrieve a more restricted set of information pertaining to a given subject. The tolog language allows queries to be expressed in terms of the association relationships between subjects. For example the tolog query:

    select $A from performed-by($A:performer, i"http://www.songz.com/I/I_Fought_The_Law":song)

    is a request to peers to return a topic map fragment in which the principle subjects represented are all performers of the song "I Fought The Law" as identified by the (fictional) subject indicator URI " http://www.songz.com/I/I_Fought_The_Law"

    Tolog queries result in a two-dimensional table of results, much like that created by an SQL query of a relational database. The standard fragment construction algorithm loses the information about which topics in the fragment are actually part of the tolog query result set and which topics are merely added to the fragment as part of the fragment-building algorithm. In addition, a tolog query may actually request bindings for multiple variables. For example, the query:

    select $A, $B from recorded-by($A:song, $B:artist), written-by($A:song, i"http://www.songz.com/perry_lee")

    requests all songs written by Lee Perry and the artists who recorded those songs. The algorithm for fragment construction will lose the information which indicates which topics are bound to the variable $A and which are bound to the variable $B, and which combinations of the variable bindings match the query criteria.

    The topics that are bound to variable in the result set can be easily incorporated into the topic map fragment sent back as the result. However, the incorporation of the information about which combinations of variable bindings are valid is more complex. This is dealt with later.

    Fragment Construction

    When a peer receives a request for a topic map fragment, the request query is executed against the unified view of both locally held topic map information and cached topic map information received from other peers. In this way it is possible for topic map information to be propagated around a network of peers and to be stored on multiple peers, thus increasing the chance that it will be found when requested.

    Each of the requests resolves to a set of topics which meet the request criteria. This set is then processed to construct a topic map fragment to be returned. A topic map fragment is a subset of the topic map information maintained by the peer which is centred on the topic or topics in the result set of the request. The basic topic map fragment construction process is governed by the following inputs:

  • The set of topics that match the query criteria.

  • The breadth of the fragment requested by the querying peer.

  • The maximum breadth of fragment that the responding peer is prepared to send.

  • The responding peer first determines the breadth of fragment that will be sent. This is the lesser of the breadth of fragment requested by the querying peer and the maximum breadth of fragment that the responding peer is prepared to send.

    Fragment construction copies topic map objects from the source topic map or topic maps to the result fragment. The process uses the concept of a shallow copy of a topic and an association, and the concept of stub topics. A stub topic is a copy of the source topic which excludes any typing information for the topic, any occurrences of the topic, all roles played by the topic, and all names which are not in the unconstrained scope. A stub topic must always have at least one subject indicator URI which may be later used to retrieve all the information which is not copied to the stub. If the topic does not have a subject indicator URI assigned to it, then the address of the source topic is used as the stub's subject indicator URI.

    A shallow copy of a topic results in a copy of all of the characteristics of the topic, except for the roles played by the topic in any associations. In a shallow copy of a topic, only the topic's subject identifier; subject indicators and unscoped base names are copied. In addition, all topics used to define the type or types of the shallow-copied topic are copied as stub topics.

    A shallow copy of an association results in a copy being made of all roles of the association, but the association type, the themes in the scope of the association and all association role players are copied as stub topics.

    The algorithm for fragment construction is best described in pseudo code.

    let deepCopies be the empty set
    let shallowCopies be the empty set
    let maxRadius be the width of the fragment required
    let resultsSet be the set of topics which matches the query criteria
    
    for each r in resultsSet:
      addTopic(r, 1)
    
    addTopic(t, depth):
      if (t in deepCopies):
        return
      if there is a copy of t in shallowCopies:
        let cp be the shallow copy of t that was already made
      else:
        let cp be a new shallow copy of t
    
      if (depth < maxRadius):
        for each x which is a typing topic of t:
          addTopic(x, depth + 1)
        for each c which is a scoped name or variant name of t:
          add a copy of c to cp
          for each x which is a theme in the scope of c:
            addTopic(x, depth + 1)
        for each o which is an occurrence of t:
          add a copy of o to cp
          for each x which is a theme in the scope of o:
            addTopic(x, depth + 1)
          let ot be the topic defining the type of o
          addTopic(ot, depth + 1)
        for each a which is an association in which t plays some role:
          addAssociation(a, depth)
        if cp is in shallowCopies:
          remove cp from shallowCopies
        add cp to deepCopies
      else:
        add cp to shallowCopies
    
    addAssociation(a, depth)
      let cp be a shallow copy of a
      if (depth + 1 < maxRadius):
        let at be the topic which types a
        addTopic(at, depth + 1)
      for each role r in association a:
        let mt be the topic which defines the type of role r
        addTopic(mt, depth + 1)
        for each p which is a topic playing role r in association a:
          addTopic(p, depth + 1) 
          
    Example 1 - Pseudo code description of the fragment construction algorithm

    Because stub topics are always created with at least one subject indicator, this algorithm ensures that in the case where a topic is first copied as a stub and is then later in the fragment building process found to actually be within the breadth of the fragment then the stub and the full copy of the topic will be merged. In addition, this property of stub topics ensures that a peer is able to combine the fragments received from several sequential queries, again using the subject-based merging principle of the topic map paradigm.

    The diagram below shows the effects of this fragment construction algorithm. It shows a schematic view of a topic map as a collection of topics connected by associations. For the purposes of this explanation, this view does not show topics used for scope or types. The upper half of the diagram shows the topics and associations that are copied to a fragment with a breadth of (1); the lower half shows the topics and associations that are copied to a fragment with a breadth of (2). The numbers inside the circles represent the number of hops from the root topic of the fragment.

    Figure 4 - The Fragment Construction Algorithm

    Tolog Result Sets

    As already discussed, a tolog result set consists of a collection of topics structured into what can be thought of as a two-dimensional table of results. Simply adding the topics in the result set to the result fragment would not preserve the structure of the result set, which is important information in being able to present the results of the query meaningfully.

    To overcome this problem, two different approaches to enriching the result set were considered. One approach embeds the structure of the result set into the topic map fragment generated by the peer. The second approach separates the description of the result set structure from the topic map fragment generated from the results.

    Enrichment By Adding Topic Map Constructs

    This approach enriches the topic map fragment created from the result set by adding topics that represent the variables and associations that aggregate the topics on each row of the tolog result set.

    For each variable in the 'select...from' part of the tolog query, one topic is created to represent the variable. For each combination of variable bindings that match the query criteria, one association is created to represent that particular combination. The association created has one role for each variable binding in the select clause where the role player is the topic bound to that variable.

    For example, if the tolog query given above resulted in finding the song "I Fought The Law", performed by The Clash, the tolog result set could be represented as the table shown below (the URIs in parenthesis are fictional subject indicators for the result topics).

    A B
    I Fought The Law (http://www.songz.com/songs/I/I_Fought_The_Law) The Clash (http://www.songz.com/artists/C/Clash_The)

    In XTM syntax, this tolog result set would be represented as:

    <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE topicMap SYSTEM "xtm1.dtd" []> <topicMap> <!-- A topic to type the variable-binding association(s) --> <topic id="row"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/result-set-row"/> </subjectIdentity> </topic> <!-- Topics representing the variables in the select clause of the tolog query --> <topic id="var_A"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/variable/A"/> </subjectIdentity> </topic> <topic id="var_B"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://tm4j.org/tolog/variable/B"/> </subjectIdentity> </topic> <!-- Topics in the tolog result set --> <topic id="i_fought_the_law"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.songz.com/songs/I/I_Fought_The_Law"/> </subjectIdentity> <!-- Other topic characteristics --> </topic> <topic id="the_clash"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.songz.com/artists/C/Clash_The"/> </subjectIdentity> </topic> <!-- This association represents one row in the result set --> <association id="row1"> <instanceOf> <topicRef xlink:href="#row"/> </instanceOf> <member> <roleSpec><topicRef xlink:href="#var_A"/></roleSpec> <topicRef xlink:href="#i_fought_the_law"/> </member> <member> <roleSpec><topicRef xlink:href="#var_B"/></roleSpec> <topicRef xlink:href="#the_clash"/> </member> </association> <!-- Other topic map constructs in the result set fragment --> </topicMap>

    This approach integrates the structure of the tolog result set directly into the topic map that is generated by the query processing. This has the advantage that the structure information can be extracted without the need to implement any new form of parsing. However, this approach also complicates the issue of caching topic map fragments received in response to a tolog query as it is unlikely that the information regarding the structure of a particular result set is required by the client once the tabular display of the results for a particular query is closed. For this reason, it was decided to examine a different approach which separates the result set topic map fragment from the information regarding the structure of the result set.

    Enrichment By A Parallel XML Document

    This approach to representing the results of a tolog query separates the topic map fragment created from the result set from an XML document which describes the result set structure. The topic map and the result set structure document are both then included in the response message generated by the peer after processing the request.

    Using this approach, the response message generated by a TMShare peer on processing a tolog query request has the structure shown below:

    <?xml version="1.0" encoding="UTF-8"?> <tmshare:TologResult xmlns:tmshare="http://jxta.org"> <query> <line>select $A, $B from recorded-by($A:song, $B:artist), written-by($A:song, i"http://www.songz.com/lee_perry") ?</line> </query> <results> <numCols>2</numCols> <vars> <var>A</var> <var>B</var> </vars> <row> <value>http://www.songz.com/songs/I/I_Fought_The_Law</value> <value>http://www.songz.com/artists/C/Clash_The</value> </row> </results> <resultsMap> &lt;xtm:topicMap xmlns:xtm="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"/&gt; ... &lt;/topicMap&gt; </resultsMap> </tmshare:TologResult>

    It should be noted that the JXTA APIs used by TMShare do not allow the message to be constructed in such a way that it would parse as a well-formed name-spaced XML document. For this reason, the XTM syntax topic map is escaped in the response message, enabling it to be extracted as a single string by the receiving peer for processing through TM4J's XTM parser, while the XML description of the tolog result set structure is accessed through the JXTA APIs.

    This approach enables those subsystems of the TMShare application which are interested in presenting the structure of the tolog result set (such as the tabular display of results) to access that information while not importing that information into the general cache — thus preventing it from cluttering the browser view of the unified topic map.

    Receiving Peer Fragment Filtering and Enrichment

    When a peer receives a fragment in response to a request, the topic map fragment is imported into the local cache. This cache is itself a single topic map, so the import process is a merge of the received fragment with the cache topic map using the standard rules defined by the XTM specification.

    Before being imported, however, the received topic map fragment is enriched by adding to it topic map constructs which enable the provenance of the topic map information to be determined. This provenance information consists of a topic that represents the peer from which the fragment was received and a topic that represents the received message itself. This latter is called the message provenance topic.

    The message provenance topic is used as an added theme when the topic map fragment is merged with the cache topic map. This results in the provenance topic being used as a theme in the scope of every topic characteristic provided by the received topic map fragment. The topic that represents the peer is associated with the message provenance topic making it easy to locate all of the topic map information provided by a specific peer. In addition, a message provenance topic has a single occurrence that specifies the time at which the message was received. In combination, the peer and message provenance topics can be used to:

    The advantage of having this provenance information as part of the topic map used to cache received information is that the standard topic map filtering and extraction tools provided by the TM4J library can be used to implement all of these features and it also makes it possible for later extensions such as peer trust metrics to be implemented without requiring a change in the format of the request and response messages used by peers.

    Up: TMShare - Topic Map Fragment Exchange In a Peer-To-Peer Application
    Previous: The TMShare Application Next: Conclusions