Publishing Solutions

One of the stated goals of the XTM specification is "to improve the findablility and manageability of information", and one of the earliest and most obvious applications of topic maps has been to the organisation and navigation of large quantities of published information. In the solutions which fall into this category, the topic map typically acts as a highly structured index of statically published information, relating each piece of information to the topic or topics to which it is relevant and then interrelating those topics. For an end-user, the result is a highly intuitive set of cross-linked resources which make it easier to browse and search for information contained within the corpus.

The IRS Tax Publications

The United States Inland Revenue Service (IRS) produce a large quantity of documentation on tax requirements and tax legislation which are important to individuals and organisations both large and small. However, each publication is produced and indexed separately. When gathering a collection of publications on a single CD-ROM, the IRS then found that they had the problem of not only unifying the indexes but in providing access to the unified index from within the content of each of the publications. The "2001 Tax Products" CD-ROM produced by the IRS makes use of topic maps to create a navigation layer that spans all of the publications on the CD-ROM. The "entry" to the collection is a traditional contents list, showing the publications on the CD-ROM (see Figure 1).

IRS 2001 Tax Products Contents
Figure 1 - IRS 2001 Tax Products Contents

Each publication then has its own contents page which includes a link to an index generated from the names of the topics found by processing the source of that document (see Figure 2). Entry to the index is also provided from within the document and is presented as links to the left of the content (see Figure 3). These links are generated when an occurrence of a topic within the content is located. The links then take the user to a page generated for the topic itself. This page aggregates all of the occurrences of the topic across the entire CD-ROM, allowing the user quick access to a whole range of information related to the index term. In the example shown below ( Figure 4 ) we can see how browsing the term "Estimated Tax" can be used to jump to information for small business, the self-employed or for business partners - the publication titles and the content fragments displayed in the index enable the user to quickly choose the most relevant documents.

Contents Of A Single Publication
Figure 2 - Contents Of A Single Publication
Index Terms In Publication Content
Figure 3 - Index Terms In Publication Content
Links To Occurrences Of The Index Term (Topic)
Figure 4 - Links To Occurrences Of The Index Term (Topic)

The flexibility of the topic map structure enables the index information to carry more than just the locations of occurrences of each index term. Additional relationships to other index terms can also be included within the topic map, allowing both a hierarchical organisation of the index terms as well as facilitating cross-references between terms (see Figure 5).

Index Term Hierarchy
Figure 5 - Index Term Hierarchy

The topic map and the rendering to HTML was created by consultant Michel Biezunski, using his tool Topic Map Loom. The starting point for generating the topic map was the original source documents, which are in SGML. Michel uses Topic Map Loom to allow the automatic extraction of topics and occurrences of the topics from the source documents. The topic map generated by doing this is then used to generate the navigational aspects of the rendered HTML. In general, topic maps are particularly well suited to situations where a corpus is composed of a set of separately indexed pieces. Topic maps were designed from the start to be mergeable - the individual topic maps for each separate publication can be combined using the merging principles of the topic map paradigm to create a single global topic map.

The IRS were looking for a technology to enable them to smoothly upgrade their existing, single-document indexes into a collection index suitable for publishing on a CD-ROM. They were also looking for a way to represent not only the index terms and their occurrences but also to model more complex relationships between index terms to help guide a user through the documentation. Both of these needs could be met with a solution based on representing these structures with a topic map. The use of the topic map also provides the publishers the flexibility to later enrich the navigation with further cross-references and external references without the need to alter the underlying technology and processes used to generate that navigation.

Cogitech - Topic Maps And XSLT

Cogitech specialize in the production of low-cost intranet and internet web-sites using a combination of XML technologies in which topic maps are used extensively to create the structure of the web-site. As with the Quid web-site, the web pages are generated from topics in the web-site's topic map. However on the Cogitech web-site (http://www.cogx.com), not only is each page represented by a topic in the topic map, but the pages themselves are constructed by traversing associations from the 'page' topic to other topics, each of which become an item on the page. Another difference here is the use of XSLT to transform the structural information in the topic map into navigable links in the published web-site. Topic maps were chosen by the site's designer and implementer, Nikita Ogievetsky, because of the ability to not only represent the web-site as a non-directed network of display elements but also to encode style and behaviour information within the same information structure. Additionally, he found that the XML interchange syntax for topic maps to be sufficiently simple to be readily handled with XSLT.

In addition to using XSLT to generate web-site content from topic maps, Nikita has also developed reusable style-sheets for harvesting topic map information from other meta data forms. The use of XSLT for producing the hyper-linked pages reduces the cost of production by enabling the use of free tools.

Before applying topic maps to this problem, Nikita investigated the use of XLink but found that it was not sufficiently expressive to represent the styling and behavioural aspects of the site. He also found the RDF and RDF Schema caused different problems - although both are sufficiently expressive, each different site organisation would require a different RDF schema with a different set of elements - thus requiring more custom XSLT code.

Up: Topic Maps - A Practical Introduction With Case Studies
Previous: Introduction Next: Web Application Development