Thesaurii

Most of us will be familiar with the thesaurus, from our school days of creative writing if nothing else. A thesaurus is a collection of terms which rather than being arranged alphabetically are arranged into groups of synonymous terms. With a thesaurus you can look up a word such as "food" and find such synonyms as "aliment", "board", "chow" and so on.

The thesaurus can also be used to relate terms which are not synonyms with one another. The most common relationship is between a broader term (often shortened to "BT" in thesaurus entries) and one a narrower term ("NT") with a definition which is in some way more constrained than the broader one. For example the term "science" might have narrower terms such as "physics", "chemistry", "biology" and so on.

A thesaurus can also be used to express a controlled vocabulary - that is a set of terms which should be used listed with (as their synonyms) the terms which are deprecated in favour of the preferred term. For example a controlled vocabulary for a repair manual might state that authors use the term "torque wrench" rather than "torque-wrench". In a thesaurus, this is represented by the "USE" or "USE FOR" relationship between terms - "USE" will be found on a term entry which is deprecated and will refer to the preferred term, and "USE FOR" references deprecated terms from the preferred term.

Although the broader/narrower term relation and the use/use-for relation are the most commonly found term relationships, a thesaurus can also be used to represent other kinds of relationships between the terms such as a part/whole relationship.

In addition to synonyms and relationships, terms in a thesaurus may have associated meta data. Two common meta data items for thesaurus terms are the scope note and warrant for the term. The scope note is information provided by the compiler or editor of the thesaurus. The warrant is a reference to source material which provides the justification for the inclusion of the term in the thesaurus.

Topic Map Patterns for Thesaurii

There are two possible patterns for the representation of a thesaurus in a topic map, in some respects there is a trade-off to be made between compactness (measured in terms of number of topics) and expressiveness, but your choice between these two patterns may in the end be driven by the model that best suits the application at hand.

Thesaurus Pattern 1: The Topic-Per-Term Pattern

In this pattern, the thesaurus is represented by creating a separate topic for each term. Each topic which represents a term should be typed as a "thesaurus term". The term string should be expressed as a base name of the topic. Relationships such as broader/narrower and use/use-for are expressed using associations. If the thesaurus being modelled simply groups synonyms with out expressing a preferred term amongst them, this can be expressed as an association in which all of the term topics play the same role. Other thesaurus entry meta data such as scope notes and warrants can be specified as occurrence data on the topic representing the term.

Figure 9 - Thesaurus Pattern 1 - Topic-Per-Term Pattern

Thesaurus Pattern 2: The Topic-Per-Concept Pattern

This alternate pattern for thesaurus representation eliminates the associations used to relate synonyms in the Topic-Per-Term pattern. Instead, one topic is used to represent the single concept which all entries share in common. In this model, the topic may have multiple base names, one for each synonym and where preferred terms are expressed, the names of the non-preferred synonyms should be scoped appropriately.

Note: Figure 10 shows the Concept class of topic as having two kinds of name, one name specified in the unconstrained scope, representing a preferred term for the concept, and the other names scoped as a Non-Preferred Term. However, if the thesaurus being modelled does not express preferred and non-preferred terms, all synonymous terms can instead be represented as topic base names in the unconstrained scope.

Figure 10 - Thesaurus Pattern 2 - Topic-Per-Concept Pattern

In this model of the thesaurus, one loses the ability to add warrants, scope notes or other meta data for individual terms but with a number of synonyms per concept, this model can lead to a much more compact topic map and also one which is slightly easier to process (one need only enumerate all of the names of a topic to list all synonyms rather than follow associations).

PSIs for the Thesaurus Pattern

Published Subject Indicators For Modelling Thesaurii

Scope Note

http://www.techquila.com/psi/thesaurus/#scope-note

A type of occurrence of a Term which either references or contains information related to the thesaurus term. Typically scope notes are provided by the thesaurus compiler or editor. The scope note resource may be either contained inline as resource data or referenced from an occurrence of this type.

Term Warrant

http://www.techquila.com/psi/thesaurus/#term-warrant

A type of occurrence of a Term which references a source which justifies the use of the term in the thesaurus. A warrant may be referenced either by a link or by a citation entered as inline resource data in the occurrence. Multiple warrants should be modelled using one occurrence of this type per topic.

Broader-Term / Narrower-Term

http://www.techquila.com/psi/thesaurus/#broader-narrower

A type of association between two Term instances. In an association of this type, at one topic must play the role type Broader Term, and one topic must play the role type Narrower Term

Synonymous Terms

http://www.techquila.com/psi/thesaurus/#synonymous-terms

A type of association between two or more Term instances which asserts that all of the terms represented by the topics which are role players in the association are considered synonymous by the thesaurus. In an association of this type, EITHER all topics must play the role type Synonym OR one topic must play the role type Preferred Term and one topic must play the role type Non-Preferred Term.

Synonym

http://www.techquila.com/psi/thesaurus/#synonym

A type of association role. In an association of the typeSynonymous Terms this role indicates that the role player or role players are all considered to be synonymous with one another.

Part

http://www.techquila.com/psi/thesaurus/#part

A type of association role. In an association of the type Part-Whole, the role player or role players are all considered to be components which go to make up the player of the Whole role.

Whole

http://www.techquila.com/psi/thesaurus/#whole

A type of association role. In an association of the type Part-Whole, the player of this role is the topic which represents the thing which is constructed of the components represented by players of the Part role.

Up: Topic Map Patterns For Information Architecture
Previous: Introduction to the Pattern Notation Next: Hierarchical Classification Systems