A common modelling decision in creating a topic map is when to use an association with 3 or more roles (an n-ary association) and when to represent it as n-1 binary associations. Herewith a discussion on the relative merits of the two forms and some pointers (ok, opinions) on the Right Thing To Do.
In many cases in creating topic maps we are presented with the issue of how to represent n-way associations. Some examples could be:
The issue that comes up is whether to code these relationships in a single multi-legged association (an n-ary association) or several two-way associations (binary associations). There are trade-offs to be made, but in my opinion the first rule of thumb is:
Smaller is Better
Or more specifically, "More granular is better" - the smaller statements we make, the more control we have over them. Breaking statements up without creating new topics gives us the ability to apply metadata to those statements individually and to query, traverse and modify one statement without any impact on or concern for the others.
Of course, there is a point of diminishing returns and this is when you need to start adding new classes of entity to your model to be able to split up n-way associations. In general, if you can break up an n-way association without creating new topics, do it. If you need to create a new topic to break up an association it is likely that you are creating a topic that represents the fact of the association - if you end up having a need for that, then all well and good, but in most cases, it is something to be avoided as once you start down this reification route, its hard to know when to stop.
The second rule of thumb I follow is to ask:
"Is the association divisible without creating another topic."
In other words would it make sense to divide up the association into smaller (typically binary) associations.
Another third useful rule of thumb is:
"Does the presence of one player of a given role have any bearing on the presence of the other players"
In other words, if one player were removed, would the statement being made suddenly become untrue (rather than just incomplete).
So, with those three rules of thumb in hand...lets play the "Binary or N-Ary Game"!
Conclusion
Modelling associations is best done with a bit of thought. Although the temptation is to just stuff as much as possible into a single association (especially when writing XTM syntax by hand), using small associations where possible gives you more flexibility in the long run as it allows greater control over attaching metadata to specific statements.
More granular associations also enable a great deal more clarity. Allowing the author to be explicit about whether role players are interdependent or not is important and making use of standard topic map machinery to do that means that you need not be dependent on an ontology description to make clear what the topic map model is already capable of expressing.
Thinking about the arity of associations at the time you are constructing your topic map ontology will reap benefits in the long run.
Posted by Kal at August 13, 2003 03:20 PM | TrackBackIt is interesting to compare these cases with good modeling practices for relational databases. Specifically, I mean third normal form, which is about as far as most people will go in normalizing data. A comparison is very appropriate, since a topic map can be thought of as a collection of sparse relational tables.
A row of a table is called a "relation" by theorists like C. J. Date. Each cell in a row of a properly normalized table is supposed to depend only on its primary key (which may be compound). Translating this to "Association", the table name becomes the name of the association type, and the column names become the role type names.
Now in third normal form, none of the columns are supposed to depend on each other but only on the primary key, although they presumably are logically related somehow. For example, if there are columns for first and last names, for a given row a first name of "Bill" should not logically entail a last name of "Smith". They go together all right, but only because they both belong to that particular individual.
Good practice in relation modeling does not call for reducing everything to binary relations, which would create a lot of tables and reduce performance. But it does call for good normalization (and sometimes denormalization later for practical reasons of performance).
For the case of the author who wrote several books, a relational design would have a separate table that held the list, since no cell in a classic relational can contain a compound value like a list. Each row in the table holding the books would in essence represent a binary association.
The case of multiple members of the finance committee is actually just the same as the list of books, and a relational design would probably treat it the same way. So relational design in essence does convert this one to binary form, as well.
Well, actually, these so-called "binary" relations are _triples_ in the RDF sense, having two columns (two role players) and one table (one association or predicate).
However, sometimes it is necessary to add addional columns to a table that would otherwise be a simple join table (i.e., a triple). Such columns are needed it there is something unique to the association of the two entities being related. I have seen this from time to time myself.
Also notice that for most ordinary relational tables, the cell contents are more like occurrences. It is the join tables that are most like topic map associations, and they are very like them indeed.
Posted by: Tom Passin at August 14, 2003 06:13 AMThe comparison with relational tables is an interesting one. Not being a big database-head, I am not aware of anything in the relational model which distinguishes between an divisible and an indivisible aggregation, and I think that your comment that the author/books and committee/decision relations would be implemented in the same way.
Being explicit about the atomicity of a relation (by expressing those relations which are atomic in your world-view as a single association and those which are divisible as separate associations) gives your association constructs an extra level of semantics which could be useful in application/presentation terms.
Of course, these semantics could be made part of the ontology and the decision to structure associations one way or another is an ontological commitment (whether it is documented or not!) - but I can see some distinct advantages for interchange if we can establish a best practice of creating only associations which are atomic in the world-view of the author.
Posted by: Kal at August 14, 2003 09:44 AMI think that the theory and practice of relational databases has a lot to offer would-be designers of topic maps. The relational folks have been developing their field for over 30 years, and it is well for us to get something out of it.
The best book I know about relational modeling is "Designing Quality Databases with IDEF1X Information Models", by Thomas Bruce. I will quote a bit from his discussion of normalization.
"According to formal mathematical theory, the goal of normalization is to ensure that there is only one way to know a fact. Thus, the technical process of normalizing a model removes all structures that provide more than one way to know the same fact."
First Normal Form requires all lists and other nested structures to be dismembered so that a single cell contains a single atomic value - no lists of comittee members.
In Second Normal Form, each attribute (cell) that is not part of the primary key contains a fact about the entire key (that is, the key might be compound). For example, if the relation represents a marriage, the key might be the tuple (man,wife). Then no cell in a row should contain information about just the wife, for example.
In Third normal form, every (non-key) attribute depends on the entire key and does not depend on any attribute of any other entity (e.g., topic or association).
Note that you can be rigorous about applying normalization rules but still get the underlying rules (the model or the business rules) wrong!
Relational database practice also emphasizes the importance of names - of tables, columns, and datatypes - which is a bit ironic for TM and RDF applications since the names represent metadata to help humans understand the intent, but which we would imagine that machine-readable metadata in the map or graph would be enough. In reality, we will probably never be able to express enough metadata and so naming conventions will remain important.
There are a lot of subleties in data modeling for practical applications, and I think that they are sure to arise in TM models as well. However, the (eventual) constraint language may turn out the be very helpful in handling some of the difficulties that relational folks have to handle with a combination of table design and relational integrity rules.
Posted by: Tom Passin at August 14, 2003 08:39 PMAnother rule of thumb I use is whether or not you want to say anything more about the group of topics. For example, does it have a name that you'd like to capture? If so, make a topic with binary associations to it. If you can't think of anything to say about it, maybe it is best to do it as an n-ary.
I very much agree with the general principle that binary associations should be preferred over n-ary ones.
Posted by: Lars Marius Garshol at September 17, 2003 03:59 PM