|InfoVis.net>Magazine>message nº 104||Published 2002-10-21|
|También disponible en Español|
The digital magazine of InfoVis.net
Document visualisation is one of the “hot topics “ of Information Visualisation. Robert Spence devotes a whole chapter of his book “Information Visualization”. In it Spence identifies the type of “questions to which the owners or users of a collection of documents may seek an answer”.
In the end these questions turn around two main concepts:
Searching for information in a set of documents turns into, many times, an iterative process, where the answer to a query produces a re-formulation of the query itself and, hence, of what is relevant for the searcher.
Even in the case that the current algorithms could return only the really relevant documents, there would need to be a refinement mechanism where interaction and visualisation could play decisive roles. Spence refers that some studies show that the probability that two persons spontaneously use the same word for the same concept is just between 7% and 18%.
An example of a visual tool in this sense is TileBar. Originally proposed by Marti Hearst in 1995 with the goal of indicating, both simultaneously and in compact form, the relative length of the retrieved documents along with the frequency, distribution and overlapping of the terms used in the query within the before mentioned documents.
In order to clarify this let’s suppose that we are searching for the words” Information” + “Visualization”. TileBar allows you to use terms that include more than one word operating as a single term. For the sake of simplicity we’ll use simple terms. The result of the query would be a set of documents retrieved by the database along with a series of rectangles whose relative length is proportional to each document they represent.
Every rectangle is like a table with as many rows as query terms we have introduced, 2 in our case. Columns represent sections (chapters, paragraphs or other subdivisions) in which the text is divided. Each cell is painted in a colour of intensity proportional to the frequency of the term in the corresponding section of the text. The darker the more frequent.
With this simple scheme we can get quite a good idea of the relevance of a document, its length and what can we find in it. In the proposed example, we could find documents where there are few coloured “tiles” with no overlapping of the terms in the same column, which indicates that the document is probably of little relevance. Those documents where both terms appear in the same column with substantial intensity maybe are the most relevant ones.
This visualisation scheme is being put to some practical uses, like the freely accessible digital library of the University of California in Berkeley whose goal is to help to find documents in a government water resources database. You can play with it and see for yourself the usefulness of TileBars.
What is most interesting about TileBars is that it shows the difference with information retrieval, ruled by the concept of relevance. Here serendipity has its own place. There’s no need to know the correct key word in advance, you can enter many terms, perhaps synonyms, visually evaluating the contents of the search in, maybe, a constantly changing way, like the way our thinking process is.
Links of this issue:
Subscribe to the free newsletter