También disponible en Espańol


The digital magazine of InfoVis.net

by Juan C. DĂĽrsteler [message nş 104]

TileBar is a visualisation technique for document search that allows the user to get a clearer idea of what has been retrieved by the search engine, adding serendipity (accidental discovery) to the concept of relevance.
TileBarOsteo.jpg (119780 bytes)
Fig. 1 TileBar.  View of the output of a query to a medical database with three terms: osteoporosis, prevention and research. Documents appear on the right, while on the left are aligned the corresponding tilebars.
Image as it appears in the examples of Marti Hearst pages.
Click on the image to enlarge it. (118 KB).

Document visualisation is one of the “hot topics “ of Information Visualisation. Robert Spence devotes a whole chapter of his book “Information Visualization”. In it Spence identifies the type of “questions to which the owners or users of a collection of documents may seek an answer”. 

  • ÂżWhat documents can be of interest to me?

  • What other documents can be close enough to my interest so as to be also considered?

  • Are any other documents whose title could trigger valuable ideas to my search? 

  • How are my keywords actually distributed in a particular document?

In the end these questions turn around two main concepts:

  • Relevance (what is directly interesting for me)

  • Serendipity* (what thing I wasn’t searching for could also interest me)

Searching for information in a set of documents turns into, many times, an iterative process, where the answer to a query produces a re-formulation of the query itself and, hence, of what is relevant for the searcher.

Even in the case that the current algorithms could return only the really relevant documents, there would need to be a refinement mechanism where interaction and visualisation could play decisive roles. Spence refers that some studies show that the probability that two persons spontaneously use the same word for the same concept is just between 7% and 18%.

An example of a visual tool in this sense is TileBar. Originally proposed by Marti Hearst in 1995 with the goal of indicating, both simultaneously and in compact form, the relative length of the retrieved documents along with the frequency, distribution and overlapping of the terms used in the query within the before mentioned documents.

In order to clarify this let’s suppose that we are searching for the words” Information” + “Visualization”. TileBar allows you to use terms that include more than one word operating as a single term. For the sake of simplicity we’ll use simple terms. The result of the query would be a set of documents retrieved by the database along with a series of rectangles whose relative length is proportional to each document they represent.

Every rectangle is like a table with as many rows as query terms we have introduced, 2 in our case. Columns represent sections (chapters, paragraphs or other subdivisions) in which the text is divided. Each cell is painted in a colour of intensity proportional to the frequency of the term in the corresponding section of the text. The darker the more frequent.

TileBarDoc1.gif (3769 bytes) TileBars of two among the many documents that you can get in a query like the one proposed in the text.
The upper row indicates the  frequency of the word "Information" in each section of the document  The lower row corresponds to the same concept for "Visualization". 
In document 1 there's no section of the text where you can find simultaneously the two words 
In document 2, shorter than doc. 1 there are three sections where both words coexist, what is probably more interesting for us, if we are looking for "Information Visualization" related data.
Click on the images to enlarge them (36 KB)
TileBarDoc2.gif (3685 bytes)

With this simple scheme we can get quite a good idea of the relevance of a document, its length and what can we find in it. In the proposed example, we could find documents where there are few coloured “tiles” with no overlapping of the terms in the same column, which indicates that the document is probably of little relevance. Those documents where both terms appear in the same column with substantial intensity maybe are the most relevant ones. 

This visualisation scheme is being put to some practical uses, like the freely accessible digital library of the University of California in Berkeley whose goal is to help to find documents in a government water resources database. You can play with it and see for yourself the usefulness of TileBars.

What is most interesting about TileBars is that it shows the difference with information retrieval, ruled by the concept of relevance. Here serendipity has its own place. There’s no need to know the correct key word in advance, you can enter many terms, perhaps synonyms, visually evaluating the contents of the search in, maybe, a constantly changing way, like the way our thinking process is.

  • Relevance:  click on the link to see the definition of the glossary
  • Serendipity: click on the link to see the definition of the glossary

Links of this issue:

http://www.sims.berkeley.edu/~hearst/tb-example.html   Examples in Marti Hearst's webpage
http://www.sims.berkeley.edu/~hearst   Marti Hearst personal page
http://www.infovis.net/printRec.php?rec=llibre&lang=2#InformationVisualisation   The book "Information Visualization", by Robert Spence
http://www.sims.berkeley.edu/~hearst/tb-overview.html   Overview of what TileBars are
http://elib.cs.berkeley.edu/tilebars/   Digital Libray of the University of California, Berkeley
http://www.infovis.net/printRec.php?rec=glosario&lang=2#Relevancia   Definition of Relevance in the glossary
http://www.infovis.net/printRec.php?rec=glosario&lang=2#Serendipia   Definition of Serendipity in the glossary
© Copyright InfoVis.net 2000-2018