|InfoVis.net>Magazine>message nº 103||Published 2002-10-09|
|También disponible en Español|
The digital magazine of InfoVis.net
Reading a book is an inspiring but lengthy process. Having to deal with many of them is by no means an easy task. Nowadays search engines ease the finding of the right book or the right part of a book, but they don’t help us understand them nor give us the possibility to discover patterns and concepts in arbitrary text.
TextArc is an experimental tool that produces an alternate visualisation of a text. Designed by W. Bradford Paley, from Digital Image Design Incorporated with the idea of allowing the user to “get an overview of a medium-sized body of raw text, eg. the amount one could receive in one single day” of ASCII raw text, like e-mail, news, etc.
Indices, summaries, concordances, lexicons and other structured lists have been available and used for many years. Computational linguistics has produced many interesting techniques capable of automatically producing summaries, abstracts and identify key ideas.
Graphical techniques have also been developed to show the prevalence of certain words in large collections of documents. As examples we have treemaps and Kohonen maps (see issues 39 and 51). We have already seen other techniques devoted to showing focus and context in a single view (numbers 3 and 85).
TextArc, unlike other approaches, takes into account the original linear order that texts exhibit. In order to do this, it shows the entire text as two concentric spirals on the screen made up of many lines written in a one pixel height font.
Each line corresponds to its counterpart in the text, including all its words. Spacing, chapters, sections, typography, poetry and all the “geometrical” features of a text are preserved so that they become visual landmarks helping the user to identify particular sections of the text.
The spiral occupies only the periphery of the drawing leaving the centre for the most used words (see the attached drawings). This way, the words that appear more than once are drawn inside the spiral, in their average position, at the “gravity centre” of the different places in the text where they belong .
A word, for example, that has more appearances on the left side of the spiral than on the right one will be closer to that side. By selecting one of these words with the mouse we can see all the lines that link the word to the places where it belongs in the text. Pointing to a line in the drawing shows its contents. You can see all the lines highlighted in the outer spiral where the word appears.
Words get bolder and brighter proportionally to their use in the text. Type size encodes the frequency of use in the printed version. There are more features of this intriguing piece of Java code that would deserve more space than what we have here. It’s worth playing with it using any text of the Project Gutenberg
Particularly interesting is the front end to search text in Project Gutenberg’s database. Once we’ve selected the text of our interest, you mustn’t forget to drop it on the appropriate box to see it in TextArc mode.
After playing for some time with this elegant tool with several texts of Project Gutenberg some sensations appear to me: TextArc provides an unusual way to look at text. You can locate the relevant terms, search for word associations and build lists of the most used words in an instant. Seeing which characters appear most in a novel and where in the text they do, is simple and very intuitive.
You can see, for example, that in one book the most used word only appears in 3 chapters while in other books the most used word is scattered more or less regularly throughout the text. Loading a large text and building the picture takes a while, nonetheless. A price worth paying to get into the full text in a “random access mode” that lets you analyse the document both visually and effectively.
I’m not sure whether this tool is the right one for indexing information on everybody’s desktop or not. The end user will tell us once it is released. For sure that its elegant clock metaphor and the easy way of finding patterns in documents makes it a very good example of a fine piece of Information Visualisation.
See also issue number 25 that covers software visualisation. To some extent it shares some common features with SeeSoft a software visualisation tool.
Links of this issue:
Subscribe to the free newsletter