|InfoVis.net>Magazine>message nº 105||Published 2002-10-31|
|También disponible en Español|
The digital magazine of InfoVis.net
The need to extract knowledge automatically out of large databases is turning out to be more and more pressing, given the volume of data that accumulates continuously, whose treatment consumes an increasing amount of resources.
Data mining is one of the answers to this problem. Usama Fayyad, in the article (available in PDF format) “From Data Mining to Knowledge Discovery in Databases” defines Knowledge Discovery [in Databases or KDD] and Data Mining as:
Fayyad and his colleagues enunciate a series of important concepts that lead to an operative definition of Knowledge. This is done in a way that can be formalised mathematically. It’s worth reviewing them (in abridged form, see the book “Information Visualisation in Data Mining and Knowledge Discovery", chap. 21. ).
All these concepts lead finally to the important concept of “interestingness” of a pattern. It is defined as a combination of Validity, Novelty, Utility and Understandability that allow us to assess and classify patterns.
i = I(E, F, N, U, S)
Needless to say that some aspects of this concept need human intervention, since they admit no objective quantification. Interestingness is fundamental for the definition of Knowledge.
Although it could appear as a definition very far from our experience of what knowledge is, in reality it isn’t so much. Knowledge is made out of those patterns that we have learnt to detect and we have stored since they allow us to apply them to new data and, hence, to predict the behaviour of phenomena or the people around us.
From this comes the utility of knowledge. A clear example is medical diagnosis. Every illness has a set of symptoms, a pattern, that differentiates it from other illnesses allowing the physician to diagnose and prescribe the appropriate treatment. It takes years to build up the baggage of clinical patterns that allow him or her to become a good diagnostic physician.
Fraud follows patterns that deviate from the common behaviour of legal transactions in financial databases. In marketing, it is important to discover the groupings of users and their behaviour in order to define specific products and/or services with predictable results. For example, the users that buy item A and also item B probably will also buy item C.
At the end, Knowledge is not as magical as it sometimes appears. So we have means to approach it and find interesting patterns for many fields.
Links of this issue:
Subscribe to the free newsletter