New data-mining tools look for hidden information in your databases
Rainer Mauth
Know thy customer. This is a mantra successful companies chant constantly. The best marketing campaigns rely on comprehensive customer databases. The more a company knows about an individual customer -- intricate details about his or her buying patterns and personal preferences -- the more likely it will be able to sell a product or service to that person.
But being successful in business is not just finding the right target groups. It also implies making the right decision with the lowest risks and the maximum outcome. Take for example a bank evaluating the risk level of loan applications from small- and medium-size companies. Its overall objective is to accept the maximum of applications and bear the minimum risk
. The analysis is based on already-accepted credit cases stored in a database. What the credit analysts need is a software package that computes the most pertinent criteria for determining the risk level of businesses making applications.
What these two examples have in common is their need for tools that can mine large databases, recognize patterns within the database entries, and map them to the appropriate discrete output levels. It is important to realize that these patterns are not sharply defined cases. Elements that belong to one pattern feature a certain level of similarity. Their properites are not identical but similar.
Many industrial problems (e.g., process monitoring or quality control) have the same kind of requirements. Comparable process states or quality defects need to be matched with discrete measures. Fuzzy-logic programs analyze and control the states of machines and entire plants.
In complex data-analysis cases, classic statistical programs fail because mathematical mod
els to fit data are often not available. Therefore, many of today's software packages for data mining and anlaysis are based on neural networks and fuzzy methods. But there are also systems on the market that feature sophisticated methods such as probabilistic decision-tree techniques.
Objects and Their Attributes
In general, data analysis considers objects and attributes that describe the objects. For example, objects can be time series, sensor signals, process states, or a customer. Attributes are properties of the object. Each object has numerous representations -- which is the actual data to be analyzed -- with specific attribute values. For simplification, representations of objects are also referred to as objects. The core of data analysis implies the grouping of similar objects in one class and, correspondingly, assigning nonsimilar objects to different classes. This formation of object classes reduces the complexity of data and represents the actual search for struct
ure in data. Data-mining tools automate this process.
"Data mining in itself is a `fuzzy' description, and definitions vary widely," asserts Damien Brenier of ISoft (Gif sur Yvette, France). "However, it is essential to distinguish between supervised and unsupervised techniques." Supervised techniques are applicable only if sample data is available that can serve as a source for the system to understand the correlation of attributes and to reveal relevant criteria for the evaluation of new sets of data.
The classification of loan risks mentioned above is an example for a supervised query. It requires a user's intelligence to determine certain risk levels (e.g., low, average, or high) through attributes such as cash flow, profit after tax, sales, or the reputation of the company. After this determination, the system is able to name the pertinent attributes for a low risk level. For example, say the decisive factors for risk evaluation are profit and cash flow. By evaluating applications, the credit
analyst is in a position to focus on these two criteria.
Unsupervised mining automatically separates a given set of objects into clusters. It requires no additional information, but it does not allow for the reduction of attributes. An example for unsupervised analysis is the separation of target groups within a customer database, without any extra information about purchasing patterns.
One way to separate data-mining tools
is suggested by the underlying analysis methods. They can feature statistical methods, fuzzy logic, neural networks, decision-tree methods, and arbitrary combinations of them. Some methods may be more appropriate for certain applications, but there is no general relationship between analysis methods and fields of applications. "It is important to have several methods merged into one tool," says Richard Weber of MIT (Aachen, Germany). "But it requires a basic expertise to use the right combination of analysis."
It is obvious that reduction of compl
exity via data analysis can't be an unambiguous mapping of objects to classes or clusters. There are always samples that belong only to a certain extent to one or the other class. Some data sets may also change their attributes over time, and different segmentation runs will consequently reveal a different classification of some cases.
Unclear Data Sets
Fuzzy-clustering methods often help to deal with these unclear data sets because they allow one object to belong to a certain degree to two clusters. This kind of fuzzy membership is the key behind data analysis based on fuzzy logic. Systems such as
MIT's DataEngine
, which feature fuzzy methods, offer additional information for the user: the probability for one object to belong to a certain cluster. However, once they are developed, fuzzy systems are not able to adapt to changing conditions and optimize the analysis taking new data into account.
The concept behind DataEngine is to compensate for thes
e disadvantages with the integration of neural networks. Neural networks can work supervised and unsupervised. They can modify a classification on the fly taking new data sets into consideration (i.e., learning). A further advantage is that neural networks work without a certain mathematical model and are therefore often used for forecasting non-deterministic systems such as traffic density on highways (see "The Road Less Traveled," October BYTE) or stocks and exchange rates.
DataEngine integrates neural networks, fuzzy logic, and statistics. It can be deployed for customer segmentation, forecasting, and on-line quality and producess control, because it allows for the use of data acquisition boards to monitor processes or analyze, for example, acoustic frequency patterns. MIT developed a plug-in version that is called DataEngine V.I. for the data acquisition package LabView from National Instruments. Today, DataEngine lacks a direct connection to standard databases. However, MIT is working on an Open Da
tabase Connectivity (ODBC) interface to DataEngine that facilitates database mining. It will be available by mid-1996.
Neural networks also improve the gain from information-retrieval systems (e.g., in fraud investigations or intelligence gathering). Cambridge Neurodynamics'
Dynamic Reasoning Engine
(DRE), for example, uses a combination of probabilistic and neural-network methods to decide on the importance of information. This information weighting is used to rank documents in order of relevance to a certain question. Queries can be made in natural language because the system searches on the basis of symbolic pattern recognition.
The system is able to cluster data into groups and dynamically respond to users' inquiries. The dynamic-reasoning process is another form of supervised learning. If you look for the term
Apollo
in a large text database, the system will offer a bunch of documents about the Greek god and about the Apollo space program. You are then in a posit
ion to decide which branch of information you are more interested in and specify more details. The system returns a ranking of documents and displays only the most relevant ones. This kind of information filtering can also be used in a real-time environment such as a news wire or the Internet.
Building Decision Trees
French software developer
ISoft's tool called AC(2)
is based on another approach. "The key strength of our tool is the combination of probabilistic-analysis methods and object-oriented decision-tree building," explains Cyril Way, a software engineer with ISoft. "This approach offers us more flexibility to process numerical, symbolic, structured, and even incomplete data at the same time." AC(2) features a conceptual representation language based on object-oriented principles.
According to Way, the definition of objects and sets in AC(2) is almost the same as that of the programming languages Smalltalk and Eifel. Each type of data, numeric or s
ymbolic (e.g., strings, names, or symbols with a hierarchical order), as well as an arbitrary mix of them, can be an object. A hierarchy editor allows you to define a hierarchy of object classes, including inheritance relations. To load data, the system taps relational databases via ODBC.
Because AC(2) is a supervised system, it allows you to add user-specific knowledge to evaluate certain attributes of the objects. In medical diagnosis, it may be useful to find out whether a new patient is a member of a risk population. Therefore, the correlation of relevant symptoms and the outbreak of the disease in the existing patients' database is required.
The data-mining system analyzes all examples in order to rank the attributes by decreasing explanative power. AC(2) structures the patients' base along each symptom and computes the entropy of each configuration. The criterion that leads to the configuration with the lowest entropy is most relevant for the outbreak of the disease. It is the first node in
the decision tree. The system repeats this procedure for the next important symptoms and creates a decision tree to classify a new patient. Each branch of the tree belongs to a level of risk of outbreak.
The algorithm of AC(2) is grounded on a theory of R. G. Quinlan (1983) and Thierry Brieman (1984). Here, entropy refers to a concept of measuring disorder in a statistical system. Minimal entropy means minimal disorder. Concepts such as these are also used in computer simulations (e.g., simulated annealing).
The decision-tree approach allows you to understand each step of the reasoning and prune the tree manually or by using statistical cutoff criteria. It is also possible to impede the use of particular criteria during tree building. This is often referred to as an advantage of decision trees over neural networks, because neural networks behave like black boxes that offer no user interaction during the analysis process.
While marketing managers of small- and medium-size companies believe th
at a well-organized database and incisive mailings ensure their company's success, mail-order houses, telecommunications organizations, and banks deploy data-mining tools to gain invisible customer information out of their existing databases and to be more competitive. However, most of these companies are reluctant to admit they are digging for individual purchasing patterns or analyzing credit ratings via software. They don't want to be seen in the wrong light of harming their customers' privacy.
The prospects for data mining are good. Experts reckon that the database market is likely to more than double in size by the end of the century. And each copy of a database is the ground to dig for hidden data. But data-mining tools are not a panacea. As MIT's Weber puts it, "Don't look for a system that is able to correlate a customer's hair color and his or her income."
PRODUCT INFORMATION
AC(2).....................DM 14,500
ISoft
Gif sur Yvette, France
+33 1 69412777
fax: +33 1 69412532
ac@isoft.fr
DataEngine
Windows Version.........DM 6,000
Solaris Version.........DM 12,000
MIT
Aachen, Germany
+49 2408 194580
fax: +49 2408 194582
infor@mitgmbh.de
Dynamic Reasoning Engine..DM 14,400
Cambridge Neurodynamics
Cambridge, U.K.
+44 1223 421107
fax: +44 1223 421096
100117,3075@compuserve.com
Data-Mining Software Can Be Used For:
-- Market
Research
-- Consumer
Profile Surveys
-- Direct
Marketing
-- Risk
Evaluation
-- Quality
Assessment
-- Medical
Diagnosis
-- Fraud
Detection
-- Forecasting
of Time Series
DataEngine 1.5 Dynamic Reasoning AC(2) 3.5
Engine 3.2
==============================================================================
Supported OSes Windows 3.1/95/NT Windows 3.1/95, Windows 3.1/95/
Solaris 2.x Unix, X/Motif NT, Unix, X/Motif
Fields of Segmentation, Investigative Segmentation,
application quality control, systems, fraud de- decision data
forecasting tection, news and analysis, fore-
market analysis casting
Classification of
the Tool
-----------------
Charting Y Y Y
Database mining n Y Y
Data analysis Y Y Y
Pattern recognition Y Y n
Preprocessing modules Y Y Y
included
Input o
ptions ASCII, Excel, data ASCII, WinWord, ASCII, ODBD data-
acquisition several databases bases
boards
Output options ASCII, data ac- Graphics, ASCII Excel, graphics
quisition boards,
graphics
Analysis methods
----------------
Statistical methods Y Y Y
Fuzzy logic Y Y n
Neural networks Y Y n
Rule-based fuzzy methods Y n n
Decision-tree methods n n Y
Support of ODBC n n Y
Support of SQL n n Y
Support of OLE 2.0 n n Y
Support of client/server n Y Y
architectur
e
Support of applications-
development interfaces
-------------------------
Code generation n n Y
C++ library Y n Y
C library Y n Y
Others (e.g., Visual DLL n n
Basic)
Maximum data volume to Unlimited 10 GB Unlimited
be processed
KEY
---
Y=Yes
n=No
screen_link (45 Kbytes)

AC(2) tests all possible combinations of database fields to find the criteria that answer the
question you put to your data best and ranks all relevant criteria along a decision tree. Its Example Editor lets you evaluate single data objects.
screen_link (36 Kbytes)

DataEngine performs data segmentation with a fuzzy-clustering algorithm that reveals probabilities for individual objects to belong to certain segments. Its basis is fuzzy logic, which provides a means to model data with participation functions instead of sharp yes/no alternatives.
screen_lin
k (52 Kbytes)

Dynamic Reasoning Engine retrieves several documents in large databases for such activities as intelligence gathering and fraud detection. A start query ("What was the effect of...") retrieves documents of interest. Those of particular interest can be clustered together. DRE then suggests additional pertinent documents.
Rainer Mauth is a senior editor in BYTE's Frankfurt bureau. You can reach him by sending E-mail to
rmauth@bix.com
or
75372.3464@compuserve.com
.