means to the organization. "In a perfect world," says SAS Institute president Jim Goodnight, "companies would have business decision makers working alongside quantitative experts. But only a few c
ompanies can afford this ideal solution."
The new desktop data mining applications provide powerful but easy-to-use analysis tools for business professionals in marketing, finance, and strategic planning. One common thread of these tools is that they help users concentrate on business problems rather than on the nitty-gritty details of data analysis.
Desktop data mining tools deploy many of the same intelligent pattern-recognition technologies as the high-end tools from which they are often derived: neural nets, decision trees, rule induction, and fuzzy logic. (For more information on these analysis techniques, see "Endless Search," November 1995 BYTE international edition.)
Intuitive Interfaces
The key
to most of these programs is the visual way they present data. For example, IVEE Development's Spotfire 1.0, a data mining system for Windows 95, NT, and Unix, has a number of sliders, one for each statistical variable. These sliders can be used
to manipulate the data representation. This allows users to interactively scan through variables to detect interesting patterns (see the sidebar "Visual Data Mining"). It is also possible to display data along with background information, such as a map, to more easily detect interesting patterns.
Unlike the typical algorithmic data mining that is based on the AI techniques we mentioned earlier, data mining programs that have a visual interface are easier to use. They are limited, though, especially when compared to the flexibility of neural networks. Users are confined to their own knowledge of the particular data set. On the other hand, as Christopher Alberg of IVEE Development points out, "most professionals know their business better than a neural network does."
Indeed, the training of a neural network very often depends on the individual who trains it. That person can essentially lock an organization into an incomplete and subjective view of the data. This is why proponents of visual data mi
ning argue that with personal attention and all the experience and knowledge front-line managers bring to the task, it is possible to quickly turn statistical results into a viable business strategy.
Some developers are adding enhanced visualization techniques to neural networks and decision tree-based systems. Cognos, a leading publisher of on-line analytical processing (OLAP) tools, recently acquired Right Information Systems, a developer of neural network-based software for business modeling and forecasting. The companies are now merging both approaches to make their software easier to use.
Another example is the SAS Institute's development project code-named DMINE, a combination of neural networks, decision trees, and visualization techniques aimed at predictive modeling in business areas. DMINE is expected to be released this fall. "Today," says Phil Winters, SAS Institute Europe's vice president of marketing, "business decision makers spend a lot of their time sending queries to databases. W
e propose a more proactive way of predictive modeling."
Five-Step Approach
In an effort to make data mining accessible to a wider audience and to help decision makers run data mining systems on their PCs, SAS Institute promotes a methodology that includes five basic steps: sampling, exploring, manipulating, modeling, and assessing. This methodology suggests sampling smaller portions of a database rather than the entire data set in order to reduce processing time. Data visualization helps the user find the right subsets to be sampled. If the data is too complex for graphical representation, then traditional statistical methods such as factor, cluster, and correspondence analysis are required.
Based on this kind of data exploration, a user can cleanse and update the selected portion and then run the actual mining process, which computes the most pertinent criteria that belong to a given set of data (modeling). In a direct marketing application, these criteria could differentiate (e.g., by a
ge, gender, income) between customer groups. In the final stage of the process, the user assesses and evaluates these models and checks their value for the real-world business problem.
ISoft launched a trimmed-down version of its product called Alice late last year. Alice uses a wide range of statistical algorithms to build predictive models. The program represents the model's results in decision trees, giving the user an immediate grasp of the basic data correlation and allowing for an easy check of hypotheses.
Part of what makes decision trees so attractive in desktop data mining programs is their ability to represent high volumes of data efficiently. With a decision tree, a user can force splits, merge nodes, collapse or expand branches, and determine the number of parameters in a tree. In addition, the user can easily select only those branches he or she is interested in.
"Explainable Documents"
Data mining systems can detect how some objects or variables affect others, locate c
hanges over time, or spot trends in customer databases. However, the way they express this information is often arcane and too complicated for some business managers to comprehend. "That's not appropriate for business users," says Dr. Kamran Parsaye, CEO of Information Discovery Systems. Interactive OLAP systems, for example, allow users to zoom into details. But the zoom features present tabular information that may not be clear to nonexperts. "What we need is a language that helps machines express their knowledge for the direct benefit of business users," Parsaye says.
Output from data mining systems should be expressed in what Parsaye calls "explainable documents." Explainable documents automatically generate hyperlinked text, graphs, and data summaries to express the influences, affinities, comparisons, variations, and trends found in the data.
Explainable documents, as implemented in the company's Intra/Knowledge system, explain the results of a data mining process in plain statements, such a
s: "Sales went up last quarter because orange juice sales in Arizona were above expectations due to discounts."
Intra/Knowledge provides results that are automatically generated from a database and delivered over a company's intranet. In a financial institution, for example, the program may read through the raw data and documents generated each day, discover key trends, and then convert the analysis into easy-to-understand English text supplemented with 3-D graphics.
An alternative approach, rule induction, works with nonhierarchical sets of conditions, in contrast to the strictly hierarchical nature of decision trees. Unlike neural networks, for example, rule induction allows for the determination of both the probability of a rule and the rule's error probability.
Define the Rules
Though it does not deliver reports in graphical formats, WizWhy, a rule induction-based data mining tool developed by WizSoft, for Win 95 and NT, gives operators the ability to tailor the analysis by defi
ning a variety of parameters. These parameters include the minimum probabilities of the rules and the minimum number of cases in each rule. This is comparable to using decision trees, which allow you to limit queries by refining trees and the number of branches.
These quantitative rule-induction methods have often been employed by specialists in medical research to discover, for example, patterns between symptoms and diseases. But with WizWhy, less experienced users can analyze rules in customer databases and check for deviations, the frequency of deviations, and the level of probability of detected rules without having to understand the complex underlying algorithm.
Although the emerging generation of desktop data mining products can't replace a data warehouse in a large enterprise, they enhance the effectiveness of many front-line managers. And they bring data mining to many small and medium organizations that couldn't afford a dedicated data-analysis specialist.
Where to Find
ISoft
Gif sur Yvette, France
Phone: +33-1-69412777
Fax: +33-1-69412532
E-mail:
info@alice.fr
Internet:
http://www.alice.fr