Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesData Mining at Your Desk


July 1997 / International Features / Data Mining at Your Desk

More intuitive data mining tools are helping middle managers make better business decisions.

Peter Hofland and Jim Utsler

As a result of flattened organizations within European companies, a new circle of employees now contributes to the decision-making process. Although they may not actually make the final decision, these people are responsible for giving recommendations based on their knowledge of the business. This increased responsibility is driving the demand among business professionals for data mining tools that allow them to more quickly and knowledgeably recommend actions to their management. Because these professionals are not data analysis specialists and are not trained as statisticians, they need tools that are easy to understand but nevertheless reli ably reveal the multifaceted relationships in customer, product, or market databases. "Because of the explosion of computing power on desktop PCs, databases of more than a million records are often analyzed on standard Windows NT PCs," says Eric Tocatlian, marketing director at ISoft, a French developer of data mining software.

Until recently, most large organizations have employed statisticians to analyze the data crucial to their business. However, this is changing as marketing and product managers need to spot customer trends and patterns, find clusters and gaps in their data, create profiles, and detect anomalies without having to queue up for a statistician's time. And all too often the statistician's report results in piles of out-of-date information and arguments about what all the data means to the organization. "In a perfect world," says SAS Institute president Jim Goodnight, "companies would have business decision makers working alongside quantitative experts. But only a few c ompanies can afford this ideal solution."

The new desktop data mining applications provide powerful but easy-to-use analysis tools for business professionals in marketing, finance, and strategic planning. One common thread of these tools is that they help users concentrate on business problems rather than on the nitty-gritty details of data analysis.

Desktop data mining tools deploy many of the same intelligent pattern-recognition technologies as the high-end tools from which they are often derived: neural nets, decision trees, rule induction, and fuzzy logic. (For more information on these analysis techniques, see "Endless Search," November 1995 BYTE international edition.)

Intuitive Interfaces

The key to most of these programs is the visual way they present data. For example, IVEE Development's Spotfire 1.0, a data mining system for Windows 95, NT, and Unix, has a number of sliders, one for each statistical variable. These sliders can be used to manipulate the data representation. This allows users to interactively scan through variables to detect interesting patterns (see the sidebar "Visual Data Mining"). It is also possible to display data along with background information, such as a map, to more easily detect interesting patterns.

Unlike the typical algorithmic data mining that is based on the AI techniques we mentioned earlier, data mining programs that have a visual interface are easier to use. They are limited, though, especially when compared to the flexibility of neural networks. Users are confined to their own knowledge of the particular data set. On the other hand, as Christopher Alberg of IVEE Development points out, "most professionals know their business better than a neural network does."

Indeed, the training of a neural network very often depends on the individual who trains it. That person can essentially lock an organization into an incomplete and subjective view of the data. This is why proponents of visual data mi ning argue that with personal attention and all the experience and knowledge front-line managers bring to the task, it is possible to quickly turn statistical results into a viable business strategy.

Some developers are adding enhanced visualization techniques to neural networks and decision tree-based systems. Cognos, a leading publisher of on-line analytical processing (OLAP) tools, recently acquired Right Information Systems, a developer of neural network-based software for business modeling and forecasting. The companies are now merging both approaches to make their software easier to use.

Another example is the SAS Institute's development project code-named DMINE, a combination of neural networks, decision trees, and visualization techniques aimed at predictive modeling in business areas. DMINE is expected to be released this fall. "Today," says Phil Winters, SAS Institute Europe's vice president of marketing, "business decision makers spend a lot of their time sending queries to databases. W e propose a more proactive way of predictive modeling."

Five-Step Approach

In an effort to make data mining accessible to a wider audience and to help decision makers run data mining systems on their PCs, SAS Institute promotes a methodology that includes five basic steps: sampling, exploring, manipulating, modeling, and assessing. This methodology suggests sampling smaller portions of a database rather than the entire data set in order to reduce processing time. Data visualization helps the user find the right subsets to be sampled. If the data is too complex for graphical representation, then traditional statistical methods such as factor, cluster, and correspondence analysis are required.

Based on this kind of data exploration, a user can cleanse and update the selected portion and then run the actual mining process, which computes the most pertinent criteria that belong to a given set of data (modeling). In a direct marketing application, these criteria could differentiate (e.g., by a ge, gender, income) between customer groups. In the final stage of the process, the user assesses and evaluates these models and checks their value for the real-world business problem.

ISoft launched a trimmed-down version of its product called Alice late last year. Alice uses a wide range of statistical algorithms to build predictive models. The program represents the model's results in decision trees, giving the user an immediate grasp of the basic data correlation and allowing for an easy check of hypotheses.

Part of what makes decision trees so attractive in desktop data mining programs is their ability to represent high volumes of data efficiently. With a decision tree, a user can force splits, merge nodes, collapse or expand branches, and determine the number of parameters in a tree. In addition, the user can easily select only those branches he or she is interested in.

"Explainable Documents"

Data mining systems can detect how some objects or variables affect others, locate c hanges over time, or spot trends in customer databases. However, the way they express this information is often arcane and too complicated for some business managers to comprehend. "That's not appropriate for business users," says Dr. Kamran Parsaye, CEO of Information Discovery Systems. Interactive OLAP systems, for example, allow users to zoom into details. But the zoom features present tabular information that may not be clear to nonexperts. "What we need is a language that helps machines express their knowledge for the direct benefit of business users," Parsaye says.

Output from data mining systems should be expressed in what Parsaye calls "explainable documents." Explainable documents automatically generate hyperlinked text, graphs, and data summaries to express the influences, affinities, comparisons, variations, and trends found in the data.

Explainable documents, as implemented in the company's Intra/Knowledge system, explain the results of a data mining process in plain statements, such a s: "Sales went up last quarter because orange juice sales in Arizona were above expectations due to discounts."

Intra/Knowledge provides results that are automatically generated from a database and delivered over a company's intranet. In a financial institution, for example, the program may read through the raw data and documents generated each day, discover key trends, and then convert the analysis into easy-to-understand English text supplemented with 3-D graphics.

An alternative approach, rule induction, works with nonhierarchical sets of conditions, in contrast to the strictly hierarchical nature of decision trees. Unlike neural networks, for example, rule induction allows for the determination of both the probability of a rule and the rule's error probability.

Define the Rules

Though it does not deliver reports in graphical formats, WizWhy, a rule induction-based data mining tool developed by WizSoft, for Win 95 and NT, gives operators the ability to tailor the analysis by defi ning a variety of parameters. These parameters include the minimum probabilities of the rules and the minimum number of cases in each rule. This is comparable to using decision trees, which allow you to limit queries by refining trees and the number of branches.

These quantitative rule-induction methods have often been employed by specialists in medical research to discover, for example, patterns between symptoms and diseases. But with WizWhy, less experienced users can analyze rules in customer databases and check for deviations, the frequency of deviations, and the level of probability of detected rules without having to understand the complex underlying algorithm.

Although the emerging generation of desktop data mining products can't replace a data warehouse in a large enterprise, they enhance the effectiveness of many front-line managers. And they bring data mining to many small and medium organizations that couldn't afford a dedicated data-analysis specialist.


Where to Find


ISoft

Gif sur Yvette, France
Phone:    +33-1-69412777 
Fax:      +33-1-69412532
E-mail:   
info@alice.fr

Internet: 
http://www.alice.fr


IVEE Development

Göteborg, Sweden
Phone:    +46-31-7014260
Fax:      +46-31-101987
E-mail:   
info@ivee.com
 
Internet: 
http://www.ivee.com
 

WizSoft

Tel Aviv, Israel
Phone:    +972-3-5631919
Fax:      +972-3-5611945
E-mail:   
abraham@wizsoft.com


SAS Institute

Heidelberg, Germany
Phone:    +4
9-6221-4160 
Fax:      +49-6221-474850
Internet: 
http://www.sas.com/


HotBYTEs
 - information on products covered or advertised in BYTE


Desktop Data Mining

illustration_link (23 Kbytes)

The decision-making process takes into account the relevant data and a manager's knowledge of the business.


Peter Hofland and Jim Utsler are technology journalists at The Visual Consultancy Corporation in Amsterdam. You can reach them at 100544.307@compuserve.com .

Up to the International Features section contentsGo to next article: Visual Data MiningSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network