Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesOLAP by Web


September 1997 / Features / OLAP by Web

Using Web-based applications to perform on-line analytical processing builds on the strengths of both technologies.

Udo Flohr

On-line analytical processing (OLAP) may be the most important new computing paradigm of the decade -- next to the Web. Joining the two makes for a powerful technology.

A data warehouse is a central, consolidated database repository for all the data in an organization. It typically allows access to this information by presenting it in a metaphorical data cube , a multidimensional storage model that allows many different views and combinations of the data. After correl ating arbitrary parts of a corporation's data, managers should then be able to see previously hidden emerging patterns -- the trees in the forest, as it were.

OLAP programs make up a category of business software that lets users manipulate a data cube. Typical OLAP operations include consolidate, drill-down (i.e., query refinement), slice, dice, and pivot. Results can be reported in traditional or tabular database formats, as well as in graphical charts. Although this output might be in a fixed format, it often allows the user to directly manipulate the data for further analysis, such as identifying trends, correlations, or time series.

A Marriage Made in Cyberspace

Most OLAP and data-warehousing packages either already do or shortly will have a Web interface, allowing users to access an organization's data via an intranet or the Internet. In a recent report, Wayne Eckerson, a senior consultant at the Patricia Seybold Group (Boston, MA), concluded that by 1998 a Web browser will be driving half of all OLAP and decision-support applications. Despite a number of related problems, Eckerson believes that "the Web is a perfect medium for business-intelligence activities." Here he applies what's called the 80/20 rule: 80 percent of all users have simple query and reporting requirements that Web applications can satisfy. The remaining 20 percent either need high-performance, interactive access to large data sets, or they're developers who require authoring capabilities. This 20 percent segment will, for the time being, continue to use dedicated tools.

One major benefit of deploying OLAP systems using a Web interface is the cost savings. Traditional OLAP packages typically start from $10,000. Web browsers, on the other hand, are ubiquitous. Furthermore, most organizations are at least starting to get some kind of intranet structure, complete with servers, in place. Thus, Web OLAP should allow almost all users in an organization access to at least some analysis functionality. Thanks to the Net, the universal access might also extend to outside users. For example, customers or suppliers could have access to some company information.

The universal Web-browser interface may also help reduce training costs. Most users are already familiar with the process of pointing and clicking on links, and the OLAP query-and-manipulation process is similar.

Another advantage is that the Web is a cross-platform environment for users and developers alike. Users find a familiar environment regardless of their OS, and developers are able to port Web applications.

Thanks to its centralized architecture, the Web helps reduce the cost for client-side distribution and support. The latest version of the client software -- the browser -- can be put on all desktops in an enterprise, and components, such as Java applets or ActiveX controls, take care of their own downloading.

Snags in the Web

A number of drawbacks balance these advantages, however. For example, the Web was originally a medium for the distribution of static files. Therefore, its main problem is that it treats each interaction as a new, anonymous connection. It does not intrinsically remember who you are or what query you were just refining. Furthermore, HTTP, the main Web protocol, does not maintain the state of a session. Programmers have to use tricks to help a server remember users' identities and how far they've progressed in their process. Initiatives are under way to remedy this problem in a standardized fashion.

Another often-cited problem is security. The Net is open to virtually anyone, and Net traffic in its basic form is not encrypted. Companies have therefore been reluctant to put sensitive information on the Net. Since a data warehouse contains the crown jewels of a company's information about its business, there's understandable hesitation about making such data available on the Net.

But that's about to change with the introduction of secure communication tunnels across the Net. These tunnels will enable users to gain secure access to remote data. Firewalls and other authentication systems can also help cordon off internal intranets by restricting access to certain sites.

The Web consists of a number of protocols that make it open and easy to integrate. However, for some OLAP applications, this simple architecture may be too simple. For example, using a dedicated OLAP application to perform a drill-down operation leaves staggered windows on a user's screen that correspond to the stages of the stepwise refinement of the query. These are useful, since the user might want to zoom out again and focus elsewhere. But a simple Web browser does not lend itself easily to such a multiple-document approach: Each new page contains HTML code that is displayed, typically wiping out what was already there.

Web Generations

Some Web sites lend themselves to OLAP better than others. For instance, from the point of view of on-line query-and-analysis tools, Eckerson's study dis tinguishes four generations of Web architectures: file distribution, dynamic HTML publishing, Java-assisted publishing, and dynamic Java publishing. Most business-intelligence tools currently support first- and second-generation architectures.

First-generation Web sites (see the figure "First-Generation Web Sites" ) use a two-tier architecture to provide basic file distribution. Dedicated off-line OLAP tools create reports and store them as HTML files, which might contain text and bit-mapped images, on the Web server. From their standard browsers, users can view or print these static documents, but no interaction is possible. For an updated view, someone has to generate a new report. Hyperlinks might simulate a certain degree of interactivity. For example, by clicking on a link labeled "Northeast Region," a user could navigate to a report providing data for that particular geographical area.

Eventually, though, the growing collection of files will lead to the administrative he adaches typically associated with large Web sites. The main disadvantages of this approach are that users can see only predefined reports, which age quickly, and that all operations have to be predefined.

Second-generation Web sites (see the figure section "Second-Generation Web Sites" ) employ dynamic HTML publishing: Applications create HTML documents on the fly in response to user requests. The environment is actually a four-tier architecture, consisting of Web browsers, Web servers, application servers, and databases.

To query databases and other resources, users fill out HTML forms that their browsers then submit to the Web server. The result is a dynamically generated, but still static, HTML file. Users get the latest data through reports executed live. They can customize the results to a certain extent by changing the values of parameters, which the site designer sets up for them. The Web server itself holds only templates and metadata. The metadata parameters tell the server which information to send to the browser. However, the metadata can also generate HTML tag information that, among other things, helps to maintain state and authentication data over a session.

The Web server submits the user's request to the application server through a gateway. This translates the HTML requests into SQL statements or other database calls. The application server also formats the result for the Web server. For linking such external programs to the Web server, most architectures use CGI.

This approach does not always yield the desired performance, especially in large installations where scalability is an issue. For this reason, native Web-server interfaces, such as Netscape Server API (NSAPI) or Microsoft's Internet Server API (ISAPI), are becoming popular. As opposed to CGI, which forks off a new process for each call, NSAPI and ISAPI use lightweight threads. The downside is that an application written for one of these APIs will not work with the other (or with other servers ). CGI, on the other hand, is portable. A more portable alternative is FastCGI, which has many of the advantages of CGI but, as its name implies, improves performance by cutting down on invocation time.

Third-generation Web architectures, according to the Patricia Seybold Group model, follow the "Java-assisted publishing" approach (see the figure section "Third-Generation Web Architectures" ). These architectures supplement second-generation frameworks with Java applets, ActiveX controls, plug-ins, or other client-side programs. These can provide a better, more interactive user experience that might support local processing of the downloaded data.

This architecture is able to communicate more user-interface events to the application servers on the other end, alleviating many of the shortcomings of HTTP and HTML. The supplementary client-side software can also be a helper application, such as a spreadsheet. The result should resemble a traditional client/server application more closely while retaining the thin-client, Web-based philosophy.

The jury is still out on whether ActiveX controls or Java applets are the right strategy to enhance a browser with more functionality and interactivity. Some perceive Java as slow. ActiveX components, which correspond to the basic building blocks of a Windows application used by hundreds of thousands of Visual Basic programmers, provide a richer and perhaps more mature development environment.

Being currently confined to Microsoft clients and servers, however, ActiveX components are not as portable as Java applets, since they are closely tied to the Windows (and specifically the Win32) architecture. ActiveX components are also heavier than Java because they may bring their own run-time environment of DLLs. At the moment, Java still seems to have the upper hand in this competition.

Finally, fourth-generation sites use a full-blown Java approach. They employ a standard three-tier architecture, dividing processing among a Java app lication server, Java applets downloaded to the client, and a database or resource manager (see the figure section "Fourth-Generation Web Sites" ). The Web server's remaining task is to supply Java applets. After a download, these communicate with the Java server directly, mostly using remote procedure calls (RPCs). The Java server communicates with the back-end resources using Java-clad native database drivers or Java Database Connectivity (JDBC).

Since this type of application is entirely for and on the Web, it circumvents the constraints of HTTP and HTML. The Java server might generate HTML, but it typically outputs data in a proprietary format for direct viewing with the client-side Java browser. Using the latter approach, it has the ability to encrypt communications to improve security.

Points to Consider

Rich Carickhoff of the Application Consulting Group, an organization that specializes in custom OLAP solutions, suggests that when evaluating systems to move t o the Web, you should check the application type. "Information-centric applications migrate nicely to the Web and stand to gain much value from its architecture," he explains. "Systems that present information with a low level of functionality, in multiple formats, and to a broad audience are the success stories."

But he advises against using the Web to deploy function-intensive applications for specialized users. There's too much additional software to administer on the server, too much data has to travel over the various servers to the browser, and there might still be too little interactivity.

In his recent report, Eckerson also concludes that interactivity, potentially including support for tables, charts, maps, and other visual output, is most important when specifying requirements for Web-based OLAP tools. Nearly as important are performance and the number of functions the Web-enabled version includes. Most tools, he says, currently don't allow browser users to apply new calculations to a re sult set. Other aspects to consider include scalability (which may require load-balancing) and support for a wide variety of back-end databases.

A final aspect to consider is the pricing model: Does the vendor charge per seat, or does it take the number of concurrent users into consideration? Eckerson concludes that the fourth-generation business intelligence tools to succeed will be the ones that simulate a client/server architecture over the Web.


Four Generations of Web Access

illustration_link (30 Kbytes)


Udo Flohr is a BYTE contributing editor based in Hannover, Germany. You can reach him by sending e-mail to flohr@dfn.de .

Up to the Features section contentsGo to previous article: Unclogging the PC BottlenecksGo to next article: Vendors and Products
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network