Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesWolfpack Howls Its Arrival


August 1997 / BYTE Software Lab Report / Wolfpack Howls Its Arrival

With Wolfpack, two NT servers can act as standbys for each other while both still do useful work.

BYTE Editors

Microsoft, the 800-pound gorilla of the software industry, is set to release a new extension for its NT Server OS that will dramatically change the server landscape and allow NT networks an unprecedented degree of reliability and fault tolerance. Popularly known as Wolfpack, this new product will, for the first time, allow built-in clustering--the ability to interconnect two or more servers so that one can automatically take over another's processing in case of failure, with minimal di sruption to end users. To a user, clustered servers appear as a single entity, even when the client is accessing several servers in different locations.

Clustering NT servers (not to mention those using other OSes) isn't a brand-new idea, but it's never been hooked directly into the OS before--where it really belongs, in our judgment. Heretofore, there have been a variety of clustering solutions from a number of vendors, most of them requiring dedicated hardware links and proprietary hardware/software bundles. Many of these vendors have been working with Microsoft and are making plans and products to confront what will be the new market reality. Phase one of Wolfpack's release is scheduled for this month. It will support two-server clusters. The second phase will follow in 1998 and enable clustering more than two servers.

This report is based on tests by both BYTE and NSTL of the second beta release of Wolfpack. In addition, we look at some important issues surroundi ng clustering technology, many of which involve limitations that have been ignored or glossed over by vendors. Finally, we take a quick survey of the existing products in the market, with a table summarizing their features and a sidebar describing their plans and positions vis-à-vis Wolfpack. (Early on, we planned to conduct a comparative look at cluster solutions, but because no common hardware configuration has been feasible, we couldn't conduct BYTE's usual apples-to-apples performance comparisons.) To help you better understand Wolfpack's capabilities and limitations, we'll quickly review the basics of clustering.

Why Cluster?

The whole point of clustering is to maintain "high availability" of computing resources to end users. To do this involves three essential functions: fault tolerance (called failover ), load balancing, and centralized administration and monitoring. Fault tolerance ensures a backup to replace a failed resource (e.g., server, router, or network). Load balancing detects when processing overloads one resource to the point that it's virtually unavailable and distributes the load among less-burdened resources. Central management of clustered servers lets administrators monitor and control the cluster from a single console, both to troubleshoot failures and shift resources for routine maintenance.

Unfortunately, most clustering products, including Wolfpack, provide only automatic failover and management. Load balancing is a manual operation, though some third-party systems may provide additional software components or add-on products to help with this.

The heart of any clustering implementation is redundancy. Have two or more of everything, so that if any single resource on the network fails--whether it be a server, server network adapter, disk drive, application, router, or segment--the system will automatically detect this and swap in a standby component. Wolfpack knows about the following NT resource types: Fault-Tolerant Disk Set, File Share, Generic Application, Generic Service, Internet Information Server (IIS) Virtual Root, IP Address, Network Name, Physical Disk, Print Spooler, and Time Service.

While it's clearly possible to set up a cluster with an extra server standing by, connected to the network but idle, waiting to take over if it's needed, this configuration (called active/passive or asymmetric) is hardly cost-efficient and rarely justifiable. Instead, the usual practice is to have each server active, doing useful work but ready to take over the other's processing if it should fail. In addition to the servers' LAN connections, a second private connection, called the interconnect , is usually established so the two servers can monitor each other.

Achieving fault tolerance in a client/server information technology (IT) environment means addressing a number of hardware and software issues: continuing electrical power, multiple servers, redundant data storage, backup network links, and failover management softwar e.

Power to the Process. All hardware required for continual services must be connected to an uninterruptible power supply (UPS) that allows time to switch to a backup generator or, if necessary, to conduct a fast but orderly shutdown.

Many Machines. You can reduce the possibility of downtime simply by dividing tasks up. A Web server on one machine and an e-mail server on another means that one server going down won't cause both applications to fail.

Share the Storage. Disk mirroring or replication techniques between servers ensure that data--and possibly applications--will be available should a disk drive or server fail. Right now, SCSI is the gold standard for shared-disk technologies, but it has limits (see the sidebar). One of them is that the distance between clustered servers is limited to only 25 meters. Also, non-SCSI failover systems can make the server cluster vulnerable to network partitioning. In the future, technologies such as Fibre Channel, Serial Storage Archit ecture (SSA), or I 2 O may provide dedicated disk sharing over longer distances.

The Dept. of Redundancy Dept. Adding an additional connection between servers helps reduce the possibility of communications failure over the network.

Manage the Monster. Failover management software offers a way to detect hardware and software failures and invoke backup, standby, or takeover technologies. Failure-detection parameters require some fine-tuning by the administrator. A too-sensitive failure test will cause needless switch-overs, but a test that's not sensitive enough risks the loss of services. A redundant dedicated interconnect between servers makes for more reliable failure detection. NSTL technicians had difficulties with NT's deadly "Blue Screen" after trying to uninstall some clustering packages. Thus, it's prudent to make an emergency repair disk prior to installation.

Simple stateless Web services are fairly straightforward to migrate, but stateful applications (e.g., datab ase applications) are more difficult and may require special add-on kits. For greatest flexibility, failover software should offer an API to let in-house programmers add failover code to custom and homegrown applications.

What Wolfpack Does

To create a Wolfpack-based cluster, you need two (no more, no less) NT 4.0 servers (with Service Pack 3 installed) that share a SCSI bus supporting an external disk-storage subsystem ( see the figure ). Both servers must be members of the same NT domain, and each must have its own system disk on a local, unshared bus.

Wolfpack enables the two servers to exchange their status, resources being run, and activity with each other. Two components of the clustering software are the Cluster Service and the Resource Monitor. The Cluster Service, which runs on every clustered server, controls cluster activity, communication between servers, and failure operations. The Resource Monitor checks the assigned states of targeted resources (i.e., off -line, off-line pending, on-line, on-line pending, or failed) and reports any state changes to the Cluster Service. Each server can run one or more Resource Monitors.

The primary monitoring communication between Wolfpack nodes is called heartbeat synchronization . Basically, each node is always checking whether the other is still there and ticking. If a node's Resource Monitor determines that the other node has disappeared, the Cluster Service executes the predefined failover instructions. Because there is a separate Cluster Service and one or more Resource Monitors on each node, this cluster communication takes the form of interprocess communications (IPC) and requires little network overhead. This traffic is small enough that it can be run over a private Ethernet LAN (usually called an Interconnect), a public LAN, a serial connection, or even the SCSI bus, though the last one isn't recommended.

The administrator can specify two polling intervals and a time-out value for resources. The pollin g intervals affect how often the Resource Monitor does its checks. There are two levels of polling, known in Wolfpack jargon as Looks Alive and Is Alive. In Looks Alive polling, Wolfpack performs a cursory check to determine if the resource is available and running. Is Alive polling is more thorough, with Wolfpack determining if the resource is fully operational. The time-out value specifies how long the Resource Monitor should wait for a response before it considers the resource failed.

Planning to Fail

The most significant advantage Wolfpack offers over current clustering solutions is its tight integration with NT. For example, Wolfpack lets you group NT resources with applications into failover groups. When a single resource fails, Wolfpack fails over the entire group to which the failing resource belongs. This provides a handy means of creating failover dependencies and ensures that a failed service will have the appropriate resources it needs to restart. Some systems require involved scrip ts to accomplish what Wolfpack allows via prompted dialog boxes and mouse-clicks.

Automatic failover isn't always possible, unfortunately. Some applications can run on only one node on the cluster and in case of failover would have to be manually started on the other node. Some applications (e.g., IIS, FTP) can be managed and configured to automatically start on the other node in the event of a failover.

Wolfpack's migrating functions and resources to the alternate server, when its cluster cousin fails, let the IT staff troubleshoot and fix the problem. But how do you restore resources to the original, failed-but-fixed server (a process called failback)? Can you, and should you, automate it? It might seem that automatic failback is the best solution, but only if the problem is really fixed and unlikely to recur. If not, automatic failback can cause subsequently failed resources to bounce back and forth between servers, causing problems for users. Restricting failback to a deliberate manual action by I T personnel can eliminate this ping-pong effect.

Cluster Management

In an ordinary server environment, users employ a number of administrative tools to identify the servers and monitor their contents and activities. Wolfpack uses a single program, the Cluster Administrator, to centralize control over applications and services. You can run it as a client from any NT workstation attached to the cluster. All cluster resources appear as hierarchically organized objects that you can assign and configure with relative ease.

Cluster Administrator manages services, file shares, and directory replication. It allows reviewing the activities and failures of the computers in each cluster to determine which nodes are currently running applications and services. Color denotes resource ownership--that is, the colors change when a failover occurs, an instant notification that also tells you which server owns what resources. Cluster Administrator lets you specify the applications and related components that run on the servers and establish policies that monitor availability and recovery failure detection. Manually taking individual nodes off-line for maintenance involves only a right mouse-click to fail services and resources over to the other server.

While failover and failback are handled well, load balancing is still a problem under Wolfpack. It's neither automatic nor dynamic; in fact, it's completely a manual process . Therefore, you need to carefully monitor cluster loads, because it's possible for one node on the cluster to be serving 200 users and the other node handling only a few clients. And, unfortunately, there may be nothing you can do to fix it.

At BYTE, we installed Wolfpack on two Digital Equipment servers (200- and 166-MHz Pentium systems) sharing a single external SCSI cabinet with two 2-GB hard drives. Setup was quick and easy. The first node creates the cluster--cluster name, IP address, alias information, groups, etc. When the second node joins this existing cl uster, we could assign resources and define failover procedures.

We tested manual failover (of IIS server, SQL server, and disk resources) by moving resources back and forth using Cluster Administrator. We shut down one node to test automatic failover. In all cases, recovery seemed nearly instantaneous. Cluster Administrator was also smart enough to prevent us from assigning new resources to the now-missing node.

Pick the Pack?

The reality of clustering for NT, right now, is that neither Wolfpack nor any of the available clustering products for NT fully implements all the functions and concepts that BYTE believes constitute true clustering. Available products provide add-on kits to support a short list of programs, mostly databases. Wolfpack adds much of the required functionality directly into the OS and provides common APIs for custom solutions. But if you need to cluster more than two servers, you probably can't wait until Wolfpack grows up some more. Thus, one of the other products, inc luding some non-NT clustering solutions, may be a better choice. Still, there seems little doubt that Microsoft will soon be the leader of the pack.




Where to Find


Digital Clusters for Windows NT.................$995

Digital Equipment Corp.
Maynard, MA
Phone:    800-344-4825
Internet: 
http://www.digital.com/

Enter 1013 on Inquiry Card.

FirstWatch for Windows NT Server..............$4,995

Veritas Software
Mountain View, CA
Phone:    800-258-8649
Phone:    415-335-8000
Internet: 
http://www.veritas.com/

Enter 1014 on Inquiry Card.

HACMP.......................Call for pricing options

IBM Corp.
Somers, NY
Phone:    800-225-5249
Internet: 
http://www.ibm.com

Enter 1015 on Inquiry Card.

Isis Availability Manager.....................$1,500 - $2,500, NT


..............................................$5,000 - $10,000, Unix

Isis Distributed Systems
Marlborough, MA
Phone:    800-258-0990
Phone:    508-460-2430 
Internet: 
http://www.isis.com

Enter 1016 on Inquiry Card.

LifeKeeper for Windows NT.....................$1,495

NCR
Dayton, OH
Phone:    800-774-7406
Phone:    937-445-5000
Internet: 
http://www.ncr.com

Enter 1017 on Inquiry Card.

Octopus and SASO..............................$1,499

Octopus Technologies
Langhorne, PA
Phone:    800-919-1009
Phone:    215-579-5600 
Internet: 
http://www.octopustech.com

Enter 1018 on Inquiry Card.

Standby & On-line Recovery Server.............$1,499

Compaq Computer
Houston, TX
Phone:    800-652-6672
Phone:    281-370-0670
Internet: 
http://www.compaq.com/

Enter 1019 on Inquiry Card.

Standby Server for NT.........................$2,999

Vinca Corp.
Orem, UT
Phone:    888-808-4622
Internet: 
http://www.vinca.com

Enter 1020 on Inquiry Card.

Wolfpack......................Price to be determined

Microsoft Corp.
Redmond, WA
Phone:    206-882-8080
Internet: 
http://www.microsoft.com

Enter 1021 on Inquiry Card.

HotBYTEs
 - information on products covered or advertised in BYTE


Features

  Digital Clusters for Windows NT FirstWatch for Windows NT Server HACMP Isis Availability Manage r LifeKeeper for Windows NT Octopus and SASO Standby & On-line Recovery Server Standby Server for NT Wolfpack
Number of servers clustered 2 2 16 Up to 100 2 or 3 2 2 2 2
Supported OSes NT 4.0 SP2 NT 3.51 or 4.0; Solaris AIX NT; Solaris NT NT NT 3.5x, 4.0 NT 3.5 or higher; NetWare; OS/2 Warp NT 4.0 SP3
Systems supported, if restricted     RS/6000 family HP-UX NCR Worldmark or S series   ProLiant & ProSignia    
Identical servers required             (S tandby only)    
Requires shared-disk subsystem * *     *   *   *
GUI-based management * * * * *   Use Compaq Insight Manager   *
Load balancing     Use Load Leveler *          
Client software required *                
Special API supplied   *   *          
Inter connect type NIC 2 NICs/server NIC NIC NIC NIC Serial NIC NIC
Failover mode: A/A or A/P A/A A/A A/A or A/P A/A A/A, A/P, three-way (N/A) A/A (on-line); A/P (standby) A/P A/A
Resources Protected
Shared disk         *       *
Generic applications       * *       *
Specific applications, via kits         *        
Generic services                 *
IP address *   * * * *     *
Network name           *     *
File sharing     *   *       *
Print services       * *       *
Time service                 *
Name service       *          
Microsoft Exchange                  
Protocols Supported
TCP/IP * * *   *        
NetBEUI *                
IPX/SPX *                
Heartbeat Monit oring
Network connection * * * * * *   * *
Shared disk   * (Unix)     *        
Serial port         *   *    
Key:   * = yes;  A/A = Active/Active,  A/P = Active/Passive;  N/A = not applicable

Back to Basics

illustration_link (7 Kbytes)

The basic configuration of a Wolfpack cluster is quite simple.


Leader of the Pack

screen_link (68 Kbytes)

Wolfpack should handle failover automatically, but a lot of manual administration goes into configuration.


Up to the BYTE Software Lab Report section contentsGo to next article: Balancing ActSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network