Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesManaging Storage


June 1994 / Special Report / Managing Storage

The challenge of managing decentralized storage is to make it accessible with optimal performance at an affordable price

Andy Reinhardt

Storage management can mean anything from data backup and monitoring the health and performance of disk drives to the use of new distributed file systems. A better definition looks past the means to the ends of storage management: to provide users with optimal system performance and minimum downtime while protecting electronic assets.

As an organization makes data more available, it also must more aggressively protect it. Implementing rigorous media monitoring, backup, and file-grooming procedures, for instance, may seem like a burden to users, but it's the only way to ensure that the right data will be there when they need it. In fact, as organizations shift from hosts to distrib uted models--or as their LANs evolve upward to assume more enterprising functions--they are starting to demand storage management tools as robust and sophisticated as those tools created for mainframes.

The move to distributed computing is thus being matched by a concurrent shift toward centralized storage management. "The notion of distributed computing is a distinct issue from distributed storage," says Bill North, the director of advanced products for Storage Dimensions (Milpitas, CA). "Even in the most distributed environments, you're trying to put your data in a centralized place, so it can be managed."

Of course, the meaning of centralized can vary; hard drives and other devices may themselves be physically dispersed in a distributed environment, because storage management is implemented largely at a logical level. Whether data is consolidated on a single giant server or scattered across an array of smaller systems, centralization gives network managers a unified view of storage resources and lets them assert control over otherwise haphazard processes such as drive-performance monitoring and backup.

Asset protection is one of the factors driving storage management into LANs. "Sometimes data at the desktop is as valuable--as strategic--as the data in the server or the mainframe," says Jay Carlson, president of Vinca (Orem, UT), a start-up company pioneering new storage architectures. "The issue of recentralization of data will dominate distributed computing over the next decade." Another major driving force is cost-containment. Mike Peterson, president of research-firm Peripheral Strategies (Santa Barbara, CA), says: "The big myth is that storage in PC LANs is free; it may cost only $1 per megabyte, but the labor cost of managing that is pushing $8 per megabyte per year."

Fortunately, implementing storage management need not turn into another tug-of-war between MIS departments and freewheeling users. If properly executed, storage management is not only transparent to users but ult imately helps them retrieve data more easily, spares them the burden of looking after their own disks and data, and boosts system performance. Unfortunately, many of the storage management products available today for LANs address only part of the problem, and most don't work together in a unified framework.

Making the Best Use

Assuming that you already have adequate security, reliability, and physical management, the key challenge in storage management is to make the best use of your available media with the minimum need for human intervention. This is the fertile area in which HSM (Hierarchical Storage Management) resides.

Some people see HSM as an extension of backup and archiving, but in fact, its technology is much closer to that of a cache. As in a cache, the point of HSM is to keep the most current, the most frequently accessed, and the most urgent data as close as possible to the place it is being used, while intelligently discarding that which is no longer needed. The limited space i n the cache is at a premium, and not everything needs to be there.

The algorithm that drives backup is based on whether a given file has already been archived and whether it has changed since the last archive. Backups don't delete the source files after copying them to the secondary media. And backup software, which operates in a batch mode, is optimized for backup speed so that the process takes the least possible time and interferes as little as possible with normal system operations. Restores can be done more leisurely because they're assumed to be infrequent.

HSM is nearly the opposite. Migrating files, which involves making a copy and deleting the original, takes place in continuous sweeps, as thresholds are crossed or trigger events occur (i.e., files pass the one-year age mark), and can thus proceed during idle processor cycles with no particular urgency. But demigration has to be instantaneous: If you require a file that is no longer on-line, the load operation ideally ought to take only marginally longer than it would have if the file was on disk.

The algorithms used in HSM to choose which files to migrate weigh several factors. The primary criterion is usually disk capacity; the administrator sets high and low thresholds, or watermarks, and the HSM engine keeps disk capacity between these levels. When the high threshold is crossed, the engine typically looks for the oldest eligible files to move. But to minimize the potential need for demigration, sophisticated algorithms will choose one large but younger file in preference to lots of small but older files.

When a file is migrated off the primary storage medium, HSM systems often leave behind a placeholder, or token, that consists of a pointer to the file's new location or an index entry in a server database that tracks the actual location of the file. The latter approach is safer, especially if the database is redundant. Unless the HSM is tightly integrated with the underlying operating system, it's more difficult for it to trap disk calls and redirect them to a separate file manager.

The trickiest problem in HSM is how to handle the situation when you request a file that needs to be demigrated but is too large to fit onto the space available on the hard drive. According to Robert Wight, president of Avail Systems (Boulder, CO), his HSM system accommodates this problem by premigrating the files that are next in line for migration. Premigrating means that the files are left on the drive and also copied to the next level down; therefore, if you quickly need the space the files occupy, you can delete the files from the hard drive and restore them later.

Similarly, some HSM systems fail to quickly remigrate a file that has been demigrated because they see it as having been recently accessed, which fools the algorithm into thinking the file is current. Avail's software treats remigrated files instead as if they were premigrated, which means that they go back to a lower level in the hierarchy as soon as the next sweep oc curs.

HSM is one of the hottest topics in storage management right now because it solves several problems at once. Migrating old or infrequently used files off onto inexpensive media such as removable optical disks or tape not only frees up space on the primary device for more current or important files but also reduces the average cost of your storage. "By implementing HSM, you put a stop to your on-line growth and grow into near-line media instead," says Mike Kidd, vice president of marketing for Palindrome (Naperville, IL). HSM is not a substitute for backup but rather complimentary to it.

Another benefit of HSM, says Avail's Wight, is that it increases aggregate network performance by optimizing access time for the data you're most likely to need. "With HSM, the focus of network drives becomes speed, not storage," he says. "Your concern becomes performance, not capacity." In a complex HSM pyramid, you might have ultrafast cached hard drives that are layered above 10-GB single-spindle Seagate drives, which are layered on top of an optical jukebox for near-line storage, which is layered above a tape library.

HSM comes from the mainframe world, where it was used to minimize storage costs that were 10 times what they are in distributed networks, says Robert Hamilton, product manager for tape and optical-storage products for Storage Dimensions. In fact, the relatively low cost of storage in distributed systems has made implementing HSM less urgent and consequently has held back its market penetration, he argues. "Customers aren't asking for it because they're so delighted not to be paying $10 per megabyte."

Indeed, the low cost of distributed storage has encouraged users to buy more drives rather than to use existing drives more efficiently, and this has ultimately led them to seek HSM for a different reason: as an easy way to implement disk space management. "It's a way of prioritizing your data," says Igor Stenmark, program director for the software management strategies service of th e Gartner Group (Stamford, CT). Only 25 percent of the benefit of using HSM accrues from lowered media costs, says Palindrome's Kidd; 75 percent of the benefit comes from reduced management overhead, because you no longer have to worry about space planning and file grooming.

The small number of vendors who now sell LAN-based HSM--primarily Conner Storage Solutions (Lake Mary, FL), which licensed its software from Avail, and Palindrome, whose HSM module layers on top of Network Archivist--argue over the fine points of their implementations, but they are functionally very similar. The basic requirements for an HSM are that it support a hierarchy of storage devices, ranging from the fastest and most expensive hard drives to low cost-per-megabyte tape autoloaders. Several analysts consider solutions that support only a fixed number of levels to be inadequate.

Using an algorithm that takes into consideration disk capacity thresholds, file aging, and sometimes file type (i.e., you can set the rules so that certain types of files such as executables or DLLs are never migrated), a rules engine does the following: It watches the drives to make sure they stay within their thresholds; it moves files from one medium to another as necessary; and it tracks file locations in a database. When you try to load a file, the engine intercepts the request and looks it up in the database; if the file needs to be remigrated from tape or an optical drive, it's copied back to the hard disk and given to you.

A function that is intimately related to file access might seem like an obvious candidate for inclusion in the operating system, and in fact, both Novell and Microsoft are moving to support HSM. However, as with other third-party functions that get added to the operating system, their support will consist of a simple "out-of-the-box" implementation coupled with an API that allows richer external products to plug in. "We believe that operating-system companies already have a lot on their plates, and they won't fill this niche completely," says Hinda Chalew, director of strategic marketing for Cheyenne Software of Roslyn Heights, New York.

Unified View

One problem faced by network administrators is the so-called "swivel chair effect," caused by a proliferation of network management consoles on their desktops. Eliminating multiple displays requires integrating management applications into a common framework. Because of the widespread adoption of SNMP (Simple Network Management Protocol), most internetworking devices such as hubs and routers can report their status to, and be managed from, a single console. But storage management systems have remained largely isolated.

Legato Systems (Palo Alto, CA) is working on an SNMP agent for its storage management products that will let the products be integrated into Novell's NMS (NetWare Management System) and into enterprise-wide IBM NetView environments. A similar capability is expected from Cheyenne this year. When these capabilities arrive, network managers wil l be able to monitor and configure storage resources with the same user interface and display they use to analyze network performance, manage network hardware, and even set up user accounts and permissions.

Intel (Hillsboro, OR) is also moving to integrate support for its StorageExpress, a dedicated NetWare-based backup server, into LANDesk Manager, its Windows-based network management system. StorageExpress now snaps into Novell NMS, and LANDesk communicates via SNMP to high-level frameworks such as NetView. The benefit of linking these capabilities together, says Ed Guzman, strategic marketing manager for networking storage products at Intel, is that storage events can trigger network management events, or vice versa. For instance, the failure of a local drive could prompt the hub to close access to that node and kick off a backup sequence to another device.

Future Files

If storage management in distributed environments raises problems and requires solutions that didn't exist in centralized systems, this is only the tip of the iceberg. Just around the corner are fundamental changes in the conception of file systems and documents that may require a rethinking of storage management.

The technological shifts underlying this trend are the emergence of "locationless" network services and the rise of objects and object file systems. In today's computing model, a file stands on its own and resides in a specific place. Advanced operating systems like Windows NT support rich data typing, yet files are still backed up and migrated using conventional attributes such as date/time stamp or archive bit.

But when software moves to a document-centric model, as Microsoft promises to do with the Cairo operating system, the meaning of a file changes. "Smart documents will require us to handle backup very differently," says Alan Adamson, director of product management for Symantec/Peter Norton Group (Santa Monica, CA). "A document will no longer be a single file but rather a book of pointers to text objects, data objects, images, fonts, and so on." Backing up with conventional approaches could cause the linked objects to be separated onto different media, making restoring a nightmare and endangering the integrity of the links.

Obviously, backup and HSM for compound documents will have to respect links, grouping related files together. Microsoft's Greg Lobdell, lead product manager for Microsoft's business systems division, says that this isn't difficult if the operating system supports object management. The tougher problem, he says, is giving storage management routines enough "smarts" that they won't try to back up a huge file over a 9600-bps modem line or make multiple copies of the same application program or repeatedly back up linked objects that have not changed.

Windows NT and NetWare 4 offer locationless network services today, Lobdell says, and "tomorrow, we'll have locationless information access." However, he adds, making storage a more readily available resource, "puts more stre ss on query mechanisms and tools to help you find it." Changes that occur in the user environment to provide access to the vast wealth of interconnected global data sources will have to be accommodated in storage management systems.

Another big change could come on a more physical level: the emergence of specialized storage hardware that assumes some of the role now played by software management schemes. "Long term, I see reduced importance for file servers, the explosion of application servers and storage servers, and the rise of dedicated backup servers," says Mike Peterson of Peripheral Strategies.

One interesting development in this area is coming from Vinca. The company is evangelizing an architecture for Storage Access Networks, or SANs, which consist of intelligent storage devices connected together on their own distributed network. The idea behind a SAN is to move responsibility for file access and storage management off the file server, freeing it up to handle user requests. The result could be higher storage bandwidth, improved reliability and manageability, and greater flexibility of configuration.

The combination of distributed, object-oriented file systems and intelligent storage servers will dramatically alter the landscape of storage management, but the net result for users will be greater ease of access to information and greater data reliability, with less of a need for any human intervention.

Whether you use a GUI or some other scheme for finding and manipulating files, and whether you back up via disk mirroring, RAID, optical jukeboxes, or tape libraries, the ultimate goal of storage management is to ensure that data is there when it's required. By implementing sophisticated backup, HSM, physical resource management, and access control, you are actually making data more available than if it goes unmanaged and has the potential to be lost. And implementing centralized storage management does not contradict a movement toward distributed computing; in the words of Stan Corker, an IDC analyst based in San Diego, "Virtually, the data is distributed to clients; physically, it's grouped at servers; and logically, it's centralized." The network makes these distinctions invisible to the user.


Trade-Off Between Performance and Cost Per Megabyte



In the classic storage pyramid, access speed is closely correlated to cost. The rarest and most expensive storage types, such as cache memory and flash RAM, cost orders of magnitude more than hard drives or removable media such as tapes and optical disks on a dollars-per-megabyte basis but also deliver data hundreds of times faster. Storage management optimizes the use of all media types.


                                Cost/megabyte
Medium                          (range)         Access time
Solid-state                     $60-100         less than 3 ms
RAID                            $2-10           9-20 ms
Hard drive                      $0.8-2          9-20 ms 
Optical (single platter)        $1-4
            50-100 ms
Optical (jukebox)               $.4-1           15-30 seconds
Tape (single)                   $.4-2           30 seconds to 3 minutes
Tape (autoloader)               $.05-1          1-5 minutes
Tape (archived, off-site)       $.05            Hours
Source: Dataquest




What to Look for in HSM



-- support for unlimited layers of media hierarchy
-- media independence
-- a rules engine that supports capacity and time thresholds,
   exceptions by file type, and forced migration
-- migration optimized to create minimal need for demigration
-- the ability to remigrate files quickly without manual intervention
-- support for data typing when it exists in operating systems
-- fast demigration


Illustration: Hierarchical Storage Layers Hierarchical storage automatically shifts older or less frequently used files away from the fastest and most expensive storage media to other media, such as optical or tape, that are slower and les s expensive. The physical location of the storage is unimportant: HSM can centrally manage media distributed throughout a network.
Andy Reinhardt is BYTE's West Coast bureau chief. You can reach him on the Internet or BIX at areinhardt@bix.com .

Up to the Special Report section contentsGo to previous article: Defining the Client/Server Distributed ModelGo to next article: Distributed and SecureSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network