HSM saves costs, eases administration -- and beats document management systems.
Mike Hurwicz
A hierarchical storage management (HSM) system is like a robot housekeeper for network storage. It monitors hard drives for not-recently-used files and migrates them to a storage medium that has greater capacity and costs less per megabyte.
Typically, storage media are in this hierarchy:
- hard drive -- most expensive, fastest
- optical -- less expensive, slower
- tape -- least expensive, slowest
Robotic library systems manage both optical drives and tapes, so no manual intervention
is necessary for access.
Save It Smart
Consider a typical word processing file. It starts out on a hard drive and remains there as long as someone actively works on it. After it has been idle for six months or a year (depending on how the administrator configures the HSM system), the
HSM system
migrates the file to an optical library.
A
stub
file or placeholder remains on the hard drive. To users browsing file directories, the stub file looks like the original file. If the file remains idle in the optical library long enough (again, the administrator defines how long that is), the HSM system migrates it once again -- to tape -- still leaving a stub file on the hard drive. (HSM systems can also write directly to tape.)
If you access the stub file, the HSM system transparently restores the original file to the hard drive. The only clue that anything unusual has happened is the extra time it takes. (Well-designed HSM systems display a message for long wa
its, so you don't think the computer has crashed.) Early HSM systems lacked automatic recall, and many people considered them only semifunctional because of this omission.
HSM Simplifies Administration
There are two ways to think about the benefits of an HSM system. One is to think in terms of reducing your average cost per megabyte of data stored. The second is to think in terms of automating routine data-migration tasks and thus saving on administrative effort.
The first perspective assumes that, without an HSM system, you would buy more hard drives as data accumulated. The second perspective assumes that you would migrate data to optical and tape storage manually. In the latter case, there would presumably be no stub files, and users -- or the administrator -- would have to keep track of the new location of the data, so they could retrieve it.
The dollars-per-megabyte argument has largely lost its punch, however, because of the plummeting cost of hard drive storage. It's true tha
t the cost of optical and tape libraries has also come down. However, if you don't have any optical- or tape-library systems now, another hard drive will often be the easiest and most cost-effective fix to a shortage of disk space.
However, adding more disk space creates problems of its own. For instance, very large volumes take a long time to restore and mount after a crash. Backup becomes increasingly time-consuming. These kinds of considerations often lead IS professionals to delete old files that have been backed up. However, if users later need some of those files, retrieving them creates more work for the IS team.
But these arguments have not swayed huge numbers of organizations to buy HSM systems -- or even to use NetWare's free one. However, for some data-intensive applications and environments, an
HSM system
may be natural or inevitable. This is particularly true where many files must be conveniently accessible over an extended period, even though any individual file
is accessed infrequently.
A perfect example would be an insurance company with digitized claim documents and photos. A typical claim is active briefly and then never seen again. A few claims may be referred to months or years later. Both the quantity of data and the access patterns would probably make an HSM system attractive.
Somewhere between 200 GB and a terabyte of data, you may hit a threshold where an HSM system becomes almost a necessity. A dedicated vertical application also makes an HSM system more practical. It is easier to prevent software behavior that cancels the benefits of the HSM system by causing massive numbers of unneeded files to return to hard drives.
Typical bad behavior is a word processor searching for a file based on text in the file. As it searches each file, that file must return to the hard drive. A search of thousands of files results in only one.
Document Management or HSM?
You can avoid bringing back files unnecessarily by using a document man
agement system that indexes all files. When you do a search, you access only the index, not the actual files. Only the required file actually returns to the hard drive. Also, the searches will go a lot faster.
Note, though, that some document management programs index all files regularly by default, in case something might have changed. While thoroughness is nice, it pretty much destroys the value of indexing, as far as efficient storage.
Also, a document management system is likely to be a large, expensive project in any environment that might justify an HSM system. Again, it may be cheaper and easier just to leave all files on the hard drive until they are so old you can delete them permanently after backing them up .
Other programs may bring back even more files than a word processor and even require the actual file -- an index won't do. The classic example: a backup program that reads every file.
Use What Already Works
The most reliable solution to this problem is for the
HSM system vendor to integrate backup, virus-scanning, and document management utilities. These utilities should recognize and ignore stub files. This eliminates your choice of separate backup, virus-scan, and document management products. Plus, nothing ensures that another program won't open every file on the disk -- and everywhere else.
One solution preserves choice, but only a few network-aware programs support it: using special programming calls that open normal files on the disk but fail when they encounter a stub file. Examples are Unix's sopen (stream open) and NetWare's FEsopen (file engines open). Such calls may also leave the last access date unaltered on stub files -- a plus, because compression and HSM systems often use this date to determine inactive files.
Few programs use such calls. Windows utilities, such as a file find done via My Computer in Windows 95, don't use them. Spreadsheets, word processors, and so on don't use them. However, some programs, such as network-aware virus sc
anners and backup programs are increasingly likely to use them.
Unfortunately, compatibility with HSM systems is the last thing on the minds of Windows programmers. The result: HSM systems are best suited to highly structured environments with predictable client behavior. Such an environment may be possible within a loosely structured LAN. The HSM system may operate only on servers, directories, or file types off-limits to most programs.
OS Boosts for HSM
More general solutions will depend on OS support. The earliest example is the Data Migration Interface Group (DMIG) API, which the IEEE Storage Standards Working Group (
http://www/arl.mil/IEEE/ssswg.html
) has adopted. DMIG created the DMAPI specification (Data Migration APIs), now part of Posix P1244.
The user interface is u
p to the implementer, but most implementations allow several ways to open a file, one of which fails if the file has migrated. You can set this mode as the default before running programs such as virus scanners, with no need to modify the application.
NetWare Does HSM
NetWare already includes HSM: High Capacity Storage Service (HCSS). Although free, it's not widely used, perhaps because it does not limit demigration (except by modifying programs to use the FEsopen call).
Mature HSM systems from companies such as Cheyenne and Seagate are based on Real-Time Data Migration (RTDM) APIs in NetWare. Optional client-side software limits demigration or advises you when a demigration will take a while.
NetWare's next release, code-named Moab and expected in the third quarter, will contain Novell Storage Services (NSS), including an improved foundation for HSM. With NSS, file opens can include
quality of service
(QoS) parameters that determine whether a migrated file sho
uld be demigrated. With Novell's improved Client32 (and possibly some help from applications), customers will get three choices in a pop-up: 1) demigrate while I wait; 2) demigrate, but let this open request fail (I'll open the file later); or 3) cancel this request (don't demigrate).
Handling migrated data this way is not new. It happens now if you load the extra client software with Cheyenne's and Seagate's HSM products. The difference: You only have to load Novell's Client32.
In addition, NSS will do away with some limitations of prior HSM systems (e.g., 16 million files per volume). Because mount times are only a few seconds, independent of the volume size, NSS may also eliminate the need for HSM for customers who implement it primarily to reduce volume sizes and mount times. With Novell's
distributed file system
(DFS), to be released later, the client can search remote replicas of a volume for one where a file is still on the hard drive.
NT and HSM, Too
NT
Server 5.0, expected this year, will be the first version with integrated HSM, Remote Storage Server (RSS), developed by Eastman Software. Eastman also makes Open/Stor and an advanced HSM system, code-named Phoenix, based on RSS. Open/Stor 2.1 for NT will contain a feature similar to the above pop-up.
Open/Stor offers two ways to prevent files from demigrating. Programmers can use special calls, similar to sopen and FEsopen, that do not demigrate files. Or the Open/Stor administrator can designate particular user IDs that cannot demigrate files. This prevents a virus scanner running under such a user name from demigrating files.
With RSS, programs will have to use special calls to prevent demigration. Microsoft is now evangelizing the use of such calls more than it ever did.
Also, Microsoft has built new
HSM-friendly
facilities into NT 5.0. NT File System (NTFS)
reparse points
are "extensible file system building blocks providing additional directory and file
functions." Reparse points are special files that hold metadata about another file. The reparse point will provide a standard format for a stub file that HSM vendors can hook into using filter drivers.
If a file open request fails because the file is a reparse point of type HSM, the OS can pass the request to a filter driver. The HSM application can then handle the request, perhaps giving you options. One option might demigrate the file and return the open to the OS with a valid file handle. If filter drivers don't handle the reparse point, the request simply fails.
Today, both HSM and backup systems typically require a dedicated library system. Neither can share. NT Media Services (NTMS), which HighGround Systems is developing for NT 5.0, will allow applications to share tape and optical libraries. That's a big step forward.
Into the OS
Better HSM facilities in NT and NetWare will widen HSM's market significantly. The ultimate? To embed HSM so deeply in the OS that even LAN adminis
trators could forget about it. "Novell is probably a year or two ahead of Microsoft in making all this a reality," says Daniel Blum, a principal with Rapport Communication (Silver Spring, MD).
However, integration with the OS is not the final step for HSM. "Ultimately, many users tell us that they want HSM integrated with applications," says Jeff Drescher, a product marketing manager with Eastman Software. Application integration will also mean new features such as migrating specific database tables. Now, if the database is a single large file, the HSM system must migrate either the whole file or nothing. Portions of the database not in use cannot be migrated.
In short, users want invisible HSM. "HSM as we know it may be a dinosaur," says
Ron Anderson
, manager of microcomputer network services at Syracuse University. Syracuse spent years looking at HSM systems for its NetWare-based LANs, only to conclude it was too costly and difficult. Anderson looks forward to HSM transpare
ntly embedded in the OS and applications. Until then, the university's strategy is more disk space.
Where to Find
Eastman Software
Billerica, MA
Phone: 800-229-2973
Phone: 508-967-8000
Internet: http://www.eastmansoftware.com
HighGround Systems
Boxborough, MA
Phone: 508-263-5588
Internet: http://www.highground.com
Microsoft
Redmond, WA
Phone: 800-426-9400
Phone: 425-882-8080
Internet: http://www.microsoft.com/ntserver/default.asp
Novell
Provo, UT
Phone: 800-453-1267
Internet: http://www.novell.com
| Flag name
| What it means
|
| FILE_FLA
G_OPEN_REPARSE_POINT
| When opening a file, inhibit any effects that may exist due to an associated reparse point.
|
| FILE_ATTRIBUTE_REPARSE_POINT
| The file has an associated reparse point.
|
| FILE_SUPPORTS_REPARSE_POINTS
| The volume supports reparse points.
|
illustration_link (37 Kbytes)

Australia's Commonwealth Department of Administrative Services' data backup/migration architecture uses disks and a tape library.
illustration_link (19 Kbytes)

Most HSM systems consist of these parts.
illustration_link (44 Kbytes)

NSS's improved foundation for HSM functionality includes a QoS parameter to show if the file should be demigrated.
illustration_link (44 Kbytes)

photo_link (83 Kbytes)

"HSM as we know it may be a dinosaur." --Ron Anderson
Mike Hurwicz is a writer and consultant in Brooklyn, New York. You can contact him at
mhurwicz@attmail.com
.