Veritas provides flexible, secure data storage for Unix SVR4.2 systems
Tom Yager
When you save a file, you trust your system to put it in the right place. But what justifies that faith? PC users routinely depend on file systems that date back to CP/M. Easily disrupted, these data-layout schemes have spawned a whole subindustry for disk-repair and file-restoration utilities. But why should you have to pay extra for one of the most important, and most basic function of your system? An operating system's primary job is to store and retrieve disk-based data, and it should tackle that job with the determination of a celebrity attorney.
The best example I've seen of how a file system should work comes from a quiet c
ompany called Veritas. This vendor has created a well-rounded file-storage and management scheme built around vxfs (the Veritas file system). Designed for systems running System V Release 4.2 Unix, vxfs defines a flexible structure for file layout, and provides a set of tools to manage that structure.
File System Woes
Despite Unix's maturity, most commercial implementations still use some variation of s5fs (System V) and the BSD-derived ffs (Fast File System). Differences between the two have fueled operating-system wars for years. It's a battle ffs deserved to win. It accesses much faster than s5fs, thanks to the replacement of linked-list free-space tables with random-accessible bitmaps. ffs splits each logical drive into cylinder groups, distributing vital data structures across the drive's geography, and keeping structures close to the data they manage. ffs offers users clear advantages, too, such as 255-character file names (vs. s5fs's 14-character limit) and symbolic links,
pointers to files in other directories.
However, ffs isn't perfect. Let's say there's a power failure while your system is writing a file. That affects the process involving both the writing of data blocks containing the user-visible portion of the file and the altering of various file-system structures. The writing of new data may require changes to directories, free-space maps, inodes (which hold housekeeping data for the file), and resource counters. The system can't change all this data simultaneously. When the system needs to change several pieces of structural data, the premature interruption of that process--by, say, a power failure--corrupts the file system. Space is allocated in the free map, but no data is written. Or perhaps a new directory entry is created, but the inode for the directory hasn't been updated to reflect the directory's new size.
The traditional approach to fixing these interrupted data postings has been to run a utility called fsck, short for file system check. fsck
walks through all the file system's data structures, locating trouble. It checks the inode and directory data, making sure that valid files are using all the space allocated to them. Except in cases of physical damage and unusual circumstances, such as sabotage, fsck can restore a file system to health, usually by discarding partially completed operations. However, on the multigigabyte drives common to modern systems, fsck takes a long time to run. The system remains unavailable until fsck finishes.
Another ffs shortcoming relates to management. Unix's file layout is superior to other operating systems, letting you mount a new drive at any location in the directory hierarchy. If your database outgrows the drive it shares with other applications, you can give it a drive of its own: Delete its previous directory, and replace it with a mount point (an empty directory) of the same name. Then attach the new, larger drive to that mount point. The change is transparent to users and to applications.
But
what happens if that database outgrows the drive you allocated to it? That's a thorny problem, leaving most administrators with the unpleasant duty of backing up the old drive, replacing it with a still larger one, and running the database in the larger space. And what about drives with more than one file system? To keep errant (or irresponsible) users from chewing up all your disk space, you might put the users' home directories in a separate file system. As you add users to the system, however, you may need to make that file system larger. With ffs, you must restructure all the file systems on that drive, wiping out existing data, to change a file system's size.
A Transparent Solution
The Veritas file system started life with a dilemma: No matter how innovative its approach, it had to be transparent to be accepted. To manage this, Veritas capitalized on a feature of System V Release 4.2 known as the installable file system. This provides a means for extending the operating sys
tem's set of support file systems. Once extended, the operating system--all the way down to the kernel--knows how to talk directly to the file system, and layout-sensitive storage-management tools (like mount and the fsck file-system check/repair tool) are infused with knowledge of the new layout. That's how vxfs gets in the door.
Once there, vxfs sets about solving the weaknesses of ffs and other Unix file-system layouts. Probably its most well-known feature is its intent log. This is a circular buffer that holds a list of pending changes to the file-system structure. It adds a brief step: First you log it, then you do it. If the system loses power in the middle of changing a resource table, no problem. Instead of climbing through every structure in the entire system to sleuth out the missing link, Unix looks at the intent log. If a log entry is complete and valid, it replays it, painting the changes described in the log onto the file-system structures. If a log entry is incomplete or invalid, it is d
iscarded.
The likelihood of lost data is somewhat diminished because the intent log absorbs data faster than the scattered data structures. But the biggest advantage of the intent log is increased system availability. Large systems doing failure recovery can spend several minutes or even hours slogging through every structure on every drive with fsck. The same system, using vxfs, need only spend a few seconds playing each file system's intent log. The system is back on-line in a flash.
Beyond intent logging, vxfs distinguishes itself by supporting spanning, mirroring, and striping. Veritas' Volume Manager extends the basic layout that vxfs provides. This lets you reach beyond the confines of physical drives in managing your storage and provides data-protection features.
To do its work, Volume Manager carves physical drives into subdisks (see ``
Volume Manager
'') unmanaged blocks of disk sectors. To build a new file system, you first select one or more subdisks to contai
n the data. Volume Manager combines these into a plex and lets you initialize the file structure.
The subdisks you select for a file system can exist anywhere, even on different physical drives. Volume Manager spans subdisks transparently, building what appears to be a contiguous file space from storage scattered across two or more drives. What's more, Volume Manager lets you add subdisks to an existing file system without altering its data. You can shrink a file system by removing subdisks--all while the file system remains mounted and available.
Volume Manager supports mirroring, the protection of data through duplication, the same way. You just set up two subdisks with one defined as the mirror for the other. If one disk in the mirrored set should fail, Volume Manager will sense the failure, report it, and continue running with the good drive. You can enable block-change logging to extend the fast recovery benefits of vxfs to mirrored volumes.
Normally, when you combine multiple subdis
ks to create a file system, data is written to those subdisks in the order in which they were joined. Data is written to subdisk 1 until it is filled, then to subdisk 2, and so on. On striped file systems, the first block of data is written to block 1 of subdisk 1. The next block is written to block 1 of subdisk 2, and so on, with each subdisk filling ``from the top down'' in concert with the others. Each subdisk must be on a separate physical drive, and preferably, subdisks should be equally split amongst two or more disk controllers. The result is increased performance. At the minimum, seek time will be reduced relative to the number of subdisks applied to the volume. At best, with multiple disk controllers, the system will gain the ability to write to several subdisks of a striped volume simultaneously (or very nearly so).
One last vxfs facility worth mentioning is the snapshot. This creates a new, read-only file system that is a duplicate of an existing one. This is most often used to create on-lin
e backups of vital data. vxfs doesn't copy an entire drive to create a snapshot. Instead, it creates a set of dummy file-system structures that point to the real file system. As the real file system changes, the snapshot is altered to ensure it retains an accurate image of the file system at the time the snapshot was taken. A deleted file, for example, is copied to the snapshot image before being deallocated from the real file system. Snapshots are a convenient and space-efficient way to protect yourself prior to making some potentially damaging change.
How It Looks
Rounding out vxfs's versatility is its variety of front ends. You can manage vxfs volumes through command-line utilities, text-based menus, or graphical means. The commands that manage vxfs volumes are a superset of the standard Unix storage-management commands. The mount command, for example, has new options for managing the intent log and for forcing the zeroing out of newly allocated data blocks. Text menus ease ad
ministration for manual-fearing users, and X Window-based graphical tools give you point-and-click access to all the functions of the Volume Manager.
In a few years, it's likely that most operating systems, large and small, will incorporate some of the benefits present in vxfs. For now, Veritas' efforts stand as an impressive benchmark against which other file systems should be judged. If you're managing critical data, you've got to understand how your operating system stores that data. If the file system you're using doesn't measure up, replace it.
illustration_link (24 Kbytes)
Volume Manager carves physical disks (disk01, disk02, disk03) into subdisks (disk01-01, disk02-03, etc.). These subdisks can then be combined into a volume that spans multiple physical drives (payroll). You can add or remove subdisks without altering existing data, and the file
system remains mounted and available.
Tom Yager writes about Unix and other subjects from his home in north Texas. You can reach him through the Internet at
tyager@maxx.lonestar.org
or on BIX c/o ``editors.''