Finding what you want on the tangled Web could be much easier with this new, powerful data architecture
Udo Flohr
Hypertext and hyperlinks sounded like magic when Ted Nelson proposed them in his 1974 book
Computer Lib
. When the World Wide Web brought real hyperlinking to the Internet in 1991, it seemed to be fulfilling that promise of near-universal access to disparate data.
But then the reality hit. An avalanche of servers, documents, and hyperlinks, compounded by the exponential growth in Web usage, has all but buried its usefulness for real work by any but the most determined. And the work needed to maintain a thriving Web site can become a nightmare, too. But that's nothing compared to the problems of organizing massive amounts of unstructured data on the Web. This calls for a n
ew approach.
The answer may be Hyper-G, a second-generation hypermedia information system that tries to combine the advantages of the Web, WAIS (Wide Area Information Service), and Gopher while minimizing their disadvantages. Hyper-G offers a number of direct advantages: a consistent search interface over a number of servers and services; the ability to know where you are in cyberspace; the power to not only access but contribute information; and the assurance of referential integrity (no dangling hyperlinks) and reliable links. Existing Web browsers can be clients of Hyper-G servers (they just won't get all the features), and Hyper-G browsers can be full-featured clients of normal Web servers.
Hyper-G was conceived at the Graz University of Technology, in Austria, by a team headed by Hermann Maurer and Frank Kappe; most development is still being done there. Some organizations with large Web servers are switching to Hyper-G, the best-known being the European Space Agency (ESA).
A
New Way to Manage Web Documents
Hyper-G represents an advance over the Web as we've known it because it provides real hypermedia. It supports tools for structuring, maintaining, and serving heterogeneous multimedia data. Hyper-G guarantees automatic hyperlink consistency, and it supports hyperlinks among multimedia documents, full-text retrieval, a Unix-like security system, and client gateways to Gopher and Web browsers such as Netscape, Mosaic, and MacWeb. Similarly, it includes seamless access to popular Internet server technologies, such as WAIS and Gopher.
Hyper-G clients access Hyper-G servers across the Internet, allowing users to view and to manipulate information in multiple ways. Advanced navigation tools help keep surfers from getting lost in hyperspace.
Links to More Data Types
In the Web, links are restricted to anchors in text and, to a lesser extent, in graphics. Hyper-G, however, supports anchors in many data types: graphics, sounds, 3-D
objects, PostScript documents, or video clips. Links in Hyper-G are bidirectional. Contrary to normal Web practice, links aren't stored in documents; they're stored in separate databases. This means links can be attached to any type of document, such as an MPEG file, and the document format doesn't have to know about the link or how it operates. For Web links, however, you would have to change the document itself to put a link from it to another. One more advantage is that links can be attached to read-only documents -- on CD-ROM, for example. Here are some other significant features:
-- Support for multilanguage documents
allows users to choose the language in which documents are presented.
-- Information Landscape offers
an interactive, 3-D representation of the database structure. Users can "fly" over the information hierarchy, represented as a virtual landscape. (
See the screen.
) The color and height of specific landmarks, for example
, represent document type and size. Two-dimensional maps are also standard. Any changes made to documents and databases are immediately reflected in both representations.
-- Documents have attributes
-- for example, author, keywords, and creation date -- that can be used in searches.
-- An underlying object-oriented database
ensures data consistency and integrity.
You can appreciate some of Hyper-G's features only if you use a generic Hyper-G browser. Currently, two are available: Amadeus for Microsoft Windows and Harmony for the X Window system (
see the screens
). A client application for the Macintosh will be available soon. Generic clients are not really meant to compete with Web clients; besides the advanced navigation features, the main reason for using a generic client is authoring capability, so you can modify documents.
Similar to a Web browser, a Hyper-G client contains a component that communicates with the server plu
s a number of internal viewers for various document types. In Harmony, these include text, image formats, video, audio, and PostScript. It can also handle highly complex 3-D scenes and models specified in a description format such as Virtual Reality Markup Language (VRML). External viewers can replace the native viewers.
A central component of Harmony's navigation aids is the session manager. It provides location feedback at all times, regardless of whether you reached an object by following a hyperlink, through a search, or by clicking on the local map.
For video, Hyper-G allows the definition of a link anchor that follows an object of interest in the video. In Harmony, these may be activated both during playback and when the video is paused.
Searches in the native clients, as well as from a Web or Gopher client, provide a full range of query refinement. You can do Boolean or fuzzy searches, as well as searches by attribute or content, and you can specify the scope. Results are provided as
a list, ranked by the server's estimate of each document's relevance.
Document Structure and Server Interaction
When we talk about Hyper-G's architecture, we really have to consider two separate architectures: the structure of the documents, and the way in which servers interact with one another. These are related, of course, because data can be distributed across multiple servers.
Hyper-G structures data hierarchically. The basic item of a Hyper-G database is a document cluster rather than a single document. (See the figure
"The Hyper-G Data Model"
.) This simplifies the implementation of such features as multiple languages or multiple graphic representations (a picture stored in different resolutions, for example).
Document clusters in Hyper-G are combined into collections. A collection can be part of one or more parent collections. This provides a big advantage: You can insert a document into a collection without having to first define the links
. This is impossible in the Web, where a document with no link is inaccessible; in Hyper-G, it's simply part of the collection structure.
Collections (and document clusters) can have attributes, which are searchable. Collections can span multiple Hyper-G servers, providing a unified view of distributed resources. All servers worldwide are members of a virtual "root collection" called Hyper Root.
A Hyper-G server keeps track of object attributes, arranges collections, and connects clients with the link database. It also encompasses three separate server components: the full-text server indexes text documents, the document Server manages documents, and the link server stores hyperlinks and ensures consistent link references. (See the figure
"The Client/Server Architecture of Hyper-G"
.)
Hyper-G uses an efficient, connection-oriented protocol. Unlike Web or Gopher clients, which usually talk to many different servers during a session, a Hyper-G client connects to a single serv
er. If documents from a remote server are needed, the local server fetches them and passes them along to the client. This approach has advantages: Users have to identify themselves to only one server, where accounts and access rights are maintained; the Hyper-G server, not the client, handles external protocols; and remote information can be cached in the local server. Users of commercial on-line services, such as America On-Line, get some of these benefits when they access the Web through their service, but only because the service has chosen to work that way. Hyper-G is completely decentralized and does its job regardless of which service provider the user has.
A server/server protocol ensures consistency across server boundaries in a distributed Hyper-G database, and a client/client protocol allows local Hyper-G browsers to communicate with one another via the server.
Hyper-G servers can store pointers to remote objects on Gopher and Web servers. Documents from those worlds are transformed into
Hyper-G representations -- Gopher menus become collections. Similar gateways to WAIS and FTP will be added soon.
When Web clients access Hyper-G servers, specified levels of the collection hierarchy and documents are converted into Hypertext Markup Language (HTML) documents on the fly, complete with links. Most other Hyper-G functions, such as identification, language selection, and searching, are also made available to the Web user. Advanced features, such as 3-D navigation or modifying documents from within the browser, do not map to the Web; they require a Hyper-G client.
Security Measures
Hyper-G has a sophisticated authorization mechanism. It specifies for each user the rights to read, link, modify, and annotate documents, and it supports anonymous users ("guests"). An administrator can assign access rights on a per-document or a per-collection basis. Each user has a home collection for storing pointers to resources and personal documents.
Hyper-G uses a hierar
chical access scheme. Authors may grant individuals or groups the right to read, write, link, or delete documents. Some Web browsers allow users to make annotations, but these are implemented in the browser software and stored locally. In Hyper-G, you can set up authorization classes for annotations, permitting private, group, or public annotations to the primary document.
No secure payment mechanism -- for transmitting credit card information, for example -- is available within Hyper-G yet, but the European Union is about to start a project that will bring high security standards to Hyper-G. This is a prerequisite for commercial applications.
Compatibility and Conversion
One of the really big questions about a project like Hyper-G is this: Can the Web as we know it, whatever its shortcomings, be successfully challenged by a newcomer? Compatibility is crucial, and Hyper-G's developers have gone to great lengths to ensure compatibility between Hyper-G and the Web. In fact,
you might never notice that your Web browser is accessing a Hyper-G server rather than a Web site.
To convert an existing Web system with a number of servers into a Hyper-G system, the migration path is straightforward enough. For example, an organization that maintains five different Web servers, each belonging to a different department, could convert them into five Hyper-G collections, which in turn would belong to a single collection for the organization as a whole. Using Hyper-G's authorization structure, modification rights would still remain with the individual departments, but users could now search for a particular piece of information across the database as a whole. No modifications to the original HTML documents would be required. People could still use their Web browsers to access the system, but they might eventually want to switch to Hyper-G client software to take advantage of Hyper-G's additional features.
Hyper-G's developers have used their experience from the Web and other large-
scale networked multimedia systems, incorporating into the basic Hyper-G all those features that have been recognized as indispensable but cannot be easily implemented on the Web. Hyper-G provides a uniform and controlled environment; while similar features might be implemented on top of the Web using external applications, this approach would eventually lead to differences between sites.
High Performance, Low Overhead
Not much hard data on Hyper-G's performance is available yet, but experience suggests that database size is almost irrelevant to the speed of searches, since pre-indexing is used. The campus server at Graz manages about 85,000 documents and 130,000 sessions per month, with an average of 300 simultaneous users. This server is a rather ordinary SparcStation 10/40 with 64 MB of RAM. The server runs at about the same speed on a Linux PC with a 100-MHz Pentium processor and 32 MB of RAM.
Until this fall, native Hyper-G documents had to be encoded in Hyper-G Text
Format (HTF). Now you can use version 3.0 of HTML, the Web's native formatting language. HTML will probably supercede HTF. Overhead for non-native clients (i.e., Web browsers) doesn't seem significant now and will become negligible as HTML 3.0 comes into use. It will also soon be possible to annotate documents from within a Web client.
It takes only a few hours to set up a Hyper-G server. The standard distribution package includes a semiautomatic installation procedure, as well as tools for inserting, modifying, and deleting objects in the database. These are most often used in scripts (e.g., using the Perl language) for mass insertion of data. One utility, called hifimport, can be used to c
reate stand-alone versions of a Hyper-G Interchange Format (HIF) file; this could, for example, be used for CD-ROM production. Another utility, called haradmin, helps maintain the user database.
The Future
The Gopher gateway (which makes Hyper-G accessible from Gopher clients such as GopherVR) will be enhanced to offer Gopher server administrators a migration path to Hyper-G. Also, there's a tool to import a Gopher server's data to a Hyper-G server. The University of Minnesota has embraced Hyper-G for the next generation of its Gopher information system and is working on a native Hyper-G client for the Macintosh.
Hyper-G isn't perfect. One of the problems seen so far is that clients like Harmony or Amadeus do not handle defective (or syntactically incorrect) HTML documents very well. In these cases, there can be problems getting a document up on the screen, and people would have more luck using a normal Web client.
Hyper-G is a stable, powerful, and, above all,
a
vailable
alternative to the World Wide Web that offers something no other Web-based technology does. It can organize the mass of unstructured data and unmanageable hyperlinks. Whether Hyper-G has a chance to make an impact in the decentralized and frankly anarchic environment of the Internet remains to be seen. But it's also possible that it could supersede the Web and we users won't even notice.
illustration_link (18 Kbytes)

In Hyper-G, documents can stand alone but are most often considered as document clusters, which can group different representations or versions of the same document. Clusters are themselves organized in
collections managed by the server. Searches can locate and access documents in the same collection, in parent collections, and in unconnected collections.
illustration_link (9 Kbytes)

Hyper-G's efficiency stems in part from its division of labor among servers. The Hyper-G server keeps track of object attributes, arranges collections, and connects clients with the link database. The full-text server indexes text documents, the document server manages documents, and the link server stores hyperlinks and ensures consistent link references. Clients and outside services, such as the Web and WAIS, see only th
e single Hyper-G server.
screen_link (55 Kbytes)

Harmony, the client for Unix/X Window systems, shows off Hyper-G's impressive navigation features.
In the
multiwindow screen shown on the left (a)
, the window at bottom right shows a search for "what's new," with a list of found objects of different types. Top right shows an object from that list, a text with anchors. Top left is the Harmony session manager, which pinpoints the position of the found object relative to the collection hierarchy. Behind it is the local map; it shows the link structure around the current document, which is high
lighted in all views.
In the
multiwindow screen shown on the right (b)
we can see that with its native clients, Hyper-G supports an even greater variety of document types than Mosaic does. Here we can see the different types: a film (bottom right window), a PostScript document (bottom left), and a 3-D object in Virtual Reality Markup Language (top right). Note the hypermedia links in the PostScript and 3-D objects. Links in films, images, and sounds are also possible (but not shown here).
screen_link (58 Kbytes)

Caution: low-flying searches. Harmony has 3-D navigation fe
atures, which the inventors of Hyper-G call Information Visualization.
The Information Landscape is at bottom left. The low blocks in the front are collections; behind them are the subcollections, with their documents on top of the corresponding block. Color indicates document type; height indicates size. Above it is a 2-D overview.
At top right is the selected text, while bottom right shows video with a clickable hypermedia anchor. The Information Landscape can also be enhanced with textures and patterns.
Udo Flohr is a science and technology journalist based in Hannover, Germany. He can be reached on the Internet by sending E-mail to
flohr@dfn.de
.