Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesThe BRS Text-Retrieval System


May 1995 / Solutions Focus / The British Library's Catalog Is On-Line / The BRS Text-Retrieval System

BRS/Search is a full-text retrieval system from Dataware Technologies (Cambridge, MA). It's available for numerous OSes, from Unix, MS-DOS, Windows, and NT, all the way up to Cray YMP supercomputers and IBM mainframes. Under development for 12 years, BRS/Search has 2000 installations worldwide, including large corporations such as Boeing and government institutions such as the U.S. DoD (Department of Defense).

The full in full-text retrieval refers to the fact that BRS/Search indexes every significant word of your documents. Thus, the full text is available for you to search on rather than just a few keywords. A user-created list of undesirable common words (e.g., and) defines what is considered significant. BRS/Search is also a free-text system in that it does not assume any fixed record structure, though it normally treats documents as being divisible into variable-length paragraphs, sentences, and words (the criteria for recognizing these divisions are also user-definable via a form file).

BRS/Search works by recording the location of every significant word in a document collection. When the British Library loads a batch of catalog entries into the system, BRS/Search creates a sorted dictionary containing every unique word it encounters, and from this it creates an inverted file that consists solely of pointers to the locations of each occurrence of every word in the dictionary. (There is actually a third level of indirection, because the inverted file points to a table of byte offsets of separate documents within a single compressed text file.)

BRS/Search uses proprietary compression techniques to squeeze the inverted file as well as the text itself, so the whole indexed database is often only 20 percent to 50 percent larger than the uncompressed text (many full-text systems more than double the data volume). The use of an inverted file makes retrieval by multiple keywords and proximity searches fast, because the software needs only to compare their occurrence lists in the inverted file without searching the text file at all.

Two features of BRS/Search were important for the OPAC (On-line Public Access Catalog) application. First, it makes all its functionality available to outside software through a procedure-call interface. This fit perfectly with OPAC's RPC-based (remote procedure call) communications method. Second, the software allows databases to reside across multiple disk volumes.


Up to the Solutions Focus section contentsGo to previous article: Handling Special Character SetsSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network