BRS/Search is a full-text retrieval system from Dataware Technologies (Cambridge, MA). It's available for numerous OSes, from Unix, MS-DOS, Windows, and NT, all the way up to Cray YMP supercomputers and IBM mainframes. Under development for 12 years, BRS/Search has 2000 installations worldwide, including large corporations such as Boeing and government institutions such as the U.S. DoD (Department of Defense).
The full in full-text retrieval refers to the fact that BRS/Search indexes every significant word of your documents. Thus, the full text is available for you to search on rather than just a few keywords. A user-created list of undesirable common words (e.g.,
and) defines what is considered significant. BRS/Search is also a free-text system in that it does not assume any fixed record structure, though it normally treats documents as being divisible into variable-length paragraphs, sentences, and words (the criteria for recognizing these divisions are also user-definable via a form file).
BRS/Search works by recording the location of every significant word in a document collection. When the British Library loads a batch of catalog entries into the system, BRS/Search creates a sorted dictionary containing every unique word it encounters, and from this it creates an inverted file that consists solely of pointers to the locations of each occurrence of every word in the dictionary. (There is actually a third level of indirection, because the inverted file points to a table of byte offsets of separate documents within a single compressed text file.)
BRS/Search uses proprietary compression techniques to squeeze the inverted file as well as the text itself,
so the whole indexed database is often only 20 percent to 50 percent larger than the uncompressed text (many full-text systems more than double the data volume). The use of an inverted file makes retrieval by multiple keywords and proximity searches fast, because the software needs only to compare their occurrence lists in the inverted file without searching the text file at all.
Two features of BRS/Search were important for the OPAC (On-line Public Access Catalog) application. First, it makes all its functionality available to outside software through a procedure-call interface. This fit perfectly with OPAC's RPC-based (remote procedure call) communications method. Second, the software allows databases to reside across multiple disk volumes.
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it
is
theoretical--and no language better exemplifies this than C++.
BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin,
and dozens of other CMP publications—bringing
you critical news and information about wireless communication,
computer security, software development, embedded systems,
and more!