onal databases could not be searched natively using SQL.Such dual-system solutions are expensive, often requiring custom middleware to coordinate data changes and queries between the text-search engine and the database system.
Oracle resolved this dilemma with its ConText technology, which makes text a native data type of an Oracle RDBMS. This eliminates the costly dual-system approach, since the same scalable and secure relational database houses both the structured data and the text. It also provides a un
ified interface where textual information can be rapidly searched and retrieved through SQL queries, often combined with associated relational data.
Getting There from Here
Oracle's ConText Cartridge enables full text retrieval within Oracle's universal data server. ConText accomplishes this by processing text into a format usable by an RDBMS and accessible via SQL (
see the figure
). It also provides linguistic capabilities that deal with the ambiguities and language dependencies of text. The term
Cartridge
indicates that the ConText engine is a software module that extends a database's capabilities to manage the new text data type.
The ConText Cartridge supports several different document-storage methods. The simplest one is the
direct data store
, which enables text storage in a
text column
of a database, with one document -- or any logical body of text -- per row. Depending on the column types chosen, each row can contain a text string as brief
as a few names or as long as a book containing hundreds of thousands of pages.
Another storage method involves keeping the documents in a file system or anywhere on the Web. In this type of situation, the text column stores only the file-system or URL pointers to those documents. ConText searches and retrieves the text contents at the end of these pointers. While storing text in the database provides more robust data integrity, keeping files on the Web or in the file system may be more expedient. This feature is especially useful for constructing intranet search applications without having to copy existing Web pages to a central location.
As the database stores the text document, ConText builds a text index that contains critical information about the document's contents, as illustrated
in the figure
. Multiple ConText servers can run in parallel to reduce the index-generation time for large documents. The index is stored and managed as a set of relational tables in the databa
se, which enables ConText to search and retrieve text information as rapidly as it can retrieve relational data.
The index's relational tables contain information about every word contained in the documents, such as which documents a certain word comes from, how many times it occurs in each document, what other words it's near, and by how much. Because text searches execute against these indexes (rather than directly against the full text), they can be as rapid as relational queries.
The generation of a text index is governed by a user-designated or default text
policy
supplied with ConText. The policy tells ConText how to index the text. A group of user preferences defines the characteristics of a given index. For example, there are preferences for languages (e.g., English or Japanese), document format (e.g.,Word or WordPerfect), document-location information (e.g., in the RDBMS or pointed to by a URL), and so on. The defaults are usually sufficient, but you can create highly customized po
licies for special search requirements.
Because ConText Cartridge is integrated into the Oracle universal data server, all development and administration tasks can be accomplished through the use of straightforward SQL and PL/SQL calls. Oracle's SQL engine handles ConText queries, while ConText background processes automatically update text indexes and handle other administrative tasks. Developers don't need to use special API calls to manage text, and they can take advantage of the data server's capabilities, such as scalability and security.
Using ConText
The primary method of accessing ConText is through the
CONTAINS
function, which enables SQL to query the contents of a document. This function can be included in a
WHERE
clause that also includes conditions that are based on structured fields. Here's a sample SQL call that uses this function:
SELECT title
FROM docs_tbl
WHERE organization = 'Support'
AND CONTAINS(docs_tbl.text, 'WebServer') > 0;
The column
docs_tbl.text
holds the documents that you wish to search. Based on the text-query expression, the
CONTAINS
function returns a numeric "score" that indicates the degree to which a given document matches the expression. In this example, the expression requests an exact match on the word
WebServer
; the score will be higher for those documents in which
WebServer
appears the most times.
The SQL call above simply returns the title of every document in which the target word
WebServer
appears at least once, given that the document also meets the structured field condition (
organization = 'Support'
). When the user chooses from the title list, the application then retrieves and displays the specified document.
To rank the results of a query according to the score, the
SCORE
function can be used, as in the following example:
SELECT SCORE(0), title
FROM docs_tbl
WHERE organization = 'Support'
AND CONTAINS(docs_tbl.text, '
WebServer',0) > 0
ORDER BY SCORE(0)
This call returns an ordered list of documents. (The argument to
SCORE
is simply an index, which is included as the third argument to the
CONTAINS
function. It places no restriction on the rows returned.) For more complex ConText searches, the text-query expression can get more complicated, and the SQL call might have multiple
CONTAINS
clauses.
ConText has been designed to easily upgrade existing Oracle-based applications with text-search capability, thus offering users direct access to text that's already stored in Oracle databases. Electronic-commerce applications that manage quantities, prices, and product-ID numbers can be upgraded to include searches on text descriptions of content or to offer the text data itself for sale.
illustration_link (22 Kbytes)

ConText eliminates the dual-system design often used to integrate a text search with relational data.
illustration_link (28 Kbytes)

Text itself (or file/URL pointers to text files) can be stored in database columns.
Ann O'Leary, a senior manager at Oracle, is responsible for marketing the Oracle universal data server and the ConText Cartridge. She can be reached at
aoleary@us.oracle.com
.