2
Query Methods

This chapter describes the different query methods you can use in your ConText application. You can use these methods with text queries and theme queries. The following topics are covered:

Selecting a Query Method

Each of the query methods (two-step, one-step, and in-memory) provide advantages and disadvantages that you must consider when developing an application. The following table briefly describes each method and illustrates the various advantages and disadvantages to using each:

Query Method Use Advantage Disadvantage

One-step

Used in SQL*Plus. Best suited for interactive queries.

No pre-allocation of result tables

Uses standard SQL statements

Uses table and column names

Query results returned in a single step

Can retrieve all hits at once

Generally slower than two-step or in-memory queries

No access to result tables

Cannot use in PL/SQL applications

Two-step

Two-step queries are best suited for PL/SQL-based applications that require all the results to a query.

Result tables can be manipulated

Generally faster than one-step queries, especially for mixed queries

Can retrieve all hits at once

Query can include a structured condition

Requires pre-allocation of result tables

Uses policy names

Requires two steps to complete

Requires join to base text table to return document details

In-memory

In-memory queries are best suited for PL/SQL-based applications that might generate large hitlists, but where only a small portion of the hits are required at a time, such as World Wide Web applications.

No result tables

Faster response time than two-step, since you need not retrieve all hits in the hitlist.

Large hitlists generally faster than one-step and two-step queries

Can specify the number of hits returned

Query can include a structured condition

Uses policy names

Cannot retrieve all hits at once

With small hitlists, performance improvement over two-step is negligible

Requires three steps, including a loop, to complete

Max and first/next operators are not supported

Query Method	Use	Advantage	Disadvantage
One-step	Used in SQL*Plus. Best suited for interactive queries.	No pre-allocation of result tables Uses standard SQL statements Uses table and column names Query results returned in a single step Can retrieve all hits at once	Generally slower than two-step or in-memory queries No access to result tables Cannot use in PL/SQL applications
Two-step	Two-step queries are best suited for PL/SQL-based applications that require all the results to a query.	Result tables can be manipulated Generally faster than one-step queries, especially for mixed queries Can retrieve all hits at once Query can include a structured condition	Requires pre-allocation of result tables Uses policy names Requires two steps to complete Requires join to base text table to return document details
In-memory	In-memory queries are best suited for PL/SQL-based applications that might generate large hitlists, but where only a small portion of the hits are required at a time, such as World Wide Web applications.	No result tables Faster response time than two-step, since you need not retrieve all hits in the hitlist. Large hitlists generally faster than one-step and two-step queries Can specify the number of hits returned Query can include a structured condition	Uses policy names Cannot retrieve all hits at once With small hitlists, performance improvement over two-step is negligible Requires three steps, including a loop, to complete Max and first/next operators are not supported

Using Two-Step Queries

To perform a two-step query, do the following:

Execute CTX_QUERY.CONTAINS. The procedure selects all documents that match the specified search criteria (query expression) and generates a score for each document.

The document textkeys and scores are stored in the specified result table.

Note:

You must create the result table before you execute the CONTAINS procedure.

Use a SELECT statement on the result table (and the base text table, if desired) to return the specified columns as a hitlist for the rows (documents) that satisfy the query expression.

Two-Step Query Example

The following example shows a simple two-step query. The query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum. The CONTAINS procedure populates the CTX_TEMP results table with the document primary keys that satisfy the query.

The select statement then joins the results in CTX_TEMP with TEXTAB to create a list of document titles ordered by score.

Note that before the two-step query example is executed, the result table, CTX_TEMP, is created:

create table CTX_TEMP(
    textkey varchar2(64),
    score number,
    conid number);

execute ctx_query.contains('ARTICLE_POLICY','petroleum','CTX_TEMP')

SELECT SCORE, title 
FROM CTX_TEMP, TEXTTAB 
WHERE texttab.PK=ctx_temp.textkey 
ORDER BY SCORE DESC;

In this example, the articles with the highest scores appear first in the hitlist because the results are sorted by score in descending order.

Scoring

In a two-step query, the score results generated by the CONTAINS procedure are physically stored in a result table that has been allocated (either by the application developer or dynamically within the application).

If you want to include scores in the hitlist returned by a two-step query, select the from the result table in the second step of the query.

Note:

The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries.

To learn more about how ConText calculates relevance score for text queries, see Appendix B, "Scoring Algorithm".

To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4, "Theme Queries".

Hitlist Result Tables

In two-step queries, ConText uses result tables called hitlist tables to store intermediate results. Intermediate results can be merged into the standard SQL query through a join operation or a sub-query operation. The result tables must be created before the query is performed. A hitlist table can be created manually or allocated through the CTX_QUERY.GETTAB procedure.

Hitlist tables can be named anything; however, they must have the following structure:

Column Name Column Datatype Purpose

TEXTKEY

VARCHAR2(64)

Stores textkeys of the rows satisfying the query

SCORE

NUMBER
Stores the score for each row (document)

CONID

NUMBER

Stores the CONTAINS ID when multiple CONTAINS procedures utilize the same result table

Column Name	Column Datatype	Purpose
TEXTKEY	VARCHAR2(64)	Stores textkeys of the rows satisfying the query
SCORE	NUMBER	Stores the score for each row (document)
CONID	NUMBER	Stores the CONTAINS ID when multiple CONTAINS procedures utilize the same result table

See Also:

For more information about the structure of the hitlist result tables, see "Hitlist Table Structure" in Appendix A, "Result Tables".

Sharing a Hitlist Result Table

For applications that support multiple concurrent users, ConText allows for sharing a single result table among all the users rather than allocating a separate table for each user.

You control sharing of result tables with the sharelevel and the query_id parameters of the CTX_QUERY.CONTAINS procedure. If the result table is shared, the CONTAINS procedure must specify that sharelevel is equal to one and include a unique query_id so that each result can be distinguished from others in the result table.

When sharelevel is equal to 0:

the hitlist result table is intended for exclusive use
ConText truncates the hitlist result table at the start of each query
after the query is completed, CONID values are NULL

When sharelevel is equal to 1 then:

the hitlist result table is intended for shared use
specify a unique number for query_id in the CONTAINS procedure to identify which entries belong to you in the hitlist result table. This number will be assigned to the CONID for each row in the result table generated by the query.
before the query is run, you must delete existing rows in the result table with the same query_id as that specified in the CONTAINS procedure
after the query is complete, the CONID column for all rows returned by the query contains the query_id specified in the CONTAINS procedure
select your rows by specifying the appropriate CONID in the WHERE clause of the SELECT statement

Attention:

ConText does not verify that these rules are observed. You must control multiple concurrent usage by passing a different query_id to the requestor if the result table is shared.

Composite Textkey Result Tables

When you execute a two-step query on a table with a composite textkey, the number of textkey columns in the result table must match the composite keys count in the document table. For example, if you want to execute a query on a document table that has a two-column textkey, create a result table with the following schema: TEXTKEY, TEXTKEY2, SCORE, CONID.

The following SQL*Plus examples show two different ways in which to create a result table with a two-column composite textkey:

/* create composite textkey result table manually */
        create table ctx_temp(
                       textkey varchar2(64),
                textkey2 varchar2(64),
                score number,
                conid number);

/* allocate composite textkey result table with CTX_QUERY.GETTAB() */
exec ctx_query.gettab(CTX_QUERY.HITTAB, :hit_tab, 2)

See Also:

For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A, "Result Tables".

SELECT from a Pre-defined View

There is an alternative to the second step of a two-step query. Rather than joining the result table and text table in a SELECT statement, you can create a view to perform the join. Then use a SELECT statement to select the appropriate rows from that view. Use this approach when the development tool does not allow tables to be joined in a SELECT statement (e.g. Oracle Forms).

For example:

        CREATE VIEW SURVEY AS SELECT * FROM TEXTTAB, CTX_TEMP
        WHERE PK = TEXTKEY;


        SELECT SCORE, AUTHOR FROM SURVEY
        ORDER BY SCORE DESC;

In this example:

The CREATE VIEW statement joins the table of articles (TEXTTAB) and the result table (CTX_TEMP). The PK column holds the primary key of the documents.
The SELECT statement retrieves the scores from the view.

Composite Textkey Queries

To execute a two-step query on a table with a composite textkey, you first specify the multiple textkey columns when you create the policy for the text column.

See Also:

For more information about creating policies for composite textkey tables, see Oracle8 ConText Cartridge Administrator's Guide.

In addition, before the two-step query, create a result table in which the number of TEXTKEY columns match the number of columns in the composite textkey in the document table. You can create the result table manually or using the CTX_QUERY.GETTAB procedure.

See Also:

For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A, "Result Tables".

For example, to create a result table manually with a composite textkey consisting of two columns, issue the following SQL statement:

create table CTX_TEMP2(
     textkey varchar2(64),
     textkey2 varchar2(64),
     score number,
     conid number);

In the two-step query, use the AND operator in the WHERE condition when you join the result and text tables. For example:

exec ctx_query.contains('ARTICLE2_POLICY','petroleum','CTX_TEMP2')
SELECT SCORE, title 
FROM CTX_TEMP2, TEXTTAB2
WHERE texttab2.PK=ctx_temp2.textkey AND
      texttab2.PK2=ctx_temp2.textkey2
ORDER BY SCORE DESC;

Structured Queries

A structured query is a query based on a text column and a structured data column. The structured data column is usually in the same table as the text column. For example, you might use a structured query to retrieve documents on a certain subject that were written after a certain date, where the document content is in a text column and date information is in a structured data column.

The CTX_QUERY.CONTAINS procedure provides an additional parameter, struct_query, for specifying the WHERE condition in a structured query. For example, to select all news articles that contain the word Oracle that were written on or after October 1st, 1996, you might use:

exec ctx_query.contains('news_text','Oracle','res_tab', 
struct_query => 'issue_date >= (''1-OCT-1996'')')

Note:

Because the struct_query parameter expects a WHERE condition, you can specify a subquery. This is useful when the structured data column is in another table.

Executing a structured query with the struct_query parameter improves performance over processing a query on a text column and then refining the hitlist by applying a where condition against a structured column. This is especially so when the selectivity of the WHERE condition is high, because when you use the structured query parameter, the ConText server executes the entire query without first writing out a potentially large hitlist to be refined later by the Oracle server.

Note:

If the user who includes a structured query in a two-step query is not the owner of the table containing the structured and text columns, the user must have SELECT privilege with GRANT OPTION on the table. In addition, if the object being queried is a view, the user must have SELECT privilege with GRANT OPTION on the base table for the view. SELECT privilege with GRANT OPTION can be granted to a user using the GRANT command in SQL.

For more information, see Oracle8 SQL Reference.

Querying Columns in Remote Databases

If a database link has been created for a remote database, two-step queries support querying text columns in the remote database.

Note:

Database links are created using the CREATE DATABASE LINK command in SQL.

For more information about creating database links, see Oracle8 SQL Reference.

To perform a two-step query for a text column in a remote database, specify the database link for the remote database in the CONTAINS procedure as part of the policy for the column in the remote database.

In addition, the result table specified in CONTAINS must exist in the remote database, and you, the user performing the query, must have the appropriate privileges on the result table.

For example:

exec ctx_query.contains('MY_POL@DB1', 'petroleum','CTX_TEMP')

In this example, MY_POL exists in a remote database identified by the database link DB1. The CTX_TEMP result table exists in the same remote database.

See Also:

For more information about remote queries and distributed databases, see Oracle8 Concepts.

Two-Step Queries in Parallel

The CONTAINS procedure provides an argument for processing two-step queries in parallel. Processing queries in parallel helps balance the load between ConText servers and might improve query performance.

When the CONTAINS procedure is called in a two-step query, the PARALLEL argument can be used to specify the number of ConText servers, up to the total number of ConText servers running with the Query personality, that are used to process two-step queries and write the results to the result table.

For example:

exec ctx_query.contains('ARTICLE_POLICY','petroleum', 'CTX_TEMP', parallel=>2)

In this example, the text column in the ARTICLE_POLICY policy is queried for documents that contain the term petroleum. The query is processed in parallel by any two available ConText servers with the Query personality and the results are written to CTX_TEMP.

Using One-Step Queries

The one-step query uses the CONTAINS and SCORE functions in a SQL statement to execute a user's request for documents. Rows and columns containing the text and structured data for relevant documents are returned to the application program as a record set like any other query in SQL.

Note:

Before one-step queries can be executed, the database in which the text resides must be text enabled by setting the ConText initialization parameter TEXT_ENABLE = TRUE. This can be done by either setting it in the initsid.ora system initialization file, or by using the ALTER SESSION command.

For more information about initialization parameters and the initsid.ora file, see Oracle8 Administrator's Guide.

For more information about using the ALTER SESSION command, see Oracle8 SQL Reference.

One-Step Query Processing

After a user has submitted a one-step query, ConText performs the following tasks to return the results to the user:

The query is placed on the text queue (query pipe). The Oracle server intercepts the query and passes the text portion (CONTAINS) to ConText.
A ConText server with the Query personality picks up the text portion of the query, processes the CONTAINS function(s) and stores the results in an internal table created automatically for the user who submitted the query. This table (and the corresponding intermediate results) are not available to the application.
The ConText server rewrites the query as a standard SQL statement and passes it back to Oracle.
The rewritten query is executed by an Oracle server and the results are returned to the user.
The internal result table is truncated.

One-Step Query Example

The following SELECT statement shows a simple one-step query. This query searches a text table called TEXTTAB for any articles that contain the word petroleum.

SELECT *
FROM texttab 
WHERE CONTAINS (text, 'petroleum') > 0;

Because ConText functions execute within normal SQL statements, all of the capabilities for selecting and querying normal structured data fields, as well as text, are available. For instance, in the example, if the text table had a column listing the date the article was published, the user could select articles based on that date as well as the content of the text column.

Note:

The asterisk wildcard character ( *) in specifies that the record set returned by the query includes all the columns of the text table for the selected documents, as well as the scores generated for each document. If a query has more than one CONTAINS function, the asterisk wildcard does not return scores for the multiple CONTAINS and the SCORE function must be called explicitly. See "Scoring" in this chapter for an example.

Multiple CONTAINS

One-step queries support calling more than one CONTAINS functions in the WHERE clause of a SELECT statement. Multiple CONTAINS can be used in a one-step query to perform queries on multiple text columns located either in the same table or in separate tables.

If multiple ConText servers with the Query personality are running and a one-step query with multiple CONTAINS is executed, the query is processed in parallel. Each CONTAINS function is evaluated by one of the available ConText servers and the results from the servers are combined before they are returned to the user.

Suggestion:

If your application makes use of multiple CONTAINS in one-step queries, ensure that multiple ConText servers with the Query personality are running to optimize query performance. The number of ConText servers should be at least equal to the number of CONTAINS you support in one-step queries for the application.

Scoring

In a one-step query, the document scores are generated by the CONTAINS function and returned by the SCORE function.

Each CONTAINS function in a query produces a separate score. When there are multiple CONTAINS functions, each CONTAINS function must have a label (a number) so the SCORE value can be identified in other clauses of the SELECT statement.

The SCORE function can be used in the SELECT statement to order a hitlist as follows:

SELECT SCORE (10), title FROM DOCUMENTS
WHERE CONTAINS (TEXT, 'dog', 10) > 0
ORDER BY SCORE(10) DESC;

Note:

The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries.

To learn more about how ConText calculates relevance score for text queries, see Appendix B, "Scoring Algorithm".

To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4.

Restrictions

The CONTAINS function can only appear in the WHERE clause of a SELECT statement.

You cannot issue the CONTAINS function in the WHERE clause of an UPDATE, INSERT or DELETE statement.

Multiple Policies

For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause using the pol_hint parameter.

You might create two policies for a column when you want to perform both theme and text queries on the column, or in any application where you build two separate indexes for a text column.

See Also:

For more information on issuing one-step queries with multiple policies see "Theme Query Examples" in Chapter 4.

To learn more about using the pol_hint parameter, see the specification for the SELECT Statement in Chapter 9.

Composite Textkey Queries

You can perform one-step queries on text tables with composite textkeys. The syntax for the query is the same as the syntax for a query on a table with a single-column textkey.

Querying Columns in Remote Databases

If a database link has been created for a remote database, one-step queries support querying text columns in the remote database.

To perform a one-step query for a text column in a remote database, the database link for the remote database is specified as part of the table name in the SELECT clause.

For example:

SELECT * 
FROM texttab@db1 
WHERE CONTAINS (text, 'petroleum') > 0;

In this example, texttab exists in a remote database identified by the database link DB1

Note:

One-step queries do not support querying LONG and LONG RAW columns in remote database tables.

For more information about creating database links, see Oracle8 SQL Reference.

For more information about remote queries and distributed databases, see Oracle8 Concepts.

Using In-Memory Queries

In-memory queries use a buffer and a cursor to return query results. Returning query results to a buffer in memory improves performance over writing and reading query results to and from database result tables, which is typical of one- and two-step queries.

To perform an in-memory query, do the following:

Call the CTX_QUERY.OPEN_CON function. OPEN_CON performs the following operations:

opens a cursor to the query buffer
queries a text column using the specified policy and query expression
stores in the query buffer the document textkeys and scores for all the documents that meet the search criteria. Hits are stored in order that they are returned or ranked by score, depending on the argument specified for OPEN_CON

In addition, you can specify that OPEN_CON return additional columns (up to five) for the selected documents from the text table.

Call the CTX_QUERY.FETCH_HIT function for each textkey in the buffer to fetch the desired query results, one hit at a time, until the desired number of hits has been returned or no hits remain in the buffer.
Call the CTX_QUERY.CLOSE_CON procedure to release the cursor opened by OPEN_CON.

In-Memory Query Example

The following example shows a simple in-memory query. This query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum.

declare 
  score  char(5); 
  pk     char(5); 
  curid  number; 
  title  char(256);

begin 
  dbms_output.enable(100000); 
  curid := ctx_query.open_con(
                        policy_name  =>  'ARTICLES_POL',
                        text_query   =>  'petroleum',
                        score_sorted =>  true, 
                        other_cols   =>  'title');
  while (ctx_query.fetch_hit(curid, pk, score, title)>0) 
   loop 
    dbms_output.put_line(score||pk||substr(title,1,50));
   end loop; 
  ctx_query.close_con(curid);
end;

In this example, the TITLE column from the table is also returned by OPEN_CON, so a variable must be declared for TITLE.

DBMS_OUTPUT.ENABLE sets the buffer size to the maximum of 100000 bytes (1 Mb) to ensure that the buffer is large enough to hold the results of the query.

The SCORE_SORTED argument in OPEN_CON is set to true which causes OPEN_CON to store the hits in the query buffer in descending order by score.

FETCH_HIT is called in a loop to fetch SCORE, PK, and TITLE for each hit until a value less than zero is returned, indicating that the buffer is empty.

DBMS_OUTPUT.PUT_LINE prints the results to the standard output.

See Also:

For more information about the DBMS_OUTPUT PL/SQL package, see Oracle8 Application Developer's Guide.

In-Memory Queries and Composite Textkeys

You can perform in-memory queries on text tables that have multiple column textkeys. When you use CTX_QUERY.FETCH_HIT to retrieve each hit from the buffer, the PK argument is returned as an encoded string. To access an individual textkey, you must use CTX_QUERY.PKDECODE.

In-Memory Query Limitations

In-memory queries have the following limitation:

Max and First/Next Operators

You cannot use the max and first/next operators with in-memory queries.

Querying Columns in Remote Databases

If a database link has been created for a remote database, in-memory queries support querying text columns in the remote database.

Note:

Database links are created using the CREATE DATABASE LINK command in SQL.

For more information about creating database links, see Oracle8 SQL Reference.

To perform an in-memory query for a text column in a remote database, the database link for the remote database is specified in the CTX_QUERY.OPEN_CON procedure as part of the policy for the column in the remote database.

In addition, the result table specified in CTX_QUERY.CONTAINS must exist in the remote database and the user performing the query must have the appropriate privileges on the result table.

See Also:

For more information about remote queries and distributed databases, see Oracle8 Concepts.

Counting Query Hits

In addition to two-step, one-step, and in-memory queries, you can count the number of hits in a two-step or in-memory query. Counting query hits helps to audit queries to ensure large and unmanageable hitlists are not returned.

You can count the number of hits before or after you issue the query using one of the following functions:

CTX_QUERY.COUNT_HITS
CTX_QUERY.COUNT_LAST

Using COUNT_HITS Before the Query

Before you issue a two-step or in-memory query, you can use the CTX_QUERY.COUNT_HITS function to return the number of hits for the query without generating scores for the hits or returning the textkeys for the documents.

COUNT_HITS can be called in two modes, estimate and exact. The results in estimate mode may be inaccurate; however, the results are generally returned faster than in exact mode.

See Also:

CTX_QUERY.COUNT_HITS in Chapter 10.

Using COUNT_LAST After the Query

You can use the CTX.QUERY.COUNT_LAST function to obtain the number of hits in a two-step query and in-memory query after issuing CONTAINS or OPEN_CON.

COUNT_LAST returns the number of hits obtained from the last call to CTX_QUERY.CONTAINS or CTX_QUERY.OPEN_CON.

For two-step queries, the time it takes to issue the query with CONTAINS and then to call COUNT_LAST is not as fast as calling COUNT_HITS before the query. However, in the case where you need to process all hits in a two-step query, issuing the query with CONTAINS and then calling COUNT_LAST is more efficient than calling COUNT_HITS and then calling CONTAINS.

With in-memory queries, issuing OPEN_CON and then calling COUNT_LAST is always a more efficient way to obtain an estimate of the query hits over calling COUNT_HITS and then calling OPEN_CON, since COUNT_LAST returns a number faster than COUNT_HITS.

See Also:

CTX.QUERY.COUNT_LAST in Chapter 10.

2 Query Methods

Selecting a Query Method

Using Two-Step Queries

Two-Step Query Example

Scoring

Hitlist Result Tables

Sharing a Hitlist Result Table

Composite Textkey Result Tables

SELECT from a Pre-defined View

Composite Textkey Queries

Structured Queries

Querying Columns in Remote Databases

Two-Step Queries in Parallel

Using One-Step Queries

One-Step Query Processing

One-Step Query Example

Multiple CONTAINS

Scoring

Restrictions

Multiple Policies

Composite Textkey Queries

Querying Columns in Remote Databases

Using In-Memory Queries

In-Memory Query Example

In-Memory Queries and Composite Textkeys

In-Memory Query Limitations

Max and First/Next Operators

Querying Columns in Remote Databases

Counting Query Hits

Using COUNT_HITS Before the Query

Using COUNT_LAST After the Query

2
Query Methods