Oracle8
ConText Cartridge Application Developer's Guide
Release 2.4 A63821-01 |
|
This chapter describes the different query methods you can use in your ConText application. You can use these methods with text queries and theme queries. The following topics are covered:
Each of the query methods (two-step, one-step, and in-memory)
provide advantages and disadvantages that you must consider when developing
an application. The following table briefly describes each method and illustrates
the various advantages and disadvantages to using each:
To perform a two-step query, do the following:
The document textkeys and scores are stored in the specified
result table.
Note: You must create the result table before you execute the CONTAINS procedure. |
The following example shows a simple two-step query. The
query uses a policy named ARTICLES_POL to search the text column in a table
named TEXTTAB for any articles that contain the word petroleum.
The CONTAINS procedure populates the CTX_TEMP results table with the document
primary keys that satisfy the query.
The select statement then joins the results in CTX_TEMP with
TEXTAB to create a list of document titles ordered by score.
Note that before the two-step query example is executed, the result table, CTX_TEMP, is created:
create table CTX_TEMP( textkey varchar2(64), score number, conid number); execute ctx_query.contains('ARTICLE_POLICY','petroleum','CTX_TEMP') SELECT SCORE, title FROM CTX_TEMP, TEXTTAB WHERE texttab.PK=ctx_temp.textkey ORDER BY SCORE DESC;
In this example, the articles with the highest scores appear
first in the hitlist because the results are sorted by score in descending
order.
In a two-step query, the score results generated by the CONTAINS
procedure are physically stored in a result table that has been allocated
(either by the application developer or dynamically within the application).
If you want to include scores in the hitlist returned by
a two-step query, select the from the result table in the second step of
the query.
Note: The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries. To learn more about how ConText calculates relevance score for text queries, see Appendix B, "Scoring Algorithm". To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4, "Theme Queries". |
In two-step queries, ConText uses result tables called hitlist
tables to store intermediate results. Intermediate results can be merged
into the standard SQL query through a join operation or a sub-query operation.
The result tables must be created before the query is performed. A hitlist
table can be created manually or allocated through the CTX_QUERY.GETTAB
procedure.
Hitlist tables can be named anything; however, they must
have the following structure:
See
Also:
For more information about the structure of the hitlist result tables, see "Hitlist Table Structure" in Appendix A, "Result Tables". |
For applications that support multiple concurrent users,
ConText allows for sharing a single result table among all the users rather
than allocating a separate table for each user.
You control sharing of result tables with the sharelevel
and the query_id parameters of the CTX_QUERY.CONTAINS
procedure. If the result table is shared, the CONTAINS procedure must specify
that sharelevel is equal to one and include a unique query_id
so that each result can be distinguished from others in the result table.
When sharelevel is equal to 0:
When sharelevel is equal to 1 then:
Attention: ConText does not verify that these rules are observed. You must control multiple concurrent usage by passing a different query_id to the requestor if the result table is shared. |
When you execute a two-step query on a table with a composite
textkey, the number of textkey columns in the result table must match the
composite keys count in the document table. For example, if you want to
execute a query on a document table that has a two-column textkey, create
a result table with the following schema: TEXTKEY, TEXTKEY2, SCORE, CONID.
The following SQL*Plus examples show two different ways in which to create a result table with a two-column composite textkey:
/* create composite textkey result table manually */ create table ctx_temp( textkey varchar2(64), textkey2 varchar2(64), score number, conid number); /* allocate composite textkey result table with CTX_QUERY.GETTAB() */ exec ctx_query.gettab(CTX_QUERY.HITTAB, :hit_tab, 2)
See
Also:
For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A, "Result Tables". |
There is an alternative to the second step of a two-step
query. Rather than joining the result table and text table in a SELECT
statement, you can create a view to perform the join. Then use a SELECT
statement to select the appropriate rows from that view. Use this approach
when the development tool does not allow tables to be joined in a SELECT
statement (e.g. Oracle Forms).
For example:
CREATE VIEW SURVEY AS SELECT * FROM TEXTTAB, CTX_TEMP WHERE PK = TEXTKEY; SELECT SCORE, AUTHOR FROM SURVEY ORDER BY SCORE DESC;
In this example:
To execute a two-step query on a table with a composite textkey,
you first specify the multiple textkey columns when you create the policy
for the text column.
See
Also:
For more information about creating policies for composite textkey tables, see Oracle8 ConText Cartridge Administrator's Guide. |
In addition, before the two-step query, create a result table
in which the number of TEXTKEY columns match the number of columns in the
composite textkey in the document table. You can create the result table
manually or using the CTX_QUERY.GETTAB procedure.
See
Also:
For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A, "Result Tables". |
For example, to create a result table manually with a composite textkey consisting of two columns, issue the following SQL statement:
create table CTX_TEMP2( textkey varchar2(64), textkey2 varchar2(64), score number, conid number);
In the two-step query, use the AND operator in the WHERE condition when you join the result and text tables. For example:
exec ctx_query.contains('ARTICLE2_POLICY','petroleum','CTX_TEMP2') SELECT SCORE, title FROM CTX_TEMP2, TEXTTAB2 WHERE texttab2.PK=ctx_temp2.textkey AND texttab2.PK2=ctx_temp2.textkey2 ORDER BY SCORE DESC;
A structured query is a query based on a text column and
a structured data column. The structured data column is usually in the
same table as the text column. For example, you might use a structured
query to retrieve documents on a certain subject that were written after
a certain date, where the document content is in a text column and date
information is in a structured data column.
The CTX_QUERY.CONTAINS procedure provides an additional parameter, struct_query, for specifying the WHERE condition in a structured query. For example, to select all news articles that contain the word Oracle that were written on or after October 1st, 1996, you might use:
exec ctx_query.contains('news_text','Oracle','res_tab', struct_query => 'issue_date >= (''1-OCT-1996'')')
Note: Because the struct_query parameter expects a WHERE condition, you can specify a subquery. This is useful when the structured data column is in another table. |
Executing a structured query with the struct_query
parameter improves performance over processing a query on a text column
and then refining the hitlist by applying a where condition against a structured
column. This is especially so when the selectivity of the WHERE condition
is high, because when you use the structured query parameter, the ConText
server executes the entire query without first writing out a potentially
large hitlist to be refined later by the Oracle server.
Note: If the user who includes a structured query in a two-step query is not the owner of the table containing the structured and text columns, the user must have SELECT privilege with GRANT OPTION on the table. In addition, if the object being queried is a view, the user must have SELECT privilege with GRANT OPTION on the base table for the view. SELECT privilege with GRANT OPTION can be granted to a user using the GRANT command in SQL. For more information, see Oracle8 SQL Reference. |
If a database link has been created for a remote database,
two-step queries support querying text columns in the remote database.
Note: Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 SQL Reference. |
To perform a two-step query for a text column in a remote
database, specify the database link for the remote database in the CONTAINS
procedure as part of the policy for the column in the remote database.
In addition, the result table specified in CONTAINS must
exist in the remote database, and you, the user performing the query, must
have the appropriate privileges on the result table.
For example:
exec ctx_query.contains('MY_POL@DB1', 'petroleum','CTX_TEMP')
In this example, MY_POL exists in a remote database identified
by the database link DB1. The CTX_TEMP result table exists in the same
remote database.
See
Also:
For more information about remote queries and distributed databases, see Oracle8 Concepts. |
The CONTAINS procedure provides
an argument for processing two-step queries in parallel. Processing queries
in parallel helps balance the load between ConText servers and might improve
query performance.
When the CONTAINS procedure is called in a two-step query,
the PARALLEL argument can be used to specify the number of ConText servers,
up to the total number of ConText servers running with the Query personality,
that are used to process two-step queries and write the results to the
result table.
For example:
exec ctx_query.contains('ARTICLE_POLICY','petroleum', 'CTX_TEMP', parallel=>2)
In this example, the text column in the ARTICLE_POLICY policy
is queried for documents that contain the term petroleum. The query
is processed in parallel by any two available ConText servers with the
Query personality and the results are written to CTX_TEMP.
The one-step query uses the CONTAINS
and SCORE functions in a SQL statement to execute
a user's request for documents. Rows and columns containing the text and
structured data for relevant documents are returned to the application
program as a record set like any other query in SQL.
Note: Before one-step queries can be executed, the database in which the text resides must be text enabled by setting the ConText initialization parameter TEXT_ENABLE = TRUE. This can be done by either setting it in the initsid.ora system initialization file, or by using the ALTER SESSION command. For more information about initialization parameters and the initsid.ora file, see Oracle8 Administrator's Guide. For more information about using the ALTER SESSION command, see Oracle8 SQL Reference. |
After a user has submitted a one-step query, ConText performs the following tasks to return the results to the user:
The following SELECT statement shows a simple one-step query. This query searches a text table called TEXTTAB for any articles that contain the word petroleum.
SELECT * FROM texttab WHERE CONTAINS (text, 'petroleum') > 0;
Because ConText functions execute within normal SQL statements,
all of the capabilities for selecting and querying normal structured data
fields, as well as text, are available. For instance, in the example, if
the text table had a column listing the date the article was published,
the user could select articles based on that date as well as the content
of the text column.
Note: The asterisk wildcard character ( *) in specifies that the record set returned by the query includes all the columns of the text table for the selected documents, as well as the scores generated for each document. If a query has more than one CONTAINS function, the asterisk wildcard does not return scores for the multiple CONTAINS and the SCORE function must be called explicitly. See "Scoring" in this chapter for an example. |
One-step queries support calling more than one CONTAINS
functions in the WHERE clause of a SELECT statement. Multiple CONTAINS
can be used in a one-step query to perform queries on multiple text columns
located either in the same table or in separate tables.
If multiple ConText servers with the Query personality are
running and a one-step query with multiple CONTAINS is executed, the query
is processed in parallel. Each CONTAINS function is evaluated by one of
the available ConText servers and the results from the servers are combined
before they are returned to the user.
In a one-step query, the document scores are generated by
the CONTAINS function and returned by the SCORE
function.
Each CONTAINS function in a query produces a separate score.
When there are multiple CONTAINS functions, each CONTAINS function must
have a label (a number) so the SCORE value can be identified in other clauses
of the SELECT statement.
The SCORE function can be used in the SELECT statement to order a hitlist as follows:
SELECT SCORE (10), title FROM DOCUMENTS WHERE CONTAINS (TEXT, 'dog', 10) > 0 ORDER BY SCORE(10) DESC;
Note: The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries. To learn more about how ConText calculates relevance score for text queries, see Appendix B, "Scoring Algorithm". To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4. |
The CONTAINS function can only appear in the WHERE clause
of a SELECT statement.
You cannot issue the CONTAINS function in the WHERE clause
of an UPDATE, INSERT or DELETE statement.
For a text column that has more than one policy associated
with it, you must specify which policy to use in the CONTAINS
clause using the pol_hint parameter.
You might create two policies for a column when you want
to perform both theme and text queries on the column, or in any application
where you build two separate indexes for a text column.
See
Also:
For more information on issuing one-step queries with multiple policies see "Theme Query Examples" in Chapter 4. To learn more about using the pol_hint parameter, see the specification for the SELECT Statement in Chapter 9. |
You can perform one-step queries on text tables with composite
textkeys. The syntax for the query is the same as the syntax for a query
on a table with a single-column textkey.
If a database link has been created for a remote database,
one-step queries support querying text columns in the remote database.
To perform a one-step query for a text column in a remote
database, the database link for the remote database is specified as part
of the table name in the SELECT clause.
For example:
SELECT * FROM texttab@db1 WHERE CONTAINS (text, 'petroleum') > 0;
In this example, texttab exists in a remote database
identified by the database link DB1
Note: One-step queries do not support querying LONG and LONG RAW columns in remote database tables. For more information about creating database links, see Oracle8 SQL Reference. For more information about remote queries and distributed databases, see Oracle8 Concepts. |
In-memory queries use a buffer and a cursor to return query
results. Returning query results to a buffer in memory improves performance
over writing and reading query results to and from database result tables,
which is typical of one- and two-step queries.
To perform an in-memory query, do the following:
In addition, you can specify that OPEN_CON return additional columns (up to five) for the selected documents from the text table.
The following example shows a simple in-memory query. This query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum.
declare score char(5); pk char(5); curid number; title char(256); begin dbms_output.enable(100000); curid := ctx_query.open_con( policy_name => 'ARTICLES_POL', text_query => 'petroleum', score_sorted => true, other_cols => 'title'); while (ctx_query.fetch_hit(curid, pk, score, title)>0) loop dbms_output.put_line(score||pk||substr(title,1,50)); end loop; ctx_query.close_con(curid); end;
In this example, the TITLE column from the table is also
returned by OPEN_CON, so a variable must be declared for TITLE.
DBMS_OUTPUT.ENABLE sets the buffer size to the maximum of
100000 bytes (1 Mb) to ensure that the buffer is large enough to hold the
results of the query.
The SCORE_SORTED argument in OPEN_CON is set to true
which causes OPEN_CON to store the hits in the query buffer in descending
order by score.
FETCH_HIT is called in a loop to fetch SCORE, PK, and TITLE
for each hit until a value less than zero is returned, indicating that
the buffer is empty.
DBMS_OUTPUT.PUT_LINE prints the results to the standard output.
See
Also:
For more information about the DBMS_OUTPUT PL/SQL package, see Oracle8 Application Developer's Guide. |
You can perform in-memory queries on text tables that have
multiple column textkeys. When you use CTX_QUERY.FETCH_HIT
to retrieve each hit from the buffer, the PK argument is returned as an
encoded string. To access an individual textkey, you must use CTX_QUERY.PKDECODE.
In-memory queries have the following limitation:
You cannot use the max and first/next operators with
in-memory queries.
If a database link has been created for a remote database,
in-memory queries support querying text columns in the remote database.
Note: Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 SQL Reference. |
To perform an in-memory query for a text column in a remote
database, the database link for the remote database is specified in the
CTX_QUERY.OPEN_CON procedure as part of the
policy for the column in the remote database.
In addition, the result table specified in CTX_QUERY.CONTAINS
must exist in the remote database and the user performing the query must
have the appropriate privileges on the result table.
See
Also:
For more information about remote queries and distributed databases, see Oracle8 Concepts. |
In addition to two-step, one-step, and in-memory queries,
you can count the number of hits in a two-step or in-memory query. Counting
query hits helps to audit queries to ensure large and unmanageable hitlists
are not returned.
You can count the number of hits before or after you issue the query using one of the following functions:
Before you issue a two-step or in-memory query, you can use
the CTX_QUERY.COUNT_HITS function to return
the number of hits for the query without generating scores for the hits
or returning the textkeys for the documents.
COUNT_HITS can be called in two modes, estimate and exact.
The results in estimate mode may be inaccurate; however, the results are
generally returned faster than in exact mode.
See
Also:
CTX_QUERY.COUNT_HITS in Chapter 10. |
You can use the CTX.QUERY.COUNT_LAST
function to obtain the number of hits in a two-step query and in-memory
query after issuing CONTAINS or OPEN_CON.
COUNT_LAST returns the number of hits obtained from the last
call to CTX_QUERY.CONTAINS or CTX_QUERY.OPEN_CON.
For two-step queries, the time it takes to issue the query
with CONTAINS and then to call COUNT_LAST is not as fast as calling COUNT_HITS
before the query. However, in the case where you need to process all hits
in a two-step query, issuing the query with CONTAINS and then calling COUNT_LAST
is more efficient than calling COUNT_HITS and then calling CONTAINS.
With in-memory queries, issuing OPEN_CON and then calling
COUNT_LAST is always a more efficient way to obtain an estimate of the
query hits over calling COUNT_HITS and then calling OPEN_CON, since COUNT_LAST
returns a number faster than COUNT_HITS.
See
Also:
CTX.QUERY.COUNT_LAST in Chapter 10. |