5
Query Expression Feedback

This chapter describes query expression feedback. The following topics are covered:

The Feedback Process

Figure 5-1

Query expression feedback is a feature that enables you to know how ConText parses a text or theme query expression before you execute the query. Knowing how ConText evaluates a text or theme query expression is useful for refining and debugging queries. You can also design your application so that it uses the feedback information to help users write better queries.

The diagram above shows how you use query expression feedback. You execute the PL/SQL procedure CTX_QUERY.FEEDBACK, which generates and stores feedback information to a table. From the data in this feedback table, you can visualize the ConText parse tree to examine how the expression was expanded and parsed. You can then refine the query and re-execute FEEDBACK, or you can execute the real query with CONTAINS for two-step queries, OPEN_CON for in-memory queries, or SELECT for one-step queries.

In text queries, query expression feedback is especially useful for knowing how context expands expressions that contain stem, wildcard, thesaurus, fuzzy, soundex, PL/SQL, or SQE operators before you execute the query. This is because such queries can potentially expand into many tokens or result in very large hitlists.

In theme queries, query expression feedback is useful for knowing how ConText uses the knowledge catalog to normalize query expressions.

Understanding ConText Parse Trees

Before ConText executes a query, it parses the expression. The resulting expression can be represented as a parse tree. A ConText parse tree can show:

order of execution (precedence of operators)
stem, fuzzy, thesaurus, soundex, PL/SQL, SQE, and wildcard expansions
theme query normalization
query optimization
stop-word transformations
breakdown of composite-word tokens (German)

The output table of the FEEDBACK procedure is graphical representation of a ConText parse tree.

Operator Precedence

Parse trees are read in a depth-first manner and from left to right. This means the first operation is always furthest to the left and at the bottom of the branch. In this way, parse trees illustrate operator precedence.

The example above shows the parse tree for the evaluation of a AND b OR c, where a, b and c stand for three arbitrary words. Since the and operation a AND b is the leftmost operation and at the bottom of the tree, it is executed first. In this way, the parse tree above indicates correctly that the and operator has higher precedence over the or operator. The resulting query is hence (a AND b) OR c rather than a AND (b OR c).

Query Expansions

The above example shows how ConText expands the query comp% OR ?smith. The parse tree shows that before ConText executes the query, the token comp% is expanded to computer and comptroller, while ?smith is expanded to smith and smythe.

ConText parse trees show similar expansions with thesaurus, wildcard, soundex, stem, SQE, and PL/SQL operators. In the case of the wildcard, soundex, and fuzzy operators, ConText obtains the correct word expansions from the index.

Note:

When you include the SQE operator in the feedback expression, the feedback (expansion of the stored query expression) is based on the current state of the index and will take into account any inserts, updates, or deletes made to the base table; however, unlike a call to CONTAINS, the stored query expression is not updated or refreshed as a result of the call to FEEDBACK.

Theme Query Normalization

You can use query expression feedback to know how ConText interprets theme queries. The feedback information provides the normalized version of the query as obtained from the knowledge catalog.

The example above shows how ConText normalizes the theme query ratified laws to the themes ratification and law. The resulting expression is an AND operation with weights attached to the normal forms: ratification*0.561 AND law*0.438.

Note:

Because numbers are rounded off when displayed, weights might not always add up to 1.000 exactly.

See Also:

For more information about theme queries, see Chapter 4, "Theme Queries".

Query Optimization

The example above shows how ConText optimizes the expression a AND b AND c, where a and b and c stand for three different words.

In the first step of the parse, ConText evaluates a AND b, then ANDs the result with c. With such a parse tree, ConText must search for all documents that contain a and b, then search for all documents that contain c, and then intersect the two result sets.

The ConText optimizer realizes this query is more efficiently executed by simultaneously searching for all the documents that contain a and b and c, which is illustrated in the second step of the optimizing process.

Stopword Rewrite

The example above shows the parse sequence for the stopword transformation:

non_stopword NOT stopword => non_stopword

Assuming that is a stopword, ConText reduces the query dog NOT that to dog.

See Also:

To learn more about querying with stopwords, see "Querying with Stopwords" in Chapter 3.

For a list of all possible stopword transformations, see Appendix D, "Stopword Transformations".

Decompounding of Composite Word Tokens

When using a composite index with German or Dutch text, you can use query feedback to examine how ConText breaks down a composite word query into its subcomposites. Even though ConText does not return documents that contain only subcomposite words in a query, composite word query feedback is useful for verifying where ConText places word boundaries.

The above example shows that ConText breaks down the German composite word Hauptbahnhof into haupt, bahn, bahnen, and hof.

Note:

To obtain composite word query feedback, the policy's lexer must have the COMPOSITE attribute of the lexer set to 1.

For more information about defining policies, see the Oracle8 Context Cartridge Administrator's Guide.

Understanding the Feedback Table

Before you issue a query, you can obtain the parse tree information for the query expression. The procedure CTX_QUERY.FEEDBACK creates a graphical representation of the parse tree and stores this information in a feedback table, which you create before executing CTX_QUERY.FEEDBACK. To reconstruct ConText parse trees, you must understand the structure of this table.

Table Structure

The feedback table has the following structure:

Table 5-1


Column Name	Datatype	Description
FEEDBACK_ID	VARCHAR2(30)	The value of the feedback_id argument specified in the FEEDBACK call.
ID	NUMBER	A number assigned to each node in the query execution tree. The root operation node has ID =1. The nodes are numbered in a top-down, left-first manner as they appear in the parse tree.
PARENT_ID	NUMBER	The ID of the execution step that operates on the output of the ID step. Graphically, this is the parent node in the query execution tree. The root operation node (ID =1) has PARENT_ID = 0.
OPERATION	VARCHAR2(30)	Name of the internal operation performed. Refer to Table 5-2 for possible values.
OPTIONS	VARCHAR2(30)	Characters that describe a variation on the operation described in the OPERATION column. When an OPERATION has more than one OPTIONS associated with it, OPTIONS values are concatenated in the order of processing. See Table 5-3 for possible values.
OBJECT_NAME	VARCHAR2(64)	Section name, or wildcard term, or term to lookup in the index.
POSITION	NUMBER	The order of processing for nodes that all have the same PARENT_ID.The positions are numbered in ascending order starting at 1.
CARDINALITY	NUMBER	Reserved for future use. You should create this column for forward compatibility.

OPERATION Column

Table 5-2 lists the possible values for the OPERATION column in the feedback table:

Table 5-2


Operation Value	Query Operator	Equivalent Symbol
ACCUMULATE	ACCUM	,
AND	AND	&
COMPOSITE	(none)	(none)
EQUIVALENCE	EQUIV	=
FIRST_NEXT_DOC	#	#
MAX_DOC	:	:
MINUS	MINUS	-
NEAR	NEAR	;
NOT	NOT	~
NO_HITS	(no hits will result from this query)
OR	OR	\|
PHRASE	(a phrase term)
SECTION	(section)
THRESHOLD	>	>
WEIGHT	*	*
WITHIN	within	(none)
WORD	(a single term)

OPTIONS Column

Table 5-3 shows the values for the OPTIONS column in the feedback table. When an OPERATION has more than one OPTIONS associated with it, the OPTIONS values are concatenated in the order of processing.

Table 5-3


Options Value	Description
($)	Stem
(?)	Fuzzy
(!)	Soundex
(T)	Order for ordered Near.
(F)	Order for unordered Near.
(n)	A number associated with Threshold, Weight, Max, or the max_span parameter for the Near operator.
(m-n)	First next range (m and n are integers)

Example

The figure above shows how ConText encodes the parse tree for the query comp% OR $smith, which is asking for all documents that contain words beginning with comp or contain words that are spelled like smith.

Each node is labeled with a value that corresponds to the OPERATION column in the feedback table. The tree above contains one OR node, two EQUIVALENCE nodes, and four WORD nodes.

The ID and PARENT_ID values are listed beside each node. For example, the OR node has an ID of 1 and PARENT_ID of 0, since it is the root node.

The EQUIVALENCE node with ID = 2, PARENT_ID = 1, has an OBJECT_NAME value of COMP%, because this equivalence operation is a result of wildcard term comp%.

The WORD node with id = 3 has an OBJECT_NAME value of computer, because in this instance, computer is one of the words that satisfy comp%.

Obtaining Query Expression Feedback

To obtain query expression feedback information, you must do the following:

Create the feedback table.
Execute CTX_QUERY.FEEDBACK.
Retrieve data from feedback table.
Optionally, construct expansion tree from table information.

Creating the Feedback Table

To create a feedback table called test_feedback for example, use the following SQL statement:

create table test_feedback(
         feedback_id varchar2(30)
         id number,
         parent_id number,
         operation varchar2(30),
         options varchar2(30),
         object_name varchar2(64),
         position number,
         cardinality number);

Executing CTX_QUERY.FEEDBACK

To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_QUERY.FEEDBACK as follows:

ctx_query.feedback(
         policy_name => 'scott.test_policy',
         text_query => 'comp% OR ?smith',
         feedback_table => 'test_feedback',
         sharelevel => 0,
         feedback_id => 'Test');

Retrieving Data from Feedback Table

To read the feedback table, you can select the columns as follows:

select feedback_id, id, parent_id, operation, options, object_name, position
from test_feedback
order by id;

The output is ordered by ID to simulate a hierarchical query:

FEEDBACK_ID   ID PARENT_ID OPERATION    OPTIONS OBJECT_NAME POSITION 
----------- ---- --------- ------------ ------- ----------- -------- 
Test           1         0 OR           NULL    NULL          1 
Test           2         1 EQUIVALENCE  NULL    COMP%         1
Test           3         2 WORD         NULL    COMPTROLLER   1 
Test           4         2 WORD         NULL    COMPUTER      2 
Test           5         1 EQUIVALENCE  (?)     SMITH         2 
Test           6         5 WORD         NULL    SMITH         1 
Test           7         5 WORD         NULL    SMYTHE        2

Constructing the Parse Tree

You can optionally construct an approximate graphical representation of the parse tree using a hierarchical query. This type of query outputs rows in a hierarchical manner, where children nodes are indented under parent nodes.

The following statement selects from a populated feedback table, indenting the output according to level:

select lpad(' ',2*(level-1)) || operation operation, options, object_name, 
position
from test_feedback
start with id = 1
connect by prior id = parent_id;

This statement produces hierarchical output for the query comp% OR ?smith as follows:

OPERATION            OPTIONS    OBJECT_NAME          POSITION 
-------------------- ---------- -------------------- -------
OR                   NULL       NULL                        1 
  EQUIVALENCE        NULL       COMP%                       1 
    WORD             NULL       COMPTROLLER                 1 
    WORD             NULL       COMPUTER                    2 
  EQUIVALENCE        (?)        SMITH                       2 
    WORD             NULL       SMITH                       1 
    WORD             NULL       SMYTHE                      2

5 Query Expression Feedback

The Feedback Process

Figure 5-1

Understanding ConText Parse Trees

Operator Precedence

Query Expansions

Theme Query Normalization

Query Optimization

Stopword Rewrite

Decompounding of Composite Word Tokens

Understanding the Feedback Table

Table Structure

Table 5-1

OPERATION Column

Table 5-2

OPTIONS Column

Table 5-3

Example

Obtaining Query Expression Feedback

Creating the Feedback Table

Executing CTX_QUERY.FEEDBACK

Retrieving Data from Feedback Table

Constructing the Parse Tree

5
Query Expression Feedback