Oracle8
ConText Cartridge Application Developer's Guide
Release 2.4 A63821-01 |
|
This chapter describes query expression feedback. The following topics are covered:
Query expression feedback is a feature that enables you to
know how ConText parses a text or theme query expression before
you execute the query. Knowing how ConText evaluates a text or theme query
expression is useful for refining and debugging queries. You can also design
your application so that it uses the feedback information to help users
write better queries.
The diagram above shows how you use query expression feedback.
You execute the PL/SQL procedure CTX_QUERY.FEEDBACK,
which generates and stores feedback information to a table. From the data
in this feedback table, you can visualize the ConText parse tree to examine
how the expression was expanded and parsed. You can then refine the query
and re-execute FEEDBACK, or you can execute the real query with CONTAINS
for two-step queries, OPEN_CON for in-memory queries, or SELECT for one-step
queries.
In text queries, query expression feedback is especially
useful for knowing how context expands expressions that contain stem, wildcard,
thesaurus, fuzzy, soundex, PL/SQL, or SQE operators before you execute
the query. This is because such queries can potentially expand into many
tokens or result in very large hitlists.
In theme queries, query expression feedback is useful for
knowing how ConText uses the knowledge catalog to normalize query expressions.
Before ConText executes a query, it parses the expression. The resulting expression can be represented as a parse tree. A ConText parse tree can show:
The output table of the FEEDBACK
procedure is graphical representation of a ConText parse tree.
Parse trees are read in a depth-first manner and from left
to right. This means the first operation is always furthest to the left
and at the bottom of the branch. In this way, parse trees illustrate operator
precedence.
The example above shows the parse tree for the evaluation
of a AND b OR c, where a, b and c stand for three
arbitrary words. Since the and operation a AND b is the leftmost
operation and at the bottom of the tree, it is executed first. In this
way, the parse tree above indicates correctly that the and operator
has higher precedence over the or operator. The resulting query
is hence (a AND b) OR c rather than a AND (b OR c).
The above example shows how ConText expands the query comp%
OR ?smith. The parse tree shows that before ConText executes the query,
the token comp% is expanded to computer and comptroller,
while ?smith is expanded to smith and smythe.
ConText parse trees show similar expansions with thesaurus,
wildcard, soundex, stem, SQE, and PL/SQL operators. In the case of the
wildcard, soundex, and fuzzy operators, ConText obtains the correct word
expansions from the index.
Note: When you include the SQE operator in the feedback expression, the feedback (expansion of the stored query expression) is based on the current state of the index and will take into account any inserts, updates, or deletes made to the base table; however, unlike a call to CONTAINS, the stored query expression is not updated or refreshed as a result of the call to FEEDBACK. |
You can use query expression feedback to know how ConText
interprets theme queries. The feedback information provides the normalized
version of the query as obtained from the knowledge catalog.
The example above shows how ConText normalizes the theme
query ratified laws to the themes ratification and law.
The resulting expression is an AND operation with weights attached to the
normal forms: ratification*0.561 AND law*0.438.
Note: Because numbers are rounded off when displayed, weights might not always add up to 1.000 exactly. |
See
Also:
For more information about theme queries, see Chapter 4, "Theme Queries". |
The example above shows how ConText optimizes the expression
a AND b AND c, where a and b and c stand for
three different words.
In the first step of the parse, ConText evaluates a AND
b, then ANDs the result with c. With such a parse tree, ConText
must search for all documents that contain a and b, then
search for all documents that contain c, and then intersect the
two result sets.
The ConText optimizer realizes this query is more efficiently
executed by simultaneously searching for all the documents that contain
a and b and c, which is illustrated in the second
step of the optimizing process.
The example above shows the parse sequence for the stopword transformation:
Assuming that is a stopword, ConText reduces the query
dog NOT that to dog.
See
Also:
To learn more about querying with stopwords, see "Querying with Stopwords" in Chapter 3. For a list of all possible stopword transformations, see Appendix D, "Stopword Transformations". |
When using a composite index with German or Dutch text, you
can use query feedback to examine how ConText breaks down a composite word
query into its subcomposites. Even though ConText does not return documents
that contain only subcomposite words in a query, composite word query feedback
is useful for verifying where ConText places word boundaries.
The above example shows that ConText breaks down the German
composite word Hauptbahnhof into haupt, bahn, bahnen,
and hof.
Note: To obtain composite word query feedback, the policy's lexer must have the COMPOSITE attribute of the lexer set to 1. For more information about defining policies, see the Oracle8 Context Cartridge Administrator's Guide. |
Before you issue a query, you can obtain the parse tree information
for the query expression. The procedure CTX_QUERY.FEEDBACK
creates a graphical representation of the parse tree and stores this information
in a feedback table, which you create before executing CTX_QUERY.FEEDBACK.
To reconstruct ConText parse trees, you must understand the structure of
this table.
The feedback table has the following structure:
Column Name | Datatype | Description |
---|---|---|
FEEDBACK_ID |
VARCHAR2(30) |
The value of the feedback_id argument specified in the FEEDBACK call. |
ID |
NUMBER |
A number assigned to each node in the query execution tree. The root operation node has ID =1. The nodes are numbered in a top-down, left-first manner as they appear in the parse tree. |
PARENT_ID |
NUMBER |
The ID of the execution step that operates on the output of the ID step. Graphically, this is the parent node in the query execution tree. The root operation node (ID =1) has PARENT_ID = 0. |
OPERATION |
VARCHAR2(30) |
Name of the internal operation performed. Refer to Table 5-2 for possible values. |
OPTIONS |
VARCHAR2(30) |
Characters that describe a variation on the operation described in the OPERATION column. When an OPERATION has more than one OPTIONS associated with it, OPTIONS values are concatenated in the order of processing. See Table 5-3 for possible values. |
OBJECT_NAME |
VARCHAR2(64) |
Section name, or wildcard term, or term to lookup in the index. |
POSITION |
NUMBER |
The order of processing for nodes that all have the same PARENT_ID.The positions are numbered in ascending order starting at 1. |
CARDINALITY |
NUMBER |
Reserved for future use. You should create this column for forward compatibility. |
Table 5-2 lists the possible values for the OPERATION column in the feedback table:
Table 5-3 shows the values for
the OPTIONS column in the feedback table. When an OPERATION has more than
one OPTIONS associated with it, the OPTIONS values are concatenated in
the order of processing.
The figure above shows how ConText encodes the parse tree
for the query comp% OR $smith, which is asking for all documents
that contain words beginning with comp or contain words that are
spelled like smith.
Each node is labeled with a value that corresponds to the
OPERATION column in the feedback table. The tree above contains one OR
node, two EQUIVALENCE nodes, and four WORD nodes.
The ID and PARENT_ID values are listed beside each node.
For example, the OR node has an ID of 1 and PARENT_ID of 0, since it is
the root node.
The EQUIVALENCE node with ID = 2, PARENT_ID = 1, has an OBJECT_NAME
value of COMP%, because this equivalence operation is a result of
wildcard term comp%.
The WORD node with id = 3 has an OBJECT_NAME value
of computer, because in this instance, computer is one of
the words that satisfy comp%.
To obtain query expression feedback information, you must do the following:
To create a feedback table called test_feedback for example, use the following SQL statement:
create table test_feedback( feedback_id varchar2(30) id number, parent_id number, operation varchar2(30), options varchar2(30), object_name varchar2(64), position number, cardinality number);
To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_QUERY.FEEDBACK as follows:
ctx_query.feedback( policy_name => 'scott.test_policy', text_query => 'comp% OR ?smith', feedback_table => 'test_feedback', sharelevel => 0, feedback_id => 'Test');
To read the feedback table, you can select the columns as follows:
select feedback_id, id, parent_id, operation, options, object_name, position from test_feedback order by id;
The output is ordered by ID to simulate a hierarchical query:
FEEDBACK_ID ID PARENT_ID OPERATION OPTIONS OBJECT_NAME POSITION ----------- ---- --------- ------------ ------- ----------- -------- Test 1 0 OR NULL NULL 1 Test 2 1 EQUIVALENCE NULL COMP% 1 Test 3 2 WORD NULL COMPTROLLER 1 Test 4 2 WORD NULL COMPUTER 2 Test 5 1 EQUIVALENCE (?) SMITH 2 Test 6 5 WORD NULL SMITH 1 Test 7 5 WORD NULL SMYTHE 2
You can optionally construct an approximate graphical representation
of the parse tree using a hierarchical query. This type of query outputs
rows in a hierarchical manner, where children nodes are indented under
parent nodes.
The following statement selects from a populated feedback table, indenting the output according to level:
select lpad(' ',2*(level-1)) || operation operation, options, object_name, position from test_feedback start with id = 1 connect by prior id = parent_id;
This statement produces hierarchical output for the query comp% OR ?smith as follows:
OPERATION OPTIONS OBJECT_NAME POSITION -------------------- ---------- -------------------- ------- OR NULL NULL 1 EQUIVALENCE NULL COMP% 1 WORD NULL COMPTROLLER 1 WORD NULL COMPUTER 2 EQUIVALENCE (?) SMITH 2 WORD NULL SMITH 1 WORD NULL SMYTHE 2