Oracle8
ConText Cartridge Application Developer's Guide
Release 2.4 A63821-01 |
|
This chapter introduces the ConText features you can use to build a query application. It describes a typical query application then discusses the options ConText provides at each step:
Figure 1-1 illustrates a basic
design of a ConText query application. It shows the different modules required
to let the user enter the query and hence view the results. Each module
represents a step in the querying process, where rectangular boxes indicate
application tasks and round boxes indicate user-tasks.
As shown, the process of issuing a query can be modeled according to the following steps:
Generally, query applications assume the following tasks have been performed:
Documents must be loaded in a text column before you can
index the document set and issue queries. You can store documents directly
in the text column or you can store a pointer to an external file or URL.
See
Also:
For more information about loading and storing text, see Oracle8 Context Cartridge Administrator's Guide. |
How you index your document set affects how the user of an application can issue queries. With ConText, you can create the following basic types of indexes for documents stored in a text column:
Having a text index allows you to issue text queries against
the document set, which is a search on words or phrases.
Having a theme index allows you to issue theme queries against
a document set, which is a search on the main ideas in a document.
You can create either type of index by specifying either
a text or theme lexer when you create the index preference.
See
Also:
For more information about creating preferences and text and theme indexes, see Oracle8 Context Cartridge Administrator's Guide. |
The options you can give the user for issuing text
queries are determined by how you create the text index. Table
1-1 describes the more frequently used options and which index preference
to set to enable each option. The Reference column in Table
1-1 gives the name of the section in this book that describes the query
feature in detail.
Once an index is created with these options, the options
cannot be changed unless a new index is created.
Text Query Option | Description | Index Preference | Reference |
---|---|---|---|
Stemming |
Enables searches for words with same root as specified term. |
Wordlist |
|
Soundex |
Enables searches for words that sound like specified term. |
Wordlist |
|
Fuzzy Matching |
Enables searches for words that have similar spelling to specified term. |
Wordlist |
|
Section Searching |
Enables searches for terms within pre-defined document sections. |
Wordlist |
|
Base-letter Matching |
Queries match words with or without diacritical marks such as tildes, accents, and umlauts. For example in Spanish with a base-letter index, a query of mañana matches manana and mañana in the index. |
Lexer |
|
Case Sensitivity |
Enables case-sensitive searches. |
Lexer |
|
(German and Dutch only) |
Enables searching on words that contain specified term as sub-composite. |
Lexer |
"Composite Word Queries (German and Dutch only)" in Chapter 3 |
See
Also:
For more information about creating index preferences, see Oracle8 Context Cartridge Administrator's Guide. |
The options discussed in the previous section entitled "Text
Indexing Options" are not supported for theme indexes. ConText has no options
for creating theme indexes.
This section provides an overview of the options you can
build into your application for user queries.
In ConText, a text query is a search for a word or phrase
in an indexed text column. ConText returns the documents (or rows) that
satisfy the query along with a score that says how relevant the document
is to the entered query.
For example, a text query on the term unify returns
all documents that contain the word unify.
The simplest text query is one in which the application user
enters a single word or phrase and ConText returns all documents that contain
the word or phrase. More sophisticated queries can include operators to
do logical searches, section searches, and wildcard searches. All of ConText's
operators are available with text queries.
You can use the standard query methods to perform text queries,
namely one-step, two-step, and in-memory.
In addition to querying English-language documents by words
(text query), you can query these documents by theme, or by their main
concepts.
Theme queries work similarly to text querying in that you
must create an index (theme) for the documents before you can query. Theme
queries differ from text queries in that you need not provide exact wording
for searches. ConText interprets your query conceptually according to its
view of the world and returns an appropriate document hitlist based on
theme, along with a measure of how relevant each document is to the query.
For example, a theme query on unify returns documents
about the concept of unification or unifying.
You can use the standard query methods to perform theme queries,
namely one-step, two-step, and in-memory. In a theme query, you can use
some of the operators you use in regular text queries.
See
Also:
For more information about theme queries, see Chapter 4, "Theme Queries". |
Operators in ConText enable you to issue a wide variety of
queries including logical AND/OR searches, NOT searches, near searches,
document section searches, term weighted searches, and expanded term searches.
You can embed these operators within your application or
pass them on to the user. When you embed them within the application, you
allow users to enter only query terms. The application can then intelligently
process entered terms by combining operators to get different results.
You can also pass on the functionality of operators to users.
You can do this by allowing users to enter ConText operators directly or
with an interface of pull-down menus and radio buttons. Allowing users
to enter operators gives users the ability to tailor their queries.
See Also: Some operators can only work if the index is enabled for them. For a complete list of these operators, see the previous section entitled "Text Indexing Options". For more information about ConText operators, Chapter 3, "Understanding Query Expressions". |
ConText supports case-sensitivity in both text and theme
queries.S
By default, ConText creates text indexes without being sensitive
to the case of tokens in the documents. Because of this, text queries are
case-insensitive. That is, a query on United returns documents that
contain United and UNITED and united.
However, you can make text queries case-sensitive by using
a case-sensitive lexer when you or your ConText administrator indexes the
document set. When you create a case-sensitive index, a query on United
is different from united, which is different from UNITED.
See
Also:
For more information about issuing case-sensitive text queries, see "Case-Sensitive Queries" in Chapter 3, "Understanding Query Expressions". For more information about creating case-sensitive text indexes for columns, see Oracle8 Context Cartridge Administrator's Guide.. |
Theme queries are case-sensitive. This means that a query
on Turkey returns hits on Turkey the country and not turkey
the bird.
Even though ConText theme queries are case-sensitive, ConText
tolerates poorly formatted input for known themes.
For example, entering microsoft or microSoft
returns documents that include the theme of Microsoft, a known company.
Likewise, entering Currency Rates returns documents that include
a theme of currency rates, a standard classification in business
and economics.
Section searching enables users to narrow text queries down to sections within documents. Sections can be of the following:
Sentence or paragraph searching enables users to search for
combination of words within sentences or paragraphs.
Searching within user-defined sections enables users to search
for a term within sections they have defined prior to creating a text index.
To do this type of section searching, you or your ConText administrator
must define sections by specifying what tags delimit the section.
User-defined section searching is useful when your documents
have internal structure, such as HTML documents.
Note: Section searching is supported for text queries only. |
See
Also:
For more information about section searching, see the "WITHIN Operator" section in Chapter 3. |
For both text and theme queries, your application interface
can give the user the options of querying on structured fields such as
date, document author etc.
You can issue structured searches with one-step, two-step
and in-memory queries and subsequently present the structured information
related to each document in the hitlist.
See
Also:
For more information about issuing structured queries, see "Using Two-Step Queries" and "Using In-Memory Queries" in Chapter 2. |
You can design your query interface to allow users to enter
ConText operators, either by allowing the user to enter operators directly
or by using a more sophisticated interface in which the user can choose
operators from a pull-down menu or radio button. In either case, your application
can refine the query expression further by adding operators or adding or
removing special words or symbols to achieve different results.
See
Also:
For more information about ConText operators, Chapter 3, "Understanding Query Expressions". |
After the user enters the query, you can either present expression
feedback or execute the query. See Figure 1-1.
Expression feedback allows the user to view how ConText executes
the query. Feedback is useful for understanding how ConText expands theme
queries as well as how it expands stem, fuzzy, thesaurus, soundex, or wildcard
text queries. By providing this additional information, query expression
feedback helps users refine queries that might return an unwanted result
set.
If the user requires feedback, the application presents the
expression feedback, and gives the user the option of re-entering a refined
query. See Figure 1-1
Your application can also present expression feedback after
executing the query when you present the hitlist. See Figure
1-1
See
Also:
For more information about query expression feedback, see Chapter 5, "Query Expression Feedback". |
In a PL/SQL application, you can issue a two-step query or
an in-memory query, depending on your requirements. You can also count
the number of hits in a query.
A third type of query, the one-step query, is discussed in
this section for completeness, even though one-step queries cannot be used
in PL/SQL applications.
Two-step queries use the PL/SQL CONTAINS procedure in the
first step to store the results in a specified result table. The second
step uses a SELECT statement to select the results from the result table.
In the SELECT statement, you can join the result table with the original
text table to return more detailed document information.
Because two-step queries use tables to store the hits, they
are best suited for applications that require all the results to a query.
See
Also:
For more information about using two-step queries, see "Using Two-Step Queries" in Chapter 2. |
In-memory queries use a cursor to return query results, rather
than the result tables used in two-step and one-step queries.
In an in-memory query, you open a cursor and issue the query.
ConText writes the results of the query to the cursor. You fetch the results
one row at a time, then close the cursor. Results can be returned unordered
or sorted by score.
Because in-memory queries store results in memory, they generally
return hits faster than two-step queries for large hitlists, since you
need not retrieve all hits at a time. As such, in-memory queries are best
suited for applications that might return large hitlist but where only
a small portion of hits are required at a time.
See
Also:
For more information about using in-memory queries, see "Using In-Memory Queries" in Chapter 2. |
In a one-step query, you create a single SQL SELECT statement
with a WHERE... CONTAINS clause to search for relevant documents. ConText
returns the rows and columns of the text table that satisfy the query.
Because PL/SQL does not recognize the CONTAINS function in
the SELECT statement, one-step queries are limited to interactive or ad-hoc
queries in SQL*Plus.
See
Also:
For more information about using one-step queries, see "Using One-Step Queries" in Chapter 2. |
In addition to fully executing two-step, one-step, and in-memory
queries, you can count the number of hits in a two-step or in-memory query
before or after you issue the query. Counting query hits helps to analyze
queries to ensure large and unmanageable hitlists are not returned.
See
Also:
For more information about counting query hits, see "Counting Query Hits" in Chapter 2. |
Your application presents a hitlist in one or more of the following ways:
Structured columns related to the text column can help identify
documents. When you present the hitlist, you can show related columns such
as document titles or author or any other combination of fields that identify
the document.
In a two-step query, you can obtain the structured fields
by joining the result table with the base table.
In an in-memory query, you must specify what structured column
or columns to fetch into the cursor along with the textkey.
In a one-step query, you specify the name of structured column
or columns in the SELECT statement.
When you issue either a text query or theme query, ConText
returns the hitlist of documents that satisfy the query with a relevance
score for each document returned. You can present these scores when you
return the hitlist to the user.
The score for each document is between one and one hundred
and indicates how relevant the document is to the query entered; the higher
the score, the more relevant the document. You can use scores to order
the hitlist to show the most relevant documents first.
In two-step queries, ConText calculates the score when you
call the CTX_QUERY.CONTAINS procedure. This
procedure stores the score in the result table.
In in-memory queries, ConText returns the score for a hit
as an out parameter with the CTX_QUERY.FETCH_HIT
function.
In one-step queries, ConText calculates scores when you use
the CONTAINS function. You obtain scores using
the SCORE function.
See
Also:
For more information about manipulating a result set, see "Result-Set Operators" in Chapter 3. For more information about how ConText scores text queries, Appendix B, "Scoring Algorithm". For more information about scoring for theme queries, see "Theme Querying" in Chapter 4. |
You present the number of hits the query returned alongside
the hitlist, using CTX_QUERY.COUNT_LAST, which returns the number of hits
in the last two-step or in-memory query.
However, when the number is all that is required, you can
use CTX_QUERY.COUNT_HITS, which is more efficient than executing the two-step
or in-memory query and then counting the hits.
You can accompany a query hitlist with expression feedback.
Using feedback in this way gives the user an opportunity to see the expanded
query alongside the results of the query.
When you present your hitlist with expression feedback, you
can give the user the option of selecting a document, or of refining and
then re-entering another query if the user is not satisfied with the results
in the hitlist.
See
Also:
For more information about query expression feedback, see Chapter 5, "Query Expression Feedback". |
If presenting a hitlist is not enough information, you can
present a Gist for every document in the hitlist. A Gist is essentially
a document summary. However, the generation of a Gist requires an extra
processing step and is available for English only.
See
Also:
For more information about generating Gists and other CTX_LING output, see Chapter 8, "Using CTX_LING". |
When your application obtains the results of a query, it can let the user select a document from the hitlist and then present one or more of the following ConText document services:
ConText enables you to present documents to the user with
query terms highlighted for text queries, or with the relevant paragraphs
highlighted for theme queries. You can do highlighting in PL/SQL as well
as with the ConText viewers for Windows 32-bit and world wide web applications.
With PL/SQL, you create the viewable output by calling the
highlighting procedure, CTX_QUERY.HIGHLIGHT, usually after you issue the
query. You can use this procedure to highlight documents stored as plain
text or documents stored in formats such as Microsoft Word.
With the highlighting procedure, you can obtain the document
plain-text, document plain-text with highlights, or the document in its
native format without highlights. This procedure outputs to result tables,
which you use to present the document.The highlighting procedure works
for text and theme queries (See Figure 1-2).
See
Also:
For more information about presenting highlighted documents, see Chapter 6, "Document Presentation: Highlighting". |
Context provides a custom control that you can embed programmatically
in 32-bit Windows client-side applications. This custom control allows
users to query documents and then view them in their native formats (WYSIWYG),
such as Microsoft Word, with query terms or paragraphs highlighted. See
Figure 1-2
You can use the ConText custom control to view documents in the following server-side supported formats:
For world wide web applications that use the Oracle Web Application server, you can present documents in a Windows 32-bit environment using one of the following:
Both these configurations require that the ConText viewer
cartridge be installed on the Oracle Web Application Server.
See
Also:
For more information about highlighting with ConText viewers, see the Oracle8 ConText Cartridge Workbench User's Guide. |
For English-language documents, the CTX_LING PL/SQL package
enables you to create different document summaries and list of themes,
which you create on a per-document basis. These summaries and lists of
themes are shorter than the documents themselves and can help application
users quickly view the essential content of documents.
ConText can generate the following forms of CTX_LING output
on a per document basis:
You obtain linguistic output by submitting a linguistic request
using the CTX_LING PL/SQL package.
See
Also:
For more information about generating CTX_LING output, see Chapter 8, "Using CTX_LING". |