1
Building a Query Application

This chapter introduces the ConText features you can use to build a query application. It describes a typical query application then discusses the options ConText provides at each step:

Overview

Figure 1-1

Figure 1-1 illustrates a basic design of a ConText query application. It shows the different modules required to let the user enter the query and hence view the results. Each module represents a step in the querying process, where rectangular boxes indicate application tasks and round boxes indicate user-tasks.

As shown, the process of issuing a query can be modeled according to the following steps:

user enters query
application re-writes query (optional)
application presents expression feedback (optional)
user refines query expression (optional)
application executes query
application presents hitlist
user selects from hitlist
application presents document

Prerequisites

Generally, query applications assume the following tasks have been performed:

text is loaded in the database
text is indexed

Loading Text

Documents must be loaded in a text column before you can index the document set and issue queries. You can store documents directly in the text column or you can store a pointer to an external file or URL.

See Also:

For more information about loading and storing text, see Oracle8 Context Cartridge Administrator's Guide.

Creating an Index

How you index your document set affects how the user of an application can issue queries. With ConText, you can create the following basic types of indexes for documents stored in a text column:

text index
theme index

Having a text index allows you to issue text queries against the document set, which is a search on words or phrases.

Having a theme index allows you to issue theme queries against a document set, which is a search on the main ideas in a document.

You can create either type of index by specifying either a text or theme lexer when you create the index preference.

See Also:

For more information about creating preferences and text and theme indexes, see Oracle8 Context Cartridge Administrator's Guide.

Text Indexing Options

The options you can give the user for issuing text queries are determined by how you create the text index. Table 1-1 describes the more frequently used options and which index preference to set to enable each option. The Reference column in Table 1-1 gives the name of the section in this book that describes the query feature in detail.

Once an index is created with these options, the options cannot be changed unless a new index is created.

Table 1-1


Text Query Option	Description	Index Preference	Reference
Stemming	Enables searches for words with same root as specified term.	Wordlist	"Stem Expansions" in Chapter 3.
Soundex	Enables searches for words that sound like specified term.	Wordlist	"Soundex Expansions" in Chapter 3
Fuzzy Matching	Enables searches for words that have similar spelling to specified term.	Wordlist	"Fuzzy Expansions" in Chapter 3
Section Searching	Enables searches for terms within pre-defined document sections.	Wordlist	"WITHIN Operator" in Chapter 3
Base-letter Matching	Queries match words with or without diacritical marks such as tildes, accents, and umlauts. For example in Spanish with a base-letter index, a query of mañana matches manana and mañana in the index.	Lexer	"Base-Letter Queries" in Chapter 3
Case Sensitivity	Enables case-sensitive searches.	Lexer	"Case-Sensitive Queries" in Chapter 3
Composite word query (German and Dutch only)	Enables searching on words that contain specified term as sub-composite.	Lexer	"Composite Word Queries (German and Dutch only)" in Chapter 3

See Also:

For more information about creating index preferences, see Oracle8 Context Cartridge Administrator's Guide.

Theme Indexing Options

The options discussed in the previous section entitled "Text Indexing Options" are not supported for theme indexes. ConText has no options for creating theme indexes.

Entering the Query

This section provides an overview of the options you can build into your application for user queries.

Text Queries

In ConText, a text query is a search for a word or phrase in an indexed text column. ConText returns the documents (or rows) that satisfy the query along with a score that says how relevant the document is to the entered query.

For example, a text query on the term unify returns all documents that contain the word unify.

The simplest text query is one in which the application user enters a single word or phrase and ConText returns all documents that contain the word or phrase. More sophisticated queries can include operators to do logical searches, section searches, and wildcard searches. All of ConText's operators are available with text queries.

You can use the standard query methods to perform text queries, namely one-step, two-step, and in-memory.

Theme Queries

In addition to querying English-language documents by words (text query), you can query these documents by theme, or by their main concepts.

Theme queries work similarly to text querying in that you must create an index (theme) for the documents before you can query. Theme queries differ from text queries in that you need not provide exact wording for searches. ConText interprets your query conceptually according to its view of the world and returns an appropriate document hitlist based on theme, along with a measure of how relevant each document is to the query.

For example, a theme query on unify returns documents about the concept of unification or unifying.

You can use the standard query methods to perform theme queries, namely one-step, two-step, and in-memory. In a theme query, you can use some of the operators you use in regular text queries.

See Also:

For more information about theme queries, see Chapter 4, "Theme Queries".

Using Operators

Operators in ConText enable you to issue a wide variety of queries including logical AND/OR searches, NOT searches, near searches, document section searches, term weighted searches, and expanded term searches.

You can embed these operators within your application or pass them on to the user. When you embed them within the application, you allow users to enter only query terms. The application can then intelligently process entered terms by combining operators to get different results.

You can also pass on the functionality of operators to users. You can do this by allowing users to enter ConText operators directly or with an interface of pull-down menus and radio buttons. Allowing users to enter operators gives users the ability to tailor their queries.

See Also:

Some operators can only work if the index is enabled for them. For a complete list of these operators, see the previous section entitled "Text Indexing Options".

For more information about ConText operators, Chapter 3, "Understanding Query Expressions".

Case-Sensitive Searching

ConText supports case-sensitivity in both text and theme queries.S

Text Queries

By default, ConText creates text indexes without being sensitive to the case of tokens in the documents. Because of this, text queries are case-insensitive. That is, a query on United returns documents that contain United and UNITED and united.

However, you can make text queries case-sensitive by using a case-sensitive lexer when you or your ConText administrator indexes the document set. When you create a case-sensitive index, a query on United is different from united, which is different from UNITED.

See Also:

For more information about issuing case-sensitive text queries, see "Case-Sensitive Queries" in Chapter 3, "Understanding Query Expressions".

For more information about creating case-sensitive text indexes for columns, see Oracle8 Context Cartridge Administrator's Guide..

Theme Queries

Theme queries are case-sensitive. This means that a query on Turkey returns hits on Turkey the country and not turkey the bird.

Even though ConText theme queries are case-sensitive, ConText tolerates poorly formatted input for known themes.

For example, entering microsoft or microSoft returns documents that include the theme of Microsoft, a known company. Likewise, entering Currency Rates returns documents that include a theme of currency rates, a standard classification in business and economics.

Note:

For poorly formatted input, ConText always attempts to match the entered theme with themes in the index. For example if you enter microsoft, ConText looks up microsoft and Microsoft in the index. Likewise, if you enter Currency Rates as your theme, ConText looks up Currency Rates and currency rates in the index.

Document Section Searching

Section searching enables users to narrow text queries down to sections within documents. Sections can be of the following:

sentence or paragraphs
user-defined sections

Sentence or paragraph searching enables users to search for combination of words within sentences or paragraphs.

Searching within user-defined sections enables users to search for a term within sections they have defined prior to creating a text index. To do this type of section searching, you or your ConText administrator must define sections by specifying what tags delimit the section.

User-defined section searching is useful when your documents have internal structure, such as HTML documents.

Note:

Section searching is supported for text queries only.

See Also:

For more information about section searching, see the "WITHIN Operator" section in Chapter 3.

Structured Field Searching

For both text and theme queries, your application interface can give the user the options of querying on structured fields such as date, document author etc.

You can issue structured searches with one-step, two-step and in-memory queries and subsequently present the structured information related to each document in the hitlist.

See Also:

For more information about issuing structured queries, see "Using Two-Step Queries" and "Using In-Memory Queries" in Chapter 2.

Rewriting the Query Expression

You can design your query interface to allow users to enter ConText operators, either by allowing the user to enter operators directly or by using a more sophisticated interface in which the user can choose operators from a pull-down menu or radio button. In either case, your application can refine the query expression further by adding operators or adding or removing special words or symbols to achieve different results.

See Also:

For more information about ConText operators, Chapter 3, "Understanding Query Expressions".

Presenting Expression Feedback

After the user enters the query, you can either present expression feedback or execute the query. See Figure 1-1.

Expression feedback allows the user to view how ConText executes the query. Feedback is useful for understanding how ConText expands theme queries as well as how it expands stem, fuzzy, thesaurus, soundex, or wildcard text queries. By providing this additional information, query expression feedback helps users refine queries that might return an unwanted result set.

If the user requires feedback, the application presents the expression feedback, and gives the user the option of re-entering a refined query. See Figure 1-1

Your application can also present expression feedback after executing the query when you present the hitlist. See Figure 1-1

See Also:

For more information about query expression feedback, see Chapter 5, "Query Expression Feedback".

Executing the Query

In a PL/SQL application, you can issue a two-step query or an in-memory query, depending on your requirements. You can also count the number of hits in a query.

A third type of query, the one-step query, is discussed in this section for completeness, even though one-step queries cannot be used in PL/SQL applications.

Two-step Queries

Two-step queries use the PL/SQL CONTAINS procedure in the first step to store the results in a specified result table. The second step uses a SELECT statement to select the results from the result table. In the SELECT statement, you can join the result table with the original text table to return more detailed document information.

Because two-step queries use tables to store the hits, they are best suited for applications that require all the results to a query.

See Also:

For more information about using two-step queries, see "Using Two-Step Queries" in Chapter 2.

In-memory Queries

In-memory queries use a cursor to return query results, rather than the result tables used in two-step and one-step queries.

In an in-memory query, you open a cursor and issue the query. ConText writes the results of the query to the cursor. You fetch the results one row at a time, then close the cursor. Results can be returned unordered or sorted by score.

Because in-memory queries store results in memory, they generally return hits faster than two-step queries for large hitlists, since you need not retrieve all hits at a time. As such, in-memory queries are best suited for applications that might return large hitlist but where only a small portion of hits are required at a time.

See Also:

For more information about using in-memory queries, see "Using In-Memory Queries" in Chapter 2.

One-step Queries

In a one-step query, you create a single SQL SELECT statement with a WHERE... CONTAINS clause to search for relevant documents. ConText returns the rows and columns of the text table that satisfy the query.

Because PL/SQL does not recognize the CONTAINS function in the SELECT statement, one-step queries are limited to interactive or ad-hoc queries in SQL*Plus.

See Also:

For more information about using one-step queries, see "Using One-Step Queries" in Chapter 2.

Counting Query Hits

In addition to fully executing two-step, one-step, and in-memory queries, you can count the number of hits in a two-step or in-memory query before or after you issue the query. Counting query hits helps to analyze queries to ensure large and unmanageable hitlists are not returned.

See Also:

For more information about counting query hits, see "Counting Query Hits" in Chapter 2.

Presenting the Hitlist

Your application presents a hitlist in one or more of the following ways:

show structured fields related to document, such as title or author
show documents ordered by score
show document hit count
show query expression feedback
show document Gist (English only)

Presenting Structured Fields

Structured columns related to the text column can help identify documents. When you present the hitlist, you can show related columns such as document titles or author or any other combination of fields that identify the document.

In a two-step query, you can obtain the structured fields by joining the result table with the base table.

In an in-memory query, you must specify what structured column or columns to fetch into the cursor along with the textkey.

In a one-step query, you specify the name of structured column or columns in the SELECT statement.

Presenting Score

When you issue either a text query or theme query, ConText returns the hitlist of documents that satisfy the query with a relevance score for each document returned. You can present these scores when you return the hitlist to the user.

The score for each document is between one and one hundred and indicates how relevant the document is to the query entered; the higher the score, the more relevant the document. You can use scores to order the hitlist to show the most relevant documents first.

In two-step queries, ConText calculates the score when you call the CTX_QUERY.CONTAINS procedure. This procedure stores the score in the result table.

In in-memory queries, ConText returns the score for a hit as an out parameter with the CTX_QUERY.FETCH_HIT function.

In one-step queries, ConText calculates scores when you use the CONTAINS function. You obtain scores using the SCORE function.

See Also:

For more information about manipulating a result set, see "Result-Set Operators" in Chapter 3.

For more information about how ConText scores text queries, Appendix B, "Scoring Algorithm".

For more information about scoring for theme queries, see "Theme Querying" in Chapter 4.

Presenting Document Hit Count

You present the number of hits the query returned alongside the hitlist, using CTX_QUERY.COUNT_LAST, which returns the number of hits in the last two-step or in-memory query.

However, when the number is all that is required, you can use CTX_QUERY.COUNT_HITS, which is more efficient than executing the two-step or in-memory query and then counting the hits.

Presenting Expression Feedback in Hitlist

You can accompany a query hitlist with expression feedback. Using feedback in this way gives the user an opportunity to see the expanded query alongside the results of the query.

When you present your hitlist with expression feedback, you can give the user the option of selecting a document, or of refining and then re-entering another query if the user is not satisfied with the results in the hitlist.

See Also:

For more information about query expression feedback, see Chapter 5, "Query Expression Feedback".

Presenting Gists (English only)

If presenting a hitlist is not enough information, you can present a Gist for every document in the hitlist. A Gist is essentially a document summary. However, the generation of a Gist requires an extra processing step and is available for English only.

See Also:

For more information about generating Gists and other CTX_LING output, see Chapter 8, "Using CTX_LING".

Presenting the Document

When your application obtains the results of a query, it can let the user select a document from the hitlist and then present one or more of the following ConText document services:

document with or without query terms highlighted (text and theme queries)
document Gist, theme summary, or list of themes (English only)

Presenting Highlighted Documents

Figure 1-2

ConText enables you to present documents to the user with query terms highlighted for text queries, or with the relevant paragraphs highlighted for theme queries. You can do highlighting in PL/SQL as well as with the ConText viewers for Windows 32-bit and world wide web applications.

HIghlighting in PL/SQL

With PL/SQL, you create the viewable output by calling the highlighting procedure, CTX_QUERY.HIGHLIGHT, usually after you issue the query. You can use this procedure to highlight documents stored as plain text or documents stored in formats such as Microsoft Word.

With the highlighting procedure, you can obtain the document plain-text, document plain-text with highlights, or the document in its native format without highlights. This procedure outputs to result tables, which you use to present the document.The highlighting procedure works for text and theme queries (See Figure 1-2).

See Also:

For more information about presenting highlighted documents, see Chapter 6, "Document Presentation: Highlighting".

Highlighting in ConText Viewers

Context provides a custom control that you can embed programmatically in 32-bit Windows client-side applications. This custom control allows users to query documents and then view them in their native formats (WYSIWYG), such as Microsoft Word, with query terms or paragraphs highlighted. See Figure 1-2

You can use the ConText custom control to view documents in the following server-side supported formats:

Microsoft Word for Windows 2, 6.x
WordPerfect for Windows 5.x, 6.x
WordPerfect for DOS 5.0, 5.1, 6.0

For world wide web applications that use the Oracle Web Application server, you can present documents in a Windows 32-bit environment using one of the following:

ConText viewer plug-in with the Netscape browser
ConText custom control with the Microsoft Internet Explorer.

Both these configurations require that the ConText viewer cartridge be installed on the Oracle Web Application Server.

See Also:

For more information about highlighting with ConText viewers, see the Oracle8 ConText Cartridge Workbench User's Guide.

Presenting CTX_LING Output (English Only)

Figure 1-3

For English-language documents, the CTX_LING PL/SQL package enables you to create different document summaries and list of themes, which you create on a per-document basis. These summaries and lists of themes are shorter than the documents themselves and can help application users quickly view the essential content of documents.

ConText can generate the following forms of CTX_LING output on a per document basis:

Output Type Description

List of Themes

A list of the main concepts of a document.

Gist

Paragraph or paragraphs in a document that best represent what the document is about as a whole. You can also generate Gists at the sentence level.

Theme Summary

Paragraph or paragraphs in a document that best represent a given theme in the document. You can also generate theme summaries at the sentence-level.

Output Type	Description
List of Themes	A list of the main concepts of a document.
Gist	Paragraph or paragraphs in a document that best represent what the document is about as a whole. You can also generate Gists at the sentence level.
Theme Summary	Paragraph or paragraphs in a document that best represent a given theme in the document. You can also generate theme summaries at the sentence-level.

You obtain linguistic output by submitting a linguistic request using the CTX_LING PL/SQL package.

See Also:

For more information about generating CTX_LING output, see Chapter 8, "Using CTX_LING".

1 Building a Query Application

Overview

Figure 1-1

Prerequisites

Loading Text

Creating an Index

Text Indexing Options

Table 1-1

Theme Indexing Options

Entering the Query

Text Queries

Theme Queries

Using Operators

Case-Sensitive Searching

Text Queries

Theme Queries

Document Section Searching

Structured Field Searching

Rewriting the Query Expression

Presenting Expression Feedback

Executing the Query

Two-step Queries

In-memory Queries

One-step Queries

Counting Query Hits

Presenting the Hitlist

Presenting Structured Fields

Presenting Score

Presenting Document Hit Count

Presenting Expression Feedback in Hitlist

Presenting Gists (English only)

Presenting the Document

Presenting Highlighted Documents

Figure 1-2

HIghlighting in PL/SQL

Highlighting in ConText Viewers

Presenting CTX_LING Output (English Only)

Figure 1-3

1
Building a Query Application