Oracle8
ConText Cartridge Application Developer's Guide
Release 2.4 A63821-01 |
|
This chapter describes how ConText query applications can
present documents with highlighted information.
The following topics are covered in this chapter:
In a typical query application, users can issue text or theme
queries. The application executes the query and returns to the user a hitlist,
allowing the user to select one or more documents.
When the user chooses a document, ConText enables you to
present the selected document with the query terms highlighted for text
queries, or with the relevant paragraphs highlighted for theme queries.
Your application can also present linguistic summaries of
the selected documents.
See
Also:
For more information about linguistic output, see Chapter 7, "ConText Linguistics". |
When developing applications in PL/SQL, you use the CTX_QUERY.HIGHLIGHT
procedure to create various forms of highlighted documents that can be
presented to users. The source documents can be stored as plain text or
in any of the formats ConText supports for text indexing.
For world wide web applications, you can use the ConText
viewers to present highlighted documents.
See
Also:
For more information about highlighting with ConText viewers, see the Oracle8 ConText Cartridge Workbench User's Guide. |
CTX_QUERY.HIGHLIGHT generates highlighting information for
text or theme queries. You typically call CTX_QUERY.HIGHLIGHT after executing
a text or theme query. With text queries, HIGHLIGHT marks the relevant
words or phrases in the document. With theme queries, HIGHLIGHT marks the
relevant paragraphs in the document.
Note: ConText does not do sentence-level theme highlighting. |
As illustrated in Figure 6-1,
CTX_QUERY.HIGHLIGHT can be used to generate the following output for a
document:
Note: The filter ConText uses to create the plain text in the PLAINTAB and MUTAB tables is the same filter ConText uses to index the document. For more information about supported formats, see Oracle8 ConText Cartridge Administrator's Guide. |
See
Also:
For more information about the structure of the highlight output tables, see "Highlight Table Structures" in Appendix A, "Result Tables". |
When you call CTX_QUERY.HIGHLIGHT,
you can specify the markup used to indicate the start and end of a highlighted
word or phrase for text queries, or the start and end of a highlighted
paragraph for theme queries.
When you specify no markup, HIGHLIGHT uses default markup.
The default highlighting mark-up produced by HIGHLIGHT differs depending
on the format of the source document.
If the source document is an ASCII document or a formatted
document, the default highlighting markup is three angle brackets immediately
to the left (<<<) and right (>>>) of each term.
If the source document is an HTML document filtered through
an external filter, the default highlighting markup is the same as the
highlighting markup for plain text or formatted documents (<<<
and >>>).
If the source document is an HTML document filtered through the internal HTML filter, the default highlighting markup is the HTML tags used to indicate the start and end of a font change:
See
Also:
For more information about internal and external filters, see Oracle8 ConText Cartridge Administrator's Guide. |
To present highlighted documents in an application, do the following:
The result tables required by the HIGHLIGHT procedure can
be allocated manually using the CREATE TABLE command in SQL or using the
CTX_QUERY.GETTAB procedure.
For example, to create a MUTAB table to store highlighted ascii mark-up, issue the following statement:
create table mu_ascii ( id number, document long );
To create a HIGHTAB table to store highlight offset information, issue the following statement:
create table highlight_ascii ( id number, offset number, length number, strength number );
See
Also:
For more information about the structure of the highlight output tables, see "Highlight Table Structures" in Appendix A, "Result Tables". |
Issue a one-step, two-step, or in-memory query to return
a hitlist of documents. You can issue either a text or theme query. For
text queries, you call CONTAINS with a text policy; for theme queries,
you call CONTAINS with a theme policy. The hitlist provides the textkeys
that are used to generate highlight and display output for specified documents
in the hitlist.
Call CTX_QUERY.HIGHLIGHT with a pointer to a document (generally
the textkey obtained from the hitlist) and a text or theme query expression.
CTX_QUERY. HIGHLIGHT returns various forms of the specified
document that can be further processed or displayed by the application.
ConText uses the query expression specified in the HIGHLIGHT
procedure to generate the highlight offset information and marked-up ASCII
text. In addition, the offset information is based on the ASCII text version
of the document.
If the query expression contains a result set operator (first/next,
max, threshold), the result set operator is ignored. ConText returns highlight
information for the entire result set.
See
Also:
For more information about the query expression in HIGHLIGHT, see the CTX_QUERY.HIGHLIGHT specification in Chapter 10. |
To create highlight mark-up for text queries, you must specify
a text policy, which is usually the policy you specify with the
CONTAINS procedure for the same query. With text queries, the HIGHLIGHT
procedure highlights the terms you specify in the query parameter.
For example, to highlight all the occurrences of the term dog with a document identified by textkey 14, issue the following statement:
ctx_query.highlight ( cspec=> 'text_policy', textkey => '14', query => 'dog', id=> 14, hightab => 'highlight_ascii', mutab => 'mu_ascii' );
To create highlight mark-up for a theme query, you must specify
a theme policy, which is usually the policy you specify with the
CONTAINS procedure for the same query. With theme queries, the HIGHLIGHT
procedure highlights the relevant paragraphs in the document.
For example, to highlight all the paragraphs that are relevant to the theme query computers for document with textkey 12, issue the following query:
ctx_query.highlight ( cspec=> 'theme_policy', textkey => '12', query => 'computers', id=> 12, hightab => 'highlight_ascii', mutab => 'mu_ascii' );
You can use the MUTAB table to view highlighted ascii text. For example in SQL*Plus, you can issue the following statement to view a MUTAB table called mu_ascii:
select * from mu_ascii order by id;
You can also use the offset information in the HIGHTAB table
to highlight the document in ways that suit your application.
With text queries, the word or phrase is highlighted. For example, a text query on dog might produce the following type of highlighted ascii output for a document:
... The quick brown <<dog>> jumped over the fox. ...
With theme queries, the relevant paragraphs in the document are highlighted. For example, a theme query of computers produces the following type of highlighted ascii output for a document:
<<< LAS VEGAS -- International Business Machines Corp. is using the huge computer trade show here this week to try to prove a much disputed marketing claim ofthe past year and a half: that its PS/2 line of personal computers really does offer unique benefits.>>> In the battle for the hearts and minds of the 100,000 dealers, corporate customers and other spectators gathered here, IBM has set up a series of demonstrations of the Micro Channel, which is the PS/2's internal data pathway. The demonstrations seek to show that this pathway has extra flexibility that can translate into more speed. One demonstration uses an add-in circuit board that IBM claims allows data to be sent over a network about 60% faster. Another illustrates a quicker way to store the huge amounts of data handled by a so-called file server, the machine that controls a network of personal computers. <<< While most personal computers contain just one "master" processor -- the chip that tells the various parts of the computer what to do -- the Micro Channel allows for more than one. That means that in Micro Channel machines, the workhorse central processor can dump lots of work onto another processor, freeing itself to go about other tasks.>>> ...
In this three paragraph excerpt of a news article that satisfies
the theme query computers, ConText highlights (with angle brackets)
only the paragraphs that are about computers.
After documents have been processed by the HIGHLIGHT procedure
and displayed to the user, drop the highlight result tables.
If the tables were allocated using CTX_QUERY.GETTAB,
you use CTX_QUERY.RELTAB to release the tables.
If the tables were created manually, drop the tables using
the SQL command DROP TABLE.