Oracle8
ConText Cartridge QuickStart
Release 2.4 A63819-01 |
|
This chapter provides a quick description of the tasks that
must be performed to enable theme queries for ConText, as well as to generate
linguistic output for use in an application. It also provides examples
of theme queries and queries using linguistic output.
Note: Theme queries and linguistic output are only available for English-language text. |
The following topics are covered in this chapter:
Note: Before you can perform the QuickStart tasks described in this chapter, ConText must be installed and certain implementation tasks must be completed. If the required installation and installation tasks have not been completed, see Chapter 4, "Implementing ConText". |
Perform the following tasks to set up a text column in a table, index the column, and perform theme queries on the column:
Perform the following tasks to generate linguistic output for the documents in the column and to query the generated output:
The first two setup tasks for theme queries and the Linguistics are:
Note: These tasks can be performed in any order. |
Similar to text indexing and queries, theme indexing is performed
by ConText servers with the DDL (D) personality and theme queries are processed
by ConText servers with the Query (Q) personality.
However, to enable a ConText server to generate linguistic
output through the Linguistics, the Linguistic (L) personality must be
specified for the server.
The following command starts a ConText server with the appropriate personalities for creating a theme index and performing theme queries:
$ ctxsrv -user ctxsys/ctxsys -personality DQ -log ctx.log &
The following command starts a ConText server with the appropriate personality for generating linguistic output:
$ ctxsrv -user ctxsys/ctxsys -personality L -log ctx.log &
See Also: |
The procedure for upgrading a column for theme indexing is
the same as the procedure for text indexing, except that the lexer used
in the policy for theme indexing is not the same as the lexer used in a
text indexing policy.
To create a theme indexing policy, the Theme Lexer is specified
for the policy, using the predefined Lexer preference, THEME_LEXER.
The following example illustrates using a PL/SQL block to create a column policy named ctx_thidx for theme indexing:
begin ctx_ddl.create_policy('ctx_thidx', 'docs.text' lexer_pref => 'THEME_LEXER'); end;
The only differences between this theme indexing policy and
the text indexing policy created in"Perform Hot
Upgrade of Columns" in Chapter 2 are the
policy names and the specification of THEME_LEXER as the Lexer preference
for the theme indexing policy.
See Also: |
Theme queries search the text column(s) of the queried table(s)
for specified themes and returns all rows (i.e. documents) that have the
specified themes. A theme represents a major topic or developed subject
in a document.
Before you can perform theme queries, you must perform the following tasks:
Once these tasks have been performed, you can perform theme
queries for your documents. Some of the issues related to theme queries
are discussed in "Theme Query Examples" in
this chapter.
See Also: |
To create a theme index for a column, call the CTX_DDL.CREATE_INDEX
stored procedure and specify the theme indexing policy for the column.
The following example creates a theme index for the text column (ctxdev.docs.text) in the ctx_thidx policy:
exec ctx_ddl.create_index('ctx_thidx')
After a theme index is created for a column, ConText servers
with the Query personality can process theme queries for the column.
The structure of the result table for a theme query and the
method for creating the table is identical to the result table used in
a text query. In fact, you can use the same result table or you can create
a separate result table for two-step theme queries.
Note: The two-step theme query example in "Theme Query Task Map" use a different result table than the result table used in the example for two-step text queries. |
The methods for performing theme queries are identical to
the three text query methods presented in "Text
Queries" in Chapter 2, with the exception
that theme queries are always case-sensitive and the scoring methods are
different.
"Theme Query Task Map" illustrates
how to perform theme queries using all three of the supported query methods.
In the examples provided, the theme Oracle is queried.
See Also: |
Unlike text queries, which can be case-sensitive or case-insensitive,
theme queries are always case-sensitive. As a result, theme queries for
places and names generally return different results than text queries for
the same words/phrases.
For example, a theme query for the term Oracle will
produce those documents in which ConText determined Oracle Corporation
(the canonical form of Oracle) to be a major theme in the document.
In contrast, a text query (in a case-insensitive index) for the term Oracle
will return all documents that contain occurrences of either Oracle
or oracle, regardless of how the term appears in the document.
Similar to text queries, the documents returned by theme
queries have a score. While scores in a theme query indicate the relevance
of the selected documents to the query, each score is based on the weight
of the queried theme in the document, rather than the number of occurrences
of the theme.
Theme weights are generated during theme indexing. Theme
weight measures the importance of a document theme relative to the other
themes in the document.
If a column has two or more indexes, which may be common
with text indexes and theme indexes, you must specify the name of the policy
for the appropriate index in the pol_hint argument for the CONTAINS
function of a one-step query.
For example, if the QuickStart tasks for both text and theme
queries have been performed, the column ctxdev.docs.text has both
a text indexing policy (ctx_docs) and a theme indexing policy (ctx_thidx)
and indexes for both.
The one-step theme query example in "Theme
Query Task Map" identifies ctx_thidx as the index to be searched.
The setup tasks required for using the Linguistics to generate output for a document are:
Once these tasks have been performed, you can query the output
tables for themes and Gists. "Example Queries for
Linguistic Output" in this chapter provides examples of the types of
queries you can perform.
The Linguistics generate three types of output for a document:
The output is stored in tables specified by the user when requesting the Linguistics. The linguistic output tables can have any name; however, they must have the following structure:
create table ctx_themes (cid number, pk varchar2(64), theme varchar2(256), weight number); create table ctx_gist (cid number, pk varchar2(64),pov varchar2(256), gist long);
In these examples, two tables (ctx_themes and ctx_gist)
are created for storing linguistic output. The pk column in each
table stores the primary keys (textkeys) for each document. The cid
column in each table stores policy IDs. The pov (point-of-view)
column in the ctx_gists table stores the theme for each theme summary.
See
Also:
Oracle8 SQL Reference, Oracle8 ConText Cartridge Application Developer's Guide |
To request a list of themes for a document, call the REQUEST_THEMES
in the CTX_LING PL/SQL package. To request a document Gist and/or theme
summaries for the document themes, call the REQUEST_GIST procedure in CTX_LING.
Then call the SUBMIT function in CTX_LING to submit the request(s)
to the Services Queue and return a handle for each submitted request. If
SUBMIT is called after REQUEST_THEMES and REQUEST_GIST are both called,
the requests are submitted as a single batch request and a single handle
is returned.
The requests in the Services Queue are picked up and processed
by the first available ConText servers with the Linguistic personality.
After linguistic output is generated, you can query for lists of themes
in documents and use Gists/theme summaries to view summarized versions
of your documents.
In the following example, themes and paragraph-level Gists/theme
summaries are requested for document 1 (pk). The document is stored
in ctxdev.docs.text for the ctx_thidx theme indexing policy.
The ctx_themes and ctx_gist tables are specified as the output
tables.
Then, the SUBMIT function is called for both requests.
exec ctx_ling.request_themes('ctx_thidx',1,'ctx_themes') exec ctx_ling.request_gist('ctx_thidx',1,'ctx_gist') variable handle number exec :handle := ctx_ling.submit print handle
See Also: |
Theme and Gist information is stored in the linguistic output
tables and can be used to present a specialized view of a document.
For example, you may want to display the themes for all the
documents in a column or for a single document. You may also want to display
the theme summary for all documents with a specific theme or the Gist for
a specific document.
The following example returns the themes (theme) and theme weight (weight), as well as the title (title) for each document in docs that has Smith as the author:
select theme, weight, title from ctx_themes, docs, where docs.author='Smith' and ctx_themes.pk=docs.pk order by weight desc;
The following example returns the title (title) and theme summary (gist) for each document that has computer software industry as one of its themes (pov):
select title, gist from ctx_gist, docs where ctx_gist.pk=docs.pk and pov = 'computer software industry';
See Also: |