Oracle8
ConText Cartridge Application Developer's Guide
Release 2.4 A63821-01 |
|
This chapter explains how to use the CTX_LING PL/SQL package
in ConText to generate the different types of theme output for English
text. It also provides some tips and suggestions for using the output to
enhance query applications.
The topics covered in this chapter are:
As shown in Figure 8-1, CTX_LING
output consists of lists of themes, theme summaries, and Gists. ConText
stores the output in either the theme or Gist table. The following table
describes the different output as well as how to generate each type:
Output Type | Description | How to Generate |
---|---|---|
List of Themes |
The main concepts of a document. You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes. |
Call CTX_LING.REQUEST_THEMES with document textkey and a policy. Use CTX_LING.SET_FULL_THEMES to enable hierarchical list of themes. |
Gist |
Text in a document that best represents what the document is about as a whole. You can generate either paragraph or sentence level Gists. |
Call CTX_LING.REQUEST_GIST with document textkey and a policy. Specify GENERIC for the pov parameter and specify either PARAGRAPH or SENTENCE for the glevel parameter. |
Theme Summary |
Text in a document that best represent a given theme in the document. You can generate either paragraph or sentence level theme summaries. |
Call CTX_LING.REQUEST_GIST with document textkey and a policy. Specify the required document theme with the pov parameter and specify either PARAGRAPH or SENTENCE for the glevel parameter |
In a query application, you can use CTX_LING output as an
alternative to presenting the entire text of a document. For example, you
can present some form of CTX_LING output next to each title when you present
the hitlist to the user.
Likewise, after the user selects a document from the hitlist,
you can also give the user the option of viewing the Gist of a document
in addition to or as an alternative to viewing the entire text of a document.
You can use linguistic settings to enable case-conversion for all-uppercase or all-lowercase text, or to change the default size of Gists and theme summaries.
See
Also:
For more information about linguistic settings, see "Enabling Linguistic Settings" in this chapter. |
You obtain CTX_LING output (list-of-themes, theme summaries,
and gists) by submitting a request using procedures in the CTX_LING PL/SQL
package. Table 8-1 describes which procedures
to use.
To generate CTX_LING output, the documents must be stored in a column (either directly or indirectly through a pathname to files), and a policy must be attached to the column.
Note: The setup requirements of having text in a column and having a policy for the column apply to ConText indexes (text/theme) as well as ConText linguistics. The procedures for storing text and creating policies are not discussed in this manual. For more information about storing text in columns and creating policies for the columns, see Oracle8 ConText Cartridge Administrator's Guide. |
Requests for CTX_LING output can only be processed by ConText
servers running with the Linguistic personality. A ConText server with
the Linguistic personality can also have other personalities in its personality
mask. Starting up ConText servers is the task of the ConText administrator,
through the CTXSYS Oracle user.
See
Also:
For more information about the Linguistic personality and starting ConText servers, see Oracle8 ConText Cartridge Administrator's Guide. |
The Services Queue is used for managing requests for CTX_LING
output. Such a request is cached in memory until the requestor uses the
CTX_LING.SUBMIT procedure to add the request to the Services Queue. If
more than one request for a single document is cached in memory when the
user submits the requests, ConText stores all of the requests as a single
batch request in the queue.
ConText servers with the Linguistic personality monitor the
Service Queue for requests and process the next request in the queue.
See
Also:
For more information about the Services Queue, see Oracle8 ConText Cartridge Administrator's Guide. |
A list of themes is a list of the main ideas of a document.
With each theme, ConText returns a weight that measures the strength of
the theme relative to the other themes in the document.
You can use a list of themes in a query application as an
alternative to presenting the entire text of a document after a query.
When used with theme queries, a presentation of a list of themes for a
returned document can also help the user select other documents with the
same theme.
You generate a list of themes on a per document basis. To generate a list of themes, use CTX_LING.REQUEST_THEMES. You can generate a list of themes in two ways:
You can generate up to fifty themes for each document, using
the CTX_LING.REQUEST_THEMES procedure. This
procedure writes a single word or phrase that represents the theme to a
row in the theme table. The words or phrases that represent the themes
are normalized themes derived from the knowledge catalog.
You can also generate each document theme (up to 50) accompanied
by the hierarchical list of parent themes as defined in the knowledge catalog.
A theme is related to its parent theme usually by an "is-associated-with"
or "is-a-part-of" relationship. For example, a theme of insects
belongs to the hierarchical list of parent themes defined as zoology,
biology, hard sciences and science and technology.
To enable hierarchical list of themes output, you must use
CTX_LING.SET_FULL_THEMES before you call CTX_LING.REQUEST_THEMES.
Generating theme hierarchical information in the theme table
helps to match themes with theme summaries generated with CTX_LING.REQUEST_GIST.
See
Also:
For more information about generating themes, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. |
A theme summary for a document provides a short summary of
the document from a specific point-of-view. You can use theme summaries
to present the relevant text (paragraph or sentence) of documents selected
by a theme query.
Because a theme summary provides a concise, focused summary
for a particular theme in a document, users of a query application can
use a theme summary to compare documents with similar themes.
You can generate two types of theme summaries:
A paragraph-level theme summary consists of the paragraph
or paragraphs that best represent a single document theme. A sentence-level
theme summary consists of the sentence or sentences that best match a single
document theme.
To create either paragraph-level or sentence-level theme
summaries, use CTX_LING.REQUEST_GIST.
You can control the size of theme summaries with linguistic
settings.
Note: The size settings for theme summaries can only be modified by creating custom setting labels in the administration tool. |
See
Also:
For more information about how to generate theme summaries, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. For more information on specifying linguistic settings, see "Enabling Linguistic Settings" in this chapter. |
A Gist for a document provides a summary that reflects all
of the themes in the document. In a query application, you can use a Gist
to give the user a overall summary of a document returned in a hitlist.
You can generate two types of Gists:
A paragraph-level Gist consists of the document paragraphs
that best represent the themes in a document as a whole. A sentence-level
Gist is the sentence or sentences that best represent the themes in a document
as a whole.
To generate either a paragraph-level or sentence-level Gist,
use CTX_LING.REQUEST_GIST.
Note: The settings for Gist can only be modified by creating custom setting configurations in the GUI administration tool. |
See
Also:
For more information about how to generate Gists, see "Generating Lists of Themes, Theme Summaries, and Gists" in this chapter. For more information on specifying linguistic settings, see "Enabling Linguistic Settings" in this chapter. |
You can present CTX_LING output (lists of themes, theme summaries, and Gists) as an alternative to presenting entire documents to users after a query. To generate theme and Gist information, follow these steps:
Note: For ConText to generate CTX_LING output, at least one server must be running with the Linguistic (L) personality. For more information about ConText Servers, see Oracle8 ConText Cartridge Administrator's Guide. |
To create a theme table called CTX_THEMES to store the list of themes from REQUEST_THEMES, issue the following SQL statement:
create table ctx_themes ( cid number, pk varchar2(64), theme varchar2(2000), weight number);
To create a Gist table called CTX_GIST to store the Gist or theme summaries from REQUEST_GIST, issue the following SQL statement:
create table ctx_gist ( cid number, pk varchar2(64), pov varchar2(80), gist long);
See
Also:
For more information about the structure of CTX_LING output tables, see "CTX_LING Output Table Structures" in Appendix A, "Result Tables". |
To create a theme table whose textkey has two columns, issue the following SQL statement:
create table ctx_themes cid number, pk1 varchar2(64), pk2 varchar2(64), theme varchar2(2000), weight number);
To create a Gist table whose textkey has two columns, issue the following SQL statement:
create table ctx_gist ( cid number, pk1 varchar2(64), pk2 varchar2(64), pov varchar2(80), gist long);
See
Also:
For more information about the structure of CTX_LING output tables, see "CTX_LING Output Table Structures" in Appendix A, "Result Tables". |
Table 8-2 describes the different types of CTX_LING output and how to generate each type.
Output Type | Description | How to Generate |
---|---|---|
List of Themes |
The main concepts of a document. You can generate list of themes where each theme is a single word or phrase or where each theme is a hierarchical list of parent themes. |
Call CTX_LING.REQUEST_THEMES with document id. Use CTX_LING.SET_FULL_THEMES to enable hierarchical list of themes.
|
Gist |
Text in a document that best represents what the document is about as a whole. You can generate either paragraph or sentence level Gists. |
Call CTX_LING.REQUEST_GIST. Specify GENERIC for the pov parameter and specify either paragraph or sentence for the glevel parameter. |
Theme Summary |
Text in a document that best represent a given theme in the document. You can generate either paragraph or sentence level theme summaries. |
Call CTX_LING.REQUEST_GIST. Specify the required document theme with the pov parameter and specify either paragraph or sentence for the glevel parameter. |
To generate CTX_LING output for a document in a text column,
you first call CTX_LING.REQUEST_GISTor CTX_LING.REQUEST_THEMES
as described in Table 8-2, then call CTX_LING.SUBMIT
to enter these requests in the services queue as a single transaction for
that particular document.
Note: A policy must be defined for a column before you can generate CTX_LING output for the documents in the column. |
The following example shows how to generate a list of themes and a paragraph-level theme summary. It assumes the tables ctx_themes and ctx_gist have already been created:
declare handle number; begin ctx_ling.request_themes('CTXSYS.DOC_POLICY','7039','CTXSYS.CTX_THEMES'); ctx_ling.request_gist('CTXSYS.DOC_POLICY','7039','CTXSYS.CTX_GIST', 'PARAGRAPH', 'Oracle Corporation'); handle := ctx_ling.submit; end;
The first call requests a list of themes from document 7039,
stored in a column identified by the DOC_POLICY policy. The second call
requests a paragraph-level theme summary for Oracle Corporation from
the same document. The list of themes and theme summary that ConText generates
is stored in the CTX_LING output tables (ctx_themes and ctx_gists),
which were created beforehand.
The call to CTX_LING.SUBMIT submits the requests as one batch request to the services queue and returns a handle which can be used to monitor the status of the request. Because the two requests are submitted as one batch request, ConText generates the theme and Gist output in only one linguistic processing cycle.
See
Also:
For more examples on generating Gists and theme summaries, refer to CTX_LING.REQUEST_GIST in Chapter 10. For more examples on generating lists of themes, refer to CTX_LING.REQUEST_THEMES in Chapter 10. |
By default, ConText generates single themes when you request
a list of themes with CTX_LING.REQUEST_THEMES.
To generate the hierarchical theme information, you must set the full themes
flag to TRUE with CTX_LING.SET_FULL_THEMES.
A hierarchical list-of-themes contains single themes accompanied by its
parent themes as defined in the knowledge catalog. A theme is related to
its parent theme usually by an "is-a-part-of" relationship.
Generating theme hierarchical information helps to match
themes with the theme summaries generated with CTX_LING.REQUEST_GIST.
The following examples illustrates the difference between
single theme output and hierarchical theme output.
The following SQL statements generate and output single theme information for a document identified by pk:
SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes') SQL> exec ctx_ling.submit(200) SQL> select theme from ctx_themes; THEME ------------------------------------------------------------------------------- NASDAQ - National Association of Securities Dealers Automated Quotation System stocks indexes weakness composites prices franchises shares cellularity declining issues measures analysts OTC purchases Wall Street lows 16 rows selected.
However, when you set the full themes flag to TRUE, ConText generates theme hierarchical information:
SQL> exec ctx_ling.set_full_themes(TRUE) SQL> exec ctx_ling.request_themes('ctx_thidx', pk, 'ctx_themes') SQL> exec ctx_ling.submit(200) SQL> select theme from ctx_themes THEME ------------------------------------------------------------------------------- :stock market:NASDAQ - National Association of Securities Dealers Automated Quotation System: :stock market:stocks: :catalogs, itemization:indexes: :weakness, fatigue:weakness: :combination, mixture:composites: :retail trade industry:prices: :business fundamentals:franchises: :possession, ownership:shares: :cellularity: :stock market:declining issues: :analysis, evaluation:measures: :analysis, evaluation:analysts: :OTC: :general commerce:purchases: :general investment:Wall Street: :bottoms, undersides:lows:
Generating a list of themes is a good way of extending theme
or text queries. For a document in a query hitlist, the user can learn
more about the document by reading a list-of themes or Gist.
For example, suppose a theme query on music returns
a hitlist containing 20 documents. If these documents are lengthy, the
user might not want to read every single document to find out what each
is about. Rather than return to the user the document text, you can return
a list of themes or a Gist for each document for the user to skim.
Generally, you can generate CTX_LING output for a document set at two different times:
You can generate CTX_LING output at indexing time; that is,
generate output before queries are issued against the document set. When
you do so, the CTX_LING output is returned to the user immediately, since
the output was already created.
However, while the retrieval time for the CTX_LING output
is good, the drawback to this method is that you have to maintain a permanent
theme or Gist output table, using your own triggers to keep it updated.
A permanent output table for an entire document set also takes up system
disk space.
You could also generate CTX_LING output after executing a
query. The advantage of generating themes as needed is that the output
table lasts only for the user session; you need not maintain a permanent
CTX_LING output table for all your documents.
However, generating CTX_LING output takes time depending
on the number of documents, the length of the documents, and how your linguistic
servers are configured. A user might not want to wait a few minutes for
a ConText query application to process a large number of documents.
The example below shows how to generate CTX_LING output after
a theme query.
The following PL/SQL code illustrates how to generate a list of themes for every document in a hitlist table returned from a theme query on birds. (You can use the same method to loop through any text table, once the text column table has a policy attached to it.)
create or replace procedure get_theme IS handle number; cursor ctx_cur is select textkey from ctx_temp; BEGIN ctx_query.contains('DOWTHEME', 'birds', 'ctx_temp'); for ctx_cur_rec in ctx_cur loop ctx_ling.request_themes('DOWPOLICY' , ctx_cur_rec.textkey, \ 'ctx_themes'); handle:= ctx_ling.submit; end loop; END; /
This routine first declares a cursor that selects the rows
from the ctx_temp result table, to be populated with a theme query
on birds.
The cursor FOR loop opens the cursor, executing the select
statement that copies all textkeys in the ctx_temp table to the
cursor. The loop index ctx_cur_rec is implicitly defined as a cursor
record of type%ROWTYPE.
Every iteration of the loop calls the CTX_LING.REQUEST_THEMES
procedure with the document textkey derived from ctx_cur_rec. Each
request is submitted to the services queue with CTX_LING.SUBMIT,
which returns a handle.
The theme output is written to the ctx_themes table.
The default linguistic setting of GENERIC is active whenever
you initialize linguistics to create theme indexes, theme highlighting
or to generate CTX_LING output.
You can enable a linguistic setting other than the default
(GENERIC) when you want to process all lower-case or all upper-case text,
or when you want to change the sizes of Gists and theme summaries. When
you enable a linguistic setting for a session, the setting applies only
to that session.
The settings for case-conversion (GENERIC or SA) are pre-defined.
However, to change the size of Gists and theme summaries, you must create
a custom setting with the administration tool.
To enable either a case-conversion setting or a custom setting created with the administration tool, use the CTX_LING.SET_SETTINGS_LABEL procedure with a setting label. For example, to process all-uppercase or all-lowercase text for your current session:
execute ctx_ling.set_settings_label('SA')
The specified setting configuration is active for your session
until SET_SETTINGS_LABEL is called with a new setting configuration label.
You can use the CTX_LING.GET_SETTINGS_LABEL function to return
the label for the active setting configuration for the current session.
See
Also:
For more information about creating custom settings, refer to the online help system for the administration tool. |
When you submit a request to the services queue with CTX_LING.SUBMIT, a handle is returned. With this handle, you can use procedures in the CTX_SVC package to perform the following tasks:
To monitor the status of requests in the Services Queue,
use the CTX_SVC.REQUEST_STATUS function. This
function returns one of the following statuses:
For example, the following PL/SQL procedure submits a request to generate themes and gist for a document with an id of 49. It then checks the status of the request.
CREATE OR REPLACE PROCEDURE GENERATE_THEMES AS v_Handle number; v_Status varchar2(10); v_Time date; v_Errors varchar2(60); BEGIN DBMS_OUTPUT.PUT_LINE('Begin generate_themes procedure' ); ctx_ling.request_themes('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_themes' ); ctx_ling.request_gist('CTXDEMO.DEMO_POLICY', '49', 'CTXDEMO.ctx_gist' ); v_Handle := ctx_ling.submit; DBMS_OUTPUT.PUT_LINE( v_Handle ); v_Status := ctx_svc.request_status( v_Handle, v_Time, v_ErrorS ); DBMS_OUTPUT.PUT_LINE( v_Status ); DBMS_OUTPUT.PUT_LINE( v_Time ); DBMS_OUTPUT.PUT_LINE( substr( v_Errors, 1, 20 ) ); EXCEPTION WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE(' Exception handling' ); END GENERATE_THEMES; /
This procedure binds the return value of REQUEST_STATUS to
v_Status for the linguistic request identified by v_Handle.
The value for v_Handle is returned by the call to CTX_LING.SUBMIT
which placed the requests for the themes and gists in the Services Queue.
To remove requests with a status of PENDING from the Services
Queue, use the CTX_SVC.CANCEL procedure.
For example:
execute ctx_svc.cancel(3321)
In this example, a pending request with handle 3321 is removed
from the Services Queue.
If a request has a status of RUNNING, ERROR, or SUCCESS,
it cannot be removed from the Services Queue.
To remove requests with a status of ERROR from the Services
Queue, use the CTX_SVC.CLEAR_ERROR procedure.
For example:
execute ctx_svc.clear_error(3321)
In this example, a request with handle 3321 is removed from
the Services Queue.
If a value of 0 (zero) is specified for the handle, all requests
with a status of ERROR are removed from the queue. If a request has a status
of PENDING, RUNNING, or SUCCESS, it cannot be removed from the queue using
CLEAR_ERROR.
To specify a procedure to be called when a linguistic request
completes or errors, use the SET_COMPLETION_CALLBACK
and SET_ERROR_CALLBACK procedures in CTX_LING.
ConText invokes the procedure defined by SET_COMPLETION_CALLBACK after
it processes a linguistic request; ConText invokes the procedure defined
by SET_ERROR_CALLBACK when it encounters an error.
The following is an example of how to define and use a completion
callback procedure. This example is taken from genling.sql in the ctxling
demonstration provided with the ConText installation.
For every linguistic request processed, ling_comp_callback keeps track of the number articles processed by decrementing num_docs, previously defined as the number of articles in the table. The procedure also keeps track of any errors by incrementing num_errors.
create or replace procedure LING_COMP_CALLBACK p_handle in number, p_status in varchar2, p_errors in varchar2 ) IS l_total number; l_pk varchar2(64); BEGIN -- decrement the count in the tracking table update ling_tracking set num_docs = num_docs - 1; -- if the request errored, mark the errors in the pending table IF (p_status = 'ERROR') then update ling_tracking set num_errros = num_errors + 1; end IF; commit; END; /
The following code is an anonymous PL/SQL block that sets the linguistic completion callback procedure to ling_comp_callback and then generates CTX_LING output for every document in the articles table:
declare cursor c1 is select article_id from articles; l_handle number; begin -- set the completion callback procdure to keep the pending table -- in sync with the number of documents processed (completed requests) -- and the number of errored requests. ctx_ling.set_completion_callback('LING_COMP_CALLBACK'); end; -- loop through all articles in the article table, requesting themes -- and gists -- for crec in c1 loop ctx_ling.request_themes('DEMO_POLICY', crec.article_id, 'ARTICLE_THEMES'); ctx_ling.request_gist('DEMO_POLICY', crec.article_id, 'ARTICLE_GISTS'); l_handle := ctx_ling.submit; end loop; end;
At start-up of a ConText server, the logging of linguistic
parse information is disabled by default.
To enable logging of the parse information generated by ConText
linguistics during a session, use the CTX_LING.SET_LOG_PARSE
procedure.
For example:
execute ctx_ling.set_log_parse('TRUE')
Once you enable parse logging for a session, it is active
until you explicitly disable it during the session. You can use the CTX_LING.GET_LOG_PARSE
function to know whether parse logging is enabled or disabled for the session.