Oracle® Ultra Search User's Guide 10g Release 1 (10.1) Part Number B10731-01 |
|
|
View PDF |
This section describes Oracle Ultra Search new features, with pointers to additional information. It also explains the Oracle Ultra Search release history.
Oracle Ultra Search provides secure crawling with the following types of authentication:
Oracle Ultra Search supports HTTP digest authentication, and the Oracle Ultra Search crawler can authenticate itself to Web servers employing HTTP digest authentication scheme. This is based on a simple challenge-response paradigm; however, the password is encrypted.
HTML form-based authentication is the most commonly used authentication scheme on the Web. Oracle Ultra Search lets you register HTML forms that you want the Oracle Ultra Search crawler to automatically fill out during Web crawling. HTML form authentication requires that HTTP cookie functionality is enabled.
The crawler can be configured to not index Web pages that are dynamically generated (for example, if a URL contains a question mark).
Oracle Ultra Search now supports HTTPS (HTTP over SSL). The Oracle Ultra Search crawler can now crawl HTTPS URLs (for example, https
://www
.foo
.com
).
Oracle Ultra Search now supports secure searches. Secure searches return only documents that the search user is allowed to view.
Each indexed document can be protected by an access control list (ACL). During searches, the ACL is evaluated. If the user performing the search has permission to read the protected document, then the document is returned by the query API. Otherwise, it is not returned.
Oracle Ultra Search stores ACLs in the Oracle XML DB repository. Oracle Ultra Search also uses Oracle XML DB functionality to evaluate ACLs.
It is now possible to use the remote crawler without mounting the remote cache directory to the server machine. Instead, the cache files are sent over the crawler's JDBC connection to the server cache directory.
A schedule can be created with no scheduled launch time, so that it can only be started on demand.
For each data source, the crawler will preserve the latest 3 log files. This avoids wiping out previous crawling log file on recrawl.
Oracle Ultra Search now includes APIs for various administration tasks, such as crawler, schedule, and instance administration.
Oracle Internet Directory (OID) is Oracle's native LDAP v3-compliant directory service, built as an application on top of the Oracle Database. Oracle Ultra Search integrates with Oracle Internet Directory in the following areas:
Oracle Ultra Search administration groups and group membership are stored in Oracle Internet Directory.
Users are authenticated through the single sign-on (SSO) server and Oracle Internet Directory.
Oracle Internet Directory performs authorization on Oracle Ultra Search users' administration privileges.
Cookie support is enabled by default.
During crawling, documents are stored in the cache directory. Every time the preset size is reached, crawling stops and indexing starts. In previous releases, the cache file was always deleted when indexing was done. You can now specify not to delete the cache file when indexing is done. This option applies to all data sources. The default is to delete the cache file after indexing.
You can set URL boundary rules to refine the crawling space. You can now include or exclude Web sites with a specific port. For example, you can include www.oracle.com but not www.oracle.com:8080. By default, all ports are crawled.
In previous releases, you could only specify suffix inclusion rules. For example, crawl only URLs ending with "oracle.com." You can now also specify prefix rules. For example, crawl "oracle.com" but not "stores.oracle.com".
Oracle Ultra Search automatically creates a default Oracle Ultra Search instance based on the default Oracle Ultra Search test user. So, you can test Oracle Ultra Search functionality based on the default instance after installation.
You can use Enterprise Manager's Grid Control to monitor Oracle Ultra Search components. Using Grid Control, you can set up notification rules to send out email notification automatically whenever a schedule status reaches certain severity states. For more information on the using Grid Control to monitor Oracle Ultra Search components, see the Oracle Enterprise Manager Concepts guide.
You can update the recrawl policy to process documents that have changed or to process all documents.
In previous releases, "process all documents" did not help when the crawling scope had been narrowed. For example, if crawling depth was reduced from seven to five, the PDF mimetype was deleted, or a host inclusion rule was removed, then you had to remove the affected documents manually in a SQL*Plus session.
With this release, all crawled URLs are subject to crawler setting enforcement, not just newly crawled URLs.
Traditionally, Oracle Ultra Search used centralized search to gather data on a regular basis and update one index that cataloged all searchable data. This provided fast searching, but it required that the data source to be crawlable before it could be searched. Oracle Ultra Search now also provides federated search, which allows multiple indexes to perform a single search. Each index can be maintained separately. By querying the data source at search-time, search results are always the latest results. User credentials can be passed to the data source and authenticated by the data source itself. Queries can be processed efficiently using the data's native format.
To use federated search, you must deploy an Oracle Ultra Search search adapter, or searchlet, and create an Oracle Database source. A searchlet is a Java module deployed in the middle tier (inside OC4J) that searches the data in an enterprise information system on behalf of a user. When a user's query is delegated to the searchlet, the searchlet runs the query on behalf of the user. Every searchlet is a JCA 1.0 compliant resource adapter.
Oracle Ultra Search is released with the Oracle Database, Oracle Application Server, and Oracle Collaboration Suite. Because of different release numbers in the past, the Oracle Ultra Search release numbers are somewhat confusing.
Oracle Ultra Search 9.0.4 is part of Oracle Application Server release 10g (9.0.4).
Oracle Ultra Search release 9.0.3 is part of the Oracle Collaboration Suite release 9.0.3.
Oracle Ultra Search release 9.2 is part of Oracle9i release 9.2. Oracle Ultra Search release 1.0.3 was part of Oracle9i release 1 (9.0.1).
Oracle Ultra Search release 9.0.2 is part of Oracle9iAS release 2 (9.0.2).