Skip Headers

PL/SQL Packages and Types Reference
10g Release 1 (10.1)

Part Number B10802-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

23
DBMS_DATA_MINING

Oracle Data Mining (ODM) is designed for programmers, systems analysts, project managers, and others interested in developing database applications that use data mining to discover hidden patterns and use that knowledge to make predictions.

This chapter contains the following topics:


Using DBMS_DATA_MINING


Overview

Oracle Data Mining (ODM) embeds data mining in the Oracle database. The data never leaves the database -- the data, its preparation, model building, and model scoring activities all remain in the database. This enables Oracle to provide an infrastructure for data analysts and application developers to integrate data mining seamlessly with database applications.

ODM provides two interfaces that support in-database data mining: a Java interface and a PL/SQL interface. The Java interface is described in Oracle Data Mining Application Developer's Guide.

You can use the package to build a mining model, test the model, and apply this model to your data to obtain predictive and descriptive information. For detailed examples of how to perform these tasks, see the sample programs in Oracle Data Mining Application Developer's Guide. The sample code is in the demo directory.

General information about both the Java and the PL/SQL interface is contained in Oracle Data Mining Application Developer's Guide. See Oracle Data Mining Concepts for a discussion of data mining concepts.

This chapter contains an overview of the development methodology, followed by information on data types, settings, and constants, as well as detailed descriptions of the PL/SQL subprograms. The DBMS_DATA_MINING_TRANSFORM package supports data pre-processing for data mining.


Constants

Table 23-1 through Table 23-8 list the constants to be used for various settings in the settings table.

Table 23-1  DBMS_DATA_MINING Constants Summary: Mining Function
Constant Purpose

association

Parameter value for mining function in CREATE_MODEL, representing association mining function.

attribute_importance

Parameter value for mining function in CREATE_MODEL, representing attribute importance mining function.

classification

Parameter value for mining_function in CREATE_MODEL, representing classification mining function.

regression

Parameter value for mining_function in CREATE_MODEL, representing regression mining function.

clustering

Parameter value for mining_function in CREATE_MODEL, representing clustering mining function.

feature_extraction

Parameter value for mining_function in CREATE_MODEL, representing feature extraction mining function.

Table 23-2  DBMS_DATA_MINING Constants Summary: Function Settings
Constant Purpose

clas_priors_table_name

Setting name representing prior probability table name for classification function.

clus_num_clusters

Setting name representing number of clusters for clustering function.

feat_num_features

Setting name representing number of features for feature selection function.

asso_max_rule_length

Setting name representing maximum rule length.

asso_min_confidence

Setting name representing minimum confidence.

asso_min_support

Setting name representing minimum support.

Table 23-3  DBMS_DATA_MINING Constants Summary: Algorithm Settings
Constant Purpose

algo_name

Setting name representing the mining algorithm.

algo_apriori_association_rules

Setting value for Apriori algorithm for association rules.

algo_naive_bayes

Setting value for Naive Bayes (NB) algorithm for classification.

algo_support_vector_machines

Setting value for Support Vector Machines (SVM) algorithm for classification or regression.

algo_nonnegative_matrix_factor

Setting value for Non-Negative Matrix Factorization (NMF) for feature selection.

algo_kmeans

Setting value for k-Means (KM) for clustering.

algo_ai_mdl

Setting value for Minimum Description Length based algorithm for Attribute Importance.

Table 23-4  DBMS_DATA_MINING Constants Summary: Adaptive Bayes Network
Constant Purpose

abns_model_type

Setting name representing ABN model type.

abns_single_feature

Setting value representing single feature ABN model.

abns_multi_feature

Setting value representing multi feature ABN model.

abns_naive_bayes

Setting value representing Naive Bayes ABN model.

abns_max_build_minutes

Setting name representing maximum time threshold to complete an ABN model build.

abns_max_nb_predictors

Setting name representing the maximum number of Naive Bayes predictors to be considered for building an ABN model of type abns_naive_bayes.

Table 23-5  DBMS_DATA_MINING Constants Summary: Naive Bayes
Constant Purpose

nabs_singleton_threshold

Setting value for singleton threshold for Naive Bayes.

nabs_pairwise_threshold

Setting value for pair-wise threshold for Naive Bayes.

Table 23-6  DBMS_DATA_MINING Constants Summary: Support Vector Machines
Constant Purpose

svms_kernel_function

Setting name representing the kernel function for SVM.

svms_linear

Setting value for Linear Kernel for SVM.

svms_guassian

Setting value for Gaussian Kernel for SVM.

svms_kernel_cache_size

Setting name representing for Kernel Cache Size for SVM.

svms_conv_tolerance

Setting name representing tolerance for SVM.

svms_std_dev

Setting name representing standard deviation for SVM.

svms_complexity_factor

Setting name representing complexity factor for SVM.

svms_epsilon

Setting name representing epsilon for SVM Regression.

Table 23-7  DBMS_DATA_MINING Constants Summary: Non-Negative Matrix Factorization
Constant Purpose

nmfs_num_iterations

Setting name representing number of iterations.

nmfs_conv_tolerance

Setting name representing convergence tolerance.

nmfs_random_seed

Setting name representing random seed for NMF.

Table 23-8  DBMS_DATA_MINING Constants Summary: k-Means
Constant Purpose

kmns_distance

Setting name representing distance function.

kmns_euclidean

Setting value representing Euclidean distance function.

kmns_cosine

Setting value representing cosine distance function.

kmns_fast_cosine

Setting value representing fast cosine distance function.

kmns_iterations

Setting name representing number of iterations.

kmns_conv_tolerance

Setting name representing convergence tolerance.

kmns_split_criterion

Setting name representing split criterion.

kmns_variance

Setting value representing variance as the split criterion.

kmns_size

Setting value representing size as the split criterion.

kmns_block_growth

Setting name representing growth factor for memory allocated to hold cluster data.

kmns_num_bins

Setting value for number of histogram bins.

kmns_min_pct_attr_support

Setting value for minimum percentage report required for attributes in rules.


Data Types

The DBMS_DATA_MINING and the DBMS_DATA_MINING_TRANSFORM packages use the data types shown in Table 23-9.

Table 23-9 DBMS_DATA_MINING Summary of Data Types
Data Type Purpose

DM_ABN_Detail

This type represents each row of the model detail output generated by GET_MODEL_DETAILS_ABN.

DM_ABN_Details

This type represents the ABN model details generated by GET_MODEL_DETAILS_ABN.

DM_Centroid

This type represents the centroid of a cluster. It is used when retrieving cluster details using GET_MODEL_DETAILS_KM.

DM_Child

This type represents each child node of a cluster.

DM_Children

This type represents a set of children nodes for a given cluster identifier.

DM_Cluster

This type represents a cluster retrieved using GET_MODEL_DETAILS_KM.

DM_Clusters

This type represents a set of clusters.

DM_Conditional

This type represents each conditional probability from a set of conditional probabilities associated with each mining attribute used in a Naive Bayes or Adaptive Bayes Network model.

DM_Conditionals

This type represents conditional probabilities associated with a given mining attribute used in a Naive Bayes or Adaptive Bayes Network model. It is used when retrieving model details using GET_MODEL_DETAILS_NB or GET_MODEL_DETAILS_ABN respectively.

DM_Histogram_Bin

This type represents a histogram associated with a cluster identifier. It is used when retrieving cluster details using GET_MODEL_DETAILS_KM.

DM_Histograms

This type represents a set of histograms.

DM_Item

This type represents an item in a set of items.

DM_Items

This type represents the set of items in an ItemSet.

DM_ItemSet

This type represents an ItemSet.

DM_ItemSets

This type represents frequent itemsets in Association models.

DM_Model_Settings

This type represents the algorithm settings retrieved using the GET_MODEL_SETTINGS function.

DM_Model_Signature

This type represents a list of model signature attributes generated by GET_MODEL_SIGNATURE.

DM_Modelname_List

This type represents a list of model names provided as input for the parameter model_names in EXPORT_MODEL and IMPORT_MODEL procedures.

DM_NB_Detail

his type represents the each row of the model detail output generated by GET_MODEL_DETAILS_NB.

DM_NB_Details

This type represents the NB model details generated by GET_MODEL_DETAILS_NB.

DM_Nested_Categoricals

This type represents a nested table of categorical attributes, used for representing wide data.

DM_Nested_Numericals

This type represents a nested table of numerical attributes, used for representing wide data.

DM_NMF_Attribute

This type represents each attribute in an attribute set for NMF model details.

DM_NMF_Attribute_Set

This type represents a set of attributes that correspond to a feature identifier, returned by GET_MODEL_DETAILS_NMF.

DM_NMF_Feature

This type represents a feature in a NMF model.

DM_NMF_Feature_Set

This type represents a set of features returned by GET_MODEL_DETAILS_NMF.

DM_Predicate

This type represents each predicate in the set of predicates in a rule.

DM_Predicates

This type represents a set of predicates that constitute either the antecedent or the consequent of a rule.

DM_Ranked_Attribute

This type represents an entry in the set of ranked attribute returned by GET_MODEL_DETAILS_AI, ranked by the attribute's importance.

DM_Ranked_Attributes

This type represents a list of ranked attributes returned by GET_MODEL_DETAILS_AI.

DM_Rule

This type represents each rule in a list of rules generated by either GET_ASSOCIATION_RULES or GET_MODEL_DETAILS_KM.

DM_Rules

This type represents rules retrieved for Association Rules or k-means models using GET_ASSOCIATION_RULES and GET_MODEL_DETAILS_KM respectively.

DM_SVM_Attribute

This type represents each attribute in an attribute set for SVM model details.

DM_SVM_Attribute_Set

This type represents a set of attributes returned by GET_MODEL_DETAILS_SVM for a linear model.

DM_SVM_Linear_Coeff

This type represents an entry in the set of linear coefficients returned by GET_MODEL_DETAILS_SVM.

DM_SVM_Linear_Coeff_Set

This type represents the set of linear coefficients returned by GET_MODEL_DETAILS_SVM for an SVM model built using the linear kernel.


Exceptions

Table 23-10 lists the errors generated by DBMS_DATA_MINING.

Table 23-10  DBMS DATA_MINING Errors Summary
Oracle Error Description

ORA-40201

Invalid input parameter %s.

ORA-40202

Column %s does not exist in the input table %s.

ORA-40203

Model %s does not exist.

ORA-40204

Model %s already exists.

ORA-40205

Invalid setting name %s.

ORA-40206

Invalid setting value for setting name %s.

ORA-40207

Duplicate or multiple function settings.

ORA-40208

Duplicate or multiple algorithm settings for function %s.

ORA-40209

Invalid setting: %s for function %s.

ORA-40211

Algorithm name: %s is invalid.

ORA-40212

Invalid target data type in input data for function: %.

ORA-40213

Contradictory values for settings: %s, %s.

ORA-40214

Duplicate setting: %s.

ORA-40215

Model %s is incompatible with current operation.

ORA-40216

Feature not supported.

ORA-40219

Apply result table %s is incompatible with current operation.

ORA-40220

Maximum number of attributes exceeded.

ORA-40221

Maximum target cardinality exceeded.

ORA-40222

Data mining model export failed, job name=%s, error=%s.

ORA-40223

Data mining model import failed, job name=%s, error=%s.

ORA-40225

Model is currently in use by another process.

ORA-40251

No support vectors were found.

ORA-40252

No target values were found.

ORA-40253

No target counter examples were found.

ORA-40261

Input data for model build contains negative values.

ORA-40262

NMF: number of features not between [1, %s].

ORA-40271

No statistically significant features were found.

ORA-40272

Apply rules prohibited for this model mode.

ORA-40273

Invalid model type %s for Adaptive Bayes Network algorithm.




User Views

Table 23-11 lists the user views provided by Oracle to obtain information about the models generated using DBMS_DATA_MINING

.
Table 23-11 DBMS_DATA_MINING Summary of User Views
User View Purpose

DM_USER_MODELS

Lists all models in a given user's schema.


Operational Notes

The development methodology for data mining using the DBMS_DATA_MINING interface is divided into two phases.

The first phase includes your application data analysis and design, where you perform the following two steps:

  1. Analyze your problem, and choose the mining function and algorithm.
  2. Analyze the data to be used for building mining models (build data), testing predictive models (test data), and the new data on which the model will be applied (scoring data).

The second phase involves developing a mining application using DBMS_DATA_MINING and DBMS_DATA_MINING_TRANSFORM packages.

  1. Prepare the build, test, and scoring data using the DBMS_DATA_MINING_TRANSFORM package or other third-party tool or direct SQL or PL/SQL utility scripts in a manner suitable for the chosen mining function and algorithm. An important caveat is that the three datasets referred to earlier have to be prepared in an identical manner for mining results to be meaningful. This is an optional step, required only if your data is not prepared for mining.
  2. Prepare a settings table that overrides the default mining algorithm for a given mining function, and the default algorithm settings. This is also an optional step.
  3. Build a mining model for the given training dataset.
  4. For predictive models (classification and regression), test the model for its accuracy and measures of performance. This amounts to applying the model on the test data (that is, scoring the test data), and computing various metrics on the apply results.
  5. Retrieve the model signature to determine the mining attributes that will be used by a given model for scoring. This information will help ascertain that the scoring data is suitable for a given model. This is an optional step.
  6. Apply a classification, regression, clustering, or feature extraction model to new data to generate predictions and/or descriptive summaries and patterns about the data.
  7. Retrieve the model details to understand why a model scored the data in a particular manner. This is an optional step.
  8. Repeat steps 3 through 9 until you obtain satisfactory results.

See Oracle Data Mining Application Developer's Guide for more general discussion of the PL/SQL interface to Oracle Data Mining.

Settings Table

The settings table is a simple relational table with a fixed schema. You can choose the name of the settings table, but the column names and their types must be defined as specified:

(setting_name  VARCHAR2(30),
setting_value  VARCHAR2(128))

The values provided in the settings table override the default values assumed by the system. The values inserted into the setting_name column are one or more of several constants defined in the DBMS_DATA_MINING package. Depending on what the setting name denotes, the value for the setting_value column can be a predefined constant or the actual numerical or string value corresponding to the setting itself. The setting_value column is defined to be VARCHAR2, so you must cast numerical inputs to string using the TO_CHAR function before input into the settings table.

Table 23-12 through Table 23-17 list the various setting names and the valid setting values, with a brief explanation of each setting

Table 23-12 DBMS_DATA_MINING Function Settings
Algorithm Settings Setting Value (with Permissible Value Ranges)

algo_name

Classification: One of:

algo_naive_bayes
algo_support_vector_machines
algo_adaptive_bayes_network

Regression:

algo_support_vector_machines

Association Rules:

algo_apriori_association_rules

Clustering:

algo_kmeans

Feature Extraction:

algo_non_negative_matrix_factor

Attribute Importance:

algo_ai_mdl

clas_priors_table_name

VARCHAR2 string denoting the name of a relational table of fixed schema containing prior probabilities. The schema of this table is provided in.

This input is applicable only for classification algorithms. The prior probabilities table must be present in the current user's schema.

clus_num_clusters

TO_CHAR(numeric_expr >= 1)

Number of clusters generated by a clustering algorithm.

Default value is 10.

feat_num_features

TO_CHAR(numeric_expr >= 1)

Number of features to be extracted.

Default value estimated from the data by the algorithm.

asso_max_rule_length

TO_CHAR(2 <= numeric_expr <= 20)

Maximum rule length for AR algorithm.

Default value is 4.

asso_min_confidence

TO_CHAR(0 <= numeric_expr <= 1)

Minimum confidence value for AR algorithm.

Default value is 0.1.

asso_min_support

TO_CHAR(0 <= numeric_expr <= 1)

Minimum support value for AR algorithm.

Default value is 0.1.

.
Table 23-13  Algorithm Settings for Adaptive Bayes Network
Setting Name Setting Value (with Permissible Value Ranges)

abns_model_type

Model type for Adaptive Bayes Network:

  • abns_single_feature
  • abns_multi_feature
  • abns_naive_bayes

Default value is abns_multi_feature.

abns_max_build_minutes

TO_CHAR(numeric_expr >= 0)

The maximum time threshold for completion of model build. Default value is 0, which implies no time limit.

abns_max_nb_predictors

TO_CHAR(numeric_expr > 0)

Maximum number of Naive Bayes predictors to be considered for model build, when the model type is chosen to be abns_naive_bayes. Default value is 10.

abns_max_predictors

TO_CHAR(numeric_expr > 0)

Default is 25.

Table 23-14  Algorithm Settings for Naive Bayes
Setting Name Setting Value (with Permissible Value Ranges)

nabs_singleton_threshold

TO_CHAR (0 <= numeric_expr <=1)

Value of singleton threshold for NB algorithm.

Default value is 0.01.

nabs_pairwise_threshold

TO_CHAR (0 <= numeric_expr <=1)

Value of pairwise threshold for NB algorithm.

Default value is 0.01.

Table 23-15  Algorithm Settings for Support Vector Machines
Setting Name Setting Value (with Permissible Value Ranges)

svms_kernel_function

Kernel for Support Vector Machines:

  • svms_linear (for Linear Kernel)
  • svms_gaussian (for Gaussian Kernel)

Default value is svms_linear.

svms_kernel_cache_size

TO_CHAR(numeric_expr > 0)

Value of kernel cache size for SVM algorithm. Applies to Gaussian kernel only.

Default value is 50000000 bytes.

svms_conv_tolerance

TO_CHAR(numeric_expr > 0)

Convergence tolerance for SVM algorithm.

Default value is 0.001.

svms_std_dev

TO_CHAR(numeric_expr > 0)

Value of standard deviation for SVM algorithm.

This is applicable only for Gaussian kernel.

Default value estimated from the data by the algorithm.

svms_complexity_factor

TO_CHAR(numeric_expr > 0)

Value of complexity factor for SVM algorithm.

Default value estimated from the data by the algorithm.

svms_epsilon

TO_CHAR(numeric_expr > 0)

Value of epsilon factor for SVM Regression.

Default value estimated from the data by the algorithm.

Table 23-16  Algorithm Settings for Non-Negative Matrix Factorization
Setting Name Setting Value (with Permissible Value Ranges)

nmfs_random_seed

TO_CHAR(numeric_expr)

Seed for random generator.

Default value is -1.

nmfs_num_iterations

TO_CHAR(1 <= numeric_expr <= 500)

Maximum number of iterations for NMF algorithm.

Default value is 50.

nmfs_conv_tolerance

TO_CHAR(0 < numeric_expr <= 0.5)

Convergence tolerance for NMF algorithm.

Default value is 0.05.

Table 23-17  Algorithm Settings for k-Means
Setting Name Setting Value (with Permissible Value Ranges)

kmns_distance

Distance Function for k-Means Clustering:

  • kmns_euclidean
  • kmns_cosine
  • kmns_fast_cosine

Default value is kmns_euclidean.

kmns_iterations

TO_CHAR(0 < numeric_expr <= 20)

Number of iterations for k-Means algorithm.

Default value is 3.

kmns_conv_tolerance

TO_CHAR(0 < numeric_expr <= 0.5)

Convergence tolerance for k-Means algorithm.

Default value is 0.01.

kmns_split_criterion

Split criterion for k-Means Clustering:

  • kmns_variance
  • kmns_size

Default value is kmns_variance.

kmns_num_bins

Number of histogram bins. Specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.

Range > 0.

Default value is 10.

kmns_block_growth

TO_CHAR(1 < numeric_expr <= 5)

Growth factor for memory allocated to hold cluster data.

Default value is 2.

kmns_min_pct_attr_support

Minimum percentage support required for attributes in rules. Specifies the minimum percentage of values for an attribute in a given cluster required to include this attribute in the rule description of the cluster. That is, if the required support level is not met, the attribute would be omitted from the rule. This would allow retaining in the rule only the well-represented attributes. Setting the parameter value too high in data with missing values can result in very short or even empty rules.

Range >= 0 and <= 1.

Default is 0.1.

You can create a settings table as shown in the example that follows for an SVM classification model, and edit the individual values using SQL DML.

CREATE TABLE drugstore_settings (
  setting_name VARCHAR2(30),
  setting_value VARCHAR2(128))
BEGIN
-- override the default for convergence tolerance for SVM Classification
INSERT INTO drugstore_model_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_conv_tolerance, TO_CHAR(0.081));
COMMIT;
END;

The table function GET_DEFAULT_SETTINGS provides you all the default settings for mining functions and algorithms. If you intend to override all the default settings, you can create a seed settings table and edit them using SQL DML.

BEGIN
CREATE TABLE drug_store_settings AS
SELECT setting_name, setting_value
   FROM TABLE (DBMS_DATA_MINING.GET_DEFAULT_SETTINGS
   WHERE setting_name LIKE 'SVMS_%';
-- update the values using appropriate DML
END;

You can also create a settings table based on another model's settings using GET_MODEL_SETTINGS, as shown in the following example:

BEGIN
CREATE TABLE my_new_model_settings AS
SELECT setting_name, setting_value 
  FROM TABLE (DBMS_DATA_MINING.GET_MODEL_SETTINGS('my_other_
model'));
END;

Prior Probabilities Table

Consult Oracle Data Mining Concepts for an explanation of the prior probabilities table. You can specify a prior probabilities table as an optional function setting when building classification models.

You must create the prior probabilities table using the fixed schema shown in the following code samples. For numerical targets, use the following schema:

target_value      NUMBER
prior_probability NUMBER

For categorical targets, use the following schema:

target_value      VARCHAR2
prior_probability NUMBER

Next, provide the name of the prior probabilities table as input to the setting_value column in the settings table, with the corresponding value for the setting_name column to be DBMS_DATA_MINING.clas_priors_table_name, as shown:

BEGIN
INSERT INTO drugstore_settings (setting_name, setting_value) 
VALUES (DBMS_DATA_MINING.class_priors_table_name, 'census_priors');
COMMIT;
END;

Cost Matrix Table

Consult Oracle Data Mining Concepts for an explanation of the cost matrix. You must create a cost matrix table with the fixed schema shown in the following code samples. For numerical targets, use the following schema:

actual_target_value    NUMBER
predicted_target_value NUMBER
cost                   NUMBER

For categorical targets, use the following schema:

actual_target_value    VARCHAR2
predicted_target_value VARCHAR2
cost                   NUMBER

The DBMS_DATA_MINING package enables you to evaluate the cost of predictions from classification models in an iterative manner during the experimental phase of mining, and to eventually apply the optimal cost matrix to predictions on the actual scoring data in a production environment.

The data input to each COMPUTE procedure in the package is the result generated from applying the model on test data. In addition, if you also provide a cost matrix as an input, the COMPUTE procedure generates test results taking the cost matrix into account. This enables you to experiment with various costs for a given prediction against the same APPLY results, without rebuilding the model and applying it against the same test data for every iteration.

Once you arrive at an optimal cost matrix, you can then input this cost matrix to the RANK_APPLY procedure along with the results of APPLY on your scoring data. RANK_APPLY will provide your new data ranked by cost.


Summary of DBMS_DATA_MINING Subprograms

.
Table 23-18  DBMS_DATA_MINING Package Subprograms
Data Type Purpose

APPLY Procedure

Applies a model to scoring data

CREATE_MODEL Procedure

Creates (builds) a mining model

COMPUTE_CONFUSION_MATRIX Procedure

Computes the confusion matrix from the APPLY results on test data for a classification model; also provides the accuracy of the model

COMPUTE_LIFT Procedure

Computes lift for a given positive target value from the APPLY results on test data for a classification model

COMPUTE_ROC Procedure

Computes Receiver Operating Characteristic for target attributes with binary class from the APPLY results on test data for a classification model

DROP_MODEL Procedure

Drops a model

EXPORT_MODEL Procedure

Exports one or more models from a schema

GET_ASSOCIATION_RULES Function

This table function returns the rules from an Association model

GET_DEFAULT_SETTINGS Function

This table function returns all the default settings for all mining functions and algorithms

GET_FREQUENT_ITEMSETS Function

Returns a set of rows that represent the frequent itemsets from an Association model

GET_MODEL_DETAILS_ABN Function

Provides the details of an Adaptive Bayes Network model

GET_MODEL_DETAILS_KM Function

Provides the details of a k-Means model

GET_MODEL_DETAILS_NB Function

Provides the details of a Naive Bayes model

GET_MODEL_DETAILS_NMF Function

Provides the details of an NMF model

GET_MODEL_DETAILS_SVM Function

Provides the details of a SVM model with a linear kernel

GET_MODEL_SETTINGS Function

Provides the settings used to build a model

GET_MODEL_SIGNATURE Function

Provides the signature of a model

IMPORT_MODEL Procedure

Imports one or more models into the current schema

RANK_APPLY Procedure

Ranks the predictions from the APPLY results for a classification model

RENAME_MODEL Procedure

Renames a model


APPLY Procedure

This procedure applies a mining model to the data to be scored, and generates the APPLY results in a table. This operation is applicable for predictive models (classification, regression) and also for descriptive models (clustering, feature extraction).

Syntax

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name  IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-19  APPLY Procedure Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

data_table_name

Name of table or view representing data to be scored.

case_id_column_name

Name of the case identifier column.

result_table_name

Name of the table to store apply results.

data_schema_name

Name of the schema containing the data to be scored.

Usage Notes

The data provided for APPLY should match the data provided to CREATE_MODEL in terms of the schema definition and relevant content. The GET_MODEL_SIGNATURE function provides this information. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing. The case identifier is not considered to be a mining attribute during APPLY.

You must provide the name of the table in which the results of the apply operation are to be stored. APPLY creates a table with an algorithm-specific fixed schema in the user schema that owns the model.

The behavior of an APPLY operation is analogous to a SQL query operation, even though it is packaged as a procedure. It does not update the model contents and does not have any contention with CREATE_MODEL, DROP_MODEL, or RENAME_MODEL operations. The corollary is that if you potentially drop or rename a model while a model is being applied to scoring data, the APPLY operation may discontinue with partial or unpredictable results.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will match the case identifier column name provided by you. The type of incoming case-id column is preserved in Apply output.

Classification Algorithms

The table containing the APPLY results for all classification models has the same schema. For numerical targets, the results table will have the schema as shown:

case_id      VARCHAR2/NUMBER 
prediction   NUMBER
probability  NUMBER

For categorical targets, the results table will have the following schema:

case_id      VARCHAR2/NUMBER 
prediction   VARCHAR2
probability  NUMBER
Regression using Support Vector Machines

The results table will have the following schema:

case_id     VARCHAR2/NUMBER 
prediction  NUMBER
Clustering using k-Means

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the cluster identifier corresponding to a case, and the associated probability. The results table will have the schema as shown:

case_id      VARCHAR2/NUMBER 
cluster_id   NUMBER
probability  NUMBER
Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the feature identifier corresponding to a case, and the associated match quality. The results table will have the schema as shown:

case_id        VARCHAR2/NUMBER 
feature_id     NUMBER
match_quality  NUMBER

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on the
 * scoring data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_apply',
  case_id_column_name => 'person_id',
  result_table_name => 'census_apply_result');
END;
/

-- View Apply Results
SELECT case_id, prediction, probability
  FROM census_apply_result;

CREATE_MODEL Procedure

This procedure creates a mining model for a given mining function using a specified mining algorithm.

Syntax

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-20  CREATE_MODEL Procedure Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

mining_function

Constant representing the mining function.

data_table_name

Name of the table or view containing the training data.

case_id_column_name

Name of the case identifier column.

target_column_name

Name of the target column -- NULL for descriptive models.

settings_table_name

Name of the table or view containing algorithm settings.

data_schema_name

Name of the schema hosting the training data.

settings_schema_name

Name of the schema hosting the settings table/view.

Usage Notes

The data provided to all subsequent operations such as APPLY must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to subsequent operations such as APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing. The case identifier column is not considered to be a mining attribute during CREATE_MODEL.

You can view the default settings for each algorithm through GET_DEFAULT_SETTINGS. You can override the defaults by providing a settings table specifying your choice of mining algorithm and relevant overriding algorithm settings.

Once a model has been built, information about the attributes used for model build can be obtained from GET_MODEL_SIGNATURE. To inspect or review model contents, you can use any of the algorithm-specific GET_MODEL_DETAILS functions.

The behavior of the CREATE_MODEL is analogous to a SQL DDL CREATE operation. It contends with RENAME_MODEL and DROP_MODEL operations.

Note: The CREATE_MODEL operation creates a set of system tables in the owner's schema to store the patterns and information that constitutes a mining model for a particular algorithm.The names of these tables have the prefix DM$. The number, schema, and content of these tables is Oracle proprietary and may change from release to release. You must not direct any queries or updates against these system tables.

Examples

Assume that you need to build a classification model using Support Vector Machines algorithm.

/* prepare a settings table to override default
 * settings (Naïve Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);

/* override the default value for complexity factor */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_complexity_factor, TO_CHAR(0.081));
COMMIT;

/* build a model with name census_model */
dbms_data_mining.create_model(
  model_name => 'census_model',
  mining_function => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name => 'census_2d_build',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  settings_table_name => 'census_settings');
END;
/

COMPUTE_CONFUSION_MATRIX Procedure

This procedure computes the confusion matrix for a classification model and also provides the accuracy of the model. See Oracle Data Mining Concepts for a description of confusion matrix.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
      accuracy                     OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      confusion_matrix_table_name  IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      cost_matrix_table_name       IN  VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-21  COMPUTE_CONFUSION_MATRIX Procedure Parameters
Parameter Description

accuracy

Accuracy of the model.

apply_result_table_name

Name of the table or view containing the results of an APPLY operation on the test dataset (see Usage Notes).

target_table_name

Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).

case_id_column_name

Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.

target_column_name

Name of the target column.

confusion_matrix_table_name

Name of the table into which the confusion matrix is to be generated.

score_column_name

Name of the column representing the score from the apply results table. In the fixed schema table generated by APPLY, this column has the name PREDICTION, which is the default.

score_criterion_column_name

Name of the column representing the ranking factor for the score from the apply results table. In the fixed schema table generated by APPLY for classification models, this column has the name PROBABILITY, which is the default. Values in this column must be represented numerically.

cost_matrix_table_name

Name of the fixed-schema cost matrix table.

apply_result_schema_name

Name of the schema hosting the APPLY results table.

target_schema_name

Name of the schema hosting the targets table.

cost_matrix_schema_name

Name of the schema hosting the cost matrix table.

Usage Notes

You can also provide a cost matrix as an optional input in order to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_CONFUSION_MATRIX do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the confusion matrix and accuracy as outputs. The quality of the results is dependent on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_CONFUSION_MATRIX procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_CONFUSION_MATRIX procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The schema for this view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

The schema for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

You must provide the name of the table in which the confusion matrix is to be generated. The resulting fixed schema table will always be created in the schema owning the model.

For numerical target attributes, the confusion matrix table will have the schema:

(actual_target_value    NUMBER,
predicted_target_value  NUMBER,
value                   NUMBER)

For categorical target attributes, the confusion matrix table will have the schema:

actual_target_value     VARCHAR2,
predicted_target_value  VARCHAR2,
value                   NUMBER

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test',
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');
CREATE VIEW census_2d_test_view as select person_id, class from census_2d_test;

/* now compute the confusion matrix from the two
 * data streams, also providing a cost matrix as input.
 */
dbms_data_mining.compute_confusion_matrix (
  accuracy => v_accuracy,
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  confusion_matrix_table_name => 'census_confusion_matrix',
  cost_matrix_table_name => 'census_cost_matrix');
dbms_output.put_line('Accuracy of the model: ' || v_accuracy);
END;
/

-- View the confusion matrix using Oracle SQL
SELECT actual_target_value, predicted_target_value, value
  FROM census_confusion_matrix;

COMPUTE_LIFT Procedure

This procedure computes a lift table for a given positive target for a classification model. See Oracle Data Mining Concepts for a description of lift.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_LIFT (
      apply_result_table_name      IN VARCHAR2,
      target_table_name            IN VARCHAR2,
      case_id_column_name          IN VARCHAR2,
      target_column_name           IN VARCHAR2,
      lift_table_name              IN VARCHAR2,
      positive_target_value        IN VARCHAR2,
      score_column_name            IN VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN VARCHAR2 DEFAULT 'PROBABILITY',
      num_quantiles                IN NUMBER DEFAULT 10,
      cost_matrix_table_name       IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN VARCHAR2 DEFAULT NULL,
      target_schema_name           IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-22  COMPUTE_LIFT Procedure Parameters
Parameter Description

apply_result_table_name

Name of the table or view containing the results of an APPLY operation on the test dataset (see Usage Notes).

target_table_name

Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).

case_id_column_name

Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.

target_column_name

Name of the target column.

lift_table_name

Name of the table into which the lift table is to be generated.

positive_target_value

Value of the positive target. If the target column is of NUMBER type, use TO_CHAR() operator to provide the value as a string.

score_column_name

Name of the column representing the score in the apply results table. In the fixed schema table generated by APPLY, this column has the name PREDICTION, which is the default.

score_criterion_column_name

Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by APPLY for classification models, this column has the name PROBABILITY, which is the default. Values in this column must be represented numerically.

num_quantiles

Number of quantiles required in the lift table.

cost_matrix_table_name

Name of the cost matrix table.

apply_result_schema_name

Name of the schema hosting the APPLY results table.

target_schema_name

Name of the schema hosting the targets table.

cost_matrix_schema_name

Name of the schema hosting the cost matrix table.

Usage Notes

You can also provide a cost matrix as an optional input to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_LIFT do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the lift table as output. The quality of the results depends on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_LIFT procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_LIFT procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The schema for this view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name             NUMBER)

The schema for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name            NUMBER)

You must provide the name of the table in which the lift table is to be generated. The resulting fixed schema table is always created in the schema that owns the model.

The resulting lift table will have the following schema:

(quantile_number               NUMBER,
 quantile_total_count          NUMBER,
 quantile_target_count         NUMBER,
 percent_records_cumulative    NUMBER,
 lift_cumulative               NUMBER,
 target_density_cumulative     NUMBER,
 targets_cumulative            NUMBER,
 non_targets_cumulative        NUMBER,
 lift_quantile                 NUMBER,
 target_density                NUMBER)

The output columns are explained in Oracle Data Mining Concepts.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
BEGIN

/* apply the model census_model on test data */
dbms_data_mining.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test,
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */

/* now compute lift with the default 10 quantiles
 * from the two data streams
 */
dbms_data_mining.compute_lift (
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  lift_table_name => 'census_lift',
  positive_target_value => '1',
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View the lift table contents using SQL
SELECT *
  FROM census_lift;

COMPUTE_ROC Procedure

This procedure computes the receiver operating characteristic (ROC) for a binary classification model. See Oracle Data Mining Concepts for a description of receiver operating characteristic.

The inputs are a table containing the results of applying the model on the test data, and a table that contains only the target and case identifier columns from the test data.

Syntax

DBMS_DATA_MINING.COMPUTE_ROC (
      roc_area_under_curve         OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      roc_table_name               IN  VARCHAR2,
      positive_target_value        IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-23  COMPUTE_ROC Procedure Parameters
Parameter Description

roc_area_under_the_curve

A measure of model accuracy, specifically, the probability that the model will correctly rank a randomly chosen pair of rows of opposite classes.

apply_result_table_name

Name of the table or view containing the results of an APPLY operation on the test dataset (see Usage Notes).

target_table_name

Name of the table or view containing only the case identifier column and target column values (see Usage Notes for required schema specification).

case_id_column_name

Name of the case identifier column in the test data set.

roc_table_name

Name of the table into which ROC results are to be generated.

score_column_name

Name of the column representing the score in the apply results table. In the fixed schema table generated by APPLY, this column has the name PREDICTION, which is the default.

score_criterion_column_name

Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by APPLY for classification models, this column has the name PROBABILITY, which is the default. Values in this column must be represented numerically.

apply_result_schema_name

Name of the schema hosting the APPLY results table.

target_schema_name

Name of the schema hosting the targets table.

Usage Notes

It is important to note that the data inputs to COMPUTE_ROC do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the ROC table as output. The quality of the results depends on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_ROC procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_ROC procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The schema for this view or table name for a numerical target attribute is:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           NUMBER

The schema for this view or table name for a categorical target attribute is:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           VARCHAR2

You must provide the name of the table in which the ROC table is to be generated. The resulting fixed schema table will always be created in the schema that owns the model. The resulting ROC table will have the following schema:

(probability              NUMBER,
 true_positives           NUMBER,
 false_negatives          NUMBER,
 false_positives          NUMBER,
 true_negatives           NUMBER,
 true_positive_fraction   NUMBER,
 false_positive_fraction  NUMBER)

The output columns are explained in Table 23-24.

Table 23-24  COMPUTE_ROC Output
Output Column Description

probability

Minimum predicted positive class probability resulting in a positive class prediction. Thus, different threshold values result in different hit rates and false_alarm_rates.

true_negatives

Negative cases in the test data with predicted probabilities below the probability_threshold (correctly predicted).

true_positives

Positive cases in the test data with predicted probabilities above the probability_threshold (correctly predicted).

false_negatives

Positive cases in the test data with predicted probabilities below the probability_threshold (incorrectly predicted).

false_positives

Negative cases in the test data with predicted probabilities above the probability_threshold (incorrectly predicted).

true_positive_fraction

true_positives/(true_positives + false_negatives)

false_positive_fraction

false_positives/(false_positives + true_negatives)

The typical use scenario is to examine the true_positive_fraction and false_positive_fraction to determine the most desirable probability_threshold. This threshold is then used to predict class values in subsequent apply operations. For example, to identify positively predicted cases in probability rank order from an apply result table, given a probability_threshold:

select case_id_column_name from apply_result_table_name where probability > 
probability_threshold order by probability DESC;

There are two procedures one might use to identify the most desirable probability_threshold. One procedure applies when the relative cost of positive class versus negative class prediction errors are known to the user. The other applies when such costs are not well known to the user. In the first instance, one can apply the relative costs to the ROC table to compute the minimum cost probability_threshold. Suppose the relative cost ratio, Positive Class Error Cost / Negative Class Error Cost = 20. Then execute a query like:

WITH cost AS (
  SELECT probability_threshold, 20 * false_negatives + false positives cost 
    FROM ROC_table 
  GROUP BY probability_threshold), 
    minCost AS (
      SELECT min(cost) minCost 
        FROM cost)
      SELECT max(probability_threshold)probability_threshold 
        FROM cost, minCost 
    WHERE cost = minCost;

If relative costs are not well known, the user simply scans the values in the table (in sorted order) and makes a determination about which of the displayed trade-offs (misclassified positives versus misclassified negatives) is most desirable:

select * from ROC_table order by probability_threshold

Examples

Assume that you have built a classification model census_model using the SVM algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.apply(
  model_name => 'census_model',
  data_table_name => 'census_2d_test',
  case_id_column_name => 'person_id',
  result_table_name => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */
v_sql_stmt :=
'CREATE VIEW census_2d_test_view AS ' ||
'SELECT person_id, class FROM census_2d_test';
EXECUTE IMMEDIATE v_sql_stmt;

/* now compute the receiver operating characterestics from
 * the two data streams, also providing a cost matrix
 * as input.
 */
DBMS_DATA_MINING.compute_roc (
  accuracy => v_accuracy,
  apply_result_table_name => 'census_test_result',
  target_table_name => 'census_2d_test_view',
  case_id_column_name => 'person_id',
  target_column_name => 'class',
  roc_table_name => 'census_roc',
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View the ROC results using Oracle SQL
SELECT *
  FROM census_roc;

DROP_MODEL Procedure

This procedure drops an existing mining model from the user's schema.

Syntax

DBMS_DATA_MINING.DROP_MODEL (model_name IN VARCHAR2);

Parameters

Table 23-25  DROP_MODEL Procedure Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Usage Notes

You can use DROP_MODEL to drop an existing mining model.

The behavior of the DROP_MODEL is similar to a SQL DDL DROP operation. It blocks RENAME_MODEL and CREATE_MODEL operations. It does not block or block on APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to drop the model during that time, the DROP will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on Query operations.

Examples

Assume the existence of a model census_model. The following example shows how to drop this model.

BEGIN
  DBMS_DATA_MINING.drop_model(model_name => 'census_model');
END;
/

EXPORT_MODEL Procedure

This procedure exports specified data mining models into a dump file set.

Syntax

DBMS_DATA_MINING.EXPORT_MODEL (
      filename          IN VARCHAR2,
      directory         IN VARCHAR2,
      model_filter      IN VARCHAR2 DEFAULT NULL,
      filesize          IN VARCHAR2 DEFAULT NULL,
      operation         IN VARCHAR2 DEFAULT NULL,
      remote_link       IN VARCHAR2 DEFAULT NULL,
      jobname           IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-26  EXPORT_MODEL Procedure Parameters
Parameter Description

filename

Name of the dump file set must be unique in the directory. A dump file set may contain one or more files. The number of files in a file set is determined by the size of the exporting data and the specified file size (see filesize). If only one file is created, it is named filename>01.dmp. If multiple files are created, they are named sequentially as filename01.dmp, filename02.dmp, and so forth.

directory

Name of the directory object where the dump file set is to be located. The directory object must be created before export. You must have WRITE privileges on the directory object and the corresponding file system directory.

model_filter

Filter that specifies models to be exported. See Usage Notes for details.

filesize

Size of the dump file; may be specified in bytes, kilobytes (K), megabytes (M), or gigabytes (G). Defaults to 50 MB.

operation

Use one of the literals 'EXPORT'/'ESTIMATE'; defaults to 'EXPORT';.

remote_link

A valid DB link or NULL. When a DB link is provided, exporting DM models from a remote database is allowed. Defaults to NULL for local operations, namely, exporting models in the current database.

jobname

Name of the export job, limited to 30 characters. Must be unique to that schema, that is, there is no active job using the same name. If not supplied, a default name is provided in the form of usernam_exp_datapump_job_id><dm_seq>, for example, SCOTT_exp_13410. A log file is created in the dump file with name jobname.log.

Usage Notes

Use EXPORT_MODEL to export all or specific data mining models from the source. This procedure creates a dump file set that includes one or more files. The location of the dump files is specified by parameter directory, which is the name of a directory object created before this procedure is called. The user must have WRITE privileges on this directory object. The dump file name must be unique. When the export operation completes successfully, the dump file name is automatically expanded to filename01.dmp even if there is only one file in the dump set.

A log file is created for every successful export operation in the directory mapped by directory. If jobname is specified, the log is named jobname.log. If jobname is set to NULL (the default), the log is named as USERNAME_exp_nnnn.log, where nnnn is a number. If jobname is provided, it must be unique. Parameter model_filter is used to specify models to be exported; its use is indicated in the following table:

Table 23-27  Table MODEL_FILTER for Export
Value Meaning Notes

NULL

Default. Export all models from the user schema.

 none   

ALL

Export all models from the user schema.

Same as NULL; useful to improve code readability.

A valid WHERE clause, such as:

Specify which models are to be exported.

Any string that can be appended to 'select name from dm_user_models where' and make it a valid SQL query.

   'name= "super_model"'

Export the named model.

  none

   'name IN ("model1","model2","model3")'

Export 3 named models.

  none

   'ALGORITHM_NAME= "NAIVE_BAYES"'

Export all NB models.

  none

'FUNCTION_NAME="CLASSIFICATION"'

Export all classification models.

  none

'name LIKE "NB\_MODEL%"ESCAPE"\"'

Export all models whose names start with "NB_MODEL".

  none

EXPORT_MODEL is not mutually exclusive with the DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. In other words, if an export operation is currently underway and the model is dropped at that time, then the results in the exported dump file are unpredictable.

Examples

The following example shows exporting one model from the current user schema into a dump file. The dump file will be located in operating system directory /home/models, which is mapped to a directory object DM_DUMP. The directory object DM_DUMP must be created before executing the sample, and the user must be granted WRITE privileges on it.

DECLARE
  job_name                              VARCHAR2(32) := 'model_exp_001';
  filename                              VARCHAR2(32);
BEGIN     
  filename := job_name;
  DBMS_DATA_MINING.export_model(
                      filename =>filename,
                      directory =>'DM_DUMP',
                      metadata_filter => 'name=''NB_MODEL_01''',
                      remote_link => NULL,
                      filesize => '30M',
                      operation => 'EXPORT',
                      job_name => job_name);
  dbms_output.put_line(
                      'Export_model '||job_name||' completed sucessfully!');
END;
/

GET_ASSOCIATION_RULES Function

This table function returns the rules from an Association model. The rows are an enumeration of the rules generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_ASSOCIATION_RULES (
   model_name            IN VARCHAR2)
 RETURN DM_Rules pipelined;

Parameters

Table 23-28  GET_ASSOCIATION_RULES Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-29  GET_ASSOCIATION RULES Function Return Values
Return Value Description

DM_Rules

Represents a set of rows with schema:

(rule_id          INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER)

DM_Predicates

Is a nested table with schema:

(attribute_name         VARCHAR2(30),
conditional_operatoar   CHAR(2),
attribute_num_value     NUMBER,
attribute_str_value     VARCHAR2(4000),
attribute_support       NUMBER,
attribute_confidence    NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

rule_id           INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER

DM_Predicates is a collection of DM_Predicate objects. When un-nested, each object maps to a row of the form:

attribute_name         INTEGER,
conditional_operator   CHAR(2),
attribute_num_value    NUMBER,
attribute_str_value    VARCHAR2,
attribute_support      NUMBER,
attribute_confidence   NUMBER
     

The significance of piped output is that each row is materialized by the table function as soon as it is read from model storage, without any latency or wait for the generation of the complete DM_Rules object. All GET operations use pipelining. For more information on pipelined, parallel table functions, consult the PL/SQL User's Guide and Reference.

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates an Association model build followed by an invocation of GET_ASSOCIATION_RULES table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build an AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/

-- View the (unformatted) rules from SQL/Plus
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM GET_ASSOCIATION_RULES('market_model'));

-- see ardemo.sql for retrieving formatted rules
/

GET_DEFAULT_SETTINGS Function

This table function returns all the default settings for all mining functions and algorithms supported in the DBMS_DATA_MINING package.

Syntax

DBMS_DATA_MINING.GET_DEFAULT_SETTINGS;

Return Values

Table 23-30  GET_ASSOCIATION RULES Function Return Values
Return Value Description

DM_Model_Settings

Represents a set of rows with schema:

(setting_name VARCHAR2(30),
setting_value  VARCHAR2(128))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(setting_name VARCHAR2(30),
setting_value  VARCHAR2(128))

This function is particularly useful if you do not know what settings are associated with a particular function or algorithm, and you want to override some or all of them.

Examples

For example, if you want to override some or all of k-Means clustering settings, you can create a settings table as shown, and update individual settings as required.


BEGIN
  CREATE TABLE mysettings AS
  SELECT * 
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
   WHERE setting_name LIKE 'KMNS%';
  -- now update individual settings as required
  UPDATE mysettings
     SET setting_value = 0.02
   WHERE setting_name = dbms_data_mining.kmns_min_pct_attr_support;
END;
/

GET_FREQUENT_ITEMSETS Function

This table function returns a set of rows that represent the frequent itemsets from an Association model. The rows are an enumeration of the frequent itemsets generated during the creation of the model. For a detailed description of frequent itemsets, consult Oracle Data Mining Concepts.

Syntax

DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS (
    model_name        IN VARCHAR2)
  RETURN DM_ItemSets pipelined;

Parameters

Table 23-31  GET_MODEL_DETAILS_ABN Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-32  GET_FREQUENT_ITEMSETS Function Return Values
Return Value Description

DM_ItemSets

Represents a set of rows with schema:

(itemsets_id      NUMBER,
items             DM_items,
support           NUMBER,
number_of_items   NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(itemsets_id      NUMBER,
items             DM_items,
support           NUMBER,
number_of_items   NUMBER)

DM_Items is a nested table of VARCHAR2 strings representing individual item names.

The examples shown in this section describe how to un-nest the values from each of the columns discussed in the preceding sections.

Examples

The following example demonstrates an Association model build followed by an invocation of GET_FREQUENT_ITEMSETS table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS
    SELECT *

  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build a AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/

-- View the (unformatted) Itemsets from SQL/Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model'));

GET_MODEL_DETAILS_ABN Function

This table function returns a set of rows that provide the details of an Adaptive Bayes Network model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN (
    model_name         IN VARCHAR2)
  RETURN DM_ABN_Details 

Parameters

Table 23-33  GET_MODEL_DETAILS_ABN Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-34  GET_MODEL_DETAILS_ABN Function Return Values
Return Value Description

DM_ABN_Details

Represents a set of rows with schema:

(attribute_name       VARCHAR2(30),
attribute_num_value   NUMBER
attribute_str_val     VARCHAR2(4000),
probability           NUMBER,
conditionals          DM_Conditionals)

DM_Conditionals

Represents a set of rows DM_Conditional with schema:

(attribute_name           VARCHAR2(30),
attribute_num_value       NUMBER,
attribute_str_value       VARCHAR2(4000),
conditional_probability   NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS.

Usage Notes

The table function pipes out rows with the schema:

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

This function returns details only for 'single feature' ABN model.

Examples

The following example demonstrates an ABN model build followed by an invocation of GET_MODEL_DETAILS_ABN table function from Oracle SQL.

BEGIN
  -- prepare a settings table to override default algorithm
  CREATE TABLE abn_settings (setting_name VARCHAR2(30),
  setting_value 
VARCHAR2(128));
  INSERT INTO abn_settings VALUES (dbms_data_mining.algo_name,
    dbms_data_mining.algo_adaptive_bayes_network);
 
  -- create a model
  DBMS_DATA_MINING.CREATE_MODEL (
    model_name => 'abn_model',
    function => DBMS_DATA_MINING.CLASSIFICATION,
    data_table_name => 'abn_build',
    case_id_column_name => 'id',
    target_column_name => NULL,
    settings_table_name => 'abn_settings');
END;
/
-- View the (unformatted) results from SQL/Plus
SELECT *
    FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN('abn_model');



GET_MODEL_DETAILS_KM Function

This table function returns a set of rows that provide the details of a k-Means clustering model. The rows are an enumeration of the clustering patterns generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_KM (
  model_name         IN VARCHAR2)
RETURN DM_Clusters pipelined;

Parameters

Table 23-35  GET_MODEL_DETAILS_KM Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-36  GET_MODEL_DETAILS_KM Function Return Values
Return Value Description

DM_Clusters

Represents a set of rows of DM_Cluster with schema:

(id               INTEGER,
record_count      NUMBER,
parent            NUMBER,
tree_level        NUMBER,
dispersion        NUMBER,
child             DM_Children)
centroid          DM_Centroids,
histogram         DM_Histogram,
rule              DM_Rule)

DM_Children

Is a nested table of DM_Child with schema:

(id               NUMBER)

DM_Centroids

Is a nested table of DM_Centroid with schema:

(attribute_name   VARCHAR2(30)
mean              NUMBER,
mode_value        VARCHAR2(30),
variance          NUMBER);

DM_Histograms

Is a nested table of DM_Histogram_bin with schema:

(attribute_name   VARCHAR2(30),
bin_id            NUMBER,
lower_bound       NUMBER,
upper_bound       NUMBER,
label             VARCHAR2(4000),
count             NUMBER)

DM_Rule

Is an object with schema:

(rule_id         INTEGER,
antecedent       DM_Predicates,
consequent       DM_Predicates,
rule_support     NUMBER,
rule_confidence  NUMBER)

DM_Predicates

Is a nested table of DM_Predicate with schema:

(attribute_name       VARCHAR2(30),
conditional_operator  CHAR(2),
attribute_num_value   NUMBER,
attribute_str_value   VARCHAR2(4000),
attribute_support     NUMBER,
attribute_confidence  NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(id               INTEGER,
record_count      NUMBER,
parent            NUMBER,
tree_level        NUMBER,
dispersion        NUMBER,
child             DM_Children,
centroid          DM_Centroids,
histogram         DM_Histograms,
rule              DM_Rule)

Each DM_Children value is a nested table of id's.

Each DM_Centroid value is itself as nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30)
mean              NUMBER,
mode_value        VARCHAR2(30),
variance          NUMBER);

Each DM_Histogram value is itself a nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30),
bin_id            NUMBER,
lower_bound       NUMBER,
upper_bound       NUMBER,
label             VARCHAR2(4000),
count             NUMBER)

Each DM_Rule value is an object with two nested DM_Predicate columns:

(rule_id          INTEGER,
antecedent        DM_Predicates,
consequent        DM_Predicates,
rule_support      NUMBER,
rule_confidence   NUMBER)

DM_Predicates is a collection of DM_Predicate objects. When un-nested, each object maps to a row of the form:

(attribute_name        VARCHAR2(30),
conditional_operator   CHAR(2) ,
attribute_num_value    NUMBER,
attribute_str_value    VARCHAR2(4000),
attribute_support      NUMBER,
attribute_confidence   NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates a k-Means clustering model build followed by an invocation of GET_MODEL_DETAILS_KM table function from Oracle SQL.

BEGIN
-- create a settings table
UPDATE cluster_settings
   SET setting_value = 3 
 WHERE setting_name = DBMS_DATA_MINING.kmeans_block_growth;

/* build a k-Means clustering model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'eight_clouds',
  function => DBMS_DATA_MINING.CLUSTERING,
  data_table_name => 'eight_clouds_build',
  case_id_column_name => 'id',
  target_column_name => NULL,
  settings_table_name => 'cluster_settings');
END;
/

-- View the (unformatted) rules from SQL/Plus
SELECT id, record_count, parent, tree_level, dispersion,
       child, centroid, histogram, rule
  FROM TABLE(DBMS_DATA_MINING_GET_MODEL_DETAILS_KM('eight_clouds'));

GET_MODEL_DETAILS_NB Function

This table function returns a set of rows that provide the details of a Naive Bayes model. The rows are an enumeration of the patterns generated during the creation of the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NB (
   model_name      IN       VARCHAR2)
 RETURN DM_NB_Details pipelined;

Parameters

Table 23-37  GET_MODEL_DETAILS_NB Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-38  GET_MODEL_DETAILS_NB Function Return Values
Return Value Description

DM_NB_Details

Represents a set of rows DM_NB_Detail with schema:

(target_attr_name       VARCHAR2(30),
target_attr_num_value   NUMBER,
target_attr_str_value   VARCHAR2(4000),
prior_probability       NUMBER,
conditionals           

DM_Conditionals

Represents a set of rows DM_Conditional with schema:

(attribute_name          VARCHAR2(30),
attribute_num_value      NUMBER,
attribute_str_value      VARCHAR2(4000),
conditional_probability  NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(target_attr_name       VARCHAR2(30),
target_attr_num_value   NUMBER,
target_attr_str_value   VARCHAR2(4000),
prior_probability       NUMBER,
conditionals            DM_Conditionals)

Each DM_Conditionals value is itself as nested table that, when un-nested, can return rows of type DM_Conditional, of the form:

(attribute_name          VARCHAR2(30),
attribute_num_value      NUMBER,
attribute_str_value      VARCHAR2(4000),
conditional_probability  NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details as shown in this example.

-- You can view the Naive Bayes model details in many ways
-- Consult the Oracle Application Developer's Guide -
-- Object-Relational Features for different ways of 
-- accessing Oracle Objects.

-- View the (unformatted) details from SQL/Plus
SELECT attribute_name, attribute_num_value, attribute_str_value,
       prior_probability, conditionals,
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NB('census_model');

See nbdemo.sql for generation of formatted rules.


GET_MODEL_DETAILS_NMF Function

This table function returns a set of rows that provide the details of a Non-Negative Matrix Factorization model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF (
   model_name        IN        VARCHAR2)
 RETURN DM_NMF_Details pipelined;

Parameters

Table 23-39  GET_MODEL_DETAILS_NMF Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-40  GET_MODEL_DETAILS_NMF Function Return Values
Return Value Description

DM_NMF_Feature_set

Represents a set of rows of DM_NMF_Feature with schema:

(feature_id     INTEGER,
attribute_set   DM_NMF_Attribute_Set)

DM_NMF_Attribute_Set

Is a nested table of DM_NMF_Attribute with schema:

(attribute_name    VARCHAR2,
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

The table function pipes out rows with the schema:

(feature_id      INTEGER,
attribute_set    DM_NMF_Attribute_Set)

Each DM_NMF_Feature_Set value is itself as nested table that, when un-nested, can return rows of the form:

(attribute_name)   VARCHAR2,
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

Assume you have built an NMF model called my_nmf_model. You can retrieve model details as shown:

--View (unformatted) details from SQL/Plus 
SELECT feature_id, attribute_set 
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF(
        'my_nmf_model'));

GET_MODEL_DETAILS_SVM Function

This table function returns a set of rows that provide the details of a Support Vector Machines model. This is applicable only for classification or regression models built using a linear kernel. For any other kernel, the table function returns ORA-40215.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM (
  model_name      IN       VARCHAR2)
 RETURN DM_SVM_Linear_Coeff pipelined;

Parameters

Table 23-41  GET_MODEL_DETAILS_SVM Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-42  GET_MODEL_DETAILS_SVM Function Return Values
Return Value Description

DM_SVM_Linear_Coeff_Set

Represents a set of rows of DM_SVM_Linear_Coeff with schema:

(class             VARCHAR2(4000),
attribute_set     DM_SVM_Attribute_Set)

DM_SVM_Attribute_Set

Is a nested table of DM_SVM_Attribute with schema:

(attribute_name    VARCHAR2(30),
attribute_value    VARCHAR2(4000),
coefficient        NUMBER)

Usage Notes

The table function pipes out rows with the schema:

(class           VARCHAR2(4000), 
attribute_set    DM_SVM_Attribute_Set)

class represents classification target values. For regression targets, class is NULL. For each classification target value for classification models or once only for regression models, the DM_SVM_Attribute_Set value is itself a nested table that, when un-nested, can return rows of the form:

(attribute_name   VARCHAR2(30),
attribute_value   VARCHAR2(4000),
coefficient       NUMBER)

The examples shown in this section describe how to un-nest the values from each of the columns discussed earlier.

Examples

The following example demonstrates an SVM model build followed by an invocation of GET_MODEL_DETAILS_SVM table function from Oracle SQL:

 -- Create SVM model
BEGIN
  dbms_data_mining.create_model(
    model_name => 'SVM_Clas_sample',
    mining_function => dbms_data_mining.classification,
    data_table_name => 'svmc_sample_build_prepared',
    case_id_column_name => 'id',
    target_column_name => 'affinity_card',
    settings_table_name => 'svmc_sample_settings');
END;
/
-- Display model details
SELECT *
  FROM TABLE(dbms_data_mining.get_model_details_svm('SVM_Clas_sample'))
ORDER BY class;

GET_MODEL_SETTINGS Function

This table function returns the list of settings that were used to build the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_SETTINGS(
   model_name           IN VARCHAR2)
 RETURN DM_Model_Settings pipelined;

Parameters

Table 23-43  GET_MODEL_SETTINGS Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-44  GET_MODEL_SETTINGS Function Return Values
Return Value Description

DM_Model_Settings

Represents a set of rows with schema:

(setting_name    VARCHAR2(30),
setting_value    VARCHAR2(128))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

You can use this table function to determine the settings that were used to build the model. This is purely for informational purposes only -- you cannot alter the model to adopt new settings.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model settings using Oracle SQL as follows:

SELECT setting_name, setting_value
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SETTINGS('census_model'));

GET_MODEL_SIGNATURE Function

This table function returns the model signature, which is a set of rows that provide the name and type of each attribute required as input to the APPLY operation.

The case identifier is not considered a mining attribute. For classification and regression models, the target attribute is also not considered part of the model signature.

Syntax

DBMS_DATA_MINING.GET_MODEL_SIGNATURE(
  model_name           IN VARCHAR2)
RETURN DM_Model_Signature pipelined;

Parameters

Table 23-45  GET_MODEL_SIGNATURE Function Parameters
Parameter Description

model_name

Name of the model (see Rules and Limitations).

Return Values

Table 23-46  GET_MODEL_SIGNATURE Function Return Values
Return Value Description

DM_Model_Signature

Represents a set of rows with schema:

(attribute_name    VARCHAR2(30),
attribute_type     VARCHAR2(106))

Pragmas

RNDS, WNDS, RNPS, WNPS

Usage Notes

You can use this table function to get the list of attributes used for building the model. This is particularly helpful to describe a model when an APPLY operation on test or scoring data is done a significant time period after the model is built, or after it is imported into another schema.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details using Oracle SQL as follows:

SELECT attribute_name, attribute_type
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SIGNATURE('census_model');

IMPORT_MODEL Procedure

This procedure imports specified data mining models from a dump file set or from a remote database.

Syntax

DBMS_DATA_MINING.IMPORT_MODEL (
    filename             IN  VARCHAR2,
    directory            IN  VARCHAR2,
    model_names          IN  VARCHAR2 DEFAULT NULL,
    operation            IN  VARCHAR2 DEFAULT NULL,
    remote_link          IN  VARCHAR2 DEFAULT NULL,
    jobname              IN  VARCHAR2 DEFAULT NULL,
    schema_remap         IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 23-47  IMPORT_MODEL Procedure Parameters
Parameter Description

filename

Name of the dump file set. If there are multiple files in the set, you may use "%U" to specify the file set. See Usage Notes for details.

directory

Name of the directory object where the dump file is located. The directory object must be created before the import operation and you must be granted both READ and WRITE privileges.

model_names

Specify the names of models to be imported. See Usage Notes for details.

operation

Use one of the literals 'IMPORT'/'SQL_FILE'; defaults to 'IMPORT'. When set to 'SQL_FILE', DDLs that create those database objects in the dump file set are generated in a text file in the dump directory. The DDL file is named job_name.sql.

remote_link

A valid DB link or NULL. When a DB link is provided, this allows moving DM models from the remote database to the local database. Default value is NULL for importing into the current database from a dump file set.

job_name

Name of the import job, limited to 30 characters. It must be unique to that schema, that is, there is no active import job using the same name. If not supplied, a default name is provided in the form of username_imp_nnnn. For example, "SCOTT_imp_76102".

schema_remap

Specify schema remapping. Must be specified when importing from a dump file set created by a different user. It must be specified in the form of 'FROM_NAME:TO_NAME'. For example, to remap schema from SCOTT to MARY, set schema_remap=>'SCOTT:MARY'. You need IMP_FULL_DATABASE privileges to do schema remapping if you do not have a SYS role.

s+

Usage Notes

Use IMPORT_MODEL to import all or specific data mining models from a dump or from a remote database by means of a DB link.

The dump file set must be created by expdp or EXPORT_MODEL procedure. The dump files must be located in the directory mapped by the directory object. The user must have READ and WRITE privileges on the directory object. In order to import models from a dump created by another user, you must have IMP_FULL_DATABASE privilege or have SYS role.

When import operation completes successfully, a log is created in the directory mapped by directory object. If jobname is provided, the log is named jobname.log. If jobname is NULL (the default) the log is named username_imp_nnnn.log, where nnnn is a number. If the dump file set contains multiple files, you can use "%U" in the filename. For example, if your dump file set contains 3 files, archive01.dmp, archive02.dmp, and archive03.dmp, you may specify filename=>'archive%U'.

Use the parameter model_names to specify which models to import. The use of model_names is dependent on the operation, whether the source is a dump file set or a remote database. When importing from dump files, model names are the sole useful reference. In this case, you can either assign model_names with comma-delimited model names or a WHERE clause with names explicitly specified. When importing from a remote database, model_names can be specified in the same fashion as in the EXPORT_MODEL procedure. Details are listed in Table 23-48.

Table 23-48  Table MODEL_FILTER for Import
Value Source* Meaning Notes

NULL

dump/remote

Default. Import all models from the user schema.

---

'ALL'

dump/remote

Import all models from the user schema.

Same as NULL; useful to improve code readability.

Comma-separated model names

dump/remote

Import named models.

Examples: 'mymodel' or 'model1, model2, model3

A valid WHERE clause, such as:

----

Specify which models are to be imported.

Any string that can be appended to 'select name from dm_user_models where' and make it a valid SQL query.

   'name=''super_model'''

dump/remote

Import super_model.

----

   'name IN ("model1","model2", "model3")'

dump/remote

Import model model1,model2, and model3.

----

'ALGORITHM_NAME="NAIVE_BAYES"'

remote

Import all NB models.

----

  'FUNCTION_  NAME="CLASSIFICATION"'

remote

Import all classification models.

----

   'name LIKE "NB\_
  MODEL%"ESCAPE"\" '

remote

Import all models whose name start with 'NB_MODEL'.

----

IMPORT_MODEL is not mutually exclusive with the DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. In other words, if an import operation is underway, models with the same name are dropped at that time, and the results in the schema are unpredictable.

Examples

The following example shows user MARY imports all models from a dump file, model_exp_001.dmp, created by user SCOTT. The dump file is located in the file system directory mapped to a directory object called DM_DUMP. Note that if user MARY does not have IMP_FULL_DATABASE privileges, IMPORT_MODEL will raise an error.

-- import all models
declare
  file_name       VARCHAR2(40);
BEGIN
  file_name := 'model_exp_001.dmp';
  DBMS_DATA_MINING.import_model(
                 filename=>file_name,
                   directory=>'DM_DUMP',                             
                 schema_remap=>'SCOTT:MARY');
  dbms_output.put_line(
'DBMS_DATA_MINING.import_model of all models from SCOTT done!');
END;
/

RANK_APPLY Procedure

This procedure ranks the results of an APPLY operation based on a top-N specification for predictive and descriptive model results. For classification models, you can provide a cost matrix as input, and obtain the ranked results with costs applied to the predictions.

Syntax

DBMS_DATA_MINING.RANK_APPLY (
      apply_result_table_name        IN VARCHAR2,
      case_id_column_name            IN VARCHAR2,
      ranked_apply_result_tab_name   IN VARCHAR2,
      top_N                          IN INTEGER DEFAULT 1,
      cost_matrix_table_name         IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name       IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name        IN VARCHAR2 DEFAULT NULL);

Parameters

Table 23-49  RANK_APPLY Procedure Parameters
Parameter Description

apply_result_table_name

Name of the table or view containing the results of an APPLY operation on the test dataset (see Usage Notes).

case_id_column_name

Name of the case identifier column. This must be the same as the one used for generating APPLY results.

ranked_apply_result_tab_name

Name of the table containing the ranked apply results.

top_N

Top N predictions to be considered from the APPLY results for precision recall computation.

cost_matrix_table_name

Name of the cost matrix table.

apply_result_schema_name

Name of the schema hosting the APPLY results table.

cost_matrix_schema_name

Name of the schema hosting the cost matrix table.

Usage Notes

You can use RANK_APPLY to generate ranked apply results, based on a top-N filter and also with application of cost for predictions, if a cost matrix is provided.

The behavior of RANK_APPLY is similar to that of APPLY with respect to other DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. The procedure does not depend on the model; the only input of relevance is the apply results generated in a fixed schema table from APPLY.

The main intended use of RANK_APPLY is for the generation of the final APPLY results against the scoring data in a production setting. You can apply the model against test data using APPLY, compute various test metrics against various cost matrix tables, and use the candidate cost matrix for RANK_APPLY.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will be the same case identifier column as that of the apply results.

Classification Models -- NB, ABN, SVM

For numerical targets, the ranked results table will have the schema as shown:

(case_id       VARCHAR2/NUMBER,
prediction     NUMBER,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

For categorical targets, the ranked results table will have the following schema:

(case_id       VARCHAR2/NUMBER,
prediction     VARCHAR2,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)
Clustering using k-Means

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the cluster identifier corresponding to a case, and the associated probability. Cost matrix is not considered here. The ranked results table will have the schema as shown, and contains the cluster ids ranked by top-N.

(case_id       VARCHAR2/NUMBER,
cluster_id     NUMBER,
probability    NUMBER,
rank           INTEGER)
Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the feature identifier corresponding to a case, and the associated match quality. Cost matrix is not considered here. The ranked results table will have the schema as shown, and contains the feature ids ranked by top-N.

(case_id        VARCHAR2/NUMBER,
feature_id      NUMBER,
match_quality   NUMBER,
rank            INTEGER)

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on apply
 * data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
DBMS_DATA_MINING.rank_apply(
  apply_result_table_name => 'census_apply',
  case_id_column_name => 'person_id',
  ranked_apply_result_tab_name => 'census_ranked_apply',
  top_N => 3,
  cost_matrix_table_name => 'census_cost_matrix');
END;
/

-- View Apply Results
SELECT *
  FROM census_ranked_apply;

RENAME_MODEL Procedure

This procedure renames a mining model to a specified new name.

Syntax

DBMS_DATA_MINING.RENAME_MODEL (
     model_name            IN VARCHAR2,
     new_model_name        IN VARCHAR2);

Parameters

Table 23-50  RENAME_MODEL Procedure Parameters
Parameter Description

model_name

Old name of the model (see Rules and Limitations).

new_model_name

New name of the model (see Rules and Limitations).

Usage Notes

You can use RENAME_MODEL to rename an existing mining model.

The behavior of the RENAME_MODEL is similar to a SQL DDL RENAME operation. It blocks DROP_MODEL and CREATE_MODEL operations. It does not block APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to rename the model during that time, the RENAME will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on query operations.

Examples

Assume the existence of a model census_model. The following example shows how to rename this model.

BEGIN
  DBMS_DATA_MINING.rename_model(
    model_name => 'census_model',
    new_model_name => 'census_new_model');
END;
/