Skip Headers

Oracle® OLAP Application Developer's Guide
10g Release 1 (10.1)

Part Number B10333-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

3 The Sample Schema

This guide uses the Global schema for its examples. This chapter describes this schema and explains how it will be mapped to multidimensional objects. It consists of the following topics:

3.1 Case Study Scenario

The fictional Global Computing Company was established in 1990. Global Computing distributes computer hardware and software components to customers on a worldwide basis. The Sales and Marketing department has not been meeting its budgeted numbers. As a result, this department has been challenged to develop a successful sales and marketing strategy.

Global Computing operates in an extremely competitive market. Competitors are numerous, customers are especially price-sensitive, and profit margins tend to be narrow. In order to grow profitably, Global Computing must increase sales of its most profitable products.

Various factors in Global Computing's current business point to a decline in sales and profits:

Global Computing needs to understand how each of these factors is affecting its business.

Current reporting is done by the IT department, which produces certain standard reports on a monthly basis. Any ad hoc reports are handled on an as-needed basis and are subject to the time constraints of the limited IT staff. Complaints have been widespread within the Sales and Marketing department, with regard to the delay in response to report requests. Complaints have also been numerous in the IT department, with regard to analysts who change their minds frequently or ask for further information.

The Sales and Marketing department has been struggling with a lack of timely information about what it is selling, who is buying, and how they are buying. In a meeting with the CIO, the VP of Sales and Marketing states, "By the time I get the information, it's no longer useful. I'm only able to get information at the end of each month, and it doesn't have the details I need to do my job."

3.1.1 Reporting Requirements

When asked to be more specific about what she needs, the Vice President of Sales and Marketing identifies the following requirements:

  • Trended sales data for specific customers, regions, and segments.

  • The ability to provide information and some analysis capabilities to the field sales force. A Web interface would be preferred, since the sales force is distributed throughout the world.

  • Detail regarding mail-order, phone, and e-mail sales on a weekly and monthly basis, as well as a comparison to past time periods. Information must identify when, how, and what is being sold by each channel.

  • Margin information on products in order to understand the dollar contribution for each sale.

  • Knowledge of percent change versus the prior and year-ago period for sales, units, and margin.

  • The ability to perform analysis of the data by ad hoc groupings.

The CIO has discussed these requirements with his team and has come to the conclusion that a standard reporting solution against the production order entry system would not be flexible enough to provide the required analysis capabilities. The reporting requirements for business analysis are so diverse that the projected cost of development, along with the expected turnaround time for requests, would make this solution unacceptable.

The CIO's team recommends using an analytic workspace to support analysis. The team suggests that the Sales and Marketing department's IT group work with Corporate IT to build an analytic workspace that meets their needs for information analysis.

3.1.2 Business Goals

The development team identifies the following high-level business goals that the project must meet:

  • Global Computing's strategic goal is to increase company profits by increasing sales of higher margin products and by increasing sales volume overall.

  • The Sales and Marketing department objectives are to:

    • Analyze industry trends and target specific market segments

    • Analyze sales channels and increase profits

    • Identify product trends and create a strategy for developing the appropriate channels

3.1.3 Information Requirements

Once you have established business goals, you can determine the type of information that will help achieve these goals. To understand how end users will examine the data in the analytic workspace, it is important to conduct extensive interviews. From interviews with key end users, you can determine how they look at the business, and what types of business analysis questions they want to answer

3.1.3.1 Business Analysis Questions

Interviews with the VP of Sales and Marketing, salespeople, and market analysts at Global Computing reveal the following business analysis questions:

  • What products are profitable?

  • Who are our customers, and what and how are they buying?

  • What accounts are most profitable? What is the performance of each distribution channel?

  • Is there still a seasonal variance to the business?

We can examine each of these business analysis questions in detail.

3.1.3.2 What products are profitable?

This business analysis question consists of the following questions:

  • What is the percent of total sales for any item, product family, or product class in any month, quarter or year, and in any distribution channel? How does this percent of sales differ from a year ago?

  • What is the unit price, unit cost, and margin for each unit for any item in any particular month? What are the price, cost, and margin trends for any item in any month?

  • What items were most profitable in any month, quarter, or year, in any distribution channel, and in any geographic area or market segment? How did profitability change from the prior period? What was the percent change in profitability from the prior period?

  • What items experienced the greatest change in profitability from the prior period?

  • What items contributed the most to total profitability in any month, quarter, or year, in any distribution channel, and in any geographic area or market segment?

  • What items have the highest per-unit margin for any particular month?

  • In summary, what are the trends?

3.1.3.3 Who are our customers, and what and how are they buying?

This business analysis question consists of the following questions:

  • What were sales for any item, product family, or product class in any month, quarter, or year?

  • What were sales for any item, product family, or product class in any distribution channel, geographic area, or market segment?

  • How did sales change from the prior period? What was the percent change in sales from the prior period?

  • How did sales change from a year ago? What was the percent change in sales from a year ago?

  • In summary, what are the trends?

3.1.3.4 What accounts are most profitable?

This business analysis question consists of the following questions:

  • What accounts are most profitable in any month, quarter, or year, in any distribution channel, by any item, product family, or product class?

  • What were sales and extended margin (gross profit) by account for any month, quarter, or year, for any distribution channel, and for any product?

  • How does account profitability compare to the prior time period?

  • Which accounts experienced the greatest increase in sales as compared to the prior period?

  • What is the percent change in sales from the prior period? Did the percent change in profitability increase at the same rate as the percent change in sales?

  • In summary, what are the trends?

3.1.3.5 What is the performance of each distribution channel?

This business analysis question consists of the following questions:

  • What is the percent of sales to total sales for each distribution channel for any item, product family, or product class, or for any geographic area or market segment?

  • What is the profitability of each distribution channel: direct sales, catalog sales, and the Internet?

  • Is the newest distribution channel, the Internet, "cannibalizing" catalog sales? Are customers simply switching ordering methods, or is the Internet distribution channel reaching additional customers?

  • In summary, what are the trends?

3.1.3.6 Is there still a seasonal variance to the business?

This business analysis question consists of the following questions:

  • Are there identifiable seasonal sales patterns for particular items or product families?

  • How do seasonal sales patterns vary by geographic location?

  • How do seasonal sales patterns vary by market segment?

  • Are there differences in seasonal sales patterns as compared to last year?

3.1.3.7 Summary of Information Requirements

By examining the types of analyses that users wish to perform, we can identify the following key requirements for analysis:

  • Global Computing has a strong need for profitability analysis. The company must understand profitability by product, account, market segment, and distribution channel. It also needs to understand profitability trends.

  • Global Computing needs to understand how sales vary by time of year. The company must understand these seasonal trends by product, geographic area, market segment, and distribution channel.

  • Global Computing has a need for ad hoc sales analysis. Analysis must identify what products are sold to whom, when these products are sold, and how customers buy these products.

  • The ability to perform trend analysis is important to Global Computing.

3.1.4 Identifying Required Business Facts

The key analysis requirements reveal the business facts that are required to support analysis requirements at Global Computing.

These facts are ordered by time, product, customer shipment or market segment, and distribution channel:


Sales
Units
Change in sales from prior period
Percent change in sales from prior period
Change in sales from prior year
Percent change in sales from prior year
Product share
Channel share
Market share
Extended cost
Extended margin
Extended margin change from prior period
Extended margin percent change from prior period
Margin, percent of total product sales
Units sold, change from prior period
Units sold, percent change from prior period
Units sold, change from prior year
Units sold, percent change from prior year

These facts are ordered by item and month:


Unit price
Unit cost
Margin per unit

3.1.5 Designing a Logical Data Model for Global Computing

"Business Goals" identifies the business facts that will support analysis requirements at Global Computing. Next, we will identify the dimensions, levels, and attributes in a logical data model. We will also identify the relationships within each dimension. The resulting data model will be used to design the Global star schema, the OLAP Catalog metadata, and the analytic workspace.

3.1.5.1 Identifying Dimensions

Four dimensions that will be used to organize the facts in the database.

  • Product shows how data varies by product.

  • Customer shows how data varies by customer or geographic area.

  • Channel shows how data varies according to each distribution channel.

  • Time how data varies over time.

3.1.5.2 Identifying Levels

Now that we have identified dimensions, we can identify the levels of summarization within each dimension. Analysis requirements at Global Computing reveal that:

  • There are three distribution channels: Sales, Catalog, and Internet. These three values are the lowest level of detail in the data warehouse and will be grouped in the Channel level. From the order of highest level of summarization to the lowest level of detail, levels will be All Channels and Channel.

  • Global performs customer and geographic analysis along the line of shipments to customers and by market segmentation. In each case, the lowest level of detail in the data model is the Ship To location.

    • When analyzing along the line of customer shipments, the levels of summarization will be (highest to lowest): All Customers, Region, Warehouse, and Ship To.

    • When analyzing by market segmentation, the levels of summarization will be (highest to lowest): Total Market, Market Segment, Account, and Ship To.

  • The product dimension will have four levels (highest to lowest): Total, Class, Family, and Item.

  • The time dimension will have three levels (highest to lowest): Year, Quarter, and Month.

Within the Channel, Customer, and Product dimensions, we added a Total or All level as the highest level of summarization. Adding this highest level will provide additional flexibility as application users analyze data.

3.1.5.3 Identifying Hierarchies

We will identify the hierarchies that organize the levels within each dimension. To identify hierarchies, we will group the levels in the correct order of summarization and in a way that supports the identified types of analysis.

For the Channel, Product, and Time dimensions, Global Computing requires only one hierarchy for each dimension. For the Customer dimension, however, Global Computing requires two hierarchies. Analysis within the Customer dimension tends to be either by geographic area or market segment. Therefore, we will organize levels into two hierarchies, Shipments and Market Segment.

3.1.5.4 Identifying Stored Measures

"Identifying Required Business Facts" lists 21 business facts that are required to support the analysis requirements of Global Computing. Of this number, only three facts need to be acquired from the transactional database:

  • Units

  • Unit Price

  • Unit Cost

All of the other facts can be derived from these basic facts. The derived facts can be calculated in the analytic workspace on demand. If experience shows that some of these derived facts are being used heavily and the calculations are putting a noticeable load on the system, then some of these facts can be calculated and stored in the analytic workspace as a data maintenance procedure.

3.2 The Global Star Schema

The Global schema consists of two fact tables and four dimension tables. The dimension tables use numeric surrogate keys for each level column to assure that dimension members are unique across levels. For example, a geography dimension can easily have identical values at different levels, for example, New York at the City level and New York at the state level. In an analytic workspace, dimension members at all levels are fetched into a single dimension, and duplicate values overwrite each other unless additional steps are taken to assure uniqueness.

Figure 3-1 shows the relationships among the tables. In addition, the Global schema contains update fact tables, which are omitted from the diagram but occupy the same logical position as the history fact tables.

Figure 3-1 Global Schema Diagram

Global Schema Diagram
Description of the illustration globdiag.gif

3.2.1 Dimension Table: TIME_DIM

The TIME_DIM table defines a time dimension with three levels. Each level is supported by four columns: a numeric surrogate key, a textual description, an end date (last day in time period), and a time span (number of days in time period). This is the most basic information required to define a time dimension.

The surrogate keys are artificial values with no meaning outside the context of the table. They assure that the same values are not repeated at different levels, and they provide the fastest processing speeds both in the relational tables and in the analytic workspace. The descriptive columns provide meaning to these numeric identifiers.

The end date and time span columns support time-series analysis, such as:

  • Change from a prior period

  • Change from a year ago

  • Year-to-date

  • Range of time

There are seven years defined, from 1998 to 2004, with data provided for 1998 to early 2003. The last year is available for forecasting.

In the standard hierarchy, the rollup sequence from the base to the top level is:

MONTH -> QUARTER -> YEAR

Table 3-1 describes the columns of the TIME_DIM table.

Table 3-1 TIME_DIM Column Descriptions

Column Datatype Role Unique Values Sample Value
MONTH_ID NUMBER
Primary key
Surrogate key
78 34
MONTH_DSC VARCHAR2 Attribute - Apr-99
QUARTER_ID NUMBER Surrogate key 26 10
QUARTER_DSC VARCHAR2 Attribute - Q2-99
YEAR_ID NUMBER Surrogate key 7 2
YEAR_DSC VARCHAR2 Attribute - 1999
MONTH_TIME_SPAN NUMBER Attribute - 30
QUARTER_TIMESPAN NUMBER Attribute - 91
YEAR_TIMESPAN NUMBER Attribute - 365
MONTH_END_DATE DATE Attribute - 30-Apr-1999
QUARTER_END_DATE DATE Attribute - 30-Jun-1999
YEAR_END_DATE DATE Attribute - 31-Dec-1999

3.2.2 Dimension Table: CUSTOMER_DIM

The CUSTOMER_DIM table defines seven levels that will be used to define two hierarchies. Each level has a numeric surrogate key and a textual description, which is the most basic information to define a "normal" dimension, that is, a dimension that is not time. SHIP_TO is the primary key, and its values will become the base-level members for both Customer hierarchies.

In the Market hierarchy, the rollup sequence from the base to the top level is:

SHIP_TO -> ACCOUNT -> MARKET_SEGMENT -> TOTAL_MARKET

In the Customer hierarchy, the rollup sequence is:

SHIP_TO -> WAREHOUSE -> REGION-> ALL_CUSTOMERS

Table 3-2 describes the columns of the CUSTOMER_DIM table.

Table 3-2 CUSTOMER_DIM Column Descriptions

Column Datatype Role Hierarchy Unique Values Sample Value
SHIP_TO_ID NUMBER
Primary key
Surrogate key
Both 61 89
SHIP_TO_DSC VARCHAR2 Attribute - - Monolith Motor Co. Knoxville
ACCOUNT_ID NUMBER Surrogate key Market 24 36
ACCOUNT_DSC VARCHAR2 Attribute - - Monolith Motor Company
MARKET_SEGMENT_ID NUMBER Surrogate key Market 5 5
MARKET_SEGMENT_DSC VARCHAR2 Attribute - - Manufacturing
TOTAL_MARKET_ID NUMBER Surrogate key Market 1 7
TOTAL_MARKET_DSC VARCHAR2 Attribute - - Total Market
WAREHOUSE_ID NUMBER Surrogate key Customers 11 21
WAREHOUSE_DSC VARCHAR2 Attribute - - United States
REGION_ID NUMBER Surrogate key Customers 3 10
REGION_DSC VARCHAR2 Attribute - - North America
ALL_CUSTOMERS_ID NUMBER Surrogate key Customers 1 1
ALL_CUSTOMERS_DSC VARCHAR2 Attribute Customers - All Customers

3.2.3 Dimension Table: PRODUCT_DIM

The PRODUCT_DIM table defines a product dimension with four levels. Each level has a numeric surrogate key and descriptive text. ITEM_ID is the primary key, so its values will become the base-level members of the Product dimension.

In the Product hierarchy, the rollup sequence from the base level to the top level is:

ITEM -> FAMILY -> CLASS -> TOTAL_PRODUCT

Table 3-3 describes the columns of the PRODUCT_DIM table.

Table 3-3 PRODUCT_DIM Column Descriptions

Column Datatype Role Unique Values Sample Value
ITEM_ID NUMBER
Primary key
Surrogate key
36 48
ITEM_DSC VARCHAR2 Attribute - Keyboard Wrist Rest
ITEM_PACKAGE_ID NUMBER Attribute 4 Laptop Value Pack
FAMILY_ID NUMBER Surrogate key 9 7
FAMILY_DSC VARCHAR2 Attribute - Accessories
CLASS_ID NUMBER Surrogate key 2 3
CLASS_DSC VARCHAR2 Attribute - Software/Other
TOTAL_PRODUCT_ID NUMBER Surrogate key 1 1
TOTAL_PRODUCT_DSC VARCHAR2 Attribute - Total Product

3.2.4 Dimension Table: CHANNEL_DIM

The CHANNEL_DIM table contains four columns. CHANNEL_ID is the primary key, and its values will become the base-level members of the Channel dimension. ALL_CHANNELS_ID defines a single value that represents all of the channels. In the OLAP Catalog, these two columns will define the two levels of a single Channel hierarchy. The rollup sequence from the base level to the top level is simply:

CHANNEL -> ALL_CHANNELS

The remaining columns, CHANNEL_DSC and ALL_CHANNELS_DSC, provide textual descriptions that give the surrogate keys meaning.

Table 3-4 describes the columns of the CHANNEL_DIM table.

Table 3-4 CHANNEL_DIM Column Descriptions

Column Datatype Role Unique Values Sample Value
CHANNEL_ID NUMBER
Primary key
Surrogate key
3 4
CHANNEL_DSC VARCHAR2 Attribute - Internet
ALL_CHANNELS_ID NUMBER Surrogate key 1 1
ALL_CHANNELS_DSC VARCHAR2 Attribute - All Channels

3.2.5 Fact Tables: UNITS_HISTORY_FACT and _UPDATE_FACT

The UNITS_HISTORY_FACT and UNITS_UPDATE_FACT tables contain four foreign key columns, which together comprise a multi-column primary key. The foreign keys are related to the primary keys of the four dimension tables.

In UNITS_HISTORY_FACT, every foreign key value for Product, Customer, and Channel is used at least once, and 65 time periods are used. The table contains 169,487 rows of a possible 513,864 unique key combinations.

UNITS_UPDATE_FACT adds data for month 91 (Jun-03). Every foreign key value for Product, Customer, and Channel is used at least once. The table contains 3,459 rows of a possible unique 6,588 key combinations.

Table 3-5 describes the columns in both tables.

Table 3-5 UNITS_HISTORY_FACT and UNITS_UPDATE_FACT Column Descriptions

Column Datatype Role Description
CHANNEL_ID NUMBER Key Related to CHANNEL_DIM
ITEM_ID NUMBER Key Related to PRODUCT_DIM
SHIP_TO_ID NUMBER Key Related to CUSTOMER_DIM
MONTH_ID NUMBER Key Related to TIME_DIM
UNITS NUMBER Facts Number of units sold

3.2.6 Fact Tables: PRICE_AND_COST_HISTORY_FACT and _UPDATE_FACT

The PRICE_AND_COST_HISTORY_FACT and PRICE_AND_COST_UPDATE_FACT tables contain two foreign key columns, which together comprise a multi-column primary key, and two fact columns.

In PRICE_AND_COST_HISTORY_FACT, data is provided for all products for 65 months.

PRICE_AND_COST_UPDATE_FACT adds data for month 91 (Jun-03) for all products.

Table 3-6 describes the columns of both tables.

Table 3-6 PRICE_AND_COST_HISTORY_FACT and PRICE_AND_COST_UPDATE_FACT Column Descriptions

Column Datatype Role Description
ITEM_ID NUMBER Key Related to PRODUCT_DIM
MONTH_ID NUMBER Key Related to TIME_DIM
UNIT_PRICE NUMBER Facts List price of a unit
UNIT_COST NUMBER Facts Cost to produce a unit

3.3 Mapping the Global Schema to an Analytic Workspace

The OLAP Catalog provides an interface for mapping the columns of relational tables to the multidimensional objects of an analytic workspace. The following tables identify the mapping for the PRICE_AND_COST_HISTORY_FACT fact table and its related dimension tables, PRODUCT_DIM and TIME_DIM. These tables are the source for the PRICE_CUBE cube.

The analytic workspace objects listed in these tables will be created in Chapter 6, " Creating an Analytic Workspace ", and are described in more detail in Chapter 8, " Exploring a Standard Form Analytic Workspace ".

3.3.1 Global Product Dimension Mapping

Table 3-7 shows how the columns of the PRODUCT_DIM dimension table are mapped to workspace objects to provide an embedded total PRODUCT dimension.

PRODUCT_DIM supports a single dimension hierarchy, PRODUCT_ROLLUP, with four levels: ITEM, FAMILY, CLASS, and TOTAL_PRODUCT. The descriptive columns are mapped to both the long and short descriptions, but this redundancy in the analytic workspace is not required.

Table 3-7 Mapping the Global Product Dimension

PRODUCT_DIM Table Columns OLAP Catalog Logical Objects GLOBAL Analytic Workspace Objects
ITEM_ID PRODUCT_ROLLUP hierarchy, ITEM level PRODUCT dimension

PRODUCT_PARENTREL parent relation

PRODUCT_LEVELLIST level dimension (ITEM, FAMILY, CLASS, TOTAL_PRODUCT)

PRODUCT_LEVELREL level relation

PRODUCT_HIERLIST hierarchy dimension (PRODUCT_ROLLUP)

FAMILY_ID PRODUCT_ROLLUP hierarchy, FAMILY level
CLASS_ID PRODUCT_ROLLUP hierarchy, CLASS level
TOTAL_PRODUCT_ID PRODUCT_ROLLUP hierarchy, TOTAL_PRODUCT level
ITEM_PACKAGE PACKAGE attribute PRODUCT_PACKAGE variable
ITEM_DSC ITEM Long Description attribute

ITEM Short Description attribute

PRODUCT_LONG_DESCRIPTION variable

PRODUCT_SHORT_DESCRIPTION variable

FAMILY_DSC FAMILY Long Description attribute

FAMILY Short Description attribute

CLASS_DSC CLASS Long Description attribute

CLASS Short Description attribute

TOTAL_PRODUCT_DSC CLASS Long Description attribute

CLASS Short Description attribute


3.3.2 Global Time Dimension Mapping

Table 3-8 shows how the columns of the TIME_DIM dimension table are mapped to workspace objects to provide an embedded total TIME dimension.

TIME_DIM supports a single dimension hierarchy, Calendar, with three levels: Month, Quarter, and Year. For time-based analysis in the analytic workspace, a Time dimension must have End_Date and Time_Span attributes, as it does here. The descriptive columns are mapped to both the long and short descriptions, but this redundancy in the analytic workspace is not required.

Table 3-8 Mapping the Global Time Dimension

TIME_DIM Table Columns OLAP Catalog Logical Objects GLOBAL Analytic Workspace Objects
MONTH_ID Calendar hierarchy, Month level TIME dimension

TIME_PARENTREL parent relation

TIME_LEVELLIST level dimension (Month, Quarter, Year))

TIME_LEVELREL level relation

TIME_HIERLIST hierarchy dimension (Calendar)

QUARTER_ID Calendar hierarchy, Quarter level
YEAR_ID Calendar hierarchy, Year level
MONTH_DSC Month Long Description attribute

Month Short Description attribute

TIME_LONG_DESCRIPTION variable

TIME_SHORT_DESCRIPTION variable

QUARTER_DSC Quarter Long attribute Attribute

Quarter Short attribute Attribute

YEAR_DSC Year Long Description attribute

Year Short Description attribute

MONTH_TIMESPAN Month Time_Span attribute TIME_TIME_SPAN variable
QUARTER_TIMESPAN Quarter Time_Span attribute
YEAR_TIMESPAN Year Time_Span attribute
MONTH_END_DATE Month End_Date attribute TIME_END_DATE variable
QUARTER_END_DATE Quarter End_Date attribute
YEAR_END_DATE Year End_Date attribute

3.3.3 Global Price Cube Mapping

Table 3-9 shows how the columns of the PRICE_AND_COST_HISTORY_FACT fact table are mapped to workspace objects to provide a PRICE_CUBE cube with two measures, UNIT_COST and UNIT_PRICE.

An aggregation operator is defined in the OLAP Catalog and is the basis for an initial aggmap for the cube. The aggmap provides the rules of aggregation. The measure formulas use the aggmap to aggregate the base-level data loaded into the measure variables.

Most variables are sparse and require a composite dimension, which is associated with the cube.

Table 3-9 Mapping the Global Price Cube

PRICE_AND_COST_HISTORY_FACT Table Columns OLAP Catalog Logical Objects GLOBAL Analytic Workspace Objects
ITEM_ID PRICE_CUBE cube

SUM aggregation operator

PRICE_CUBE dimension (TIME, PRODUCT)

PRICE_CUBE_COMPOSITE dimension

PRICE_CUBE_AGGMAP_AWCREATEDDEFAULT_1 aggmap

MONTH_ID
UNIT_PRICE UNIT_PRICE measure UNIT_PRICE formula

UNIT_PRICE_VARIABLE variable

UNIT_COST UNIT_COST measure UNIT_COST formula

UNIT_COST_VARIABLE variable