Skip Headers

Oracle® XML DB Developer's Guide
10g Release 1 (10.1)

Part Number B10790-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

B XML Schema Primer

This appendix includes introductory information about the W3C XML Schema Recommendation.

This appendix contains these topics:

XML Schema and Oracle XML DB

Support for the Worldwide Web Consortium (W3C) XML Schema Recommendation is a key feature in Oracle XML DB. XML Schema specifies the structure, content, and certain semantics of a set of XML documents. It is described in detail at http://www.w3.org/TR/xmlschema-0/.

Namespaces

Two different XML schemas can define on object, such as an element, attribute, complex type, simple type, and so on, with the same name. Because the two objects are in different XML schemas they cannot be treated as being the same item. This means that an instance document must identify which XML schema a particular node is based on. The XML Namespace Recommendation defines a mechanism that accomplishes this.An XML namespace is a collection of names identified by a URI reference. These are used in XML documents as element types and attribute names.

XML Schema and Namespaces

This section describes the basics of using XML schema and Namespaces.

XML Schema Can Specify a targetNamespace Attribute

The XML schema use the targetNamespace attribute to define the namespace associated with a given XML schema. The attribute is included in the definition of the 'XML schema' element. If an XML schema:

  • Specifies a targetNamespace, all elements and types defined by the XML schema are associated with this namespace. This implies that any XML document containing these elements and types must identify which namespace they are associated with.

  • Does not specify a targetNamespace, elements and types defined by the XML schema are associated with the NULL namespace.

XML Instance Documents Declare Which XML Schema to Use in Their Root Element

The XML Schema Recommendation defines a mechanism that allows an XML instance document to identify which XML schemas are required for processing or validating XML documents. The XML schemas in question are identified (in an XML instance document) on a namespace by namespace basis using attributes defined in the W3C XMLSchema-Instance namespace. To use this mechanism the instance XML document must declare the XMLSchema-instance namespace in their root element, as follows:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

schemaLocation Attribute

Besides this, XML schemas that include a targetNamespace declaration are identified by also including a schemaLocation attribute in the root node of the instance XML documents. The schemaLocation attribute contains one entry for each XML schema used. Each entry consists of a pair of values:

  • The left hand side of each pair is the value of the targetNamespace attribute. In the preceding example, this is "xmlns:xxxx"

  • The right hand side of each pair is a hint, typically in the form of a URL. In the preceding example, this is: http://www.w3.org.org/2001/XML It describes where to find the XML schema definition document. This hint is often referred to as the "Document Location Hint".

noNamespaceSchemaLocation Attribute

XML schemas that do not include a targetNamespace declaration are identified by including the noNamespaceSchemaLocation attribute in the root node of the instance document. The noNamespaceSchemaLocation attribute contains a hint, typically in the form of a URL, that describes where to find the XML schema document in question.In the instance XML document. once the XMLSchema-instance namespace has been declared, it must identify the set of XML schemas required to process the document using the appropriate schemaLocation and noNamespaceSchemaLocation attributes.

Declaring and Identifying XML Schema Namespaces

Consider an XML schema with a defined root element PurchaseOrder. Assume that the XML schema does not declare a target namespace. The XML schema is registered under the following URL:

http://xmlns.oracle.com/demo/purchaseOrder.xsd

For an XML document to be recognized as an instance of this XML schema, specify the root element of the instance document as follows:

<PurchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="http://xmlns.oracle.com/demo/purchaseOrder.xsd">

Registering an XML Schema

Before Oracle XML DB can make use of information in an XML schema, the XML schema must be registered with the database. You register an XML schema by calling the PL/SQL procedure DBMS_XMLSCHEMA.register_schema().The XML schema is registered under a URL. This URL a used internally as a unique key used to identify the XML schema. Oracle XML DB does not require access to the target of the URL when registering your XML schema, or when processing documents that conform to the XML schema.Oracle XML DB assumes that any instance documents associated with the XML schema will provide the URL used to register the XML schema as the Document Location Hint.

Oracle XML DB Creates a Default Table

When an XML schema is registered with the database, a default table is created for each globally defined element declared in the XML schema. When an instance document is loaded in the Oracle XML DB repository, the content of the document will be stored in the Default Table. The default tables created by registering an XML schema are XMLType tables, that is, they are Object Tables, where each row in the table is represented as an instance of the XMLType data type.

Deriving an Object Model: Mapping the XML Schema Constructs to SQL Types

Oracle XML DB can also use the information contained in an XML schema to automatically derive an object model that allows XML content compliant with the XML schema to be decomposed and stored in the database as a set of objects. This is achieved by mapping the constructs defined by the XML schema directly into SQL types generated using the SQL 1999 Type framework that is part of Oracle Database.

Using the SQL 1999 type framework to manage XML provides several benefits:

  • It allows Oracle XML DB to leverage the full power of Oracle Database when managing XML.

  • It can lead to significant reductions in the space required to store the document.

  • It can reduce the memory required to query and update XML content.

  • Capturing the XML schema objects as SQL types helps share the abstractions across schemas, and also across their SQL storage counterparts.

  • It allows Oracle XML DB to support constructs defined by the XML schema standard that do not easily map directly into the conventional relational model.

Oracle XML DB and DOM Fidelity

Using SQL 1999 objects to persist XML allows Oracle XML DB to guarantee DOM fidelity. The Document Object Model (DOM), is a W3C standard that defines a set of platform- and language-neutral interfaces that allow a program to dynamically access and update the content, structure, and style of a document. To provide DOM fidelity Oracle XML DB ensures that a DOM generated from a document that has been shredded and stored in Oracle XML DB will be identical to a DOM generated from the original document.

Providing DOM Fidelity requires Oracle XML DB to preserve all information contained in an XML document. This includes maintaining the order in which elements appear within a collection and within a document as well as storing and retrieving out-of-band data, such as comments, processing instructions, and mixed text. By guaranteeing DOM fidelity, Oracle XML DB can ensure that there is no loss of information when the database is used to store and manage XML documents.

Annotating an XML Schema

Oracle XML DB provides the application developer or database administrator with control over how much decomposition, or 'shredding', takes place when an XML document is stored in the database. The XML Schema Recommendation allows vendors to define schema annotations that add directives for specific schema processors. Oracle XML DB schema processor recognizes a set of annotations that make it possible to customize the mapping between the XML schema data types and the SQL data types, control how collections are stored in the database, and specify how much of a document should be shredded. If you do not specify any annotations to your XML Schema to customize the mapping, Oracle XML DB uses default choices that may or may not be optimal for your application.

Identifying and Processing Instance Documents

Oracle XML DB uses the Document Location Hint to determine which XML schemas are relevant to processing the instance document. It assumes that the Document Location Hint will map directly to the URL used when registering the XML schema with the database.

Introducing XML Schema

Parts of this introduction are extracted from W3C XML Schema notes.

An XML schema (referred to in this appendix as schema) defines a class of XML documents. The term "instance document" is often used to describe an XML document that conforms to a particular XML schema. However, neither instances nor schemas are required to exist as documents, they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset Information Items. But to simplify the description in this appendix, instances and schemas are referred to as if they are documents and files.

Purchase Order, po.xml

Consider the following instance document in an XML file po.xml. It describes a purchase order generated by a home products ordering and billing application:

<?xml version="1.0"?>
   <purchaseOrder orderDate="1999-10-20">
     <shipTo country="US">
       <name>Alice Smith</name>
       <street>123 Maple Street</street>
       <city>Mill Valley</city>
       <state>CA</state>
       <zip>90952</zip>
     </shipTo>
     <billTo country="US">
       <name>Robert Smith</name>
       <street>8 Oak Avenue</street>
       <city>Old Town</city>
       <state>PA</state>
         <zip>95819</zip>
     </billTo>
     <comment>Hurry, my lawn is going wild!</comment>
     <items>
       <item partNum="872-AA">
         <productName>Lawnmower</productName>
         <quantity>1</quantity>
         <USPrice>148.95</USPrice>
         <comment>Confirm this is electric</comment>
       </item>
       <item partNum="926-AA">
         <productName>Baby Monitor</productName>
         <quantity>1</quantity>
         <USPrice>39.98</USPrice>
         <shipDate>1999-05-21</shipDate>
       </item>
     </items>
   </purchaseOrder>

The purchase order consists of a main element, purchaseOrder, and the subelements shipTo, billTo, comment, and items. These subelements (except comment) in turn contain other subelements, and so on, until a subelement such as USPrice contains a number rather than any subelements.

  • Complex Type Elements. Elements that contain subelements or carry attributes are said to have complex types

  • Simple Type Elements. Elements that contain numbers (and strings, and dates, and so on) but do not contain any subelements are said to have simple types. Some elements have attributes; attributes always have simple types.

The complex types in the instance document, and some simple types, are defined in the purchase order schema. The other simple types are defined as part of the XML Schema repertoire of built-in simple types.

Association Between the Instance Document and Purchase Order Schema

The purchase order schema is not mentioned in the XML instance document. An instance is not actually required to reference an XML schema, and although many will. It is assumed that any processor of the instance document can obtain the purchase order XML schema without any information from the instance document. Later, you will see the explicit mechanisms for associating instances and XML schemas.

Purchase Order Schema, po.xsd

The purchase order schema is contained in the file po.xsd:

Purchase Order Schema, po.xsd

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:annotation>
     <xsd:documentation xml:lang="en">
      Purchase order schema for Example.com.
      Copyright 2000 Example.com. All rights reserved.
     </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

  <xsd:element name="comment" type="xsd:string"/>

  <xsd:complexType name="PurchaseOrderType"
    <xsd:sequence>
      <xsd:element name="shipTo" type="USAddress"/>
      <xsd:element name="billTo" type="USAddress"/>
      <xsd:element ref="comment" minOccurs="0"/>
      <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
  </xsd:complexType>

  <xsd:complexType name="USAddress"
    <xsd:sequence>
      <xsd:element name="name"   type="xsd:string"/>
      <xsd:element name="street" type="xsd:string"/>
      <xsd:element name="city"   type="xsd:string"/>
      <xsd:element name="state"  type="xsd:string"/>
      <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
  </xsd:complexType>

  <xsd:complexType name="Items"
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element name="productName" type="xsd:string"/>
            <xsd:element name="quantity">
              <xsd:simpleType>
                <xsd:restriction base="xsd:positiveInteger">
                  <xsd:maxExclusive value="100"/>
                </xsd:restriction>
              </xsd:simpleType>
            </xsd:element>
            <xsd:element name="USPrice"  type="xsd:decimal"/>
            <xsd:element ref="comment"   minOccurs="0"/>
            <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
          </xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Stock Keeping Unit, a code for identifying products -->
  <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

The purchase order schema consists of a schema element and a variety of subelements, most notably elements, complexType, and simpleType which determine the appearance of elements and their content in the XML instance documents.

Prefix xsd:

Each of the elements in the schema has a prefix xsd: which is associated with the XML Schema namespace through the declaration, xmlns:xsd="http://www.w3.org/2001/XMLSchema", that appears in the schema element. The prefix xsd: is used by convention to denote the XML Schema namespace, although any prefix can be used. The same prefix, and hence the same association, also appears on the names of built-in simple types, such as xsd:string. This identifies the elements and simple types as belonging to the vocabulary of the XML Schema language rather than the vocabulary of the schema author. For clarity, this description uses the names of elements and simple types, for example, simpleType, and omits the prefix.

XML Schema Components

Schema component is the generic term for the building blocks that comprise the abstract data model of the schema. An XML Schema is a set of ·schema components·. There are 13 kinds of component in all, falling into three groups.

Primary Components

The primary components, which may (type definitions) or must (element and attribute declarations) have names are as follows:

  • Simple type definitions

  • Complex type definitions

  • Attribute declarations

  • Element declarations

Secondary Components

The secondary components, which must have names, are as follows:

  • Attribute group definitions

  • Identity-constraint definitions

  • Model group definitions

  • Notation declarations

Helper Components

Finally, the helper components provide small parts of other components; they are not independent of their context:

  • Annotations

  • Model groups

  • Particles

  • Wildcards

  • Attribute Uses

Complex Type Definitions, Element and Attribute Declarations

In XML Schema, there is a basic difference between complex and simple types:

  • Complex types, allow elements in their content and may carry attributes

  • Simple types, cannot have element content and cannot carry attributes.

There is also a major distinction between the following:

  • Definitions which create new types (both simple and complex)

  • Declarations which enable elements and attributes with specific names and types (both simple and complex) to appear in document instances

This section defines complex types and declares elements and attributes that appear within them.

New complex types are defined using the complexType element and such definitions typically contain a set of element declarations, element references, and attribute declarations. The declarations are not themselves types, but rather an association between a name and the constraints which govern the appearance of that name in documents governed by the associated schema. Elements are declared using the element element, and attributes are declared using the attribute element.

Defining the USAddress Type

For example, USAddress is defined as a complex type, and within the definition of USAddress you see five element declarations and one attribute declaration:

<xsd:complexType name="USAddress"
   <xsd:sequence>
     <xsd:element name="name"   type="xsd:string"/>
     <xsd:element name="street" type="xsd:string"/>
     <xsd:element name="city"   type="xsd:string"/>
     <xsd:element name="state"  type="xsd:string"/>
     <xsd:element name="zip"    type="xsd:decimal"/>
   </xsd:sequence>
   <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>

Hence any element appearing in an instance whose type is declared to be USAddress, such as shipTo in po.xml, must consist of five elements and one attribute. These elements must:

  • Be called name, street, city, state, and zip as specified by the values of the declarations' name attributes

  • Appear in the same sequence (order) in which they are declared. The first four of these elements will each contain a string, and the fifth will contain a number. The element whose type is declared to be USAddress may appear with an attribute called country which must contain the string US.

The USAddress definition contains only declarations involving the simple types: string, decimal, and NMTOKEN.

Defining PurchaseOrderType

In contrast, the PurchaseOrderType definition contains element declarations involving complex types, such as USAddress, although both declarations use the same type attribute to identify the type, regardless of whether the type is simple or complex.

<xsd:complexType name="PurchaseOrderType"
   <xsd:sequence>
     <xsd:element name="shipTo" type="USAddress"/>
     <xsd:element name="billTo" type="USAddress"/>
     <xsd:element ref="comment" minOccurs="0"/>
     <xsd:element name="items"  type="Items"/>
   </xsd:sequence>
   <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

In defining PurchaseOrderType, two of the element declarations, for shipTo and billTo, associate different element names with the same complex type, namely USAddress. The consequence of this definition is that any element appearing in an instance document, such as po.xml, whose type is declared to be PurchaseOrderType must consist of elements named shipTo and billTo, each containing the five subelements (name, street, city, state, and zip) that were declared as part of USAddress. The shipTo and billTo elements may also carry the country attribute that was declared as part of USAddress.

The PurchaseOrderType definition contains an orderDate attribute declaration which, like the country attribute declaration, identifies a simple type. In fact, all attribute declarations must reference simple types because, unlike element declarations, attributes cannot contain other elements or other attributes.

The element declarations we have described so far have each associated a name with an existing type definition. Sometimes it is preferable to use an existing element rather than declare a new element, for example:

<xsd:element ref="comment" minOccurs="0"/>

This declaration references an existing element, comment, declared elsewhere in the purchase order schema. In general, the value of the ref attribute must reference a global element, on other words, one that has been declared under schema rather than as part of a complex type definition. The consequence of this declaration is that an element called comment may appear in an instance document, and its content must be consistent with that element type, in this case, string.

Occurrence Constraints: minOccurs and maxOccurs

The comment element is optional in PurchaseOrderType because the value of the minOccurs attribute in its declaration is 0. In general, an element is required to appear when the value of minOccurs is 1 or more. The maximum number of times an element may appear is determined by the value of a maxOccurs attribute in its declaration. This value may be a positive integer such as 41, or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1.

Thus, when an element such as comment is declared without a maxOccurs attribute, the element may not occur more than once. If you specify a value for only the minOccurs attribute, then make certain that it is less than or equal to the default value of maxOccurs, that is, it is 0 or 1.

Similarly, if you specify a value for only the maxOccurs attribute, then it must be greater than or equal to the default value of minOccurs, that is, 1 or more. If both attributes are omitted, then the element must appear exactly once.

Attributes may appear once or not at all, but no other number of times, and so the syntax for specifying occurrences of attributes is different from the syntax for elements. In particular, attributes can be declared with a use attribute to indicate whether the attribute is required, optional, or even prohibited. Recall for example, the partNum attribute declaration in po.xsd:

<xsd:attribute name="partNum" type="SKU" use="required"/>

Default Attributes

Default values of both attributes and elements are declared using the default attribute, although this attribute has a slightly different consequence in each case. When an attribute is declared with a default value, the value of the attribute is whatever value appears as the attribute value in an instance document; if the attribute does not appear in the instance document, then the schema processor provides the attribute with a value equal to that of the default attribute.


Note:

Default values for attributes only make sense if the attributes themselves are optional, and so it is an error to specify both a default value and anything other than a value of optional for use.

Default Elements

The schema processor treats defaulted elements slightly differently. When an element is declared with a default value, the value of the element is whatever value appears as the element content in the instance document.

If the element appears without any content, then the schema processor provides the element with a value equal to that of the default attribute. However, if the element does not appear in the instance document, then the schema processor does not provide the element at all.

In summary, the differences between element and attribute defaults can be stated as:

  • Default attribute values apply when attributes are missing

  • Default element values apply when elements are empty

The fixed attribute is used in both attribute and element declarations to ensure that the attributes and elements are set to particular values. For example, po.xsd contains a declaration for the country attribute, which is declared with a fixed value US. This declaration means that the appearance of a country attribute in an instance document is optional (the default value of use is optional), although if the attribute does appear, then its value must be US, and if the attribute does not appear, then the schema processor will provide a country attribute with the value US.


Note:

The concepts of a fixed value and a default value are mutually exclusive, and so it is an error for a declaration to contain both fixed and default attributes.

Table B-1 summarizes the attribute values used in element and attribute declarations to constrain their occurrences.

Table B-1 Occurrence Constraints for XML Schema Elements and Attributes

Elements(minOccurs, maxOccurs)fixed, default Attributesuse, fixed,default Notes
(1, 1) -, - required, -, - Element or attribute must appear once. It may have any value.
(1, 1) 37, - required, 37, - Element or attribute must appear once. Its value must be 37.
(2, unbounded) 37, - n/a Element must appear twice or more. Its value must be 37. In general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be unbounded.
(0, 1) -, - optional, -, - Element or attribute may appear once. It may have any value.
(0, 1) 37, - optional, 37, - Element or attribute may appear once. If it does appear, then its value must be 37. If it does not appear, then its value is 37.
(0, 1) -, 37 optional, -, 37 Element or attribute may appear once. If it does not appear, then its value is 37. Otherwise its value is that given.
(0, 2) -, 37 n/a Element may appear once, twice, or not at all. If the element does not appear, then it is not provided. If it does appear and it is empty, then its value is 37. Otherwise its value is that given. In general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be unbounded.
(0, 0) -, - prohibited, -, - Element or attribute must not appear.


Note:

Neither minOccurs, maxOccurs, nor use may appear in the declarations of global elements and attributes.

Global Elements and Attributes

Global elements, and global attributes, are created by declarations that appear as the children of the schema element. Once declared, a global element or a global attribute can be referenced in one or more declarations using the ref attribute as described in the preceding section.

A declaration that references a global element enables the referenced element to appear in the instance document in the context of the referencing declaration. So, for example, the comment element appears in po.xml at the same level as the shipTo, billTo and items elements because the declaration that references comment appears in the complex type definition at the same level as the declarations of the other three elements.

The declaration of a global element also enables the element to appear at the top-level of an instance document. Hence purchaseOrder, which is declared as a global element in po.xsd, can appear as the top-level element in po.xml.


Note:

This rationale also allows a comment element to appear as the top-level element in a document like po.xml.


Global Elements and Attributes Caveats

One caveat is that global declarations cannot contain references; global declarations must identify simple and complex types directly. Global declarations cannot contain the ref attribute, they must use the type attribute, or, be followed by an anonymous type definition.

A second caveat is that cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations. In other words, global declarations cannot contain the attributes minOccurs, maxOccurs, or use.

Naming Conflicts

The preceding section described how to:

These involve naming. If two things are given the same name, then in general, the more similar the two things are, the more likely there will be a naming conflict.

For example:

If the two things are both types, say a complex type called USStates and a simple type called USStates, then there is a conflict.

If the two things are a type and an element or attribute, such as when defining a complex type called USAddress and declaring an element called USAddress, then there is no conflict.

If the two things are elements within different types, that is, not global elements, say declare one element called name as part of the USAddress type and a second element called name as part of the Item type, then there is no conflict. Such elements are sometimes called local element declarations.

If the two things are both types and you define one and XML Schema has defined the other, say you define a simple type called decimal, then there is no conflict. The reason for the apparent contradiction in the last example is that the two types belong to different namespaces. Namespaces are described in "Introducing the W3C Namespaces in XML Recommendation".

Simple Types

The purchase order schema declares several elements and attributes that have simple types. Some of these simple types, such as string and decimal, are built into XML Schema, while others are derived from the built-ins.

For example, the partNum attribute has a type called SKU (Stock Keeping Unit) that is derived from string. Both built-in simple types and their derivations can be used in all element and attribute declarations. Table B-2 lists all the simple types built into XML Schema, along with examples of the different types.

Table B-2 Simple Types Built into XML Schema

Simple Type Examples (delimited by commas) Notes
string Confirm this is electric --
normalizedString Confirm this is electric 3
token Confirm this is electric 4
byte -1, 126 2
unsignedByte 0, 126 2
base64Binary GpM7 --
hexBinary 0FB7 --
integer -126789, -1, 0, 1, 126789 2
positiveInteger 1, 126789 2
negativeInteger -126789, -1 2
nonNegativeInteger 0, 1, 126789 2
nonPositiveInteger -126789, -1, 0 2
int -1, 126789675 2
unsignedInt 0, 1267896754 2
long -1, 12678967543233 2
unsignedLong 0, 12678967543233 2
short -1, 12678 2
unsignedShort 0, 12678 2
decimal -1.23, 0, 123.4, 1000.00 2
float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN equivalent to single-precision 32-bit floating point, NaN is Not a Number. Note: 2.
double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN equivalent to double-precision 64-bit floating point. Note: 2.
Boolean true, false 1, 0 --
time 13:20:00.000, 13:20:00.000-05:00 2
dateTime 1999-05-31T13:20:00.000-05:00 May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time, see 2
duration P1Y2M3DT10H30M12.3S 1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds
date 1999-05-31 2
gMonth --05-- May, Notes: 2, 5
gYear 1999 1999, Notes: 2, 5
gYearMonth 1999-02 the month of February 1999, regardless of the number of days. Notes: 2, 5
gDay ---31 the 31st day. Notes: 2, 5
gMonthDay --05-31 every May 31st. Notes: 2, 5
Name shipTo XML 1.0 Name type
QName po:USAddress XML namespace QName
NCName USAddress XML namespace NCName, that is, QName without the prefix and colon
anyURI http://www.example.com/, http://www.example.com/doc.html#ID5 --
language en-GB, en-US, fr valid values for xml:lang as defined in XML 1.0
ID -- XML 1.0 ID attribute type, Note: 1
IDREF -- XML 1.0 IDREF attribute type. Note: 1
IDREFS -- XML 1.0 IDREFS attribute type, see (1)
ENTITY -- XML 1.0 ENTITY attribute type. Note: 1
ENTITIES -- XML 1.0 ENTITIES attribute type. Note: 1
NOTATION -- XML 1.0 NOTATION attribute type. Note: 1
NMTOKEN US, Canada XML 1.0 NMTOKEN attribute type. Note: 1
NMTOKENS US UK, Canada Mexique XML 1.0 NMTOKENS attribute type, that is, a whitespace separated list of NMTOKEN values. Note: 1

Notes:

(1) To retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes.

(2) A value of this type can be represented by more than one lexical format. For example, 100 and 1.0E2 are both valid float formats representing one hundred. However, rules have been established for this type that define a canonical lexical format, see XML Schema Part 2.

(3) Newline, tab and carriage-return characters in a normalizedString type are converted to space characters before schema processing.

(4) As normalizedString, and adjacent space characters are collapsed to a single space character, and leading and trailing spaces are removed.

(5) The "g" prefix signals time periods in the Gregorian calender.

New simple types are defined by deriving them from existing simple types (built-ins and derived). In particular, you can derive a new simple type by restricting an existing simple type, in other words, the legal range of values for the new type are a subset of the range of values of the existing type.

Use the simpleType element to define and name the new simple type. Use the restriction element to indicate the existing (base) type, and to identify the facets that constrain the range of values. A complete list of facets is provided in Appendix B of XML Schema Primer, http://www.w3.org/TR/xmlschema-0/.

Suppose you want to create a new type of integer called myInteger whose range of values is between 10000 and 99999 (inclusive). Base your definition on the built-in simple type integer, whose range of values also includes integers less than 10000 and greater than 99999.

To define myInteger, restrict the range of the integer base type by employing two facets called minInclusive and maxInclusive:


Defining myInteger, Range 10000-99999
<xsd:simpleType name="myInteger">
   <xsd:restriction base="xsd:integer">
     <xsd:minInclusive value="10000"/>
     <xsd:maxInclusive value="99999"/>
   </xsd:restriction>
</xsd:simpleType>

The example shows one particular combination of a base type and two facets used to define myInteger, but a look at the list of built-in simple types and their facets should suggest other viable combinations.

The purchase order schema contains another, more elaborate, example of a simple type definition. A new simple type called SKU is derived (by restriction) from the simple type string. Furthermore, you can constrain the values of SKU using a facet called pattern in conjunction with the regular expression \d{3}-[A-Z]{2} that is read "three digits followed by a hyphen followed by two upper-case ASCII letters":


Defining the Simple Type "SKU"
<xsd:simpleType name="SKU">
   <xsd:restriction base="xsd:string">
     <xsd:pattern value="\d{3}-[A-Z]{2}"/>
   </xsd:restriction>
</xsd:simpleType>

This regular expression language is described more fully in Appendix D of http://www.w3.org/TR/xmlschema-0/.

XML Schema defines fifteen facets which are listed in Appendix B of http://www.w3.org/TR/xmlschema-0/. Among these, the enumeration facet is particularly useful and it can be used to constrain the values of almost every simple type, except the Boolean type. The enumeration facet limits a simple type to a set of distinct values. For example, you can use the enumeration facet to define a new simple type called USState, derived from string, whose value must be one of the standard US state abbreviations:


Using the Enumeration Facet
<xsd:simpleType name="USState">
   <xsd:restriction base="xsd:string">
     <xsd:enumeration value="AK"/>
     <xsd:enumeration value="AL"/>
     <xsd:enumeration value="AR"/>
     <!-- and so on ... -->
   </xsd:restriction>
</xsd:simpleType>

USState would be a good replacement for the string type currently used in the state element declaration. By making this replacement, the legal values of a state element, that is, the state subelements of billTo and shipTo, would be limited to one of AK, AL, AR, and so on. Note that the enumeration values specified for a particular type must be unique.

List Types

XML Schema has the concept of a list type, in addition to the so-called atomic types that constitute most of the types listed in Table B-3. Atomic types, list types, and the union types described in the next section are collectively called simple types. The value of an atomic type is indivisible from XML Schema perspective. For example, the NMTOKEN value US is indivisible in the sense that no part of US, such as the character "S", has any meaning by itself. In contrast, list types are comprised of sequences of atomic types and consequently the parts of a sequence (the atoms) themselves are meaningful. For example, NMTOKENS is a list type, and an element of this type would be a white-space delimited list of NMTOKEN values, such as US UK FR. XML Schema has three built-in list types:

In addition to using the built-in list types, you can create new list types by derivation from existing atomic types. You cannot create list types from existing list types, nor from complex types. For example, to create a list of myInteger:

Creating a List of myInteger

<xsd:simpleType name="listOfMyIntType">
   <xsd:list itemType="myInteger"/>
</xsd:simpleType>

And an element in an instance document whose content conforms to listOfMyIntType is:

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

Several facets can be applied to list types: length, minLength, maxLength, and enumeration. For example, to define a list of exactly six US states (SixUSStates), we first define a new list type called USStateList from USState, and then we derive SixUSStates by restricting USStateList to only six items:


List Type for Six US States
<xsd:simpleType name="USStateList">
  <xsd:list itemType="USState"/>
</xsd:simpleType>
<xsd:simpleType name="SixUSStates">
  <xsd:restriction base="USStateList">
    <xsd:length value="6"/>
  </xsd:restriction>
</xsd:simpleType>

Elements whose type is SixUSStates must have six items, and each of the six items must be one of the (atomic) values of the enumerated type USState, for example:

<sixStates>PA NY CA NY LA AK</sixStates>

Note that it is possible to derive a list type from the atomic type string. However, a string may contain white space, and white space delimits the items in a list type, so you should be careful using list types whose base type is string. For example, suppose we have defined a list type with a length facet equal to 3, and base type string, then the following 3 item list is legal:

Asie Europe Afrique

But the following 3 item list is illegal:

Asie Europe Amérique Latine

Even though "Amérique Latine" may exist as a single string outside of the list, when it is included in the list, the whitespace between Amérique and Latine effectively creates a fourth item, and so the latter example will not conform to the 3-item list type.

Union Types

Atomic types and list types enable an element or an attribute value to be one or more instances of one atomic type. In contrast, a union type enables an element or attribute value to be one or more instances of one type drawn from the union of multiple atomic and list types. To illustrate, we create a union type for representing American states as singleton letter abbreviations or lists of numeric codes. The zipUnion union type is built from one atomic type and one list type:


Union Type for Zipcodes
<xsd:simpleType name="zipUnion">
  <xsd:union memberTypes="USState listOfMyIntType"/>
</xsd:simpleType>

When we define a union type, the memberTypes attribute value is a list of all the types in the union.

Now, assuming we have declared an element called zips of type zipUnion, valid instances of the element are:

<zips>CA</zips>
<zips>95630 95977 95945</zips>
<zips>AK</zips>

Two facets, pattern and enumeration, can be applied to a union type.

Anonymous Type Definitions

Schemas can be constructed by defining sets of named types such as PurchaseOrderType and then declaring elements such as purchaseOrder that reference the types using the type= construction. This style of schema construction is straightforward but it can be unwieldy, especially if you define many types that are referenced only once and contain very few constraints. In these cases, a type can be more succinctly defined as an anonymous type which saves the overhead of having to be named and explicitly referenced.

The definition of the type Items in po.xsd contains two element declarations that use anonymous types (item and quantity). In general, you can identify anonymous types by the lack of a type= in an element (or attribute) declaration, and by the presence of an un-named (simple or complex) type definition:

Two Anonymous Type Definitions

<xsd:complexType name="Items">
   <xsd:sequence>
     <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"
       <xsd:complexType>
         <xsd:sequence>
           <xsd:element name="productName" type="xsd:string"/>
           <xsd:element name="quantity"
             <xsd:simpleType>
               <xsd:restriction base="xsd:positiveInteger">
                 <xsd:maxExclusive value="100"/>
               </xsd:restriction>
             </xsd:simpleType>
           </xsd:element>
           <xsd:element name="USPrice"  type="xsd:decimal"/>
           <xsd:element ref="comment"   minOccurs="0"/>
           <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
         </xsd:sequence>
         <xsd:attribute name="partNum" type="SKU" use="required"/>
       </xsd:complexType>
     </xsd:element>
   </xsd:sequence>
</xsd:complexType>

In the case of the item element, it has an anonymous complex type consisting of the elements productName, quantity, USPrice, comment, and shipDate, and an attribute called partNum. In the case of the quantity element, it has an anonymous simple type derived from integer whose value ranges between 1 and 99.

Element Content

The purchase order schema has many examples of elements containing other elements (for example, items), elements having attributes and containing other elements (such as shipTo), and elements containing only a simple type of value (for example, USPrice). However, we have not seen an element having attributes but containing only a simple type of value, nor have we seen an element that contains other elements mixed with character content, nor have we seen an element that has no content at all. In this section we will examine these variations in the content models of elements.

Complex Types from Simple Types

Let us first consider how to declare an element that has an attribute and contains a simple value. In an instance document, such an element might appear as:

<internationalPrice currency="EUR">423.46</internationalPrice>

The purchase order schema declares a USPrice element that is a starting point:

<xsd:element name="USPrice" type="decimal"/>

Now, how do we add an attribute to this element? As we have said before, simple types cannot have attributes, and decimal is a simple type.

Therefore, we must define a complex type to carry the attribute declaration. We also want the content to be simple type decimal. So our original question becomes: How do we define a complex type that is based on the simple type decimal? The answer is to derive a new complex type from the simple type decimal:


Deriving a ComplexType from a SimpleType
<xsd:element name="internationalPrice">
   <xsd:complexType>
     <xsd:simpleContent>
       <xsd:extension base="xsd:decimal">
         <xsd:attribute name="currency" type="xsd:string"/>
       </xsd:extension>
     </xsd:simpleContent>
   </xsd:complexType>
</xsd:element>

We use the complexType element to start the definition of a new (anonymous) type. To indicate that the content model of the new type contains only character data and no elements, we use a simpleContent element. Finally, we derive the new type by extending the simple decimal type. The extension consists of adding a currency attribute using a standard attribute declaration. (We cover type derivation in detail in Section 4.) The internationalPrice element declared in this way will appear in an instance as shown in the example at the beginning of this section.

Mixed Content

The construction of the purchase order schema may be characterized as elements containing subelements, and the deepest subelements contain character data. XML Schema also provides for the construction of schemas where character data can appear alongside subelements, and character data is not confined to the deepest subelements.

To illustrate, consider the following snippet from a customer letter that uses some of the same elements as the purchase order:


Snippet of Customer Letter
<letterBody>
  <salutation>Dear Mr.<name>Robert Smith</name>.</salutation>
  Your order of <quantity>1</quantity> <productName>Baby
  Monitor</productName> shipped from our warehouse on
  <shipDate>1999-05-21</shipDate>. ....
</letterBody>

Notice the text appearing between elements and their child elements. Specifically, text appears between the elements salutation, quantity, productName and shipDate which are all children of letterBody, and text appears around the element name which is the child of a child of letterBody. The following snippet of a schema declares letterBody:


Snippet of Schema for Customer Letter
<xsd:element name="letterBody">
   <xsd:complexType mixed="true">
     <xsd:sequence>
       <xsd:element name="salutation">
         <xsd:complexType mixed="true">
           <xsd:sequence>
             <xsd:element name="name" type="xsd:string"/>
           </xsd:sequence>
         </xsd:complexType>
       </xsd:element>
       <xsd:element name="quantity"    type="xsd:positiveInteger"/>
       <xsd:element name="productName" type="xsd:string"/>
       <xsd:element name="shipDate"    type="xsd:date" minOccurs="0"/>
       <!-- and so on -->
     </xsd:sequence>
   </xsd:complexType>
</xsd:element>

The elements appearing in the customer letter are declared, and their types are defined using the element and complexType element constructions seen previously. To enable character data to appear between the child-elements of letterBody, the mixed attribute on the type definition is set to true.

Note that the mixed model in XML Schema differs fundamentally from the mixed model in XML 1.0. Under the XML Schema mixed model, the order and number of child elements appearing in an instance must agree with the order and number of child elements specified in the model. In contrast, under the XML 1.0 mixed model, the order and number of child elements appearing in an instance cannot be constrained. In summary, XML Schema provides full validation of mixed models in contrast to the partial schema validation provided by XML 1.0.

Empty Content

Now suppose that we want the internationalPrice element to convey both the unit of currency and the price as attribute values rather than as separate attribute and content values. For example:

<internationalPrice currency="EUR" value="423.46"/>

Such an element has no content at all; its content model is empty.


An Empty Complex Type

To define a type whose content is empty, we essentially define a type that allows only elements in its content, but we do not actually declare any elements and so the type content model is empty:

<xsd:element name="internationalPrice">
  <xsd:complexType>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="currency" type="xsd:string"/>
        <xsd:attribute name="value"    type="xsd:decimal"/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
</xsd:element>

In this example, we define an (anonymous) type having complexContent, that is, only elements. The complexContent element signals that the intent to restrict or extend the content model of a complex type, and the restriction of anyType declares two attributes but does not introduce any element content (see Section 4.4 of the XML Schema Primer, for more details on restriction. The internationalPrice element declared in this way may legitimately appear in an instance as shown in the preceding example.


Shorthand for an Empty Complex Type

The preceding syntax for an empty-content element is relatively verbose, and it is possible to declare the internationalPrice element more compactly:

<xsd:element name="internationalPrice"
  <xsd:complexType>
    <xsd:attribute name="currency" type="xsd:string"/>
    <xsd:attribute name="value"    type="xsd:decimal"/>
  </xsd:complexType>
</xsd:element>

This compact syntax works because a complex type defined without any simpleContent or complexContent is interpreted as shorthand for complex content that restricts anyType.

AnyType

The anyType represents an abstraction called the ur-type which is the base type from which all simple and complex types are derived. An anyType type does not constrain its content in any way. It is possible to use anyType like other types, for example:

<xsd:element name="anything" type="xsd:anyType"/>

The content of the element declared in this way is unconstrained, so the element value may be 423.46, but it may be any other sequence of characters as well, or indeed a mixture of characters and elements. In fact, anyType is the default type when none is specified, so the preceding could also be written as follows:

<xsd:element name="anything"/>

If unconstrained element content is required, for example in the case of elements containing prose which requires embedded markup to support internationalization, then the default declaration or a slightly restricted form of it may be suitable. The text type described in Section 5.5 is an example of such a type that is suitable for such purposes.

Annotations

XML Schema provides three elements for annotating schemas for the benefit of both human readers and applications. In the purchase order schema, we put a basic schema description and copyright information inside the documentation element, which is the recommended location for human readable material. We recommend you use the xml:lang attribute with any documentation elements to indicate the language of the information. Alternatively, you may indicate the language of all information in a schema by placing an xml:lang attribute on the schema element.

The appInfo element, which we did not use in the purchase order schema, can be used to provide information for tools, style sheets and other applications. An interesting example using appInfo is a schema that describes the simple types in XML Schema Part 2: Datatypes.

Information describing this schema, for example, which facets are applicable to particular simple types, is represented inside appInfo elements, and this information was used by an application to automatically generate text for the XML Schema Part 2 document.

Both documentation and appInfo appear as subelements of annotation, which may itself appear at the beginning of most schema constructions. To illustrate, the following example shows annotation elements appearing at the beginning of an element declaration and a complex type definition:


Annotations in Element Declaration and Complex Type Definition
<xsd:element name="internationalPrice">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
      element declared with anonymous type
    </xsd:documentation>
  </xsd:annotation>
  <xsd:complexType>
    <xsd:annotation>
      <xsd:documentation xml:lang="en">
        empty anonymous type with 2 attributes
      </xsd:documentation>
    </xsd:annotation>
    <xsd:complexContent>
      <xsd:restriction base="xsd:anyType">
        <xsd:attribute name="currency" type="xsd:string"/>
        <xsd:attribute name="value"    type="xsd:decimal"/>
      </xsd:restriction>
    </xsd:complexContent>
  </xsd:complexType>
</xsd:element>

The annotation element may also appear at the beginning of other schema constructions such as those indicated by the elements schema, simpleType, and attribute.

Building Content Models

The definitions of complex types in the purchase order schema all declare sequences of elements that must appear in the instance document. The occurrence of individual elements declared in the so-called content models of these types may be optional, as indicated by a 0 value for the attribute minOccurs (for example, in comment), or be otherwise constrained depending upon the values of minOccurs and maxOccurs.

XML Schema also provides constraints that apply to groups of elements appearing in a content model. These constraints mirror those available in XML 1.0 plus some additional constraints. Note that the constraints do not apply to attributes.

XML Schema enables groups of elements to be defined and named, so that the elements can be used to build up the content models of complex types (thus mimicking common usage of parameter entities in XML 1.0). Un-named groups of elements can also be defined, and along with elements in named groups, they can be constrained to appear in the same order (sequence) when they are declared. Alternatively, they can be constrained so that only one of the elements may appear in an instance.

To illustrate, we introduce two groups into the PurchaseOrderType definition from the purchase order schema so that purchase orders may contain either separate shipping and billing addresses, or a single address for those cases in which the shippee and billee are co-located:


Nested Choice and Sequence Groups
<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
    <xsd:choice>
      <xsd:group ref="shipAndBill"/>
      <xsd:element name="singleUSAddress" type="USAddress"/>
    </xsd:choice>
    <xsd:element ref="comment" minOccurs="0"/>
    <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>

<xsd:group name="shipAndBill">
  <xsd:sequence>
    <xsd:element name="shipTo" type="USAddress"/>
    <xsd:element name="billTo" type="USAddress"/>
  </xsd:sequence>
</xsd:group>

The choice group element allows only one of its children to appear in an instance. One child is an inner group element that references the named group shipAndBill consisting of the element sequence shipTo, billTo, and the second child is asingleUSAddress. Hence, in an instance document, the purchaseOrder element must contain either a shipTo element followed by a billTo element or a singleUSAddress element. The choice group is followed by the comment and items element declarations, and both the choice group and the element declarations are children of a sequence group. The effect of these various groups is that the address element(s) must be followed by comment and items elements in that order.

There exists a third option for constraining elements in a group: All the elements in the group may appear once or not at all, and they may appear in any order. The all group (which provides a simplified version of the SGML &-Connector) is limited to the top-level of any content model.

Moreover, the group children must all be individual elements (no groups), and no element in the content model may appear more than once, that is, the permissible values of minOccurs and maxOccurs are 0 and 1.

For example, to allow the child elements of purchaseOrder to appear in any order, we could redefine PurchaseOrderType as:


An 'All' Group
<xsd:complexType name="PurchaseOrderType">
  <xsd:all>
    <xsd:element name="shipTo" type="USAddress"/>
    <xsd:element name="billTo" type="USAddress"/>
    <xsd:element ref="comment" minOccurs="0"/>
    <xsd:element name="items"  type="Items"/>
  </xsd:all>
  <xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>

By this definition, a comment element may optionally appear within purchaseOrder, and it may appear before or after any shipTo, billTo and items elements, but it can appear only once. Moreover, the stipulations of an all group do not allow us to declare an element such as comment outside the group as a means of enabling it to appear more than once. XML Schema stipulates that an all group must appear as the sole child at the top of a content model. In other words, the following is not permitted:


'All' Group Example: Not Permitted
<xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
    <xsd:all>
          <xsd:element name="shipTo" type="USAddress"/>
          <xsd:element name="billTo" type="USAddress"/>
          <xsd:element name="items"  type="Items"/>
    </xsd:all>
    <xsd:sequence>
      <xsd:element ref="comment" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>

Finally, named and un-named groups that appear in content models (represented by group and choice, sequence, all respectively) may carry minOccurs and maxOccurs attributes. By combining and nesting the various groups provided by XML Schema, and by setting the values of minOccurs and maxOccurs, it is possible to represent any content model expressible with an XML 1.0 Document Type Definition (DTD). Furthermore, the all group provides additional expressive power.

Attribute Groups

To provide more information about each item in a purchase order, for example, each item weight and preferred shipping method, you can add weightKg and shipBy attribute declarations to the item element (anonymous) type definition:

Adding Attributes to the Inline Type Definition

<xsd:element name="Item" minOccurs="0" maxOccurs="unbounded">
   <xsd:complexType>
     <xsd:sequence>
       <xsd:element   name="productName" type="xsd:string"/>
       <xsd:element   name="quantity">
         <xsd:simpleType>
           <xsd:restriction base="xsd:positiveInteger">
             <xsd:maxExclusive value="100"/>
           </xsd:restriction>
         </xsd:simpleType>
       </xsd:element>
       <xsd:element name="USPrice"  type="xsd:decimal"/>
       <xsd:element ref="comment"   minOccurs="0"/>
       <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>
     <xsd:attribute name="partNum"  type="SKU" use="required"/>
     <!-- add weightKg and shipBy attributes -->
     <xsd:attribute name="weightKg" type="xsd:decimal"/>
     <xsd:attribute name="shipBy">
       <xsd:simpleType>
         <xsd:restriction base="xsd:string">
           <xsd:enumeration value="air"/>
           <xsd:enumeration value="land"/>
           <xsd:enumeration value="any"/>
         </xsd:restriction>
       </xsd:simpleType>
     </xsd:attribute>
   </xsd:complexType>
</xsd:element>

Alternatively, you can create a named attribute group containing all the desired attributes of an item element, and reference this group by name in the item element declaration:

Adding Attributes Using an Attribute Group

<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
   <xsd:complexType>
     <xsd:sequence>
       <xsd:element name="productName" type="xsd:string"/>
       <xsd:element name="quantity">
         <xsd:simpleType>
           <xsd:restriction base="xsd:positiveInteger">
             <xsd:maxExclusive value="100"/>
           </xsd:restriction>
         </xsd:simpleType>
       </xsd:element>
       <xsd:element name="USPrice"  type="xsd:decimal"/>
       <xsd:element ref="comment"   minOccurs="0"/>
       <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>

     <!-- attributeGroup replaces individual declarations -->
     <xsd:attributeGroup ref="ItemDelivery"/>
   </xsd:complexType>
</xsd:element>

<xsd:attributeGroup name="ItemDelivery">
  <xsd:attribute name="partNum"  type="SKU" use="required"/>
  <xsd:attribute name="weightKg" type="xsd:decimal"/>
  <xsd:attribute name="shipBy">
    <xsd:simpleType>
      <xsd:restriction base="xsd:string">
        <xsd:enumeration value="air"/>
        <xsd:enumeration value="land"/>
        <xsd:enumeration value="any"/>
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:attribute>
</xsd:attributeGroup>

Using an attribute group in this way can improve the readability of schemas, and facilitates updating schemas because an attribute group can be defined and edited in one place and referenced in multiple definitions and declarations. These characteristics of attribute groups make them similar to parameter entities in XML 1.0. Note that an attribute group may contain other attribute groups. Note also that both attribute declarations and attribute group references must appear at the end of complex type definitions.

Nil Values

One of the purchase order items listed in po.xml, the Lawnmower, does not have a shipDate element. Within the context of our scenario, the schema author may have intended such absences to indicate items not yet shipped. But in general, the absence of an element does not have any particular meaning: It may indicate that the information is unknown, or not applicable, or the element may be absent for some other reason. Sometimes it is desirable to represent an unshipped item, unknown information, or inapplicable information explicitly with an element, rather than by an absent element.

For example, it may be desirable to represent a null value being sent to or from a relational database with an element that is present. Such cases can be represented using the XML Schema nil mechanism which enables an element to appear with or without a non-nil value.

The XML Schema nil mechanism involves an out of band nil signal. In other words, there is no actual nil value that appears as element content, instead there is an attribute to indicate that the element content is nil. To illustrate, we modify the shipDate element declaration so that nils can be signalled:

<xsd:element name="shipDate" type="xsd:date" nillable="true"/>

And to explicitly represent that shipDate has a nil value in the instance document, we set the nil attribute (from the XML Schema namespace for instances) to true:

<shipDate xsi:nil="true"></shipDate>

The nil attribute is defined as part of the XML Schema namespace for instances, http://www.w3.org/2001/XMLSchema-instance, and so it must appear in the instance document with a prefix (such as xsi:) associated with that namespace. (As with the xsd: prefix, the xsi: prefix is used by convention only.) Note that the nil mechanism applies only to element values, and not to attribute values. An element with xsi:nil="true" may not have any element content but it may still carry attributes.

How DTDs and XML Schema Differ

DTD is a mechanism provided by XML 1.0 for declaring constraints on XML markup. DTDs enable you to specify the following:

The XML Schema language serves a similar purpose to DTDs, but it is more flexible in specifying XML document constraints and potentially more useful for certain applications.


XML Example

Consider the XML document:

<?xml version="1.0">
<publisher pubid="ab1234">
  <publish-year>2000</publish-year>
  <title>The Cat in the Hat</title>
  <author>Dr. Seuss</author>
  <artist>Ms. Seuss</artist>
  <isbn>123456781111</isbn>
</publisher>

DTD Example

Consider a typical DTD for the foregoing XML document:

<!ELEMENT publisher (year,title, author+, artist?, isbn)>
<!ELEMENT publisher (year,title, author+, artist?, isbn)>
<!ELEMENT publish-year (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT artist (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
...

XML Schema Example

The XML schema definition equivalent to the preceding DTD example is:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
  <element name="publisher">
    <complexType>
      <sequence>
       <element name="publish-year" type="short"/>
       <element name="title" type="string"/>
       <element name="author" type="string" maxOccurs="unbounded"/>
       <element name="artist" type="string" nillable="true" minOccurs="0"/>
       <element name="isbn" type="long"/>
      </sequence>
      <attribute name="pubid" type="hexBinary" use="required"/>
    </complexType>
  </element>
</schema>

DTD Limitations

DTDs, also known as XML Markup Declarations, are considered deficient in handling certain applications which include the following:

  • Document authoring and publishing

  • Exchange of metadata

  • E-commerce

  • Inter-database operations

DTD limitations include:

  • No integration with Namespace technology, meaning that users cannot import and reuse code.

  • No support of datatypes other than character data, a limitation for describing metadata standards and database schemas.

  • Applications must specify document structure constraints more flexibly than the DTD allows for.

XML Schema Features Compared to DTD Features

Table B-3 lists XML Schema features. Note that XML Schema features include DTD features.

Table B-3 XML Schema Features Compared to DTD Features

XML Schema Feature DTD Features
Built-In Datatypes

XML schemas specify a set of built-in datatypes. Some of them are defined and called primitive datatypes, and they form the basis of the type system: string, Boolean, float, decimal, double, duration, dateTime, time, date, gYearMonth, gYear, gMonthDat, gMonth, gDay, Base64Binary, HexBinary, anyURI, NOTATION, QName

Others are derived datatypes that are defined in terms of primitive types.

DTDs do not support datatypes other than character strings.

User-Defined Datatypes

Users can derive their own datatypes from the built-in datatypes. There are three ways of datatype derivation: restriction, list, and union. Restriction defines a more restricted datatype by applying constraining facets to the base type, list simply allows a list of values of its item type, and union defines a new type whose value can be of any of its member types.

The publish-year element in the DTD example cannot be constrained further.

For example, to specify that the value of publish-year type to be within a specific range:
<element name="publish-year">
     <simpleType>
      <restriction base="short"
       <minInclusive value="1970"/
       <maxInclusive value="2000"/>
      </restriction>
     </simpleType>
    </element>

Constraining facets are: length, minLength, maxLength, pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minInclusive, minExclusive, totalDigits, fractionDigits

Note that some facets only apply to certain base types.

--
Occurrence Indicators (Content Model or Structure)

In XML Schema, the structure (called complexType) of an instance document or element is defined in terms of model and attribute groups. A model group may further contain model groups or element particles, while an attribute group contains attributes.

Wildcards can be used in both model and attribute groups. There are three types of model group: sequence, all, and choice, representing the sequence, conjunction, and disjunction relationships among particles, respectively. The range of the number of occurrences of each particle can also be specified.

--
Like the datatype, complexType can be derived from other types. The derivation method can be either restriction or extension. The derived type inherits the content of the base type plus corresponding modifications. In addition to inheritance, a type definition can make references to other components. This feature allows a component to be defined once and used in many other structures.

The type declaration and definition mechanism in XML Schema is much more flexible and powerful than in DTDs.

--
minOccurs, maxOccurs Control by DTDs over the number of child elements in an element are assigned with the following symbols:
  • ? = zero or one. In "DTD Example", artist? implied that artist is optional.

  • * = zero or more.

  • + = one or more in the "DTD Example", author+ implies that more than one author is possible.

  • (none) = exactly one.

Identity Constraints

XML Schema extends the concept of the XML ID/IDREF mechanism with the declarations of unique, key and keyref. They are part of the type definition and allow not only attributes, but also element content as keys. Each constraint has a scope. Constraint comparison is in terms of their value rather than lexical strings.

None.
Import or Export Mechanisms (Schema Import, Inclusion and Modification)

All components of a schema need not be defined in a single schema file. XML Schema provides a mechanism for assembling multiple XML schemas. Import is used to integrate XML schemas that use different namespaces, while inclusion is used to add components that have the same namespace. When components are included, they can be modified using redefinition.

You cannot use constructs defined in external schemas.


XML schema can be used to define a class of XML documents.


Instance XML Documents

An instance XML document describes an XML document that conforms to a particular XML schema. Although these instances and XML schemas need not exist specifically as documents, they are commonly referred to as files. They may however exist as any of the following:

  • Streams of bytes

  • Fields in a database record

  • Collections of XML Infoset information items

Oracle XML DB supports the W3C XML Schema Recommendation specifications of May 2, 2001: http://www.w3.org/2001/XMLSchema.

Converting Existing DTDs to XML Schema?

Some XML editors, such as XMLSpy, facilitate the conversion of existing DTDs to XML schemas, however you are still required to add more typing and validation declarations to the resulting XML schema definition file before it can be useful as an XML schema.

XML Schema Example, PurchaseOrder.xsd

The following example PurchaseOrder.xsd, is a W3C XML Schema example, in its native form, as an XML Document. PurchaseOrder.xsd XML schema is used for the examples described in Chapter 3, " Using Oracle XML DB":

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:complexType name="ActionsType">
     <xs:sequence>
       <xs:element name="Action" maxOccurs="4">
         <xs:complexType>
           <xs:sequence>
             <xs:element ref="User"/>
             <xs:element ref="Date"/>
           </xs:sequence>
         </xs:complexType>
       </xs:element>
     </xs:sequence>
   </xs:complexType>
   <xs:complexType name="RejectType">
     <xs:all>
       <xs:element ref="User" minOccurs="0"/>
       <xs:element ref="Date" minOccurs="0"/>
       <xs:element ref="Comments" minOccurs="0"/>
     </xs:all>
   </xs:complexType>
   <xs:complexType name="ShippingInstructionsType">
     <xs:sequence>
       <xs:element ref="name"/>
       <xs:element ref="address"/>
       <xs:element ref="telephone"/>
     </xs:sequence>
   </xs:complexType>
   <xs:complexType name="LineItemsType">
     <xs:sequence>
       <xs:element name="LineItem" type="LineItemType" maxOccurs="unbounded"
     </xs:sequence>
   </xs:complexType>
   <xs:complexType name="LineItemType">
     <xs:sequence>
       <xs:element ref="Description"/>
       <xs:element ref="Part"/>
     </xs:sequence>
     <xs:attribute name="ItemNumber" type="xs:integer"/>
   </xs:complexType>
   <!--

   -->
   <xs:element name="PurchaseOrder">
     <xs:complexType>
       <xs:sequence>
         <xs:element ref="Reference"/>
         <xs:element name="Actions" type="ActionsType"/>
         <xs:element name="Reject" type="RejectType" minOccurs="0"/>
         <xs:element ref="Requestor"/>
         <xs:element ref="User"/>
         <xs:element ref="CostCenter"/>
         <xs:element name="ShippingInstructions" 
                     type="ShippingInstructionsType"/>
         <xs:element ref="SpecialInstructions"/>
         <xs:element name="LineItems" type="LineItemsType"/>
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   <xs:simpleType name="money">
     <xs:restriction base="xs:decimal">
       <xs:fractionDigits value="2"/>
       <xs:totalDigits value="12"/>
     </xs:restriction>
   </xs:simpleType>
   <xs:simpleType name="quantity">
     <xs:restriction base="xs:decimal">
       <xs:fractionDigits value="4"/>
       <xs:totalDigits value="8"/>
     </xs:restriction>
   </xs:simpleType>
   <xs:element name="User">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="10"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Requestor">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="0"/>
         <xs:maxLength value="128"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Reference">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="26"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="CostCenter">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="4"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Vendor">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="0"/>
         <xs:maxLength value="20"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="PONumber">
     <xs:simpleType>
       <xs:restriction base="xs:integer"/>
     </xs:simpleType>
   </xs:element>
   <xs:element name="SpecialInstructions">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="0"/>
         <xs:maxLength value="2048"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="name">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="20"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="address">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="256"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="telephone">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="24"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Date" type="xs:date"/>
   <xs:element name="Comments">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="2048"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Description">
     <xs:simpleType>
       <xs:restriction base="xs:string">
         <xs:minLength value="1"/>
         <xs:maxLength value="256"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:element>
   <xs:element name="Part">
     <xs:complexType>
       <xs:attribute name="Id">
         <xs:simpleType>
           <xs:restriction base="xs:string">
             <xs:minLength value="12"/>
             <xs:maxLength value="14"/>
           </xs:restriction>
         </xs:simpleType>
       </xs:attribute>
       <xs:attribute name="Quantity" type="money"/>
       <xs:attribute name="UnitPrice" type="quantity"/>
     </xs:complexType>
   </xs:element>
</xs:schema>