Oracle Consulting Oracle Training Oracle Support Development
Oracle Books
SQL Server Books
IT Books
Job Interview Books
Rampant Horse Books
911 Series
Pedagogue Books

Oracle Software
Write for Rampant
Publish with Rampant
Rampant News
Rampant Authors
Rampant Staff
Oracle News
Oracle Forum
Oracle Tips
Articles by our Authors
Press Releases
SQL Server Books

Oracle 11g Books

Oracle tuning

Oracle training

Oracle support

Remote Oracle


Privacy Policy



Managing XML Data in Oracle XML DB 11g

by V.J. Jain - From Oracle Technical Network

Over the past few years, XML has emerged as the new standard for data transmission, and its use is becoming more prevalent as companies continue to adopt XML-based solutions. As more organizations begin to enforce XML standards for all data transmission, increasingly complex XML formats are emerging. These complex formats can include multiple namespaces, thousands of elements, and recursive definitions. As the XML documents produced from these formats grow in size and complexity, managing this content has become increasingly challenging, with limited information available on how to address this challenge.

In this article, you will learn how to use the XML DB feature in Oracle Database 11g to manage complex XML content as well as its advantages over commercial ETL products. You will see an example of a complex XML schema that demonstrates the following:

  • Registration of a complex XML schema

  • Insertion of XML files into the database

  • Retrieval of XML data via relational query

  • In-Place Evolution for XML schema modifications

Furthermore, you will get an overview of strategies for maximizing the performance and throughput of Oracle XML DB solutions and practical applications of complex XML formats.

Oracle XML DB Background

Oracle XML DB is a feature of Oracle Database that provides a powerful tool for managing XML content, including storage, manipulation, and retrieval. It offers different storage options to meet the unique requirements of different XML formats. These options include unstructured, binary, and structured storage:

  • Unstructured (character large object, or CLOB). By treating the document as one large object and storing it in the database, this method allows for the best insertion times. However, this storage method also consumes the most space and has the worst performance for relational access to the data. This is an impractical solution for managing large, complex XML documents if relational access is required. Unstructured storage can be a practical solution if disk space is not an issue and the objective is to archive the documents in their original format.

  • Binary storage. This option, new in Oracle Database 11g, stores data in a postparse binary format designed specifically for XML data. This option has several advantages over unstructured storage, in that it is XML-schema-aware, allowing better disk space efficiency and query performance. Although this option offers incredible performance compared to that of unstructured storage, it does not have the same query performance as structured storage.

  • Structured storage. Also known as schema-based storage, this option uses an object-relational model to store XML documents in the database. This storage option is the most efficient in terms of disk space and relational access. It also has the highest overhead during file insertion and requires additional preparation for schema registration.

The perception of the size and complexity of an XML document can differ greatly, depending on the organization. On one hand, for online transaction processing (OLTP) databases using XML for their electronic data interchange (EDI) or other transactional data exchange, a file with several thousand lines might be regarded as a very large file. On the other hand, a multiterabyte data warehouse might regularly process XML documents measured in gigabytes and not consider a file to be large unless it contains millions of lines. The same concept holds for the perceived complexity of an XML document.

For purposes of this article, a document is considered "complex" if it has the following properties:

  • It is single-rooted, with multiple namespaces.

  • It has flexible XML definitions, allowing for great variations while maintaining validity.

  • It has recursive or circular/cyclical references.

  • It has nonstatic XML schemas.

In this article, XML documents are considered "large" if they are single-rooted and are greater than 20MB. These properties introduce certain scalability and management considerations that must be addressed for a robust enterprise solution.

There are no golden rules for choosing the best storage option. Based on the file structure, performance objectives, available resources, and expected volume of data, the best option will vary. If you cannot decide which storage option is the best for your particular requirement, it is worthwhile to try the different formats and determine which is optimal for your specific needs.

Maximizing Throughput with Structured Storage

Although there is an overhead cost for inserting files when you are using the structured storage option, you can reduce this cost by splitting large documents into smaller pieces. For example, if there is a 700MB single-rooted XML file, it may be possible to split it into 10 smaller files that are valid with the XML schema. Inserting 10 different 70MB files is much faster than inserting a single 700MB file and achieves the same end result. The level of concurrency should be determined by the available processing power of the database. This strategy takes advantage of the database concurrency and maximizes throughput.

Another consideration when using Oracle XML DB is that the insertion time for a single XML file is generally limited by a single CPU's speed. In other words, having multiple processors does not help the throughput of a single document. For example, consider a situation in which a complex, single-rooted 700MB document needs to be inserted by use of schema-based storage. The total time to insert this file might be 10 minutes with a 1.35GHz processor, whether the database has 12 or 72 CPUs (assuming that at least one CPU is available in each scenario).

Unfortunately, calculating these comparisons is not an exact science. Each database may perform differently, depending on several factors, including the operating system, available processing power, memory, and schema definition. The best way to optimize performance in your environment is to implement these strategies and determine the benefits of each of them.

Oracle XML DB Versus Commercial Off-the-Shelf (COTS) ETL Products

Several commercial extract, transform, and load (ETL) tools are available for loading data from files into the database. These tools usually feature a simple front end with drag-and-drop capability and can hide the complexity of the actual process. When you are creating an XML loading process in a commercial ETL tool, it is necessary to define the fields that need to be extracted. If a certain XPath is not specified, the data will not be collected from the document.

In the case of complex XML documents, the advantage of Oracle XML DB is that it takes a database-centric view of documents and shreds each document into an object-relational model. When the file is successfully inserted into the database, any data contained within that file is readily accessible without reparsing.

One of the best benefits of Oracle XML DB is that it is a standard feature of Oracle Database and does not require additional licensing. But if licensing fees were not a concern for an organization, why would it use Oracle XML DB for its XML content management instead of a tool from a company with ETL as its primary focus? To understand the answer, it is necessary to understand how Oracle XML DB technology meets the unique requirements of managing very complex XML.

The Challenges of Complex XML Content

When XML content is flexible, frequently changing, recursive, and very large, a developer will undoubtedly encounter certain challenges that might be absent from simpler formats. One possibility with a complex XML document is that its XML schema might be very large and allow a great amount of flexibility while remaining valid. The use of flexible XML schemas is a common strategy for supporting industrywide standards while accommodating company-specific requirements.

For example, consider an XML schema that is adopted as a standard for three companies. In addition to the shared or standard elements, this XML schema will need to include all of the company-specific definitions for each of the companies. To support this requirement, the XML schema is designed with a great deal of flexibility by use of generic container elements that reference all possible company-specific elements and allow most element references to occur zero or more times. As a result, documents with completely different elements are valid with the same XML schema. From a development perspective, this flexibility makes it difficult to write XML parsers to extract the necessary data, because the occurrence of elements is difficult to predict. Because COTS ETL tools and custom parsers require a specific XPath for extracting data, every possible XPath would need to be checked to guarantee the capture of all data from the document.

This XML schema could be used as an industry standard for transmitting sales and inventory data. Note that in the standardData.xsd namespace, both the childContainer element and the companySpecificContainer element are optionally self-referencing. In this particular definition, this design enables each individual company to decide the granularity of its data using the parent/child relationship. This schema also gives each company the option of including inventory data, sales data, or both. It further enables each company to include zero or more stores, products, and sales, based on its individual needs but within the same flexible format.

These documents are very different, but both are valid in the XML schema. This design facilitates the use of industry wide standards across different companies, by using a single, flexible format in the XML schema. Furthermore, the use of recursive references for generic container elements enables each company to decide how much detail to include while maintaining compliance with the XML schema. For example, one XPath for extracting quantity-on-hand data from CompanyABC.xml would be


However, if this company wanted to include store data with a finer granularity, it could add another child element for substores, such as


The design of this XML schema enables the files to be rich with content that is determined by each individual company. By using the generic container elements and optional references, each document can begin to resemble a database of its own. Due to the unpredictability of each file, managing this XML with custom code or a COTS ETL tool would be an intense development effort that would likely require continuous maintenance and support.


The Oracle XML DB Solution

This problem with XPath mapping, as demonstrated in the above example, does not exist with Oracle XML DB, because the entire file is stored in the database. As soon as the document has been inserted, the content from the document is immediately available for query. Regardless of how flexible the XML schema might be, any data contained in the document can be accessed with the appropriate XPath. This provides an unmatched advantage that maximizes availability and minimizes the cost of maintenance.

To implement the XML DB solution, start by registering the XML schema (the definition files and documents are in a directory called XML_TEST):

        schemaurl => 'CompanySpecific.1.0.xsd',
        schemadoc => BFILENAME ('XML_TEST','CompanySpecific.1.0.xsd')
        schemaurl => 'standardData.xsd',
        schemadoc => BFILENAME ('XML_TEST','standardData.xsd')
        schemaurl => 'startData.xsd',
        schemadoc => BFILENAME ('XML_TEST','startData.xsd')

Now that you have created the object-relational structure for our XML schema, you are ready to insert XML files into the default table (specified in the annotation of the root element).


The file is successfully inserted in a fraction of a second. Immediately the data is available for relational access. Here is an example of a query access that describes the current inventory:


SELECT extractValue(object_value,'/rootElement/fileFooter/@timeStamp')  start_date, 
extractValue(object_value,'/rootElement/fileHeader/@companyName') companyName, 
extractValue(object_value,'/rootElement/fileHeader/@fileFormat') fileFormat,
   'xmlns:xn="standardData.xsd"') childLabel,
   'xmlns:xn="standardData.xsd"') childType,
'xmlns:xn="standardData.xsd"') csDataType,
   'xmlns:xn="standardData.xsd"') csDataFormat,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreLabel,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreName,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csStoreValue,
   'xmlns:xn="standardData.xsd"') csDataTypeL2,
   'xmlns:xn="standardData.xsd"') csDataFormatL2,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductLabel,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductName,
   'xmlns:xn="standardData.xsd" xmlns:cs="CompanySpecific.1.0.xsd"') csProductValue
,TABLE(XMLSequence(Extract(object_value, '/rootElement/fileData/xn:childContainer', 
   'xmlns:xn="standardData.xsd" xmlns="startData.xsd"'))) b
,TABLE(XMLSequence(Extract(value(b), '/xn:childContainer/xn:CompanySpecificContainer', 
   'xmlns:xn="standardData.xsd"'))) c
,TABLE(XMLSequence(Extract(value(c), '/xn:CompanySpecificContainer/xn:CompanySpecificContainer', 
   'xmlns:xn="standardData.xsd"'))) d

This solution allows for the management of any documents that conform to the XML schema definition. Oracle XML DB's database-centric view of XML documents enables an XML document to be content-rich while providing a layer of abstraction from the complexities of development that would be unavoidable with any other ETL technology.

This example demonstrates the unmatched XML storage-and-retrieval capabilities of Oracle XML DB, but a complex XML schema is also characterized as having frequent changes to the XML schema definition. How does Oracle XML DB facilitate changes to the XML schema?

Managing XML Schema Changes

Changes to the schema definitions have always been problematic for XML-based solutions. This is expected, because an XML schema is supposed to be used to validate an XML document. If the document contains elements that are not defined in the schema, the document is not valid. In the past, this has been a weakness of Oracle XML DB, because changes to registered XML schemas required a complex and costly operation using the copyEvolve procedure.

To address this limitation, Oracle has introduced in Oracle XML DB 11g a new procedure called inPlaceEvolve, which allows the same schema modifications to be an online operation that does not require data movement. Instead, this procedure modifies the database objects created during the schema registration while leaving the related data in place. This enhancement is crucial in addressing the frequent definition changes common with complex XML schemas.

With Oracle XML DB 11g, this modification is fast and requires minimal resources. The new procedure requires the schema URL and an XMLType document (XMLDiff) that conforms to the xdiff XML schema. The XMLDiff document is a specially formatted document that reflects the changes in the XML schema. Instead of your having to manually create the XMLDiff document, it can be done automatically with the xmldiff function in Oracle Database.

First, create a new schema file with the updated XML schema definition CompanySpecific.1.1.xsd. Then use the Oracle Database xmldiff function with the old schema as the first parameter and the new schema as the second parameter:

var oldSchemaDoc clob;
var newSchemaDoc clob;
     :oldSchemaDoc := xmltype( bfilename ( 'XML_TEST', 'CompanySpecific.1.0.xsd') , 
        nls_charset_id('AL32UTF8') ).getClobVal();
      :newSchemaDoc := xmltype( bfilename ( 'XML_TEST', 'CompanySpecific.1.1.xsd') ,
        nls_charset_id('AL32UTF8') ).getClobVal();

You can view the XMLDiff document by using

select xmldiff(xmltype(:oldSchemaDoc),xmltype(:newSchemaDoc)).getClobVal() from dual;

For simplicity, use a CLOB variable to store the XMLDiff document.

var diffXMLDoc clob;
      select xmldiff(xmltype(:oldSchemaDoc),xmltype(:newSchemaDoc)).getClobVal() 
into :diffXMLDoc from dual;

Now that the XMLDiff document is available, the inPlaceEvolve procedure can be called to modify the XML schema online.

SQL> alter session set events = '31150 trace name context forever, level 0x200000';

Session altered.

  2  DBMS_XMLSCHEMA.inPlaceEvolve('CompanySpecific.1.0.xsd',  xmltype(:diffXMLDoc) );
  3  END;
  4  /

PL/SQL procedure successfully completed.

An examination of the trace file shows that the database type that represents the modified complex element was altered to include the new element:

change to ct  sqltype = csProduct1046_T
 ------------ QMTS Executing SQL ------------ 

The introduction of this new feature makes it possible to efficiently manage changes to XML schemas, even with a large volume of documents, without requiring a window of unavailability during the schema evolution. This is a major enhancement that improves the feasibility of using Oracle XML DB for large enterprise solutions.

Practical Applications

Oracle XML DB 11g offers a complete, efficient content management solution that is far more useful and practical than any of the alternatives. By taking a database-centric view of each document, Oracle XML DB provides an innovative and powerful method for storing and retrieving XML content. The various storage options provide the ability to efficiently manage all documents, regardless of size and complexity. The new feature for efficiently modifying XML schema definitions makes Oracle XML DB an ideal solution for frequently changing content. These features meet the challenges of managing complex XML content and provide a solution that deserves serious consideration by all organizations.



 Copyright © 1996 -2016 by Burleson. All rights reserved.

Oracle® is the registered trademark of Oracle Corporation. SQL Server® is the registered trademark of Microsoft Corporation. 
Many of the designations used by computer vendors to distinguish their products are claimed as Trademarks