Oracle Consulting Oracle Training Oracle Support Development
Home
Catalog
Oracle Books
SQL Server Books
IT Books
Job Interview Books
eBooks
Rampant Horse Books
911 Series
Pedagogue Books

Oracle Software
image
Write for Rampant
Publish with Rampant
Rampant News
Rampant Authors
Rampant Staff
 Phone
 800-766-1884
Oracle News
Oracle Forum
Oracle Tips
Articles by our Authors
Press Releases
SQL Server Books
image
image

Oracle 11g Books

Oracle tuning

Oracle training

Oracle support

Remote Oracle

STATSPACK Viewer

Privacy Policy

 

 
 

Oracle RAC Global Block Management

by Donald K. Burleson

As a review of our last installment of this article, we learned about the evolution and internal mechanisms for data cache management in multi-instance Oracle10g Grid databases. As you may recall, Oracle instances require three main types of concurrency locking:

  • Concurrent reads on multiple instances — When users on two different instances need to read the same set of blocks.
  • Concurrent reads and writes on different instances — A user intends to read a data block that was recently modified, and the read can be for either the current version of the block, or for a read-consistent previous version.
  • Concurrent writes on different instances — When the same set of data blocks are modified by different users on different instances

The Global Cache Services (GCS) is the RAC process mechanism for maintaining cache coherency. Next, let’s dive into the internals of multi-instance cache invalidation and cache coherency mechanisms.

RAC Invalidation Mechanism

One important aspect of the cache coherency and cache fusion architectures is the concept of block invalidation. In general, block invalidation is the process by which in-memory blocks are flagged as “invalid.” Block invalidation occurs in RAC under the following conditions:

  • Block changes on other nodes — As blocks are changed, the Global Caching Service sends messages via Cache fusion to change the status of the block.
  • Failure of nodes — Upon instance failure, Oracle RAC manages the recovery of all other instances, maintaining the status of updated blocks to ensure that no updates are lost.

However, regardless of the reason, invalidation only happens at a data block level, not at the cache level. Essentially, block invalidation involves “status changes” in the data block buffers of each RAC instance. These status changes are based on the messages transmitted across the interconnect via the cache fusion processes.

In Oracle OPS, the DLM and IDLM processes handled the invalidation of blocks by controlling the latches placed on those blocks. The latches were controlled via messages sent over the high-speed interconnect to the DLM or IDLM processes. Remember, in OPS, an invalid block has to be re-read by the instance to become valid, and this extra disk I/O was a major bottleneck.

As we already know, Oracle with cache fusion (8.1.6 and later versions of OPS and Oracle9i RAC), the blocks can be passed back and forth between the nodes and the images of the blocks can be merged to maintain the correct image, which reduces the need to totally invalidate a block and force a re-read from disk.

As individual nodes make changes to blocks and issue commits, the individual nodes are responsible for writing the blocks back to disk and flagging the appropriate latches with semaphore signals as to state changes.

Prior to Oracle’s writing their own lock manager software (for Windows and Linux platforms), each hardware vendor was responsible for implementing a layer of software allowing cluster database processing by using Operating System-Dependent (OSD) Layers. These layers provide communication links between the Operating System and the Real Application Clusters software. The OSD for each vendor is proprietary code used to coordinate the activities of the cluster that is independent of Oracle, but which Oracle depends on to track node status in the cluster.

RAC Data Block Transfer Mechanism

Oracle treats the data blocks as resources. They are synchronized while the operation of the database processing activity is in progress. Coordination of concurrent tasks is called synchronization.

GCS resources such as data blocks and enqueues are synchronized as nodes within a cluster acquire and release ownership of blocks. The synchronization provided by real application clusters maintains a cluster-wide concurrency of resources, and in turn, ensures the integrity of the shared data.

The following is a description of the RAC lock mechanism, taken from the Ault and Tumma book, Oracle9i RAC:

“The data block (or GCS resource) can be held in different resource modes, depending on whether a resource holder intends to modify the data or read the data. The modes are as follows:
  • Null (N) mode — Holding a resource at this level conveys that there are no access rights. Null mode is usually held as a placeholder, even if the resource is not actively used. [Steve Adams]
  • Shared (S) mode — When a resource is held at this level, it will ensure that the data block is not modified by another session, but will allow concurrent shared access.
  • Exclusive (X) mode — This level grants the holding process exclusive access. Other processes cannot write to the resource. It may have consistent read blocks.”

The resource mode is an important mechanism to maintain data integrity, perform escalation and avoid data corruption issues. Within RAC, the GCS resources are allowed to have global roles or local roles. These roles are mutually exclusive and serve very different purposes:

1. When a block is first read into the cache of an instance and other instances have not read the same block, then the block is said to be locally managed and is therefore assigned a local role.

2. After the block has been modified by the local instance and transmitted to another instance, it is considered to be globally managed, and is therefore assigned a global role.

Thus, Oracle treats the data buffer or data block as a resource, and coordinates among all the instances (within each node’s db_cache_size), the data buffer shipping to other instances. This mechanism is sometimes referred to as the buffer state, and it is an important way to categorize a block.

Internally, lock escalation and data block transfer is accomplished by sending lock messages to the Cache Fusion layer using the GCS processes on each node. There is no shipping of the instruction or piece of code to other instance to do a certain task, and the Cache Fusion layer manages the state for every data block in every instance.

As we will see in the following example, the total functioning of data block movement and provision of a single system image to the application user connections is based on series of escalations of buffer states. Let’s take a closer look at the block transfer steps:

1. When an Instance desires to read a data block, it makes a request to the GCS, which keeps track of the resources, location, and status. In this example, Instance 1 intends to modify or update the data block and submits a request to GCS.

2. The GCS then sends a message and forwards the request to the owning instance.

3. The holding instance then transmits a copy of the block to the requesting instance, but keeps the resource in shared mode and also retains the local role for the block. Before sending the block, the resource is downgraded to null mode and the changed (dirty) buffer is kept as a PI. Thus, the role changes to global (G) because the block is dirty. Along with the block, Instance 2 informs the requestor that it retained a PI copy and a null resource. The same message also specifies that the requestor can take the block held in exclusive mode and with a global role (X, G).

4. The receiving instance now informs the GCS of its own resource disposition (S, L) and also that of the instance that sent the block (S, L).

Hence, the block transfer involves no disk I/O. The block transfer took place through the high-speed private interconnect. This is a key feature of the Oracle RAC System and the major reason OPS was re-named to RAC. Next, let’s explore the global resource directory and see how it manages the state of all data blocks.

Inside the Global Resource Directory

As we noted, a major shortcoming of OPS was the requirement that “master” instances manage the global locks for the cluster. Changed in RAC, a global view of the blocks (in the clustered cache) is maintained in the RAC Global Resource Directory (GRD). This is an important management feature that keeps track of all the data block resources of multiple instances. The following is a description of the how the GCS and GES control data block information, also taken from the Ault and Tumma book, Oracle9i RAC:

“The GES and GCS together maintain a global resource directory (GRD) to record information about resources and enqueues. The GRD remains in the memory and is stored on all the instances. Each instance manages a portion of the directory. The distributed nature of the GRD is a key point for the fault tolerance of RAC.
The GRD is an internal database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache, the GRD is updated. The following resource information is available in GRD:
  • Data Block Addresses (DBA).
  • Location of most current versions.
  • Modes of the data blocks ((N)Null, (S)Shared, (X)Exclusive ).
  • The Roles of the data blocks (local or global).
The GRD is similar to the previous version of the lock directory from a functional perspective, but it has been expanded with more components. It contains an accurate inventory of resource status and location.”

Using this directory, Oracle RAC implementations always keeps the global view of all the caches involved in the cluster. In OPS, Oracle9i RAC and Oracle10g Grid, nodes may join and leave the cluster at will (via the Grid control mechanism in OEM), but the global view of data blocks participating in multi-instance caches is maintained through the GRD.

Now that we understand the mechanism for lock escalation and global block management in RAC, we are ready to conclude this series with a discussion of inter-instance node management. This is a core feature of Oracle10g Grid computing and an area that every DBA must understand at an intimate level.

 

 

   

 Copyright © 1996 -2016 by Burleson. All rights reserved.


Oracle® is the registered trademark of Oracle Corporation. SQL Server® is the registered trademark of Microsoft Corporation. 
Many of the designations used by computer vendors to distinguish their products are claimed as Trademarks