Oracle Consulting Oracle Training Oracle Support Development
Oracle Books
SQL Server Books
IT Books
Job Interview Books
Rampant Horse Books
911 Series
Pedagogue Books

Oracle Software
Write for Rampant
Publish with Rampant
Rampant News
Rampant Authors
Rampant Staff
Oracle News
Oracle Forum
Oracle Tips
Articles by our Authors
Press Releases
SQL Server Books

Oracle 11g Books

Oracle tuning

Oracle training

Oracle support

Remote Oracle


Privacy Policy



Oracle RAC Cache Coherency

by Donald K. Burleson

As we noted in the first installment of this RAC series, cache coherency is the mechanism to allow multiple RAM data caches (as defined by the db_cache_size and db_block_buffers parameters) to remain synchronized. This is especially critical when dozens of Oracle 10g instances (SGA regions) share a single copy of the Oracle10g Grid database.

In the Oracle RAC System, concurrency, and consistency is maintained as if it is a single image system. Even though the same set of blocks (obtained from the IO Device or Storage) is brought into cache of each system, data integrity should be maintained.

From the Ault and Tumma book, Oracle RAC and Grid, we see this excellent description of the cache coherency mechanism:

“In a RAC system, users can connect with multiple instances to run database queries. Typically, users will be connected to different nodes but access the same set of data or data blocks. This situation demands that the data consistency, formerly confined to a single instance, be effectively extended to multiple instances. Therefore, buffer cache coherence from multiple instances must be maintained.

Instances require three main types of concurrency:

  • Concurrent reads on multiple instances — When users on two different instances need to read the same set of blocks.
  • Concurrent reads and writes on different instances A user intends to read a data block that was recently modified, and the read can be for either the current version of the block, or for a read-consistent previous version.
  • Concurrent writes on different instances — When the same set of data blocks are modified by different users on different instances”

Cache Coherency demands that even though there are multiple instances (each with a separate db_cache_size data buffer region) in which data blocks can reside or brought in, block consistency must be maintained. Oracle RAC achieves this by following the inter-instance block transfers through Cache Fusion mechanism. The global cache services (GCS), which is implemented as a set of processes, organizes this facility. GCS also ensures that only one instance modifies the block at any given time. Even when the same data block is cached in different instances at the same time, global consistency is maintained.

Let’s take a closer look at the data block writing mechanism in Oracle RAC.

Data Block Writing Method

Oracle follows the concept of Dirty Block and Past Image of the block. Let’s understand what they are.

Whenever a server process changes or modifies a data block, it becomes a dirty block. Once a server process makes changes to the data block, the user may commit transactions, or transactions may not be committed for quite some time. In either case, the dirty block is not immediately written back to disk.

Writing dirty blocks to disk takes place under the following two conditions:

  • When a server process cannot find a clean, reusable buffer after scanning a threshold number of buffers, then the database writer process writes the dirty blocks to disk.
  • When the checkpoint takes place the database writer process writes the dirty blocks to disk

As we are aware, a typical data block is not written to the disk immediately, even after it becomes dirty as the result of an update. When the same dirty data block is requested by another instance for write or read purposes, an image of the block is created at the owning instance, and only that block is shipped to the requesting instance. This backup image of the block is called the past image (PI) and is kept in memory.

In the event of instance failure, Oracle can reconstruct the current version of the block by reading the PIs from RAM. It is also possible to have more than one past image in the memory depending on how many times the data block was requested in the dirty stage. The process of writing the blocks back to the I/O device (disk storage unit) depends on the checkpoint schedule defined by the DBA for the RAC cluster. Once the checkpoint interval is reached, Oracle’s Database Writer (DBWR) process initiates an asynchronous write of the dirty blocks to disk.

When the write takes place, a message is sent across Cache Fusion to change the status for the block in the other instances and the past images (PI), on all other instances are invalidated and discarded.

For more details, refer to Oracle Metalink Document Note # 139436.1 titled, “Understanding 9i Real Application Clusters Cache Fusion.”

Internal Lock Messaging in RAC

Remember, Oracle uses a lock escalation mechanism to maintain cache coherency. There can only be one block buffered in the “xcur” exclusive state in the cluster at any one time and to modify a block, each instance must assign an xcur state to the buffer containing the block.
For example, if another instance requests reading the same block in its most current version, then oracle sends a message to change the access mode from exclusive to shared, sends the block to the requesting instance and keeps a Prior Image (PI) buffer if the buffer contained a dirty (changed) block. It then sends a “current read” version of the block to the requesting instance. The original instance keeps a copy in current mode, but the overall status of the block becomes global. Again, there can be multiple copies of the shared current (scur mode) cached at any time.

In early versions of Oracle OPS, one master instance kept track of the lock status, so if the master instance crashed, the entire OPS system went down. Obviously, this was a serious shortcoming, remedied in RAC. In later versions of OPS and RAC, only the uncommitted transactions on the instance that goes down are lost. The other instances stay active.

In RAC there is still a master node, but while the first node to start-up becomes the “master” node, it is strictly a bookkeeping method, and there are no repercussions to the cluster if the master node dies. The Cache Fusion mechanisms for Global Caching Service (GCS) and Global Enqueue Service (GES) are global resources, running on all nodes in the cluster, serving to maintain copies of the global dictionary.

Now that we understand the RAC block updating process, we are ready to move even deeper into RAC internals. Our next installment will examine RAC invalidation mechanisms.




 Copyright © 1996 -2016 by Burleson. All rights reserved.

Oracle® is the registered trademark of Oracle Corporation. SQL Server® is the registered trademark of Microsoft Corporation. 
Many of the designations used by computer vendors to distinguish their products are claimed as Trademarks