Oracle RAC Internals - Evolution
by Donald K. Burleson
It is imperative for the Oracle DBA to fully understand the
internal mechanisms of Oracle RAC, especially with the
advent of Oracle10g Grid computing. In Oracle10g Grid,
multiple database instances share a data cache, and the DBA
must fully understand the internal mechanisms for cache
coherency, data block writing, block invalidation, and node
maintenance.
Oracle databases have always had a data cache, but the cache
mechanism has become radically more complex once OPS and RAC
allowed multiple instances to share their RAM data caches.
This article explains the basic mechanisms for maintaining
data cache coherency across multiple dynamic instances in an
Oracle RAC or Oracle10g Grid environment.
Cache coherency is defined as the mechanism to allow
multiple RAM data caches (db_cache_size and
db_block_buffers) even while dozens of Oracle instances
(SGA regions) share a single copy of the Oracle10g database.
Oracle developed multiple instance support for cache
coherency in the decade-old Oracle Parallel Server (OPS) and
later the Oracle Real Application Clusters (RAC) products.
Maintaining consistent images of data on both RAM and disk
is a formidable challenge, especially when we consider the
maintenance overhead of undo data images for flashback query
and rollback behavior. This cache coherency mechanism
becomes even more complex when a RAC instance has dozens of
Oracle instances, each constantly updating data blocks.
Regardless of the complexity, it is the job of all Oracle
DBAs to completely understand these internal mechanisms for
data cache coherency across multiple instances in an
Oracle10g Grid system. Let’s start by examining the
evolution of multi-instance SGA coherency and see how Oracle
has evolved over the past decade.
The Evolution of Multi-instance Cache Coherency
All modern database systems use RAM caching to accelerate
access to data and to process the queries. Oracle’s use of
multi-instance cache coherency for OPS began in Oracle
version 6.0.35 (6.2) as documented in a paper presented in
1991 by Kevin Jernigan of Oracle Corporation entitled, “ORACLE
V6.2: Oracle Parallel Server Technology for VAXclusters and
Clustered UNIX Environments” at the International
Oracle User's Group (IOUG) and described in the Oracle6
documentation.
In Oracle RAC and OPS, the software requires a high-speed
inter-connect that allows communication between the Oracle
nodes. For example, the Massively Parallel Processing
(MPP) architecture is ideally suited to OPS and RAC
implementations. As the interconnect speeds improved, so did
the Oracle coherency mechanism. In early computer systems,
this interconnect was limited in bandwidth starting with 10
megabits per second, moving to 100 megabits per second, and
eventually to over one gigabit per second.
In early releases of OPS, cache coherency was maintained
through the process of lock management through the
interconnect, causing data blocks to be written to disk and
then back into the db_block_buffers of each Oracle
instance. This disk write was required whenever
control of an Oracle data block had to be transferred due to
insert, update, or delete operations in other than the
original owning instance.
This historical perspective is critical for the Oracle DBA
to understand the sophisticated cache coherency mechanisms.
Let’s explore the evolution of cache coherency.
OPS and Cache Coherency
In Oracle6 through Oracle8, the mechanisms for Oracle OPS
changed very little. OPS on VAX-VMS Clusters were released
in 1989, and the caching changed with the introduction of
the Integrated Data Lock Manager (IDLM) in 1997. Prior
to the introduction of the IDLM, each vendor provided their
own distributed lock manager (DLM) for their own server
operating system. There was a DLM for VAX (VAXClusters), Sum
(SunClusters), and so on.
The core of cache coherency was present in early releases of
6.0, but there were no distributed lock managers other than
VAX-VMS and some limited UNIX clustering to take advantage
of it (Norgaard, 2003). Way back in release 6.0.35,
Oracle did not write their own DLM software. Instead,
they integrated with the proprietary Data Lock Manager (DLM)
that was available in the VAX-VMS system.
Unfortunately, the VAX DLM was far too slow, primarily
because it was based on locking and sharing at the file
level and slowed to low a granularity to support database
activities.
The DLM’s basic function is to coordinate the locking
between the nodes of the Oracle cluster. Oracle OPS
communicated with the DLMs of the various computers on the
cluster through the cluster interconnect, a high-speed
direct interconnection between the instances.
The Oracle OPS SGA provides several cache areas for data,
code, and for messages that are passed between the Oracle
OPS processes. For OPS, additional areas were configured
using the “GC_XXX” init.ora parameters that defined how the
cache areas for data within Oracle mapped to specific
latching structures inside the OPS applications.
OPS Transitions to Cache Fusion
This intermediate disk-write for transferring data blocks
between OPS instances caused a major slowdown, and many OPS
DBAs would partition their data to avoid inter-instance
block transfer pinging (refer to figure 1).
intercepting a write instruction to one of the said
plurality of I/O devices from said computer and
communicating over the network individually with each
computer in the list of computers in the data structure
corresponding to said one of said I/O devices to invalidate
data in caches on the network corresponding to said one of
said plurality of I/O devices.
Figure 1: OPS data partitioning to avoid pinging.
Oracle7 through Oracle8 used the same mechanisms for
maintaining cache coherency. In later versions of Oracle7,
Oracle
provided
the integrated distributed lock manager (IDLM).
Oracle “cache fusion” became available on selected platforms
to replace the system specific DLMs and Oracle’s IDLM with
Oracle8i (version 8.1.6) starting in 1999.
Oracle8i (release 8.1.6) was a transition version between
the IDLM version of Oracle OPS and the cache fusion-driven
version using global cache services (GCS), global enqueue
services (GES), and the global data dictionary that is now
present in Oracle9i and Oracle10g (Oracle9i Real Application
Clusters Concepts, Release 1 (9.0.1), Section 5-2
Oracle9i Real Application Clusters Concepts).
If we examine the Oracle7 Parallel Server
Administrator's Guide, we can see that the cache
coherency’s description is virtually identical for the 7.1
and 7.3 versions of Oracle. Remember, the IDLM wasn’t
introduced until 1999, with the concept of “cache fusion,”
in which all transfers are done via the high-speed
interconnect, was not introduced until version 8.1.6 in the
year 2000.
RAC and Cache Fusion
Starting with Oracle8i, the archaic DLM was replaced with
the concept of Cache Fusion (CF). Unlike the IDLM, with its
disk write mechanism, Cache Fusion uses the super-fast
high-speed interconnects (1 gigabit/sec) to bond the various
instance buffer cache areas into a larger virtual cache area
(refer to figure 2).
Figure 2: Oracle RAC and cache fusion.
Rather than use the older method of writing blocks out to
disk and then reading them back into a neighboring cache,
the blocks are transferred directly from once cache to
another through the high-speed interconnect.
Any end-user application that uses RAC cache fusion can read
blocks into any of the caches as reads can share the
intra-instance locks; however, if one of the instances must
modify data in a specific block, it must send RAC a message
to issue an exclusive lock and take control of the data
block. Internally, there are two initial states for blocks:
-
Virgin blocks — A “virgin” version of a block is
a data block that has been read, but not changed. This
is a “global” block to RAC.
-
Active blocks — An actively updated block is a
previously-modified block (a read-consistent image) that
is transferred in from the instance that previously
modified the block.
In either case, if a block is to be modified, it must be
transferred into the modifying instance and be owned by it
before any modifications are allowed.
In RAC, the global directory services keeps track of the
current and past image blocks allowing for recovery in case
of instance failures. The details of this process are
contained in Oracle9 i Real Application Clusters
Concepts, Release 1 (9.0.1), Section 5-2
Oracle9i Real Application Clusters Concepts.
Bibliography
ORACLE V6.2: Oracle Parallel Server Technology for
VAXclusters and Clustered UNIX Environments, 1991 by
Kevin Jernigan of Oracle Corporation, IOUG (International
Oracle User's Group) Proceedings
Oracle7 Parallel Server Concepts and Administrator's
Guide, Version 7.3, Part No. A42522-1 January 8, 1996
Oracle9i Real Application Clusters Concepts, Release
1 (9.0.1), Part No. A89867-02 July 2001
Oracle7 Parallel Server Administrator's Guide,
Release 7.1 , Part No. A12852-1 Copyright © Oracle
Corporation 1992, 1993, 1994
Shared Data
Clusters: Scaleable, Manageable, and Highly Available
Systems (VERITAS Series),
ISBN:
047118070X
Oracle Parallel
Processing,
ISBN:
1-56592-701-x
Oracle9i RAC:
Oracle Real Application Clusters Configuration and Internals,
Michael Ault, Madhu Tumma, 2003, ISBN:0d-9727513-0-0