Why Top-Down RAC Tuning Works
By: Bert Scalzo
Database Domain Expert
Quest Software
Sometimes such an obvious and simplistic
approach seems to offer a questionable return on investment in
time and effort – meaning can it actually make a worthwhile
difference. Let me demonstrate a very dumb example of how and
why top-down RAC tuning works.

When I first moved to
Texas, I had an occasion where I needed to drive
from Dallas
to Austin.
Now instead of simply looking at a map, I asked my neighbor
where Austin
was located – to which he replied just a little west of Houston. So like an idiot I drove south on
highway 45 until reaching Houston, took the 610 Houston bypass
west, and looked for any signs saying Austin – which led me to
take 290. It took me 7 hours to make the trip!
So the third time or trip was the charm. By
taking the obvious best route, the trip took 50% less time or
just 3.5 hours. Now laugh all you like at me (and quite
deservedly so), but it’s not uncommon for me to find RAC
optimization efforts where equally simple or dumb questions have
yet nor ever not been asked. It’s amazing what performance
issues that a little humility and stupidity can resolve.
J
Step 1 – Application Nature
The most logical place to start is with
fully comprehending the nature of the application and any
special demands to support that nature. Often the knowledge
about nature of the application seems to stop with the
application architect. However the smart DBA should always be
fully informed about what is being attempted, and not simply how
it has been implemented within the database objects or code.
Remember that in order to understand the
“big picture”, we
really need to start at the very top and examine the entire
system. In order to provide context for that system and all its
subsystems, we need to know what it’s trying to do. Otherwise
the best we can do is to apply theoretical
“golden rules” that
may or may not apply. So we will simply be taking hypothetical,
best guesses based on feeling rather than fact. Such an approach
will generally not lead to optimal results.
I cannot overstate just how important this
first step is. Everything that is discovered here will flow
downstream to every subsystem being optimized. This information
will usually help make key hardware and software architectural
and configuration decisions that will directly define the limits
of what can be achieved during the optimization process. So if
we paint ourselves into a corner by skimping on this step, there
may well be no solution to solve the experienced performance
problems. And since RAC by its nature is more complex than
single instance deployments, the ramifications are magnified
even further.
There are numerous well-known
“best practices” for
keeping a database application humming along. We all know to
write more efficient SQL and/or PL/SQL, use bind variables to
reduce unnecessary parsing, maintain up-to-date statistics,
partition large tables and their indexes, examine explain plans
for in inefficiencies, examine extended trace files for deeper
problems, and such. But many times we have zero ability to
change or otherwise modify the application code. For example,
I’m running a TPC-C benchmark. That is defined by a spec (www.tpc.org/tpcc)
and is implemented by a software package (Quest’s Benchmark
Factory:
www.quest.com/benchmark-factory) – so I have no ability to
change the application. You may have third party applications
such as ERP and CRM, so you’d be in the same boat. Therefore
this is neither an unreasonable nor uncommon example. We often
are stuck with the application as is.
While I know that the TPC-C and thus OLTP
in nature, I also can identify some critical application
characteristics by examining the spec – and/or capturing the SQL
sent to the database. Either way, I’ll discover that it’s
primarily reads with some writes, very small average transaction
sizes (i.e. works on a few rows from a few tables at a time),
there is high concurrency with some potential for deadlock, and
I have to meet ACID compliance relating to the transactions
(i.e. logging=yes). That’s actually quite a lot of very useful
information for the optimization process, so I’ll carry these
finding forward and notice where they might apply across all the
various subsystems.
Step 2 – Public Network
As with the database application, there are
numerous well-known public networking
“best practices”.
Once again, we all know to isolate and/or size a public
network’s capacity based upon cumulative application needs, use
gigabit or possibly ten-gigabit networking technology, utilize
only layer 2 or 3 switches, employ switches’ VLAN (virtual LAN)
capabilities to further segment loads,
“bond” network
channels for increased throughput, and several other techniques.
How could application knowledge have
divulged a public network problem? One site that I visited has a
database application driving manufacturing robots tooling
titanium – with a cost of $20,000 per tooling instruction
mistake. The application made hundreds to even thousands of
tooling mistakes per day, so in effect it was a total failure.
The DBA’s and some very high priced consultants had spent weeks
tuning this database to no avail. Then I arrived and asked some
networking “dumb
questions” – and ignored the DBA’s and their consultants
telling me what a waste of time that line of questions was. The
result, they had placed the entire shop floor on one public
network segment – even though the robots were not dependent on
each other. I did not even waste the time to monitor the network
for collisions. I simply suggested running a few extra cheap
Ethernet cables, and the problem was entirely solved – with zero
Oracle or RAC optimization efforts on my part. It’s true;
sometimes we’re too close to the problem to see the forest from
the trees. All those DBA and consultants actually knew more
about RAC than I. I simply looked for
“low hanging fruit”.
It does not always work out that way – but it only took a couple
minutes to play stupid and then solve their problem. So don’t
overlook gift horses if they happen to exist.
J
Step 3 – Storage Network
As with the public network, there also are
many well-known storage networking
“best practices”.
Once again, we all know to isolate and/or size a storage
network’s capacity based upon cumulative application IO needs,
employ Fiber Channel or Infiniband for SAN’s, utilize
ten-gigabit networking technology for NAS and iSCSI, deploy
multiple pathways per storage controllers and HBA, consider
TCP/IP offload engines (i.e. TOE’s) or specialized iSCSI HBA’s,
and several other techniques.
So how could application knowledge have
divulged a storage network problem? I once visited a site where
for some unknown reason, the RAC performance just tanked. And
just like the prior section’s example, extraordinary tuning
efforts had been attempted with zero success. So once again I
entered the picture and asked a whole lot of relatively dumb
questions. I was told that the storage network was leveraging
“jumbo frames” – it
was initially setup that way, and was that way now. I simply
asked to prove it. Imagine their surprise when they found that a
recent firmware upgrade on a switch had wiped out the
“jumbo frames”
settings to the NAS – and the VLAN setting that segregated the
storage network. So two minutes to reset the switch setup fixed
the problem – which once again had nothing to do with Oracle or
RAC.
I know these examples may seem hard to
believe, but never pass up the opportunity to quickly fix a
nasty problem by simply correcting an obvious external yet
critical factor. Sometimes life throws us a bone – an obvious
yet overlooked problem that is subtle yet highly critical. So
just accept whatever easy ones come your way, because we all
know that life will throw us more than our share of tough ones.
Step 4 – Storage Subsystem
This topic could easily support a paper all
on its own. Storage subsystem “golden
rules” are both numerous and can vary by vendor and/or type,
so it’s impossible to do this topic justice within this one
paper. Therefore here’s just a very simple list of some of the
more universal storage rules that most people agree upon and
that apply across technologies:
-
More Smaller Disks generally higher overall throughput
-
More memory cache generally higher overall throughput (but
expensive)
-
Avoid “write-back” mode if no backup power source (e.g.
battery)
-
Align Stripe Boundaries: drive, OS block, LVM, file sys,
database block, etc
-
Stripe Depth (i.e. size) from 256 KB to 1 MB
-
Stripe Width (i.e. # disks) between 4 and 16
-
Stripe Depth = Stripe Width X Drive IO Size = One IO per
Disk per IO request
-
Average I/O <= Stripe Width X Stripe Depth
-
Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3
(sequential) or 5 (scattered)
But all this kind of advice merely focuses
around the database oriented storage subsystem configuration
settings at an obvious level. It’s in the slightly less obvious
settings that can often yield significant improvements for
little effort – it’s just that too many DBA’s don’t feel
comfortable with asking some
“dumb questions” such
as these (due to the separation of duties between database, OS
and storage administrators):
-
What’s the disk array’s memory cache size
-
How many IO controllers or processors are there
-
Why did we choose a stripe depth of size X
-
Why did we choose a stripe width of size Y
-
Do my LUN’s overlap or share drives with other databases
-
How many actual drives or spindles are my LUN’s spread
across
But even these kinds of questions are too
high a level, remember that we want to make sure to examine the
obvious yet overlooked, and the subtle yet highly critical. Far
too many sites I visit cannot answer the following questions –
which are at the lowest level and define the maximum level of
performance they can expect to eek out of that storage
subsystem:
-
What are the storage array’s read-ahead or pre-fetch
settings
-
What are the storage array’s cache operation/algorithm
percentages
Step 5 – Oracle Instance
As with the prior topic, there have been
hundreds of books and tens of thousands of papers written on how
to tune an Oracle database instance. So once again it would be
arrogant to think that I could add any purely technical new
insight here. But when we now instead ask database
“dumb questions”, we
find one of the most fundamental and critical factors for
performance often possesses an unqualified answer. Namely, when
asked why the database block size is 8K, many people say I don’t
know – meaning that they just accepted the default in DBCA
(Database Configuration Assistant) or have that hard coded in
their database creation scripts. Yet this is one of the most
important items to base upon the application’s nature. And this
issue is even more paramount in a RAC setup, but we’ll examine
that issue in the following section about cluster best
practices.
So what if anything can we do at the
instance level that should apply just as well within a cluster?
In the example I’m running a TPC-C OLTP benchmark. Furthermore
the amount and nature of transactions are straightforward and
well defined, and thus easy to examine. So once again we’ll just
look for some “low
hanging fruits” within the Oracle init.ora or spfile file
parameter settings. Obvious TPC-C analysis ramifications include
results such as the following:
-
Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20
-
Small Transaction = Size redo logs correctly for small size
X high load
-
High Concurrency = cursor_space_for_time=true,
cursor_sharing=similar
-
No Logging, etc… = Turn off “Recycle Bin”, but must keep
logging=yes
Step 6 – Oracle Cluster
There are many much brighter people when it
comes to RAC than I. Fortunately for me, Quest employs two such
experts in Mike Ault and Muralli Valleth – both well known
authors and RAC experts. I learn more in an hour working along
side these guys than typically can be taught in a week long
class. But even when I’m working alongside such RAC gurus, my
simplistic approach still adds value to and compliments their
approach. Between my asking the
“dumb questions”
versus their smarter ones, and my attacking the problem from
above to their more thorough and scientific approach from below
– we completely cover the entire spectrum of performance
optimization possibilities. Therefore little to nothing gets
missed. At least that’s the little white lie I keep telling
myself so that I can feel worthwhile next to my infinitely
brighter workmates.
J
Some well known and generally agreed upon
RAC “best practices”
include doubling the ASM instance default parameters for SGA
memory size (64MB simply does not scale too well), the
interconnect is single most important limiting factor or
bottleneck, and consider hash partitions and reverse indexes to
spread IO across cluster nodes. Beyond this, many of instance
specific optimization
“golden rules” may not apply when doing RAC. So test any and
all assumptions beyond these just to be safe.
Remember the block size question from the
prior section? Let examine in more detail why both block size
and concurrency can become critical performance factors for RAC.

The above diagram depicts how both block
and row level contention happens on a single instance database.
If two Oracle server processes request access to Row X, then we
have row level contention or deadlock. But we know from
application nature that’s a fairly low occurring event. However
it’s not unlikely that two Oracle server processes might seek to
access different rows within the same block (e.g. Rows X and Y).
And as the block size increases, the likelihood of concurrent
block access increases – but there’s no real cost in performance
terms. Now let’s examine at the exact same scenarios in a RAC
database as shown below for a simple two node cluster.

Step 7 – RAC Interconnect
As with the other networking issues, there
are numerous well-known private networking or interconnect
“best practices”.
Once again, we all know to isolate and/or size a private
network’s capacity based upon cumulative application needs, use
gigabit or possibly ten-gigabit networking technology, always
utilize “jumbo frames”,
“bond” multiple
network channels for increased throughput, and several other
techniques.
But once again our application nature adds
a critical piece to the puzzle: we must account for high
concurrency. So as the last section identified, we can reduce
block contention by reducing the block size and multi block read
count. But sometimes those steps alone may not be enough.
Sometimes we can project that the concurrency is so high that
block level contention cannot be eliminated, but just reduced –
so we need a way to further optimize handling for that scenario.
That might lead one to increase (i.e. re-nice) the global cache
cluster service processes to mitigate this unavoidable cost. As
before, we also might be able to make that call up front for
very little cost. Just be careful on whatever you may
preemptively attempt to handle – because the Interconnect is
truly RAC’s Achilles’ heal.
|