Blocksize and Oracle Performance
Article by author Bert Scalzo
As it does for many people today, the Linux movement
enthralls me. I'm interested not only because I'm more of a
UNIX-based DBA but also because of the amazing speed with which
major vendors, such as HP, Compaq, Dell, IBM and Oracle, have
embraced this open-source operating system. Last year Linux
server sales accounted for approximately 30% of Compaq's, 13.7%
of Dell's and 13.5% of IBM's total server sales, according to
eWeek. Moreover, IBM spent a billion dollars on Linux
development in 2001, after having ported Linux to all their
hardware platforms in 2000. Furthermore, Intel's new 64-bit
Itanium CPU lists only four supported operating systems:
Windows, Linux, AIX and HP-UX. And let's not forget that Oracle
released 9i on Linux months ahead of the Windows port. Then
again, maybe I just like the underdog--I mean I'm writing this
article on my AMD Athlon-based PC.
No matter how fashionable Linux may be, that
popularity does not automatically translate into nor
does it guarantee performance. Even though Linux
runs on everything from IBM 3/90s to Sun SPARC-based
boxes, most people at this point are still probably
running Linux on Intel-based server platforms. Now
without sounding condescending, let me state that
the PC architecture was never really intended to
scale to the heights Linux makes possible. Thus we
need to make sure that we squeeze every last drop of
blood out of the turnip when we deploy an Intel
based Linux server--especially for enterprise
databases like DB2 and Oracle. Believe it or not,
it's quite easy to get upwards of 1000% database
improvement through proper Linux tuning and database
configuration for Linux.
As with any scientific endeavor, in this article we will
attempt to evaluate different tuning techniques by establishing
a controlled environment where we can ascertain a baseline,
identify all the changeable relevant variables, modify one
variable at a time and obtain a reliable measurement of the
effects for that one change. Wow, I haven't written
techno-babble like this since I was a Physics student at Ohio State.
In plain English, we must test one tuning concept at a time in
order to accurately measure the observable effects of only that
change.
First you need a test environment. I used a Compaq quad
CPU server with 512 megabytes memory and eight 7200 RPM
ultra-wide SCSI disks. Then I did the exact same tests with a
single CPU Athlon system with the same amount of memory, but
with a single 7200 RPM ultra100 IDE disk drive. Although the raw
numbers and percentages were not identical, the observed
improvement pattern was. That is, every test made each system
better in the same general direction and similar magnitude.
Linux servers are truly universal in function, utilized
easily as web servers, application servers, database servers,
routers, firewalls, e-mail servers, file servers, print servers
and combinations of the above. But we need to pick one such
usage; remember our single variable requirement.
For simplicity, I chose the TPC benchmark as my testing
methodology. It's widely recognized as a reliable OLTP workload
benchmark, it has both on-line and deferred transactions, it's
non-uniform in nature and it applies to numerous databases,
including Oracle and DB2. Plus the TPC can be configured to
stress all aspects of the hardware: CPU, memory, bus and disk.
And to be totally honest, I'm a DBA, and Quest has a wonderful
tool called Benchmark Factory that makes defining, running and
measuring TPC tests as simple as sending e-mail.
Remember, I said that we'd start by looking at some very
high ROI approaches. That means we're looking for items so easy
and apparent in terms of applicability and impact that we only
need observe the runtime differences in the TPC to see if we're
on the right track. So we'll be only looking at what I call the
OS and DB low-hanging fruits.
NOTE: If you're positive your Oracle database has been
created perfectly for Linux, you might choose to skip the
database low-hanging fruits section. For many DBAs it might
serve as a refresher of some obvious database configuration and
tuning issues.
DB Low-Hanging Fruits
Let's begin by looking at a typical initial database.
Often, people start with either the default database created by
the Oracle Installer or a database they created using the
Database Configuration Assistant. Either way, the default
settings are generally quite useless. Plus, a novice DBA or a
consultant passing as a DBA might select values that actually
make things worse. The point is that databases set up with poor
initialization parameters and using dictionary tablespaces as
shown in table DB1 are not uncommon.
DB1: Initial Database
Database Block Size
|
2K
|
SGA
Buffer Cache
|
64M
|
SGA
Shared Pool
|
64M
|
SGA
Redo Cache
|
4M
|
Redo
Log Files
|
4M
|
Tablespaces
|
Dictionary
|
TPC Results (our baseline)
Load Time (Seconds)
|
49.41
|
Transactions / Second
|
8.152
|
The obvious first choice is to increase the SGA size. So
we increase the buffer cache and shared pool as shown in table
DB2.
DB2: Cache & Pool
Database Block Size
|
2K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
4M
|
Redo
Log Files
|
4M
|
Tablespaces
|
Dictionary
|
TPC Results
Load Time (Seconds)
|
48.57
|
Transactions / Second
|
9.147
|
Not quite what we had hoped for; only a 1.73% improvement
in load time and a 10.88% increase in TPS. Okay, so maybe we
should have increased the SGA redo log as well. Of course, we
don't want the redo log file to be smaller than the SGA memory
allocation, so we'll need to bump up the redo log file size to
match. This is shown in table DB3.
DB3: Log Buffer
Database Block Size
|
2K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
16M
|
Redo
Log Files
|
16M
|
Tablespaces
|
Dictionary
|
TPC Results
Load Time (Seconds)
|
41.39
|
Transactions / Second
|
10.088
|
Now we're getting somewhere. Notice the load time improved
by 10, or by 17.35%. And once again the TPS time improved about
the same amount, 9.33%. This makes sense because the load and
simultaneous inserts, updates and deletes needed much more room
than 8M. But it seems like the memory increases are yielding
very small improvements. The I/O aspect seems to be where the
current problem is. So even though it's an OLTP system, let's
try increasing the block size as shown in table DB4.
DB4: 4K Block
Database Block Size
|
4K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
16M
|
Redo
Log Files
|
16M
|
Tablespaces
|
Dictionary
|
TPC Results
Load Time (Seconds)
|
17.35
|
Transactions / Second
|
10.179
|
Now we're cooking. Even a PC with its limited bus and I/O
capabilities can reap huge benefits from a larger block size.
The load time improved over 138%, with no detriment to the TPS.
For the moment, let's assume we don't want to try the next block
size increase for whatever reason. The next simple idea that
comes to mind is to switch from dictionary to locally managed
tablespaces, something Oracle has been touting pretty hard. Thus
we end up with that shown in table DB5.
DB5: Local Tablespaces
Database Block Size
|
4K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
16M
|
Redo
Log Files
|
16M
|
Tablespaces
|
Local
|
TPC Results
Load Time (Seconds)
|
15.07
|
Transactions / Second
|
10.425
|
So Oracle is right, locally managed tablespaces are
definitely the way to go. We got over a 15% improvement on the
load and about 2% on the TPS. That's okay, but we would really
like to see more results like those we saw with the 4K block
size. So let's try 8K, as shown in table DB6. It worked before,
so maybe it will work again.
DB6: 8K Block
Database Block Size
|
8K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
16M
|
Redo
Log Files
|
16M
|
Tablespaces
|
Local
|
TPC Results
Load Time (Seconds)
|
11.42
|
Transactions / Second
|
10.683
|
Not too bad. As before, the larger block size yielded
improvements to the load (almost 32%) with no detriment to the
TPS. In fact, the TPS improved over 2%. But notice that we have
reached a critical juncture in block size increases. The load
time improvement decreased quite significantly--138% to 32%--and
the TPS gain was nearly three times as much as that of the 4K
block size. Further, block size increases will not likely be a
good source of no-brainer gains (i.e., so obvious that we don't
need to use other performance measuring tools).
So we're rapidly approaching the end of the DB low-hanging
fruits. The only other thought that comes to mind is that we
have multiple CPUs, maybe we can set up I/O slaves to leverage
them. It's worth a try and is shown in table DB7.
DB7: I/O Slaves
Database Block Size
|
8K
|
SGA
Buffer Cache
|
128M
|
SGA
Shared Pool
|
128M
|
SGA
Redo Cache
|
16M
|
Redo
Log Files
|
16M
|
Tablespaces
|
Local
|
dbwr_io_slaves
|
4
|
lgwr_io_slaves (derived)
|
4
|
TPC Results
Load Time (Seconds)
|
10.48
|
Transactions / Second
|
10.717
|
That's another 9% improvement on the load but almost
nothing for the TPS. It appears we've gotten all there is from
the low-hanging fruits. We got improvements of 342% for the load
time and 24% for the TPS; not bad for requiring absolutely no
extensive or detailed performance monitoring. The results are
summarized below.

|