Optimizing Linux for Oracle
Article by author Bert Scalzo
OS
Low-Hanging Fruits
So you've just
installed Linux. It's smart enough to recognize hardware issues,
such as the manufacturer, speed and number of CPUs, the amount
of system memory available, and the type, speed and number of
disk drives. Nonetheless, many simple, no-brainer opportunities
for performance improvement remain to be leveraged. In this
case, we'll start on a typical Red Hat 6.2 install. Note that
this means that we'll be starting with kernel 2.2.14-5smp, the
one that shipped with 6.2.
The first thing
anyone should do to Linux after the install is to create a
monolithic kernel (i.e., recompile the kernel to statically
include libraries you intend to use and to turn off dynamically
loaded modules). The idea is that a smaller kernel with just the
features you need is superior to a fat kernel supporting things
you don't need. Sounds reasonable to me, so we'll cd over
to /usr/src/Linux and issue the
make clean xconfig
command (use
make clean config
if you boot to the command line instead of X).
There are
literally hundreds of parameters to set, and I could recommend
any one of a dozen good books or web sites to reference on the
subject. Some key ones that stick out in my mind include CPU
type, SMP support, APIC support, DMA support, IDE DMA default
enabled and quota support. My advice: go through them all and
read the xconfig help if you're unsure.
Since we know
we're going to recompile the kernel, we might as well fix the
IPC (inter process communication) settings, as documented in the
Oracle installation guide. For the 2.2 kernel, shared memory
settings are located in /usr/src/Linux/include/asm/shmparam.h. I
suggest setting the SHMMAX parameter value to at least
0x13000000. The semaphor settings are located in
/usr/src/Linux/include/Linux/sem.h. I recommend setting SEMMNI,
SEMMSL and SEMOPN to at least 100, 512, 100, respectively.
Now we recompile
the kernel by typing
make
dep clean bzImage.
Copy the link map and kernel image to your boot directory, edit
/etc/lilo.conf, run lilo and reboot. If you've done everything
correctly, the machine will boot using your new, leaner and
meaner kernel.
In my case, the
monolithic kernel with properly sized IPC settings improved the
load by nearly 10% and the TPS by nearly 8%, as shown in table
OS1.
OS1: Mono Kernel &
IPC
TPC Results
|
|
Load Time (Seconds)
|
9.54
|
Transactions / Second
|
11.511
|
If simply
recompiling a specific version of the kernel can yield such
improvements, then it stand to reason that a newer version of
the same kernel family will also provide improvements. So I
obtained the latest stable kernel source within the same family
from
www.Linux.org (in my case 2.2.16-3smp). But
improvements were a paltry 1.5% for the load and practically
nothing for the TPS, as shown in table OS2.
OS2: Newer minor
version kernel
TPC Results
|
|
Load Time (Seconds)
|
9.40
|
Transactions / Second
|
11.522
|
Since many Linux
distributions now use kernel 2.4.x as their base, it made
sense to try this next. So I downloaded the kernel source
2.4.1smp, and the new kernel was worth the wait. It yielded
improvements of almost 13% on the load and over 10% on the TPS,
as shown in table OS3.
OS3: Newer major
version kernel
TPC Results
|
|
Load Time (Seconds)
|
8.32
|
Transactions / Second
|
12.815
|
Although these are
not bad results so far, in my mind tuning the OS should provide
some big hitters, like those we had with the database
low-hanging fruits. During our low-hanging fruits for the
database discussion, we found that items reducing I/O, such as
block size and locally managed tablespaces, made big
improvements. So the goal is to find a Linux technique to reduce
the I/O. That's when it hit me: there's a dirt simple way to cut
the I/O in half. By default, Linux updates the last-time-read
attribute of any file during a read operation. It also does this
for writes, but that makes sense. We really don't care when
Oracle reads its data files, so we can turn that off. This is
known as setting the noatime file attribute (a similar setting
exists for Windows 2000 and Windows NT).
If you want to do
it for only the Oracle data files, the command is
chattr +A file_name.
If you want to do an entire directory, the command is
chattr -R +A directory_name.
But the best method would be to edit /etc/fstab, and for each
entry, add the noatime keyword to the filesystem parameter list
(i.e., the fourth column). This ensures that the entire set of
filesystems benefits from this technique and, more importantly,
that the settings persist across reboots. The results are
amazing, improvements of nearly 50% for loads and 8% for the
TPS, as shown in table OS4.
OS4: noatime file
attribute
TPC Results
|
|
Load Time (Seconds)
|
5.58
|
Transactions / Second
|
13.884
|
Another area that
comes to mind regarding I/O is the Linux virtual memory
subsystem. And as is the beauty of Linux, that too is
controllable. We simply need to edit the /ect/sysctl.cong file
and add an entry to improve filesystem performance, as follows.
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
Where according to
/usr/src/Linux/Documentation/sysctl/vm.txt:
The first
parameter 100 %:
governs the maximum number of dirty buffers in the buffer cache.
Dirty means that the contents of the buffer still have to be
written to disk as opposed to a clean buffer, which can just be
forgotten about. Setting this to a high value means that Linux
can delay disk writes for a long time, but it also means that it
will have to do a lot of I/O at once when memory becomes short.
A low value will spread out disk I/O more evenly.
The second
parameter 1200
ndirty:
gives the maximum number of dirty buffers that bdflush can write
to the disk in one time. A high value will mean delayed, bursty
I/O, while a small value can lead to memory shortage when
bdflush isn't woken up often enough.
The third
parameter 128
nrefill:
the number of buffers that bdflush will add to the list of free
buffers when refill_freelist() is called. It is necessary to
allocate free buffers beforehand, as the buffers often are of a
different size than the memory pages, and some bookkeeping needs
to be done beforehand. The higher the number, the more memory
will be wasted and the less often refill_freelist() will need to
run.
refill_freelist()
512:
when this comes across more than nref_dirt dirty buffers, it
will wake up bdflush.
age_buffer 50*HZ,
age_super parameters 5*HZ:
govern the maximum time Linux waits before writing out a dirty
buffer to disk. The value is expressed in jiffies (clockticks);
the number of jiffies per second is 100. Age_buffer is the
maximum age for data blocks, while age_super is for filesystem
metadata.
The fifth 15 and
the last two parameters 1884 and 2:
unused by the system so we don't need to change the default
ones.
The performance
improvements were 26% for loads and 7% for TPS. That brings our
final results to less than 5 seconds to load what took 50
seconds and nearly double the TPS rate. And remember, we never
had to monitor anything; these were the no-brainer or
low-hanging fruit improvements.
OS5: bdflush
settings
TPC Results
|
|
Load Time (Seconds)
|
4.43
|
Transactions / Second
|
14.988
|
The summarized
results were as follows:

|