Becoming an access path guru
requires time learning how
optimization parameters, statistics
and hints affect SQL access paths
and statement performance. This
article will provide you with a few
hints and tips to help you begin
your scientific analysis of the
Oracle optimization process.
In my experience most good
tuners share a common set of traits.
They are inquisitive by nature,
understand that there is no
substitute for experience and
dedicate lots of time performing
scientific analysis on SQL
statements and database performance.
The objective we are trying to
accomplish is to identify the
effects that startup parameters,
statistics and hints have on access
paths, and ultimately, SQL
performance. You also need to spend
some time changing a statement’s
access path and noting the impact
that different access paths have on
performance.
A well known tuning
guru, Jonathan Lewis, uses terms
like “scientific analysis” in his
discussions on SQL and database
tuning. It’s really a very
appropriate description of his
activities. After reading many of
his works, I would describe him as a
database performance scientist. He
identifies something he wants to
learn more about, creates a test
environment, executes a series of
tests, analyzes the data and
provides documented results to
reinforce his theories.
That’s what we all MUST DO to
expand our knowledge on the Oracle
optimization process. We need to
become database performance
scientists. We do that by creating a
test environment, running and
documenting performance baselines,
changing the environment to
influence the statement’s access
path and documenting the results.
Setting up a Test
Environment
Running test
cases to identify the effects that
different database environmental
settings have on a statement’s
access path is not as monumental as
it may seem.
Most production databases have a
test counterpart. Select an
environment that is actively used by
your application developers. Don’t
worry, if one of our tests causes a
statement to “run longer than
anticipated”, we can use the trusty
ALTER command to kill our session.
We can also run our workloads during
lull times. Lastly, the majority of
changes we will make to influence
the statement’s access path will be
done in our own session. We won’t be
making changes that affect the
entire database environment.
I prefer to use an active test
environment because it allows me to
easily select SQL statements to use
as test cases, the data is usually
more representative of what will be
found in production, and the
developers will most likely have a
firm understanding of the active
workload being executed.
You need to talk to the
developers who are responsible for
running workloads on that test
environment to ensure that the
changes being made to the test data
don’t skew your results from one
test execution to the next. You
can’t run a test, have a load insert
another 100 thousand rows in the
table, run another test and expect
to have a good comparison. You want
your test bed to be active but not
so active that it complicates your
testing process or causes your tests
to generate incorrect results.
You’ll need to use common sense when
selecting the test environment as
well as determining the most optimal
time to run your test cases.
We also want to choose a test
environment that is providing
adequate performance for the
workloads being executed upon it. We
really don’t want to use an
environment that isn’t performing
well to begin with.
There’s a myriad of options
available to you. If you don’t want
to impact any of your test
environments, create a test
environment of your own. Clone one
of your test databases to a sand-box
environment if you can.
You’ll also want to make sure
that statistics are up to date on
the database you will be using as a
test bed. If you are running 10G/11G
databases, the database will run
statistics jobs for you
automatically (isn’t 10G/11G
great?). If you don’t have
statistics run automatically, it
will be up to you to analyze the
data objects to ensure that the
statistics optimally represent what
is stored in the data structures.
Documenting the
Parameters That Affect the
Optimization Process
The
next step is to document the
environment. There are a couple of
dozen parameters that affect
optimization and SQL statement
performance. To begin, we are going
to choose the basic parameters that
are easy to change and have the
biggest impact on optimization.
These are not all of the parameters
that can influence the optimization
process, just the ones that are easy
to change and provide the best
chance of successfully achieving an
access path change.
It is important that we read the
documentation beforehand for these
parameters for the specific Oracle
release that we are using as our
test environment. We know that each
Oracle release may contain
enhancements to these parameters
that change the effect they have on
the optimization process and how we
alter them to different values.
We’ll want to document the
following parameters to begin our
scientific analysis:
optimizer_features_enable
- This parameter allows you
to make the optimizer behave as it
would for a specific release of the
Oracle database. You set the value
to a release identifier (the listing
of the optimizer releases that you
can set this parameter to is
provided in the Reference Manual)
and the optimizer will act as it
would if the database were at that
release.
cursor_sharing –
For our first set of initial tests,
we’ll hardcode values in our
selection criteria to ensure that
our statements aren’t affected by
cursor sharing. We will
review cursor sharing and the impact
it has on statements using bind
variables, in-depth in upcoming
blogs.
optimizer_index_caching
– Administrators will often set this
parameter in conjunction with
optimizer_index_cost_adj to
influence the optimizer to use more
indexes and Nested Loop joins.
Setting this parameter makes Nested
Loop joins look less expensive when
compared to Hash or Sort-Merge
joins.
optimizer_index_cost_adj
- Allows the administrator
to make index access more, or less,
costly to the optimizer. The default
value of 100 tells the optimizer to
evaluate the index using the regular
cost. The lower you set this value,
the less costly the indexes will
look to the optimizer. Conversely,
the higher you set this parameter,
the more expensive the indexes will
look.
db_file_multiblock_read_count -
The number of blocks that Oracle
will read in one I/O when performing
a scan of data (i.e. table scan).
We need to be aware that in 10G/11G
that Oracle will choose the optimal
size of this parameter for us and
could adjust it based on database
workload.
optimizer_mode -
Sets the approach the optimizer will
use when analyzing queries. Since
there have been a few changes made
between Oracle 9i and Oracle10g,
I’ll provide information on both
sets.
optimizer_use_pending_statistics
- New parameter (and
feature) to Oracle Database
11g. In previous releases,
when you gathered optimizer
statistics, the statistics were
automatically published once the
gather was completed. 11G provides
administrators with the option
of separating statistics gathering
from statistics publishing.
This feature allows us to test
the newly gathered statistics before
they are published.
optimizer_capture_sql_plan_baselines and
optimizer_use_sql_plan_baselines –
Another new 11G feature
that we will be testing in the near
future. When these two
parameters are set, the 11G
optimizer will look for a SQL
plan (access path) baseline for the
SQL statement being optimized. If
one is found in SQL Management Base,
then the optimizer will review the
data access costs of the plans and
pick the one with the lowest cost.
Oracle 9i provides
the following settings:
Choose - If
optimizer_mode is set to choose, the
optimizer is able to switch between
rule and cost-based optimizations.
When optimizer_mode is set to
CHOOSE, the optimizer uses the
all_rows cost-based approach for a
SQL statement if there are
statistics in the dictionary for at
least one table accessed in the
statement. If you generate
statistics on one table, every query
that accesses that table will use
the cost-based optimizer. What
happens if other tables in the query
do not have statistics collected?
The optimizer will make an educated
guess on the statistics for those
tables. The problem is that Oracle
isn’t always a good statistics
guesser and the end-result is a
“less than optimal” access path.
first_rows -
Influences the optimizer to choose
an access path that minimizes
response time. Most often used for
online transaction processing
systems that return small result
sets. The optimizer favors Nested
Loop joins and index access.
Rule - The
optimizer will use the rule-based
approach during analysis. The
rule-based optimizer does not use
data statistics as input when
generating the access path. Each
access path is assigned a numerical
ranking. The rule-based optimizer
chooses the access path that has the
most favorable numerical ranking.
The rule-based optimizer has been
superceded by the cost-based
approach. There are a few cases
where I have seen the rule-based
optimizer choose a better access
path than the cost-based method –
but not many.
all_rows -
Influences the optimizer to choose
an access path that minimizes total
execution time. Most often used for
decision support and data warehouse
environments. The optimizer tends to
favor full table scans, Hash and
Merge-Scan joins.
Oracle10g/11g
settings for optimizer_mode:
first_rows - The
same influence on the optimizer as
it did in Oracle9i.
first_rows_n -
Where “n” = 1, 10, 100, 1000.
Influences the optimizer to optimize
queries to provide the fastest
response when returning the “n”
number of rows. Acts as a throttle,
which allows you to better balance
the optimization process.
all_rows – The
same influence on the optimizer as
it did in Oracle9i.
We will use the ALTER SESSION SQL
statement to alter these parameters
during our scientific analysis on
the effects they have on the Oracle
optimization process.
Documenting Our Test
Tables
After we
document some of the parameters that
affect optimization, let’s turn our
attention to documenting the tables
we will be accessing. In my next
blog, I’ll provide you with a few
hints and tips on how to select or
create SQL statements to use in your
test cases but let’s continue our
discussion on documentation.
The following information will
provide you with a good base of
information on the data objects our
statments will be accessing. Since
we are just beginning our scientific
analysis, we’ll use basic storage
objects (tables and b-tree indexes).
We’ll discuss some of the more
complex objects (bitmap indexes,
partitioning, etc.) in later blogs.
Row counts for all tables that
are accessed by our test queries can
be found in the num_rows column in
dba_tables if we have statistics
generated for our tables.
Number of blocks the table is
using can be found in the blocks
column in dba_tables if we have
statistics generated for our tables.
Index listing for all indexes on
our tables. The query below will
provide you with a listing of
indexes for a given table:
select b.index_name, b.column_name,
a.uniqueness, b.column_position
from sys.dba_indexes a,
sys.dba_ind_columns b
where
a.index_name = b.index_name
and
a.table_owner=’&table_owner’ and
a.table_name = ‘&table_name’
order by b.index_name,
b.column_position
/
Once we find all of the indexes
and columns in those indexes, let’s
check them for both selectivity and
data skew. Selectivity is the number
of unique values for a given column.
Skew means that some values can
occur a few times in a column while
other values can occur many, many
times. Since data skew will affect
optimization, that information will
also be important to us.
We can easily find the
selectivity for a single or
multi-column index by accessing the
distinct_keys column in our
dba_indexes table if we have
statistics generated. For
multi-column indexes, we will want
to check the individual selectivity
for each column in our multi-column
index. We can do this with the
following query:
select count (distinct
index_colname) from owner.table_name;
Where index_colname is one of the
columns in our multi-column
index,owner is the table owner and
table_name is the name of our table.
We’ll need to do this for all
columns in our multi-column indexes.
We’ll use the following queries
to identify data skew and find out
some information on histograms:
select index_colname, count(*)
from owner.table_name
group by
index_colname;
Where index_colname is the column
name, owner is the table owner and
table_name is the name of our table.
We’ll need to do this for all
columns in our indexes. We’ll do
this for both single column and
multi-column indexes.
select * from dba_tab_histograms
where owner=’&owner’ and table_name=’&table_name’;
Where owner is the table owner
and table_name is the name of our
tables. We’ll need to do this for
all tables that our queries will be
accessing.