Robin Schumacher is the author of
Oracle
Performance Troubleshooting by Rampant TechPress
Exploring the Oracle Database 10g Wait Interface
by Robin Schumacher
With the new and enhanced wait interfaces built into the Oracle
kernel, DBAs who choose to explore beyond 10g's performance-tuning recommendations should be able to more quickly
get to the bottom of performance issues
In the pursuit of optimal database performance, Oracle
professionals have been slowly migrating away from performance
analysis techniques that focused on blanket system ratios to a
methodology that instead zeros in on where a system, application,
or session is spending its time. Referred to as wait-based
analysis or bottleneck analysis, its goal is to uncover the true
nature of performance degradation by determining the exact causes
of throughput hindrances.
The wait interface of Oracle was introduced many versions ago, and
with each new release, Oracle has added enhancements to the wait
interface that helps DBAs better understand where database
bottlenecks are occurring. Oracle Database 10g is no
exception, and in fact, it offers perhaps the most impressive set
of new additions to the wait interface ever. 10g also
provides a new built-in diagnostic monitor (ADDM) and other tools
that survey, among other things, how wait events affect SQL
performance and more.
Let's take a look at these new additions and explore how, if
you're a DBA, you can practically use them to troubleshoot the
performance of an ailing database.
A Wait Interface
Primer
When a database is up and running, every connected process is
either busy performing work or waiting to do so. A process that is
waiting may mean nothing in the overall scheme of things—the user
may have left to get a cup of coffee, for example—or it can be an
indicator that a database bottleneck exists. This is where
wait-based or bottleneck analysis comes into play.
If latch contention or heavy table-scan activity has been dragging
a database's performance down, you can use the wait interface to
confirm the actual root cause. Once one or more wait events or
other bottlenecks have been pinpointed as possible performance
vampires, you can drill down and often discover a fair amount of
detail about the sessions and objects that are causing the
problem.
How do you correctly practice bottleneck or wait-based analysis?
First, it is imperative that you obtain not only the number of
waits for a particular event but also the times involved for each
wait. Having both lets you form a complete picture regarding the
magnitude of wait-initiated performance degradation. (Of course,
the CPU may be the actual bottleneck, but that's a subject for a
different article.)
The next prerequisite to using bottleneck analysis is that certain
wait events should sometimes be filtered out of any metrics used
to diagnose performance bottlenecks. For example, Oracle records a
wait statistic that represents how long a particular user sits at
the SQL tool prompt between every issued database request and the
next one. Such a statistic provides no real value to a DBA who is
trying to figure out where a database bottleneck exists but is
useful for determining if a bottleneck is inside or outside the
database. Any SQL scripts that are used to collect database wait
statistics can exclude such events.
Depending on the database engine, there are several levels of
detail you can work through when collecting wait statistics. The
first level is the system view, which provides a global,
cumulative snapshot of all the waits that have occurred on a
system. Viewing these numbers can help you determine which wait
events have caused the most commotion in a database thus far.
After looking at system-level wait activity, you can drill down
further to discover which current connections may be responsible
for any reported waits that have been observed at the system
level. Such a drill-down, for example, can indicate the processes
responsible for most of the index scan waits reported in a global
system overview query. For systems that have been up for an
extended period, it's hard to spot recent wait activities or
trends. This is where it's nice to be able to automatically
capture snapshots of wait-based metrics and view the delta
measurements for the time periods you're interested in.
Standard
Wait-Interface Queries
Whether you are using Oracle Database 10g or Oracle7, there
are some SQL queries you can use to broadly interrogate Oracle's
wait interface. To get a global handle on wait activity in a
database, you can use the query in
Listing 1, which prioritizes
waits by their wait time and filters out most "idle" waits. Its
result is in Figure 1.
Drilling down further, you can get an idea of historical and
current wait activity for currently logged-on sessions. For
metadata regarding historical wait occurrences, you can use the
query in
Listing 2, which prioritizes
session waits by the session waiting the longest. The result
appears in Figure 2.
Note that the query in
Listing 2 shows the Oracle
background processes, which you may not want for a system that has
been up for a long time. You can alter the query to show just user
processes, by removing the references to v_$bgprocess and
v_$process and including the filter 'and
b.username is not NULL'
in each of the inline views.
10g Wait
Interface Enhancements
Oracle Database 10g has introduced several new wait-based
performance views and has also added new wait columns to several
existing performance views. The most interesting new performance
views include
-
V$EVENT_HISTOGRAM
-
V$SYSTEM_WAIT_CLASS
-
V$WAITCLASSMETRIC AND V$WAITCLASSMETRIC_HISTORY
-
V$SESSION_WAIT_CLASS
-
V$SESSION_WAIT_HISTORY
-
V$SESS_TIME_MODEL
-
V$SYS_TIME_MODEL
-
V$ACTIVE_SESSION_HISTORY
Interesting wait-based enhancements made to existing performance
views include
-
V$EVENT_NAME
-
V$SESSION
-
V$SESSION_WAIT
-
V$SQLAREA
New Performance Views
The V$EVENT_HISTOGRAM view contains a histogram of system waits
over various wait duration intervals. For example, it might show
that 100 latch-free events occurred where the wait time was 16 to
32 milliseconds and that 200 latch-free events occurred where the
wait time was 32 to 64 milliseconds. You can use the following
query (which filters out idle events) to view this type of
information:
select EVENT,
WAIT_TIME_MILLI,
WAIT_COUNT
from V$EVENT_HISTOGRAM
where EVENT IN
(select NAME
from V$EVENT_NAME
where CLASS not in ('Idle'))
order by 1,2
The output is shown in Figure 3.
Oracle has provided broad wait "classes" in Oracle 10g at
the system and session level that help you identify wait activity
in less-cryptic terms than has been the case in the past. These
new categories (or classes) are
-
Administrative
-
Application
-
Cluster
-
Commit
-
Concurrency
-
Configuration
-
Idle
-
Network
-
Other
-
Scheduler
-
System I/O
-
User I/O
The first new view that makes use of these new classes is
V$SYSTEM_WAIT_CLASS. This view simply shows the wait times and
counts across the new wait classes. A query you can use is
select WAIT_CLASS,
TOTAL_WAITS,
round(100 * (TOTAL_WAITS / SUM_WAITS),2) PCT_WAITS,
TIME_WAITED,
round(100 * (TIME_WAITED / SUM_TIME),2) PCT_TIME
from
(select WAIT_CLASS,
TOTAL_WAITS,
TIME_WAITED
from V$SYSTEM_WAIT_CLASS
where WAIT_CLASS != 'Idle'),
(select sum(TOTAL_WAITS) SUM_WAITS,
sum(TIME_WAITED) SUM_TIME
from V$SYSTEM_WAIT_CLASS
where WAIT_CLASS != 'Idle')
order by 5 desc
which results in Figure 4.
This is a particularly nice query, in that it helps you get an
idea of overall wait problems. For example, in Figure 4, you can
see that 73 percent of all the wait time can be attributed to
system I/O chores. Because system I/O may not affect user
activity, you can exclude it from the query if you want.
To get a session summary by using the new wait class categories,
you can use V$WAITCLASSMETRIC and V$WAITCLASSMETRIC_HISTORY. The
V$WAITCLASSMETRIC view shows you a summary of session wait
activity for the last minute, and V$WAITCLASSMETRIC_HISTORY gives
you a historical retrospective of the last hour, in one-minute
intervals. Suppose you want to look back over the last hour to see
the number of user sessions waiting each minute for user I/O,
along with the wait counts and times. You can use a query like
this:
select BEGIN_TIME,
END_TIME,
NUM_SESS_WAITING,
A.WAIT_COUNT,
A.TIME_WAITED
from V$WAITCLASSMETRIC_HISTORY A,
V$SYSTEM_WAIT_CLASS B
where A.CLASS_ID = B.WAIT_CLASS# and
B.WAIT_CLASS = 'User I/O'
order by 1
whose result is seen in Figure 5.
New Session Wait Views
Oracle Database 10g contains several new views that focus
on session wait activity. One of them is V$SESSION_WAIT_CLASS.
This view is helpful in identifying whether the session is bound
mostly by user I/O and the like. You can get a complete overview
of all logged-on sessions by using the query below or focus on
just one session by adding an extra clause onto the query that
specifies a particular SID:
select A.SID,
B.USERNAME,
A.WAIT_CLASS,
A.TOTAL_WAITS,
A.TIME_WAITED
from V$SESSION_WAIT_CLASS A,
V$SESSION B
where B.SID = A.SID and
B.USERNAME IS NOT NULL
order by 1,2,3
The result is shown in Figure 6.
The V$SESSION_WAIT_HISTORY view gives you the last 10 wait events
for each session, along with their wait time and other
information. A query to get the last 10 wait events for all user
sessions might be
select A.SID,
B.USERNAME,
A.SEQ#,
A.EVENT,
A.WAIT_TIME
from V$SESSION_WAIT_HISTORY A,
V$SESSION B
where A.SID = B.SID and
B.USERNAME IS NOT NULL
order by 1,3
See Figure 7 for the result.
Next, we come to V$SESS_TIME_MODEL. Ever wonder if a session that
was executing a large procedure was spending most of its time on
PL/SQL logic or just straight SQL execution? Well, wonder no more.
Although not a "wait" view in the strict sense, because it
includes more than wait metrics, the V$SESS_TIME_MODEL view
provides a window into
-
DB CPU
-
DB time
-
Java
execution elapsed time
-
PL/SQL
compilation elapsed time
-
PL/SQL
execution elapsed time
-
Application
wait time
-
Background
CPU time
-
Bind/define
call elapsed time
-
Cluster wait
time
-
Concurrency
wait time
-
Connection
management call elapsed time
-
Failed parse
elapsed time
-
Global cache
CR block receive time
-
Global cache
current block receive time
-
Global cache
get time
-
Hard parse
(bind mismatch) elapsed time
-
Hard parse
(sharing criteria) elapsed time
-
Hard parse
elapsed time
-
Inbound
PL/SQL RPC elapsed time
-
Parse time
elapsed
-
Sequence load
elapsed time
-
SQL execute
elapsed time
-
User I/O wait
time
Let's say you have a user who is complaining about poor response
times. Once you grab the SID for the user session, you can use a
query such as this to find out exactly what areas of work are
causing the degradation:
select A.SID,
B.USERNAME,
A.STAT_NAME,
ROUND((A.VALUE / 1000000),3) TIME_SECS
from V$SESS_TIME_MODEL A,
V$SESSION B
where A.SID = B.SID and
B.SID = < input SID here >
order by 4 DESC
The result is shown in Figure 8.
You can also obtain the global time values for the entire database
by querying V$SYS_TIME_MODEL:
select STAT_NAME,
ROUND((VALUE / 1000000),3) TIME_SECS
from V$SYS_TIME_MODEL
order by 2 DESC
See Figure 9 for the result.
Saving the new performance "whopper" view for last, we come to
V$ACTIVE_SESSION_HISTORY. Imagine Oracle taking snapshots of
session activity once a second without seriously affecting system
response time, and that's what you have in
V$ACTIVE_SESSION_HISTORY. The view mirrors a lot of what is in
V$SESSION but contains a SAMPLE_TIME column that allows you to
look back in time. There are a lot of things you can do with this
view, but I'll focus on one or two aspects that should be of
interest to you.
For a long time, the only way DBAs could get SQL statement
information correlated with wait events was through Oracle's
extended 10046 trace. The new Oracle 10g
V$ACTIVE_SESSION_HISTORY view now makes it possible to get this
type of data—past and present—through queries alone. Suppose you
have identified a session that is experiencing delays or "hangs"
and you want to see what SQL statement(s) the session is issuing,
along with the wait events being experienced for a particular time
period. You can issue a query similar to this:
SELECT C.SQL_TEXT,
B.NAME,
COUNT(*),
SUM(TIME_WAITED)
FROM V$ACTIVE_SESSION_HISTORY A,
V$EVENT_NAME B,
V$SQLAREA C
WHERE A.SAMPLE_TIME BETWEEN '29-JAN-04 02:57:00 PM' AND
'29-JAN-04 02:59:00 PM' AND
A.EVENT# = B.EVENT# AND
A.SESSION_ID= 36 AND
A.SQL_ID = C.SQL_ID
GROUP BY C.SQL_TEXT, B.NAME
See the result in Figure 10.
In the above example, the session had issued a single SQL
statement that caused 1 table scan (db file scattered read) and 13
index scans (db sequential reads are often index scans), which
took a considerable amount of time. Want the actual objects
involved in the waits or database files? You can join the
CURRENT_OBJ# column in V$ACTIVE_SESSION_HISTORY to DBA_OBJECTS to
get the objects or CURRENT_FILE# to DBA_DATA_FILES to get the
files.
To be fair to the 10046 trace, V$ACTIVE_SESSION_HISTORY does not
catch session activity that's extremely fast, but it should catch
activity that causes most waits and resource consumption and will
therefore be useful to you. And statistically, it does catch
extremely fast operations if they occur sufficiently often to
contribute to user time.
Enhancements to
Existing Performance Views
The first notable wait-related change to existing performance
views occurs in V$EVENT_NAME, where a wait class column is
included to identify which wait events correspond to which wait
classes. This helps you see, for example, that a db file scattered
read belongs to the User I/O wait class.
The next change can be found in V$SESSION, where several wait
event columns have been added so you can see the current wait
event being experienced by the session, along with pertinent
wait-related information. This often removes the requirement to
join V$SESSION and V$SESSION_WAIT. This join gets written a lot in
systems preceding Oracle 10g.
In the V$SESSION_WAIT view, you'll find a wait class column that
can help you identify which wait class each individual wait event
belongs to.
By far the most interesting enhancement I've found is the new wait
columns added to the V$SQLAREA view, which houses SQL statement
execution information. Six new wait-related and timing columns
have been added:
-
APPLICATION_WAIT_TIME
-
CONCURRENCY_WAIT_TIME
-
CLUSTER_WAIT_TIME
-
USER_IO_WAIT_TIME
-
PLSQL_EXEC_TIME
-
JAVA_EXEC_TIME
The new columns can definitely help you determine the amount of
time, say, that a procedure spends in PL/SQL code versus standard
SQL execution and whether the statement experienced any particular
user wait time.
Conclusion
Those practicing wait-based or bottleneck performance analysis
have a lot to cheer about in Oracle Database 10g. Armed
with the new and enhanced wait interfaces built into the Oracle
kernel, DBAs who choose to explore beyond 10g's
performance-tuning recommendations should be able to more quickly
get to the bottom of performance issues and properly focus their
tuning efforts on what is most likely to produce a tangible system
benefit.
Listing 1
select event,
total_waits,
round(100 * (total_waits / sum_waits),2) pct_waits,
time_wait_sec,
round(100 * (time_wait_sec /
greatest(sum_time_waited,1)),2)
pct_time_waited,
total_timeouts,
round(100 * (total_timeouts /
greatest(sum_timeouts,1)),2)
pct_timeouts,
average_wait_sec
from
(select event,
total_waits,
round((time_waited / 100),2) time_wait_sec,
total_timeouts,
round((average_wait / 100),2) average_wait_sec
from sys.v_$system_event
where event not in
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'rdbms ipc reply',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep',
'jobq slave wait',
'Queue Monitor Wait',
'wakeup time manager',
'PX Idle Wait') AND
event not like 'DFS%' AND
event not like 'KXFX%'),
(select sum(total_waits) sum_waits,
sum(total_timeouts) sum_timeouts,
sum(round((time_waited / 100),2)) sum_time_waited
from sys.v_$system_event
where event not in
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'rdbms ipc reply',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep',
'jobq slave wait',
'Queue Monitor Wait',
'wakeup time manager',
'PX Idle Wait') AND
event not like 'DFS%' AND
event not like 'KXFX%')
order by 4 desc, 1 asc
Listing 2
SELECT sid,
username,
event,
total_waits,
100 * round((total_waits / sum_waits),2)
pct_of_total_waits,
time_wait_sec,
total_timeouts,
average_wait_sec,
max_wait_sec
FROM
(SELECT a.event,
b.sid sid,
decode (b.username,null,c.name,b.username) username,
a.total_waits total_waits,
round((a.time_waited / 100),2) time_wait_sec,
a.total_timeouts total_timeouts,
round((average_wait / 100),2)
average_wait_sec,
round((a.max_wait / 100),2) max_wait_sec
FROM sys.v_$session_event a,
sys.v_$session b,
sys.v_$bgprocess c,
sys.v_$process d
WHERE a.event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
AND a.event NOT LIKE 'DFS%'
AND a.event NOT LIKE 'KXFX%'
AND a.sid = b.sid
AND d.addr = b.paddr
AND c.paddr (+) = b.paddr
),
(select sum(total_waits) sum_waits
FROM sys.v_$session_event a,
sys.v_$session b
WHERE a.event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net more data from client',
'SQL*Net message to client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
AND a.event NOT LIKE 'DFS%'
AND a.event NOT LIKE 'KXFX%'
AND a.sid = b.sid)
order by 6 desc, 1 asc
|