Oracle File Global I/O
The first step in unraveling any I/O puzzles in a database is to
make a quick check of some of the global database I/O metrics. A
query such as the
globiostats.sql script
can be used to get a bird's eye view of a database's I/O:
select
name,
value
from
sys.v_$sysstat
where
name in
('consistent changes',
'consistent gets',
'db block changes',
'db block gets',
'physical reads',
'physical writes',
'sorts (disk)',
'user commits',
'user rollbacks'
)
order by
1;
The script queries the
sys.v_$sysstat view and output from the
query might look like the following:
NAME VALUE
------------------------------
consistent changes 1
consistent gets 70983
db block changes 243
db block gets 612
physical reads 11591
physical writes 52
sorts (disk) 0
user commits 26
user rollbacks 1
Although there are some database experts who do not believe the
buffer cache hit ratio is of much value anymore, a cursory check can
still be performed to get an idea of overall disk I/O activity by
using the
buffratio.sql script:
select
100 -
100 *
(round ((sum (decode (name, 'physical reads',
value, 0))
-
sum (decode (name, 'physical reads direct',
value, 0)) -
sum (decode (name,
'physical reads direct (lob)',
value, 0))) /
(sum (decode (name, 'session logical reads',
value, 1))
),3)) hit_ratio
from
sys.v_$sysstat
where
name in
('session logical reads',
'physical reads direct (lob)',
'physical reads',
'physical reads direct');
SEE CODE DEPOT FOR FULL SCRIPTS
This script also queries the sys.v_$sysstat view and some quick items to look for in the statistics include:
§
Increasing numbers of physical reads and a low hit
ratio may indicate insufficient settings for
db_block_buffers or
db_cache_size . The hit ratio
reading, in particular, should be observed over a time period
sufficient to see if the ratio is representative of the database’s
personality. Readings below the normal rule of thumb of 90% can be
OK.
§
High volumes of disk sorts could be indicative of a
too low setting for
sort_area_size (Oracle8i and below) or
unnecessary sort activities. Large numbers of user rollbacks can be
undesirable, since it indicates that user transactions are not
completing for one reason or another.
The
fileio.sql script
listed below shows an example select statement to generate both
regular and temporary file IO.
-- *************************************************
-- Copyright © 2005 by Rampant TechPress
-- This script is free for non-commercial purposes
-- with no warranties. Use at your own risk.
--
-- To license this script for a commercial purpose,
-- contact info@rampant.cc
-- *************************************************
rem
rem NAME: fileio.sql
rem
rem FUNCTION: Reports on the file io status of all of the
rem FUNCTION: datafiles in the database.
rem
column sum_io1 new_value st1 noprint
column sum_io2 new_value st2 noprint
column sum_io new_value divide_by noprint
column Percent format 999.999 heading 'Percent|Of IO'
column brratio format 999.99 heading 'Block|Read|Ratio'
column bwratio format 999.99 heading 'Block|Write|Ratio'
column phyrds heading 'Physical | Reads'
column phywrts heading 'Physical | Writes'
column phyblkrd heading 'Physical|Block|Reads'
column phyblkwrt heading 'Physical|Block|Writes'
column name format a45 heading 'File|Name'
column file# format 9999 heading 'File'
column dt new_value today noprint
select to_char(sysdate,'ddmonyyyyhh24miss') dt from dual;
set feedback off verify off lines 132 pages 60 sqlbl on trims on
rem
select
nvl(sum(a.phyrds+a.phywrts),0) sum_io1
from
sys.v_$filestat a;
select nvl(sum(b.phyrds+b.phywrts),0) sum_io2
from
sys.v_$tempstat b;
select &st1+&st2 sum_io from dual;
rem
@title132 'File I/O Statistics Report'
spool rep_out\&db\fileio&&today
select
a.file#,b.name, a.phyrds, a.phywrts,
(100*(a.phyrds+a.phywrts)/÷_by) Percent,
a.phyblkrd, a.phyblkwrt, (a.phyblkrd/greatest(a.phyrds,1))
brratio,
(a.phyblkwrt/greatest(a.phywrts,1)) bwratio
from
sys.v_$filestat a, sys.v_$dbfile b
where
a.file#=b.file#
union
select
c.file#,d.name, c.phyrds, c.phywrts,
(100*(c.phyrds+c.phywrts)/÷_by) Percent,
c.phyblkrd, c.phyblkwrt,(c.phyblkrd/greatest(c.phyrds,1))
brratio,
(c.phyblkwrt/greatest(c.phywrts,1)) bwratio
from
sys.v_$tempstat c, sys.v_$tempfile d
where
c.file#=d.file#
order by
1
/
spool off
pause Press enter to continue
set feedback on verify on lines 80 pages 22
clear columns
ttitle off
SEE CODE DEPOT FOR FULL SCRIPTS
The output from the above script is shown below. This report is
important because it shows the percent of total I/O for each
datafile.
Date:
11/09/03
Page: 1
Time: 01:24 PM File I/O
Statistics Report PERFSTAT
testdb
database
Physical Physical Block Block
File Physical
Physical Percent Block Block Read Write
File Name Reads
Writes Of I/O Reads Writes Ratio Ratio
----- --------------------------------------------- ----------
---------- -------- ---------- ---------- ------- -------
1 /data001/oradata/testdb/system01.dbf 27396
2992 53.526 55735 2992 2.03 1.00
1 /data001/oradata/testdb/temp01.dbf 1703
1357 5.390 7184 7177 4.22 5.29
2 /data001/oradata/testdb/undotbs01.dbf 151
18034 32.032 151 18034 1.00 1.00
3 /data001/oradata/testdb/drsys01.dbf 116
107 .393 116 107 1.00 1.00
4 /data001/oradata/testdb/indx01.dbf 117
107 .395 117 107 1.00 1.00
5 /data001/oradata/testdb/tools01.dbf 890
1403 4.039 1137 1403 1.28 1.00
6 /data001/oradata/testdb/users01.dbf 115
107 .391 115 107 1.00 1.00
7 /data001/oradata/testdb/xdb01.dbf 183
107 .511 194 107 1.06 1.00
8 /data001/oradata/testdb/olof_data01.dbf 1045
620 2.933 1242 620 1.19 1.00
9 /data001/oradata/testdb/olof_idx01.dbf 116
107 .393 116 107 1.00 1.00
Another important measurement is the actual timing. On some systems
and some disk subsystems, the I/O timing data can be bad, so it
should always be compared against actual
iostat numbers. An example I/O timing
report is shown below.
Date:
11/21/03
Page: 1
Time: 09:56 AM I/O Timing
Analysis PERFSTAT
testdb
database
FILE#
NAME
PHYRDS PHYWRTS READTIM/PHYRDS WRITETIM/PHYWRTS
----------
----------------------------------------------------------
---------- ---------- -------------- ----------------
5
/oracle/oradata/testdb/tools01_01.dbf
318 153 .377358491 .150326797
1 /oracle/oradata/testdb/system01.dbf
3749 806 .332622033 2.3101737
9
/oracle/oradata/testdb/tcmd_data01_03.dbf
442389 1575 .058064283 6.90095238
8
/oracle/oradata/testdb/tcmd_data01_02.dbf
540596 2508 .057647485 5.11961722
7
/oracle/oradata/testdb/tcmd_data01_01.dbf
14446868 1177 .036516842 2.62531861
10
/oracle/oradata/testdb/tcmd_idx01_02.dbf
15694 5342 .035746145 6.50074878
3
/oracle/oradata/testdb/rbs01_01.dbf
757 25451 .034346103 10.7960002
11 /oracle/oradata/testdb/tcmd_data01_04.dbf
1391 606 .023005032 6.66336634
6 /oracle/oradata/testdb/tcmd_idx01_01.dbf
1148402 10220 .015289942 6.35831703
2 /oracle/oradata/testdb/temp01_01.dbf
34961 8835 0 0
4
/oracle/oradata/testdb/users01_01.dbf
78 76 0 0
11 rows selected.
The output in listing above shows that all of the I/O timing is at
or below 10 milliseconds. Normally, this would be considered to be
good performance for disks; however, most modern arrays can give
sub-millisecond response times by use of caching and by spreading
I/O across multiple platters. While many experts say anything less
than 10-20 milliseconds is good, that was based on old disk
technology. If the disk system is not giving response times that are
at five milliseconds or less, one should consider tuning the I/O
subsystems.
Another interesting statistic is the overall I/O rate for the system
as it relates to Oracle. This is easily calculated using PL/SQL as
shown in the
get_io.sql script
below.
-- *************************************************
-- Copyright © 2005 by Rampant TechPress
-- This script is free for non-commercial purposes
-- with no warranties. Use at your own risk.
--
-- To license this script for a commercial purpose,
-- contact info@rampant.cc
-- *************************************************
set serveroutput on
declare
cursor get_io is select
nvl(sum(a.phyrds+a.phywrts),0) sum_io1,to_number(null)
sum_io2
from sys.gv_$filestat a
union
select
to_number(null) sum_io1, nvl(sum(b.phyrds+b.phywrts),0)
sum_io2
from
sys.gv_$tempstat b;
now date;
elapsed_seconds number;
sum_io1 number;
sum_io2 number;
sum_io12 number;
sum_io22 number;
tot_io number;
tot_io_per_sec number;
fixed_io_per_sec number;
temp_io_per_sec number;
begin
open get_io;
for i in 1..2 loop
fetch get_io into sum_io1, sum_io2;
if i = 1 then sum_io12:=sum_io1;
else
sum_io22:=sum_io2;
end if;
end loop;
select sum_io12+sum_io22 into tot_io from dual;
select sysdate into now from dual;
select ceil((now-max(startup_time))*(60*60*24)) into
elapsed_seconds from gv$instance;
fixed_io_per_sec:=sum_io12/elapsed_seconds;
temp_io_per_sec:=sum_io22/elapsed_seconds;
tot_io_per_sec:=tot_io/elapsed_seconds;
dbms_output.put_line('Elapsed Sec :'||to_char(elapsed_seconds,
'9,999,999.99'));
dbms_output.put_line('Fixed IO/SEC:'||to_char(fixed_io_per_sec,'9,999,999.99'));
dbms_output.put_line('Temp IO/SEC :'||to_char(temp_io_per_sec,
'9,999,999.99'));
dbms_output.put_line('Total IO/SEC:'||to_char(tot_io_Per_Sec,
'9,999,999.99'));
end;
/
SEE CODE DEPOT FOR FULL SCRIPTS
An example of the output from this report is shown below.
SQL> @io_sec
Elapsed Sec : 43,492.00
Fixed IO/SEC: 588.33
Temp IO/SEC : 95.01
Total IO/SEC: 683.34
PL/SQL procedure successfully executed
By examining the total average IO/SEC for the database, one can
determine if the I/O subsystem is capable of handling the load. For
example, if the above listing was for a RAID 10 system
with 10 disks, in a five-way stripe in a two-way mirror array, then
the DBA will know that they don’t have any problems with I/O rate:
(10 DISKS * 110 IO/SEC/DISK = ~1100 IO/SEC max rate)
However, if there are only 6 disks in a RAID5, then
there are probably periods when I/O is saturated ( 5 DISKS * 90
IO/SEC/DISK = ~ 450 IO/SEC max rate). The above I/O rate is an
average, which means that if there is an equal distribution about
the mean, then 50% of the time the I/O rate was higher than this
reported value of 683 IO/SEC.
The other indications of possible I/O related problems with Oracle
are examination of I/O related wait events from the wait interface
of the Oracle kernel. If the STATSPACK reports or home-grown reports
show that any of the following waits are the majority wait on the
system, look at tuning the I/O system in relationship to Oracle:
§
db file sequential read : An event generated
during index and other short term read events.
§
db file scattered read : An event generated
during full-table and index scans.
§
db file parallel write
: An event generated during writes to multiple
extents across multiple datafiles.
§
Log file or control file writes: Events generated
during waits to write to a log or control file.
§
Direct path read or write: Events generated during
hash, sort, global temporary table I/O or other direct operations.
§
LGWR waits: Events generated during writes to the redo
logs.
These events must be reviewed in relationship to their overall
contribution to the total service time. The following is an example
Wait Report in Comparison to CPU Time.
Date: 02/03/04
Page: 1
Time: 08:31 AM System Events
Percent MAULT
testdb database
Percent Percent
Total
Average Of of Total
Event Name Waits Waits Time Waited
Non-Idle Waits Uptime
------------------------------ --------- ------------ ------------
-------------- ---------
CPU used when call started 0 0
3,580,091 52.659 3.8499
db file sequential read 9,434,983 0
1,278,929 18.811 1.3753
enqueue 302 2,899
875,552 12.878 .9415
wait for stopper event to be 1,526 194
295,860 4.352 .3182
increased
db file scattered read 430,041 1
261,103 3.841 .2808
log file parallel write 339 590
199,881 2.940 .2149
db file parallel write 32,240 5
170,070 2.502 .1829
… ------------
sum 6,798,684
So, even if all I/O related wait times in the above report are
eliminated, the service time would only be reduced by about 27%. It
cannot always be assumed that fixing I/O will give large performance
returns. If there are CPU usage issues, adding the fastest disks in
the world may not help performance that much. This directs the focus
to the major tuning point for any Oracle or other database system:
Tune the code first! Make the SQL as optimized as possible for both
Logical and Physical IO, then tackle other issues.
The Wait Report shown above was generated using the following
script.
-- *************************************************
-- Copyright © 2005 by Rampant TechPress
-- This script is free for non-commercial purposes
-- with no warranties. Use at your own risk.
--
-- To license this script for a commercial purpose,
-- contact info@rampant.cc
-- *************************************************
col event format a30 heading 'Event Name'
col waits format 999,999,999 heading 'Total|Waits'
col average_wait format 999,999,999 heading 'Average|Waits'
col time_waited format 999,999,999 heading 'Time Waited'
col total_time new_value divide_by noprint
col value new_value val noprint
col percent format 999.990 heading 'Percent|Of|Non-Idle Waits'
col duration new_value millisec noprint
col p_of_total heading 'Percent|of Total|Uptime' format 999.9999
set lines 132 feedback off verify off pages 50
select to_number(sysdate-startup_time)*86400*1000 duration from
v$instance;
select
sum(time_waited) total_time
from v$system_event
where total_waits-total_timeouts>0
and event not like 'SQL*Net%'
and event not like 'smon%'
and event not like 'pmon%'
and event not like 'rdbms%'
and event not like 'PX%'
and event not like 'sbt%'
and event not in ('gcs remote message','ges remote
message','virtual circuit status','dispatcher timer') ;
select value from v$sysstat where name ='CPU used when call started';
@title132 'System Events Percent'
break on report
compute sum of time_waited on report
spool rep_out/&db/sys_events
select name event,
0 waits,
0 average_wait,
value time_waited,
value/(&÷_by+&&val)*100 Percent,
value/&&millisec*100 p_of_total
from v$sysstat
where name ='CPU used when call started'
union
select event,
total_waits-total_timeouts waits,
time_waited/(total_waits-total_timeouts) average_wait,
time_waited,
time_waited/(&÷_by+&&val)*100 Percent,
time_waited/&&millisec*100 P_of_total
from v$system_event
where total_waits-total_timeouts>0
and event not like 'SQL*Net%'
and event not like 'smon%'
and event not like 'pmon%'
and event not like 'rdbms%'
and event not like 'PX%'
and event not like 'sbt%'
and event not in ('gcs remote message','ges remote
message','virtual circuit status','dispatcher timer')
and time_waited>0
order by percent desc
/
spool off
clear columns
ttitle off
clear computes
clear breaks
SEE CODE DEPOT FOR FULL SCRIPTS
The following section shows how one can track down disk I/O against
specific tables and indexes. This can give great insight into the
internal operations of the Oracle database application.
SEE CODE DEPOT FOR FULL SCRIPTS