Engineering Better PL/SQL
Bert Scalzo
PL/SQL is a great language. It’s relatively
simple to learn, is well integrated with the Oracle database,
and can often be the most efficient way to perform complex or
large scale database operations. In fact PL/SQL is so useful,
it’s difficult to believe that its origin is SQL*Forms – and
that PL/SQL was once was an optional cost add-on to the
database.
However contrary to logical and reasonable expectations, a
simple to learn yet robust language like PL/SQL does not
automatically guarantee programs which are readable,
maintainable, effective (i.e. correct) and efficient. In fact,
some of the worst programs I’ve seen over the past twenty years
of Oracle development were written in PL/SQL. I have often been
quite amazed at just how easily one can “shoot themselves in the
foot” with PL/SQL – and how often it goes undetected until a
major production crisis occurs.
So
the question is “How do we engineer better PL/SQL?” This paper
will examine some commonly used manual methods and their
shortcomings, and then will offer some more scientific advice
for how to improve upon the PL/SQL development process. And
while this paper will demonstrate techniques using Quest
Software’s TOAD for Oracle product, the practices espoused
within are actually based on industry standards (although as of
yet not universally prevalent – although hoping that papers like
this may help to correct that).
The Cost of Software Defects
In
order to see the full value for engineering better PL/SQL, we
must first understand and appreciate the ramifications for not
doing so. Thus looking beyond the simple yet highly relevant
fact that our jobs may well depend on it, what does poorly
engineered code cost these days? The answers are quite
staggering. One survey estimates that inferior software
engineering and inadequate testing in 2002 alone cost $59.5
billion for the just US.
Now I’ve seen current quotes of about 2.5 million information
technology (IT) workers in the
US.
Moreover it seems like only about ½ of those working in IT these
days designs and writes code – so that translates into nearly
$48,000 in bug costs per PL/SQL developer. What if companies
someday legally held programmers accountable for costs based
upon their mistakes (don’t laugh, it’s been discussed before and
why programmers might need to unionize). Because I for one would
not want to see my salary reduced by that figure!
Of
course that’s really just some interesting observations and
speculation. Closer to earth, the facts break down as follows:
Developers spend 40% of their time fixing software defects
Between 60% and 70% of the cost of software is attributable to
maintenance
The Failure
of PL/SQL “Best Practices”
Most PL/SQL development shops I go to subscribe to some kind of
PL/SQL guidelines and best practices. I often see the popular
PL/SQL series of books on their shelves in a place of reverent
prominence, as well as articles pinned to their cubicle walls.
Moreover many Oracle user groups and conferences are replete
with sessions on novel PL/SQL development guidelines and best
practices. Add to that the many articles, webinars and blogs on
the topic – and it just seems like everyone has fully bought
into this approach.
But how many of us drive the speed limit? That may seem like a
silly question. But isn’t a road sign stating “speed limit 55
MPH” really just a recommendation – which I’d even go so far as
to call a traffic guideline or best practice. Because until the
state trooper pulls up behind our car and turns on his flashing
lights, we’re all going to drive as we naturally think is proper
for the circumstances.
Don’t we all slow down for rain and snow as part of our basic
instinct for self-preservation? We don’t go 55 just because the
sign says so. The problem is that many of us also liberalize
towards the other direction for various reasons. Plus herding
instinct has lots of us all just zooming along because everyone
else is doing it. So traffic guidelines don’t seem to work all
that well.
Well writing programs is not all that different. We all may
genuinely have the best of intentions in mind when we start
coding, but it’s a long and laborious process. It’s only natural
to sometimes forget to apply all 900 best practice rules to
every line of code. In fact, it’s quite reasonable to expect
that we’re only generally following the spirit of best
practices. But from the project manager’s perspective – that’s
simply not good enough. We cannot build and roll into production
database applications that merely have good intentions as their
basis. Because as large and complex as today’s database
applications are, there is just no reasonable way we can nor
should expect quality assurance (QA) to catch everything.
Application code should be inherently and intrinsically sound –
with QA merely catching the exceptions.
That’s why I see best practices as basically flawed – because as
a methodology it cannot address the following major concerns
(which will be our benchmark going forward):
• Reliability
Do we have the ideal set of best practice rules identified
Is everyone following all the prescribed rules all the time
• Consistency
Is everyone interpreting all the rules the exact same way
Will different people apply the same rules to different ends
• Measurability
How do we measure and attribute success of this methodology
Can we quantify the cost versus savings of this methodology
•
Effectiveness
Will we simply and easily get results of better engineered code
The
Challenges of Code Reviews
Now at the best PL/SQL development shops I’ve worked in, the
project managers have instituted mandatory peer code reviews –
and we really did them. The process generally broke down as
follows:
Developer writes their program unit(s)
Developer performs basic unit testing (pre-QA)
Developer submits code to project manager for code review
Project manager schedules 2-4 peers to meet and review code
Peer review occurs, with minutes recording recommendations
If only minor issues, developer corrects and then moves to QA
If major issues, then repeat the entire process to ensure
quality
The good news is that under ideal conditions, peer code reviews
can often produce stellar results in terms of code quality. The
bad news however is the increased costs – both in terms of time
and money. This approach is a very person-hour intensive
process. While the results might easily warrant the increased
costs, many shops won’t even give it a try based upon this. But
the peer code review process can work.
I’ve witnessed many shops reduce their production database
errors by an order of magnitude or more. I’ve also seen a few
shops eliminate the need for expensive hardware upgrades through
more efficient coding discovered via code reviews. Sometimes
higher costs should simply be borne.
But even for those shops willing to spend the extra time and
money, peer code reviews have another drawback that’s harder to
quantify – team dynamics. Imagine that you’re a junior developer
submitting your code for review, and the most senior guy on your
team finds a really stupid mistake in your code. How would you
feel as the junior guy? How would you feel as the senior guy?
Without project management’s commitment and good team dynamics,
the peer code review process can strain relations on many teams.
I’ve seen people who’ve needed to iterate their code through the
review process more than a few times – and both the author and
the reviewers were stressed by the affair.
Finally the project manager needs to keep the peer code review
process and iteration counts separate from developer
productivity for issues like annual salary reviews. I’ve seen a
case where the most critical piece of application code took five
iterations even though it was written by our most proficient
PL/SQL developer. Complex logic can easily require more review
iterations – regardless of the person authoring it and their
skill level.
Returning to our benchmark for success, I feel that peer code
reviews also fail – however only because they still cannot as a
methodology adequately address one major concern:
• Consistency
Is everyone participating on code reviews to fullest ability
Will different people review the same code to different ends
Software
Engineering to the Rescue
Back in the mid 1980’s, the US Air Force funded a study for the
objective evaluation of software at the
Carnegie Mellon University’s Software Engineering Institute (SEI).
That study resulted with the publishing of “Managing the
Software Process” in 1989, which first introduced a
revolutionary and soon-to-be widely accepted model to organize
and improve the software development process – known as the
Capability Maturity Model (CMM). The actual
CMM
v1.0 specification was later published in 1991, then updated
throughout the 1990’s, and finally supplanted in 2000 by the
newer Capability Maturity Model Integration (CMMI).
But in a nutshell, both models basically espouse a simple
framework by for an effective software development process –
with five levels of accomplishment. People can readily measure
and rate their own software development process against that
framework. Once your maturity rating is known, the model
provides recommendations for improving your development
processes. Below is a very simplified explanation of the five
levels within the
CMM/CMMI:
1. Initial – ad hoc processes, success often depends on
competence and heroics of people
2. Repeatable – organization begins using project management to
schedule & track costs
3. Defined – true organizational standards emerge & are applied
across different projects
4. Managed – management controls all processes via statistical &
quantitative techniques
5. Optimizing – agile, innovative, and continuous incremental
improvements are applied
While it’s quite interesting to ponder where a shop may lie
within that universe, there is nonetheless an underlying theme
that’s easy to spot here – one can mature their software
development processes by implementing project management,
development standards, conformance measurement and on-going
improvements. That should actually seem quite natural, because
it’s simply what any good project manager would do – even if
they did not know its fancy name (i.e.
CMM/CMMI).
But there’s another less obvious aspect inferred by the maturity
model – that automation tools to better manage, measure and
improve these processes can aid with the maturation process. How
many project managers could effectively do their job without
software like Microsoft Project or Open Workbench? But how many
developers actually and routinely use tools to automate the
formatting, analysis and tuning of their SQL and PL/SQL code?
And even for those who might, how many do so within a defined
structured process with standard measurements for effectiveness
and efficiency?
The answer is
simply to combine good project management with development tools
that foster and support superior software engineering techniques
– as well as their automation.
Step 1 – Automate PL/SQL Best Practices
Let’s first start
by re-examining some of the earlier mentioned techniques, namely
best practices and code reviews. These techniques did not fail
due to any fundamental flaws. It’s just that both methods relied
heavily on entirely manual processes – which were not guaranteed
to yield reliable nor consistent results. These shortcomings can
quite easily be overcome. Let’s assume that you’re using a
robust database development tool like Quest Software’s TOAD for
Oracle to write your SQL and PL/SQL code. Then you can easily
leverage its CodeXpert technology – which can fully automate
both a comprehensive best practices check and basic code review.
Second, you’ll need to select a
collection of rules (referred to as a “rule set”) that you’re
going to use to scan your PL/SQL code. If one of the pre-canned
or existing rule sets will not suffice as a possible corporate
standard, then you’ll first need to create a custom rule set. To
do that, you simply press the bottom panel toolbar icon that
looks like a folder with a red flag in it (and is located just
to the right of the rule set selection drop-down).
Third, you now simply evoke the
CodeXpert to perform a scan – also known as a review.
Fourth and final, you now simply
examine the results of the CodeXpert scan – and fix all those
issues that violate your standard. Look at how simple the few
lines of PL/SQL – including opening and not closing a cursor.
CodeXpert truly gives you fully automated, best practice code
reviews – that are both reliable and consistent.
Step 2 –
Automate Individual SQL Statement Tuning
Now this is where development
process automation tools like the TOAD’s CodeXpert really begin
to pay off. Typical code reviews focus on the business logic and
language constructs – basically just the items we covered and
caught in the previous section. So there’s often very little
time and effort expended on reviewing the efficiency of the SQL
statements within that code, because developers in code reviews
just don’t have enough time to review every explain plan – as
well as the numerous other related and sometimes intangible
tuning issues. Thus unless the QA process runs the PL/SQL code
in question against a database with sufficient size, you won’t
know until it’s moved into production that a problem exists.
Thus we have a challenge – identifying what SQL needs tuned.
Furthermore, the science of
reading and deciphering an Oracle explain plan is both highly
subjective (based upon peoples’ SQL tuning acumen and database
object knowledge) and somewhat enigmatic. With all the various
database versions, intricate optimizer nuances, and the plethora
of database object constructs, the task of finding an optimal
explain plan is a tall order to fill. Far too often this task is
simply relegated to the DBA either as a late development
deliverable or as a problem to correct during testing. Either
way it makes far better sense to equip the developer to handle
the issue. The developer knows the business requirements and the
code itself much better. Plus the developer often has a better
insight into code interactions (i.e. how the code in question
participates in the overall application, and what other code it
effects or is affected by). The developer simply needs tools to
help them focus on the true problems, and to find optimal
solutions with minimal efforts.
That’s where technology like
CodeXpert shines – because it can fully automate both the
finding and fixing of complex, problematic and invalid SQL
Statements. It’s also a very simple four step process as
detailed below.
First, you need to define what
SQL scanner tuning options should apply during the code review
scan. To do that, you simply press the bottom panel toolbar icon
that looks like a toolbox with checkmarks (and is located just
to the right of the flashlight shining on the word “SQL”). This
screen contains two tabs worth of SQL tuning and optimization
options. Note how I’ve chosen to exclude SQL statements that are
contained within comments or reference only the SYS.DUAL table.
I’ve also supplied my preferences for what detailed
characteristics constitutes simple vs. complex vs. problematic
SQL statements. For my needs, any SQL statement with between
three and five joins is complex – or just beginning to become a
challenge. And those SQL statements with six or more joins are
problematic – or worth serious efforts on my part to optimize to
their fullest.
Third, you merely examine the
results of the CodeXpert optimization scan – and tune all those
SQL statements that you feel warrant your attention.
I cannot adequately stress just
how valuable this can be. It’s like having a tuning mentor
sitting at your side and pointing out all your optimization
candidates – or where you’d be best rewarded for spending more
time tuning. And again, it does all this without adding undue
stress or burden to your system. Thus for very little additional
developer effort or system overhead, CodeXpert gives you SQL
tuning and analysis reviews – that are both reliable and
consistent.
Step 3 – Improve Code via Proven
Scientific Metrics
These first two steps are clear
movement in the right direction. Remember that the SEI maturity
models advocate increased software development process
accomplishment via implementing project management, development
standards, conformance measurement and continuing improvements.
Since many development shops these days have embraced both good
project management techniques and tools, they have thus matured
their ratings from level-1 (initial) to levl-2 (repeatable).
However in their quest to improve further, far too many have
relied primarily on the manual application of both best
practices and code reviews – which lack reliability and
consistency.
The critical and initial step in
obtaining SEI maturity level-4 (managed) is to understand,
embrace and implement quantitative analysis. But what exactly is
quantitative analysis? According to one definition:
Quantitative analysis is an
analysis technique that seeks to understand behavior by using
complex mathematical and statistical modeling, measurement and
research. By assigning a numerical value to variables,
quantitative analysts try to decipher reality mathematically.
Even though I have my PhD, I’ve
never been great with math or fancy definitions. That’s really
just a pretty academic way to overstate a rather simple idea.
There exist some very well published and accepted standards
(i.e. formulas) for examining source code such as PL/SQL, and
assigning it a numeric rating. Furthermore, these ratings are
simple numeric values that map against ranges of values – and
where those ranges have been categorized.
This metric simply assigns a
numerical complexity rating based upon the number of operators
and operands in the source code. Below is a quick summarization
of how it’s derived:
Code is
tokenized and counted, where:
n1 = the
number of distinct operators
n2 = the
number of distinct operands
N1 = the
total number of operators
N2 = the
total number of operands
Halstead
calculations now applied as follows:
Measure
|
Symbol
|
Formula
|
Program
length |
N
|
N= N1 + N2
|
Program
vocabulary |
n
|
n= n1 + n2
|
Volume
|
V
|
V= N *
(LOG2 n) |
Difficulty
|
D
|
D= (n1/2)
* (N2/n2) |
Effort
|
E
|
E= D * V
|
|