Improving Performance in an Oracle Database

Abstract

The purpose of this document is to provide a quick reference to a number of techniques available to improve Oracle database performance. The intended reader is a database designer or consultant who is expected to be knowledgeable about Relational Databases and at least familiar with Oracle. The table of contents is organized as an outline to provide an instant overview of the techniques covered. Some of the techniques will be examined in detail in this document, but most of the techniques are presented on the conceptual level whose details can be pursued in the Oracle documentation.

   Introduction


  I. Configuration Techniques
    A) Use Raw Partitions instead the file system.
    B) Eliminate Fragmentation
      1) Defining Fragmentation
      2) Preventing Fragmentation
      3) Dealing with Existing Fragmentation
    C) Table Clustering.
      1) Table Index Clustering
       a) Clustering about the join of 2 or more tables.
       b) Clustering about 1 table.
      2) Hash Clustering
    D) Maximize Size of System Global Area
  
  II. SQL Tuning and Stored Procedures
     A) Reducing Number of Indexed Keys.
      1) Index Performance Tradeoffs.
      1) Alternatives to Indexes
     B) Tuning individual SQL Queries
     C) Using Stored Procedures
  
  III. Application Level Techniques
     A) Application Cacheing of Smaller Tables
     B) Report Data Warehousing
     C) Collection of Statistics

  IV. Techniques Affecting Data Model
    A) Third Normal Form
    B) Collections - Implement Many <=> 1 relationships
      1) VARRAYS - Atomic MultiValues withing a single column.
      2) Nested Tables - MultiValues of another table's rows
       a) Evaluating Nested Table Candidates
       b) Performance Considerations of Nested Tables

Introduction

Altho this document is targetted at the Oracle database, with the exception of Table Clustering, most of the other techniques listed in this document can also be applied to other database packages.

This document takes a full system approach to improving database performance. A single technique approach, like tuning SQL queries using tools such as Explain Plan, can yield some significant improvements in the short term. However, optimizing SQL queries will only help to the extent that the SQL queries are the problem. As this document will show, there are many performance tuning options that range from the relatively quick fix of Eliminating Fragmentation to the more involved Report Data Warehousing.

Role of Database Designer

Since this web page is targetted to Database Designers, the first subject covered will be to define just what it means to do database design.

The role of the relational database designer is ambiguous. He or she is expected to design the Data Model, consisting of the Entity/Relationship model, if one is used, and the Relational Model (the tables definitions). Outside of the Data Model, the other expectations of the database designer tend to vary with the particular project.

The diagram below is intended to illustrate the role of the database designer in relation to the various tasks associated with database design and implementation: Figure 1 is intended to be a Venn diagram using boxes instread of circles. The area enclosed within larger gray box represents what should be the realm of responsibility, or at least involvement, of the database designer (or architect). The area enclosed within the green box represents the responsibilities of the DBA. Thus, it is intended that figure 1 illustrates that Configuration should be a shared responsibility. (Please note that in this diagram the size of the separate boxes has no implications. The DBA got a small box, but also got top billing.)

The arrows in figure 1 are intended to show dependency, such as the Application Code being dependent upon the the Data Model and/or the Functional Model. But even in the application code dependency there is ambiguity because there may or may not be a distinct Functional Model. On some projects, the application developers may access the table directly by using ODBC and SQL, or some similar method. If there is a Functional Model, some of the application code might bypass it. Or another alternative is that there is no Functional model per se, but the database designer may be expected to assist the application developers with database access, or just code it up himself on an ad hoc basis.

The dotted line around the Business Rules is meant to show that altho it would never expected for the database to enforce all the business rules, some of the business rules can be enforced by the database. For example, encapsulating separate credit and debit operations within a single transaction is applying a business rule. Stored procedures can be used to apply business rules.

Consequently, before getting into the particulars of increasing database performance by using the listed techniques, this document advocates clear role definitions for the various players that are involved with the database and its associated application. This document also promotes the idea that the Database Designer should at least be involved, if not directly responsible for, all the areas within the gray outline of figure 1. Achieving optimum database performance will usually require a holistic approach to all the factors affecting the database performance. Without someone given the means to coordinate all the factors affecting database performance, the odds of achieving optimum performance are remote.

Dependency Considerations

Altho role definitions and development processes may vary or be ambiguous, functional dependency is clear. For example, one cannot design the the SQL for queries or updates (ie Functional Model) to a database before the tables are defined (Relational Model). Since dependencies are very relevant to the effort needed to implement a particular technique, this document will start with the least intrusive performance techniques and progress to the most intrusive techniques. Since database configuration is independent of the dependency tree, changes to the configuration are the least intrusive techniques. The most intrusive techniques would be at the base of the dependency tree, those involved with modifying the Data Model.

Configuration Techniques are Dependency Independent

Refering again to figure 1, database Administration and Configuration, typically the duties of the DBA, is independent of the dependency tree. This independence implies that any techniques applied to the Configuration to improve performance can be done without affecting the application design. Thus, techniques applied to the Configuration, the least intrusive techniques, will be covered first.

RAW Partitions

Using raw partitions instead of a file system simply removes a layer (the file system) between the raw disk blocks and the database package. The most profound effect of using raw partitions is most likely to be the lessening of fragmentation.

Eliminate Fragmentation

Defining Fragmentation

The most optimal physical layout of a database is for all of its tables (and other objects) to be stored in contiguous disk blocks. Any time there is a gap between the disk blocks of a given table, fragmentation has occured. The more gaps that exist within the disk blocks of tables, the more fragmentation there is.

Fragmentation causes additional disk head movements, which significantly increase the amount of time it takes to perform database operations. These head movements are in the form of seeks, which are the moving of the disk head from one disk track to another. A seek will typically take at least 5 milliseconds (1/200th of a second) or longer.

1/200th of a second may not seem like much, but it snowballs. Not only can there be many seeks be involved in a database transaction, the disk driver software may not immediately perform a required seek. The optimization algorithms in the disk driver may interrupt your disk operation to do other disk operations to minimize disk head movement. The moral of the story is that fragmentation is a bad thing.

Preventing Fragmentation

The best prevention of fragmentation is the proper setting of the INITIAL and NEXT parameters of the storage clause, which itself is part of the extent specs. The storage clause is a Data Definition (DDL) SQL statement clause. The storage clause or its extent specs container is found in the CREATE CLUSTER, CREATE INDEX, CREATE TABLE and CREATE TABLESPACE DDL SQL statements. The NEXT paramater is also modifiable in the ALTER CLUSTER, ALTER INDEX, ALTER TABLE and ALTER TABLESPACE DDL SQL statements.

The value of the INITIAL parameter represents the number of disk blocks originally allocated to a database object (table, index, tablespace, or cluster), while the NEXT parameter represents the size in disk blocks of the Extent that object will increase by when it needs more space. An Extent is a logical unit of database storage and is made up of contiguous disk blocks.

The less often new extents are added to a database object, the less fragmentation. Thus, large INITIAL and NEXT extent sizes are better than small extent sizes, providing disk space is not wasted that may needed elsewhere. As is true with all database configuration issues, the setting of the INITIAL and NEXT parameters is a matter understanding the nature of the data and of the application.

Dealing With Existing Fragmentation

In an existing system with fragementation, the NEXT clause can be modified in the ALTER CLUSTER, ALTER INDEX, ALTER TABLE and ALTER TABLESPACE DDL SQL statements to prevent it from getting worse.

To eliminate existing system fragmentation, the database can be exported and then imported. If the database is 24/7 and can never be scheduled for down time, that database would need to be rebuilt in a separate space and put online using some switchover mechanism.

Table Clustering

The Oracle database gives the database designer or DBA 3 distinct techniques to store, or cluster, table rows together within disk blocks. Those 3 ways are Table Clustering about 1 table, Table Clustering about 2 or more tables, and Hash Clustering. The primary purpose of Table Clustering about 2 or more tables is to expidite joins and is discussed in a separate web page.

The factors that determine if a table or tables are good candidates for Table Clustering are:

An illustration that sums up these 2 factors is found here. The variation of the size of the row clusters is a property of the data, and can't be controlled. However, the Disk Block Size can be controlled, and for Table Clustering the bigger the better. Unfortunately, the disk block size can only be defined once for an entire database instance, but a larger disk block size in general shouldn't cause many adverse affects.

Table Clustering about 1 table

If a significant portion (1 or more columns) of the rows of a table tend to have same value repeated for many rows at a time, it may be adventagous to define that column or columns as a key and use that key as a Table Cluster Index. In other words, candidates for this technique have a duplicate key where most of those keys have a high number of duplicates.

The way it works is that the key value, possibly comprising a number of columns, will only be stored once for all the rows that share that same key value. Additionally, all the rows that share that same key value will be stored contiguously within the same disk block. Obviously, it makes no sense to apply this technique to unique keys.

The primary advantage to Table Clustering about 1 table is that when the Table Cluster Index key is used in queries, those queries will be faster since all the rows with the same key value are physically stored together. In addition, that duplicate key will only need to be stored once by the associated index, a significant factor in index traversal and maintenance performance.

Hash Clustering

A database index can be defined as an auxilary mechanism to a table that is used to rapidly locate particular rows given a particular key value. By this definition it would be possible to design an index based upon a Hashtable object. However, in Oracle, the term 'index' applies only to B-Tree indexes.

In Oracle, hashtables per se do not exist, but key hashing can be used to locate rows via Hash Clustering. Hash Clustering is similar in concept to Table Index Clustering, but uses a hashing function to determine in which disk block a row of a given key value resides, instead of a B-Tree index.

Altho a Hash Cluster is similar in concept to an Index Cluster, it has its own criteria for use. Hash Clustering is not about the efficiency of doing joins or for index keys of low cardinality. The advantage to using a Hash Cluster is that it replaces a B-Tree index. Like all Oracle Table Clustering, properly determining the optimal configuration parameters can be complicated, but it only has to done once.

Maximize Size of System Global Area

The System Global Area (SGA) is a shared memory region allocated by Oracle that contains: In short, the more RAM allocated to the SGA, the better Oracle will run. Oracle provides extensive documentation on the SGA. The SGA can be studied and by using Oracle's many analytical utilities one can take an educated guess as to the affect on database performance by increasing the SGA to a certain size. It may turn out that installing some extra megs of RAM is a relatively painless way to significantly improve database performance.

Reducing the Number of Indexed Keys

Indexing Perfomance Tradeoffs

The use of B-Tree indexes present a tradeoff. In general, a query operation that searches by a particular key will be faster if that key is indexed. However, there are also downsides to indexes.

In regards to performance, B-Tree indexes are expensive. Indexes can improve the performance of searching by indexed keys, but they degrade the performance of the SQL INSERT, UPDATE and DELETE operations. Consequently, the importance of the queries that benefit from the indexing of a particular key must be weighed against all the other operations that are hurt by it. A common cause of performance degredation is having too many indexes. Indexes should not, but often are, created in a willy-nilly fashion.

A basic criteria for the creation of an index is whether the query that benefits from it is a fundamental part of the application, or whether it is only for report generation. If the query that benefits from an index is not a fundamental part of the application (ie reports), an alternative to indexing may be a better solution.

Alternatives to B-Tree Indexes

Alternatives to B-Tree indexes include: Another alternative is to simply eliminate the index. Simple elimination of an index will most likely hurt some query operations, but the overall effect to performance may compensate for it.

Tuning Individual SQL Queries

How to optimize individual SQL queries is well documented by Oracle. Oracle offers tools such as Explain Plan to help analyze SQL queries and show how they can be made more efficient. There are also tools offered by independent companies.

Since SQL optimization is well covered by Oracle, it will not be expanded upon here. However, it is still well worth looking into.

Using Stored Procedures

The use of Stored Procedures provide 3 main advantages:

A common method for an application to access the database is to use ODBC or its Java counterpart, JDBC. Every time an ODBC operation is done, the SQL script is passed to the database. This SQL script must be compiled before it can be used. A precompiled stored procedure eliminates compiling overhead during transactions.

SQL itself is a stateless language. A stored procedure contains SQL within the environment of a logical and algorithmic program. Thus, all the interaction that may be involved between the database and the application's program logic takes place within the database. I/O between the database and the application is greatly reduced.

Previously, all stored procedures were coded in PL/SQL. PL/SQL is a distinct programming language that requires its own learning curve. PL/SQL was modelled after PL/1, which most developers today probably never heard of.

Now that Oracle supports Java, stored procedures can be developed, supported and understood by anyone who understands Java. Java programmers are everywhere and Java itself may soon become the most common programming language.

Application Table Cacheing

A huge boost in database performance can be made by storing particular tables completely in RAM. This technique involves the development of application code to read entire tables, to store those tables completely in RAM, and to access those RAM based tables for processing. Indexing support for these in RAM tables can be done using RAM based objects like Hashtables.

Since system RAM has become so cheap and abundant, this technique has become feasible for larger and larger tables. The main criteria for chosing tables for Application Table Cacheing is if those tables are essentially nonTransactional (they don't often change).

A common example of a table that is nonTransactional but yet core to its application is an Item Description table in an inventory oriented application. The Item Description table itself, the table of the descriptions and parameters of the standard items that are offered for sale by department, mail order and web based stores, is essentially nonTransactional. Most stores do not change their standard product line on a minute by minute, or even hour by hour basis.

A mail order business must have its items defined and stable before its catalogs or flyers are printed. A web site has to update the content equivilant of the mail order catalog before a new item can be used. A department store must coordinate a new item with supply, display, stocking it, etc. The point is, an Item Description table does not change often. If a change occurs, such a table can be refreshed in its RAM buffer on a scheduled basis during off hours.

In inventory oriented applications, the Item Description table is involved in almost every transaction: sales, inventory updates, order generation, invoice generation, etc. By having the Item Description table in RAM, not only is its access much faster, it also puts no burden on the I/O channel from the application to the database, the disk head movement sequence or the disk block cache.

Report Data Warehousing

A data warehouse is nonTransactional database used for data mining, the obtaining of data by the running of elaborate queries. Transactional support is very expensive in terms of resources and performance, so the data warehouse exists outside of the transactional system and is used to support data mining.

The basic idea of using data warehousing to increase performance is to separate, at least conceptually, which operations need to be part of the transactional system, and which do not. For example, scheduled daily reports do not need to be transactional. Scheduled reports can be scheduled to run in off hours. If a particular index exists soley to support certain reports, it is possible that in off hours the system can tolerate a less efficient search, even if it's a sequential search, and that index can be eliminated.

It may seem that a read only query, like a report, is not a transaction. But it is. Oracle automatically supports read consistency. Read consistency essentially takes a snapshot of all the tables involved in a query and guarantees that the entire query will see the all data at the point in time that the query started. Thus, if any UPDATEs, INSERTs or DELETEs are done to the targetted tables during a query, that query essentially becomes a read transaction.

It is common that a number of scheduled reports access the same data, but may use different indexed keys for querying. Instead of querying the transactional database separately for each report, intermediate tables can be designed that contain the information needed for those reports. These intermediate tables, which essentially become a temporary little data warehouse, can be optimized for report generation without the need for transactional considerations. To create the intermediate tables, usually only a single query, possibly a sequential search, is the only burden to the transactional system. Again, creation of intermediate tables would best be done in off hours.

Since after its creation, no UPDATEs, INSERTs or DELETEs are done to these intermediate tables (mini data warehouse), the report queries that run against it do not generate transactions. These reports can then be run on an ad hoc basis against their data warehouse. After the reports that required certain information are generated, the table rows of the intermediate tables that contain that information, or the entire tables, can be deleted.

Collection of Statistics

Oracle automatically accumulates a number of statistics on the database. But more useful information never hurts. Instead of spending time pursuing actions based upon guesses about some aspects of a database, that time may be better spent deciding what exactly what information is needed, often in the form of statistics, and adding code and/or tables to obtain that information.

An example is using custom statistics can be found in the web page on Oracle Table Clustering. Table clustering is usually not recommended even by Oracle reps, not because it doesn't work, it does work and can be very effective, but because it is difficult to determine when to properly apply table clustering. In short, there is more complexity involved that the average rep wants to deal with. As can be seen in the Oracle Table Clustering web page, a methodical approach to obtain and analyze pertinent information can make the mysterious become clear. This increased understanding will provide a heightened sense of control which will yield successful results in many performance related areas.

Techniques Affecting Data Model

As stated in the Introduction, any performance improvements that involve modifying the Data Model, will affect other things downstream to it on the dependency tree. The best thing to do, obviously, is to consider changes to the Data Model early in the development phase. The next best thing is to do, at least in theory, is to encapsulate the Data Model within a Functional Model so that any subtle Data Model changes are transparent to the Application Code.

Third Normal Form

A fundamental tenent of Relational Databases is to insure Data Integrity. One of the means used to insure Data Integrity is Normalization. Altho some purists may argue that a database should be designed to its maximum level of normalization, anything higher than Third Normal Form will probably be unnecessary and serve only to hurt performance.

Collections - Implement Many <=> 1 relationships

Oracle has become an Object Relational database. In other words, Oracle now supports objects. It would appear that if any of the the new Object Relational features would benefit performance, that feature would be the support of Collections. Collections in Oracle consist of VARRAYs and Nested Tables.

VARRAYS - Atomic MultiValues withing a single column.

The VARRAY data type is essentially an array with the same properties as an array in any programming language. It has a fixed size, or given number of slots for storage bins for values, and each slot is addressable via its whole number index. The index of and array is not to be confused with a database index.

VARRAYs are useful in cases there is a 1 to MANY relationship between a table row and a single value. Simple examples might be criminal's alias names, or the people listed to drive a particular car. In each case, the array of values is strictly one dimensional. Conversely, a list of phone numbers a person might have would not be one dimensional, since a phone number is incomplete without knowing if a given phone number was for home, work, cell phone, etc.

In regard to performance, the pertinent property of VARRAYs is that unless they are very large (greater that 4K), they are stored in place within the data row. Thus, VARRAYs can significantly improve the performance of a 1 dimensional multivalues.

The downside of VARRAYs that its empty data slots must store a NULL value.

Nested Tables - MultiValues of another table's rows

Evaluating Nested Table Candidates

Similar to VARRAYs, Nested Tables exist to conveniently store a 1 to MANY relationship between a table row and another entity, in this case the rows of a nested table. The term nested is significant. It is only intended for cases where the only time the nested table rows are accessed is as part of the row that contains them. If the nested rows are also expected to be accessed independently, those rows should not be nested.

Following is an example of 4 normalized tables that comprise the information contained within an Order. These tables are:

              Item Description Table
      Item #  Item Name  Misc Item Parameters ~
      ------  ---------  ---- ---- ---- ----- ~
      066364  thingagig         ....          ~
      087905  dodad             ....          ~
      037617  gizmo             ....          ~
      053135  thingie           ....          ~
       ....    ....             ....          ~
    
              Customer Description Table              
      Customer #  Customer    Other Customer Info ~
                  Name        Addr City State Zip ~
      ----------  ----------  ---- ---- ---- ---- ~
       AA23487    Widget Inc         ....         ~
       GV12464    zipTech            ....         ~
       MT57612    A-OK Corp          ....         ~
    
      Order Customer Table        Order Table
      Order #  Customer #     Order #  Item #  Qty
      -------  ----------     -------  ------  ---
      02-1123   AA23487       02-1123  066364   39
      02-1154   GV12464       02-1123  087905   32
      02-2321   MT57612       02-1154  037617   81
                              02-1154  053135   17
                              02-2321  006776    6
                              02-2321  055662  144
                              02-2321  067587   37
                              02-2321  087905   55
The base of an order is the join between the Order Customer Table and the Order Table, which has 1 to many relationship respectively, as shown below:
                 Joined Order Table
           Order #  Customer # Item #  Qty
           -------  ---------- ------  ---
           02-1123   AA23487   066364   39
                               087905   32
           02-1154   GV12464   037617   81
                               053135   17
           02-2321   MT57612   006776    6
                               055662  144
                               067587   37
                               087905   55
The Item Description and Customer Description tables simply add descriptive parameters to this join on a 1 to 1 basis.

In this example, the Order Table exists only to normalize the 1 to many relationship between the Order Customer Table and the Order Table. There is no reason to access the Order Table other than as part of the join of the Order Customer Table and the Order Table. Thus, the Order Table is a candidate to be a Nested Table.

Performance Considerations of Nested Tables

Unfortunately, the nested rows of Nested Tables are stored separately, just as they would be in the conventional normalized scenario. When the combined table, the Joined Order Table in our example, is accessed, a dynamic join still must be done. Consequently, Nested Tables are really just a fancy View mechanism and don't offer a performance benefit. A better solution to use Table Clustering. Table Clustering not only actually stores the parent row and the nested rows together, but is even less restrictive since the nested rows can efficiently be accessed independently.

Copyright (C) 2002 by Cary Rhode,
Software Developer and self proclaimed database guru. E-mail caryrhode@yahoo.com