Statistical Analysis of Oracle Table Clustering

Abstract

This web page will describe a technique to determine if a candidate for Oracle Table Clustering (1 or more tables) is a good fit. This technique uses Statistical Analysis to predict and balance the tradeoff between disk space utilization and disk block chaining of the data stored in Oracle clustered table disk blocks. Programs are supplied to calculate the numbers, and the theory behind those calculations is explained.

INTRODUCTION - Oracle Table Clustering

NOTE: To eliminate ambiguity, the term row will be reserved for individual table rows. A group of these individual rows will be called a Cluster Row, which will alway be capitalized.

Oracle Table Clustering is an optional technique of storing table data. The primary purpose of Clustering is to take the rows from 1 or more tables that are associated to each other about a common data entity (see figure 1), and physically store those rows adjacent to each other in the same disk block (see figure 2). This common data entity is defined as the 'Cluster Key'. Idenfifying the Cluster Key is the first step to implement Oracle Table Clustering.

The advantage of clustering is twofold. Firstly, by storing the field or fields comprising the Cluster Key once instead of multiple times, storage is saved. More significantly, the fields of the Cluster Key are often indexed, so the high maintenance indexes for those fields are also reduced from containing multiple key entries to a single key entry.

The arguably more significant advantage to Clustering is to expidite join queries. When 2 (or more) tables repeatedly have their rows accessed together via a join query, those tables are good candidates for Clustering. Since the rows of both tables would be stored adjacently in the same disk block if they were clustered, when a query is done that joins these 2 tables about the Cluster Key, the joined rows would be fetched with a single IO operation (see lists of I/O operations).

It should be noted that the clustering of tables in Oracle does not affect the relational model (table schemas). It is a transparent 'backend' data storage technique.

An example of exploiting Oracle Clustering in a join query is found in the 'Clusters' section of Chapter 10 of the Concepts Manual for Oracle8i (online at www.oracle.com). Consider the 2 tables from figure 1, the EMPLOYEE_TABLE and the DEPT_TABLE.

The EMP_Table and the DEPT_Table are associated to each other via the DeptNum, the primary key of the DEPT_Table. When a common query is done to list the current employees at the various departments, a combining (SQL 'join') of the 2 tables is done using a SQL query such as:

   SELECT   EmpNum, EmpName, FName, EmpSSN, HireDate, Salary,
            EMP_Table.DeptNum, DeptName, Location
   FROM     EMP_Table, DEPT_Table
   WHERE    EMP_Table.DeptNum = DEPT_Table.DeptNum
   ORDER BY DEPT_Table.DeptNum;

To implement this query, each row of the combined columns will involve the following IO operations:

           List of I/O Operations for Join Query

 1) Traversing DEPT_Table.DeptNum index, get next DeptNum.
 2) Fetch row from DEPT_Table that corresponds to DeptNum,
    using ROW_ID from DEPT_Table.DeptNum index.
 3) Find each row from EMP_Table that corresponds to DeptNum.
    Unless DeptNum is a secondary index of EMP_Table, finding
    these rows will require a costly sequential search.
 4) Fetch row from EMP_Table that corresponds to DeptNum,

As can be seen from the above IO operations, this query could be costly for a large number of rows. IO operation #3 implies that either the EMP_Table.DeptNum is an indexed secondary key, which is another burden to the maintenance of EMP_Table, or the sequential search for this query will be very costly.

However, if these 2 tables are clustered as shown in figure 2, the contents of these 2 tables will be stored adjacent to each other in the same disk block. Essentially, clustering nests the EMP_Table within the DEPT_Table, whose primary key is also the Cluster Key. The result is that the previously described join about the DeptNum results in the following IO operations:

  List of I/O Operations for Join Query from a Cluster Row

 1) Traversing DEPT_Table.DeptNum (Cluster Key) index,
    get the next DeptNum.
 2) Fetch Cluster Row, consisting of the row from
    DEPT_Table and all the rows from EMP_Table
    that correspond to DeptNum, the Cluster Key.

Obstacles to Using Oracle Clustered Tables

Once a good candidate for Clustering has been identified, there are 2 possible obstacles to its success. The first possible obstacle is that the combined size of all the table rows in a single Cluster Row will exceed the size of a disk block. A single Cluster Row exceeding the size of the disk block would result would result the immediate disk block Chaining of every disk block. Chaining is the linking together of 2 separate disk blocks, which tends to reduce or eliminate the advantages of using Oracle Clustered Tables.

The second possible obstacle is that the average size of a Cluster Row is less than a disk block but still relatively large, and the individual sizes vary a great deal (large variance, or standard deviation). In this case, it is likely that either significant Disk Block Chaining or wasted disk space is likely to occur. Exactly why this is so is explained in the Dynamics of Oracle Clusters.

Dynamics of Oracle Clustered Tables

When creating an Oracle Cluster, an optional SIZE parameter should be used to set the expected size of an average Cluster Row. Oracle divides the available disk block space (in bytes) by this SIZE parameter (in bytes), the resulting quotient determines the pseudo parameter MAXIMUM. MAXIMUM is the maximum number of Cluster Keys, with their corresponding rows, that are allowed within a disk block.

In other words, when a new Cluster Row is Inserted into the database, it is put into the first disk block found in the Free Block List with enough space to store that Cluster Row. Once a Clustering disk block contains the MAXIMUM number of Cluster Keys (and corresonding Cluster Records), that disk block is removed from the Free Block List, regardless of how much space remains on it. Thus, in an indirect manner, the SIZE parameter reserves disk block space for each Cluster Row.

The complicating factor is the UPDATE operation. In a static situation, like possibly a data warehouse, reserving space via the SIZE parameter would not be necessary. Static Cluster Row Entries could simply be put in any block with enough space to hold that Cluster Row. However, in the typical dynamic world of transaction processing, UPDATE operations are expected. Often, an UPDATE operation to a table row results in that row becoming larger. In a Cluster Row Entry, adding an individual table row (via INSERT) also makes that Cluster Row larger. An INSERT of a new individual table row in a Cluster Row is essentially an UPDATE to that Cluster Row. Considering that a Cluster Row might contain numerous individual table rows, it is quite possible that a Cluster Row will significantly increase in size during its lifespan.

If the accumulated increases in size of all the Cluster Rows within a disk block consume all the available remaining space within that disk block, Disk Chaining will occur. As stated before, disk chaining is undesirable since it is likely that the disk block that is being chained to will be on a different disk track, requiring an expensive disk seek operation to access it.

The opposite effect occurs when a number of Cluster Rows remain or become (via UPDATEs or row DELETEs) significantly smaller than expected. Due to the MAXIMUM limit indirectly defined by the SIZE parameter, even if the accumulated decreases in aggregate size result in enough additional free space within the cluster disk block to easily store another Cluster Row, that disk block will not accept any more Cluster Rows. Thus if the group of Cluster Rows within a disk block has a net reduction in the disk space used, that liberated disk space will be wasted.

Refering back to the 2nd obstacle, that a large variance in the size of the Cluster Rows can cause either disk chaining or a large amount of wasted disk space, it may become clearer why the average size of a Cluster Row plays a role. If the average size of a Cluster Row is small, there will be a lot of Cluster Rows within a Cluster disk block. The more entries within a disk block, the more likely increases in the size of some Cluster Rows will be offset by the size decreases in others (and visa-versa). However, if the Cluster Rows are large, and thus have a small MAXIMUM limit, is it less likely that any significant size change will be balanced by another offseting (opposite) size change within that disk block.

Evaluation of Clustered Table Candidates via Statistical Analysis

The questions about what is 'relatively large' and how much the Cluster Rows will vary can be answered by using fundamental Statistical Analysis. Please note that it not required for the reader to deal with the details of this analysis. A supplied PL/SQL procedure (getSigma.sql) and a Perl program (CDFtable.pl) will do the analysis. Currently, getSigma.sql is hard coded to process the provided demo. To adapt getSigma() to another 2 table join, the 2 SELECT queries that define cursors C1 and C2 must be modified. In addition, the disk block size is hard coded to 4096, which also needs to be modified if the disk block size is different.

CDFtable.pl, getSigma.sql, bldTables.sql, bldDept.ctl and bldEmp.ctl are contained in demo.zip. Unzip each of these files into to a work directory on your machine. Please note that CDFtable.pl must have execute permissions.

To load the demo data, enter sqlplus, then:

 SQL> @bldTables   -- This will Create EMP_Table & DEPT_Table
 SQL> quit         -- Exit to shell
From the shell, run the SQL loader:
 > sqlldr userid=youruserID/passwd control=bldDept.ctl log=bldDept.log
 > sqlldr userid=youruserID/passwd control=bldEmp.ctl  log=bldEmp.log

The generic Statistical parameters that getSigma.sql and CDFtable.pl deal with and their meanings are in Useful Statistical Formulas. The specific statistical values we are interested in are:

     Statistical Parameters as Used to Analyze Clustering

 Mean of Cluster Row - Size in bytes of average Cluster Row.

 Mean of Group of Cluster Rows (mu) - Average size in bytes
          of a group of Cluster Rows within a disk block.
          This value will conversely show how much space is
          available for INSERTs and UPDATEs in the average
          Clustered Table disk block.

 Standard Deviation of Group of Cluster Rows (sigma) - 
          In combination with the Group Mean, determines the % of
          Clustered disk blocks that will chain.  Inversely, also
          shows how much disk space will be unused.

How these parameters are used is illustrated in figure 3. As shown in figure 3, a group of n Cluster Rows, where n is MAXIMUM, is allocated to each disk block. The average size in bytes of a disk block group of Cluster Rows is mu. The standard deviation of this group size is sigma.

Referring to figure 4, an illustration of the Bell Curve, the significance of the values of mu and sigma can be seen. In a bell curve distribution, only 2.5% of occurances lie outside of mu + 2*sigma within the area of the 'upper tail' of the curve. In our case, referring back to figure 3, the area within the upper tail represents the % of time when the size of a Cluster Group is greater than (overflows) the available space in the disk block, which results in disk block chaining. (The lower tail simply represents unused disk space, which is already accounted for in our Slack Space calculation (via SIZE/MAXIMUM).)

In figure 3, the Bell Curve crosses the end of the useable available space, a combination of Slack Space and PCTFREE, (see Accuracy of PCTFREE) at about mu + 2*sigma. Expressing this point can also be described as mu + X*sigma, where X = 2. It is the value of this point, or X when expressed as mu + X*sigma, that determines how much disk block chaining will occur. We will refer to this X value later as NumSigmas.

In other words, the % of time disk block chaining occurs is determined by how many sigmas are contained within the Slack Space and PCTFREE space. An initial goal of selecting a SIZE parameter is to allow for enough Slack Space and PCTFREE to contain at least 2*sigma, which will result in disk block chaining 2 1/2% or less of the time.

Obtaining the Statistical Values Using PL/SQL Procedure

The 1st step is to determine the average size (statistical mean) of the proposed Cluster Row Entry. The easy way to do this is to run the getSigma() PL/SQL procedure with a passed parameter of 1 in SQLplus. The getSigma() procedure is currently written to analyze the join of the 2 tables, DEPT_Table and EMP_Table, from our sample. To adapt getSigma to any other 2 table join, modify the 2 SELECT queries that define cursors C1 and C2.

In getSigma(), all the columns that comprise the joined tables must have their sizes accumulated. Thus, to analyze a difference set of joined tables, the column names from the tables to be analyzed must replace the columns from DEPT_Table and EMP_Table. The Cluster Key referrence in the inner FOR loop must also be changed. Replacing the alias variable names empSize and deptSize is optional.

The getSigma() procedure calculates the Cluster Group mean (mu) and sigma in one pass. (To see how this is done, see The Derivation of the Computational form of Standard Deviation.) To execute getSigma(), load getSigma.sql into sqlplus:

>sqlplus userID/password -- Enter SQLplus

SQL> @getSigma.sql       -- Load & compile getSigma() into SQLplus

SQL> call getSigma(1);

# Clusters TotalSize Average
    100      70400     704
 NumGroups AvgGroup Sigma
    100        704   93.51
The meanings of the getSigma() output values are:
# Clusters - Total number of all Cluster Rows processed
Total Size - Total size in bytes of all Cluster Rows processed
Average    - Mean (average) size of Cluster Row
NumGroups  - Total number of all  Clustered Table disk block groups
AvgGroup   - Average size of a Clustered Table disk block group (mu)
Sigma      - Standard Deviation of Clustered disk block group size

Once the average size of the proposed Cluster Row has been determined, an initial guess can be made of how many Cluster Entries (MAXIMUM parameter) will comfortably fit, allowing for Slack Space, in a disk block. This MAXIMUM parameter is determined by:

 a) Determining the Disk Block Size, dBlkSize.
 b) From dBlkSize, Subtract 192 bytes for the
    Disk Block Header. 
 c) Starting with an initial Free Space value of 20%,
    multiply the remaining dBlkSize by 80%.
 d) Divide the result of c by the average size of the
    proposed Cluster Row.
In our test case, our values are:
 a) 4096 bytes
 b) 4096 - 192 = 3904
 c) 3904 * .80 = 3123
 d) 3123 / 704 = 5
With an initial value of MAXIMUM, the Variance or the Standard Deviation of that size of a group of Cluster Entries can be determined by running getSigma() again with the value from from d. From the data in our example DEPT_Table and EMP_Table tables, we get the result:
SQL> call getSigma(5);

# Clusters TotalSize ClusterMean
>  100      70400      704
NumGroups GroupMean Sigma   BlockSize
>  20       3520    214.96  4096
NumSigmas within Block's Available Space
>      1.79
for Predicted disk block Chaining:
Run CDFtable.pl with NumSigma value

How to Interpret the Results

As the last line of output from getSigma() suggests, form the shell run the CDFtable.pl perl program, entering the NumSigmas value:

> CDFtable.pl
Please Enter your NumSigmas floating point value: 1.79
For 1.79 Sigma, the Area within the Upper Tail,
or the Predicted disk block Chaining is: 3.67%

Altho we did not make our target of 5 Cluster Rows yielding a NumSigmas of 2 or more, resulting in 2 1/2% chaining, we were close. Our NumSigmas of 1.79 results in a predicted chaining of 3.67% of the time. The alternatives are to live with these results, or try again with a smaller MAXIMUM value of 4:

SQL> call getSigma(4);
# Clusters TotalSize ClusterMean
>  100      70400      704
NumGroups GroupMean Sigma   BlockSize
>  25       2816    200.5  4096
NumSigmas within Block's Available Space
>      5.43
for Predicted disk block Chaining:
Run CDFtable.pl with NumSigma value

> CDFtable.pl
Please Enter your numSigmas floating point value: 5.43
For 5.43 Sigma, the Area within the Upper Tail,
or the Predicted disk block Chaining is: 0.00%

Accuracy of Clustered Table Calculations

There are 2 significant areas of inaccuracies in these analytical calculations. The first inaccuracy deals with the PCTFREE value.

The PCTFREE space is only used for UPDATES to individual table rows. As stated earlier, INSERTing additional individual rows to a Cluster Row is still technically an INSERT, altho it is a logical UPDATE to that Cluster Row. Thus, unless the PCTFREE value is absolutely perfect in matching individual row UPDATEs verses individual row INSERTs, the actual useable available space in the disk block is likely to be less than the sum of the Slack Space & PCTFREE, and chaining will likely to exceed the predicted value.

The second inaccuracy is the question of how close the Bell Curve actually fits our real world data. It is a common perception that all probability curves are a form of the Bell Curve. This perception is only partially true.

Most probability curves do not fit the Bell Curve. However, by the Central Limit Theorem, if random data is taken in groups, the larger the group, the closer that the probability curve of that group data will fit the Bell Curve.

The Central Limit Theorem Applet demonstrates this principal very clearly with a dice rolling experiment. As can be seen, when as few as 4 dice are rolled on the order of 10,000 times, the output fits the Bell Curve very well.

In our case, the # of dice represent the # of Cluster Rows in disk block group, or our MAXIMUM value. The number of rolls represents the total number of disk block groups. Thus, we can finally answer the question 'How large is relatively large'. THe answer is larger than 1/4 the available disk block space.

Theory Behind getSigma.sql and CDFtable.pl

It is intended that the overall theory of how this statistical analysis is applied Oracle Table Clustering is illustrated in figure 3, and explained in the previous sections. It was pointed out by the Central Limit Theorem, that the Bell Curve can be used here because we are dealing in groups

As previously stated, getSigma() PL/SQL procedure utilizes the Computational Form of the Standard Deviation, which allows the standard deviation to be calculated in one pass. The derivation of this form is in The Derivation of the Computational form of Standard Deviation.

The CDFtable.pl perl program implements the complement of the Cumulative Distribution Function (CDF) for the Normal Distribution (Bell Curve). The CDF function returns how much area is contained under the Bell Curve up to a point given in terms of # of Sigmas, our NumSigmas parameter. CDFtable.pl returns the complement to the CDF function, or the area under the upper tail given a point expressed in terms of # of Sigmas. As was hopefully explained, this upper tail area, expressed as a % of the total area, represents the % of time that disk block chaining can be expected.

The math behind the CDF function involves integrating the function defining the Normal Distribution from minus infinity to the point idenfied by the # of sigmas parameter. Consequently, the CDFtable.pl program is simply table driven. Any value larger than 3.4 sigmas is simply returned as zero %.

Oracle8i Nested Tables Feature

The Nested Tables feature of Oracle8i is not a replacement of Oracle Table Clustering. Nested Tables appear to accomplish the same thing as Table Clustering without all the analysis. It is true that analysis is not needed, but that is because the nested table is still stored separately as it is in a conventional normalized relational model.

It is the opinion of this author that the Nested Table feature exists because it may be important to a certain market segment. But since the tables are still stored separately and must be dynamically joined to present their nested view, it is hard to see what the point of it is other than to make a straight forward join complex and mysterious.


Copyright (C) 2000 by Cary Rhode,
Software Developer and self proclaimed database guru. caryrhode@yahoo.com