Exploring Table and Index Partitioning in SQL Server 2005
By Rob Garrison
Published: 12/10/2004
Reader Level: Beginner Intermediate
Rated: 3.17 by 6 member(s).
Tell a Friend
Rate this Article
Printable Version
Discuss in the Forums

Sample code can be downloaded here.

Introduction

Table partitioning is a powerful new feature in SQL Server 2005.  It is primarily a manageability feature, but my interest here is in the performance and scalability of table and index partitioning for use with very large tables.

When I started looking for table partitioning resources, Books Online seemed to be the only resource, period.  It covers the basics pretty well (see the References section for topics).  Then Kimberly Tripp published her excellent white paper.  I definitely recommend reading it.  If you like to understand the theory before trying out a new feature, read Kimberly's paper first, then use the example code here to expand your understanding of partitioning.  If you want to get your feet wet right away, read this and run the sample code first, then follow up by reading Kimberly's paper.  I will not cover the same things here as Kimberly did but will give step-by-step examples and test code, as well as compare the performance of different partitioning schemes.

Enough of that, let's get right to the code.  All of these examples use a simple work database.

USE master
GO

CREATE DATABASE [TestPart] ON PRIMARY (
    NAME       = N'TestPartPrimary',
    FILENAME   = N'C:\TestPart_Primary.MDF',
    SIZE       = 20,
    FILEGROWTH = 10%
)
LOG ON (
    NAME       = N'TestPart_Log',
    FILENAME   = N'C:\TestPart_Log.LDF',
    SIZE       = 2,
    FILEGROWTH = 10%
)
GO

USE [TestPart]
GO
 
Create a Simple Partitioned Table and Index

This table uses a bit column for the partition key and will create two partitions.

Create a Partition Function and Partition Scheme

CREATE PARTITION FUNCTION PFN_Partition2(bit)
AS RANGE RIGHT FOR VALUES (
    1 -- Partition 1 gets records where int <  1 (0)
      -- Partition 2 gets records where int >= 1 (1)
)
GO

CREATE PARTITION SCHEME PS_Partition2 AS PARTITION PFN_Partition2
ALL TO ([PRIMARY])
GO

After executing the above script, you should see this message:

Partition scheme 'PS_Partition2' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'PS_Partition2'.

Create a Partitioned Table and Index

CREATE TABLE dbo.TblPartBit (
    Col1     varchar(32)  NOT NULL,
    PartKey  bit          NOT NULL
) ON [PS_Partition2] (PartKey)
GO

CREATE NONCLUSTERED INDEX TblPartBit_CustomerID_IDX
ON dbo.TblPartBit (Col1, PartKey)
ON PS_Partition2 (PartKey)
GO

Done.  But how do you know that it's partitioned?  Look at the table properties screen.  Right-click on the table in the Object Explorer and select Properties.

partitioned table properties

When a table is partitioned, the partition scheme is listed after "Partition Scheme" and "Table Is Partitioned" is True.

For the index, right-click on the index in the Object Explorer, select Properties, then click on Storage in the left pane.

partitioned index properties

Create Customer/Order/Order-Item Partitioned Tables and Indexes

Let's move on to more-involved examples.  We will compare partitioned tables that have the same columns and indexes but different partitioning schemes.  The questions here are:

  1. If partitioning is good, how does it affect performance?
  2. What affect does the number of partitions have on performance?

These tables implement a standard customer/order/order-item database. The examples assume that the database is OLTP and heavily favors queries by customer.  The schema uses a "PartKey" column that is a persisted computed column based on a mod of the integer CustomerID.  This is a type of hash partitioning.  This allows stored procedures to take a CustomerID as input and direct the query to just the partition on which that customer will reside.  Understand that this is not pure, transparent partitioning.  It is, however, what I see as the most powerful use of table partitioning in an OLTP environment.  It allows even distribution of records within the tables and indexes.

partitioned tables e/r diagram

To understand the benefits here, take the largest example: one million customers, an average of ten orders per customer (ten million orders), and two items per order (20 million items).  Using a 100-partition table and index scheme, each order partition will have on average 1/100th of the total, or 100,000 records.  Note that 1,000 is the maximum number of partitions allowed.

One important caveat here is that to partition a unique index, the index must include the partition key.  In this example, creating a unique index on the combination of CustomerName and PartKey is not at all the same as creating a unique index on CustomerName alone.  So in these examples, the unique index is not partitioned.

Example code below is for ten-way partitions only.  The full code to create and populate all of these tables is in Code.zip.

CREATE PARTITION FUNCTION PFN_Partition10(int)
AS RANGE RIGHT FOR VALUES (
    1, -- Partition 1  holds records where int <  1             (0)
    2, -- Partition 2  holds records where int >= 1 and int < 2 (1)
    3, -- Partition 3  holds records where int >= 2 and int < 3 (2)
    4, -- Partition 4  holds records where int >= 3 and int < 4 (3)
    5, -- Partition 5  holds records where int >= 4 and int < 5 (4)
    6, -- Partition 6  holds records where int >= 5 and int < 6 (5)
    7, -- Partition 7  holds records where int >= 6 and int < 7 (6)
    8, -- Partition 8  holds records where int >= 7 and int < 8 (7)
    9  -- Partition 9  holds records where int >= 8 and int < 9 (8)
       -- Partition 10 holds records where int >= 9             (9)
)
GO

CREATE PARTITION SCHEME PS_Partition10 AS
PARTITION PFN_Partition10
ALL TO ([PRIMARY])
GO

CREATE TABLE dbo.TblCustomer10 (
    CustomerID       int               NOT NULL,
    CustomerName     varchar(80)       NOT NULL,
    PartKey AS (CustomerID % 10) PERSISTED,
    DateCreated      datetime          NOT NULL,

    CONSTRAINT TblCustomer10_PK PRIMARY KEY CLUSTERED (CustomerID, PartKey)
) ON [PS_Partition10] (PartKey)

CREATE UNIQUE NONCLUSTERED INDEX TblCustomer10_CustomerName_UIDX
ON dbo.TblCustomer10 (CustomerName)
WITH FILLFACTOR = 80
ON [PRIMARY]                          -- Note that this index is not partitioned

CREATE TABLE dbo.TblOrder10 (
    OrderID          bigint            NOT NULL   IDENTITY(1,1),
    PartKey          int               NOT NULL,
    CustomerID       int               NOT NULL,
    OrderGUID        uniqueidentifier  NOT NULL,
    OrderDate        datetime          NOT NULL,

    CONSTRAINT TblOrder10_PK PRIMARY KEY CLUSTERED (OrderID, PartKey),
    CONSTRAINT TblOrder10_CustomerID_FK FOREIGN KEY (CustomerID, PartKey)
        REFERENCES dbo.TblCustomer10 (CustomerID, PartKey)
) ON [PS_Partition10] (PartKey)

CREATE NONCLUSTERED INDEX TblOrder10_CustomerID_OrderID_IDX
ON dbo.TblOrder10 (CustomerID, PartKey)
WITH FILLFACTOR = 80
ON [PS_Partition10] (PartKey)

CREATE TABLE dbo.TblOrderItem10 (
    OrderItemID      bigint            NOT NULL   IDENTITY(1,1),
    PartKey          int               NOT NULL,
    OrderID          bigint            NOT NULL,
    ProductID        int               NOT NULL,

    CONSTRAINT TblOrderItem10_PK PRIMARY KEY CLUSTERED (OrderItemID, PartKey),
    CONSTRAINT TblOrderItem10_CustomerID_FK FOREIGN KEY (OrderID, PartKey)
        REFERENCES dbo.TblOrder10 (OrderID, PartKey)
) ON [PS_Partition10] (PartKey)

CREATE NONCLUSTERED INDEX TblOrderItem10_OrderID_IDX
ON dbo.TblOrderItem10 (OrderID, PartKey)
WITH FILLFACTOR = 80
ON [PS_Partition10] (PartKey)
GO

Performance

Read Performance

What is the read-performance benefit or penalty of table and index partitioning?  Well, let's just say we're hoping to see better results from subsequent betas and the final RTM product.  The execution plan statistics look great.  The actual scripted read tests show partitioned tables to be significantly slower than non-partitioned tables.

Read Performance - Execution Plan Statistics

In each test, there were ten orders per customer and two items per order.  Orders were placed against random customers, so some customers have no orders and some have several.

The percentages here are out of 100% using the "Include Actual Execution Plan" statistics from SQL Server Management Studio.  All tests were run on a single-processor desktop with a 2.66 GHz P4, 1 GB RAM, and a single-spindle 80 GB IDE drive running Windows 2003 Server and SQL Server 2005 Beta 2.  Your mileage may vary.

Query Partition
Count
Customers (x 1000)
1 10 100 1000
1 1 93% 93% 93% 93%
10 2% 1% 1% 2%
100 6% 5% 5% 5%
2* 1 42% 36% 30% 23%
10 26% 28% 24% 38%
100 31% 35% 46% 39%
3 1 89% 90% 90% 93%
10 4% 3% 3% 2%
100 7% 7% 7% 5%

  * Query 2 uses the non-partitioned index.

Read Performance - Scripted Random Reads

Each test performs 1,000 queries against random customers.  The query is similar to Query 1 above but returns just the CustomerName to a local variable so that there is no result set.

SELECT @customerName = c.CustomerName
FROM dbo.TblCustomer10  AS c
JOIN dbo.TblOrder10     AS o
ON    c.CustomerID = o.CustomerID
  AND c.PartKey    = o.PartKey
JOIN dbo.TblOrderItem10 AS i
ON    o.OrderID = i.OrderID
  AND o.PartKey = i.PartKey
WHERE c.CustomerID = @customerID
  AND c.PartKey    = @partKey

Partition
Count
Customers (x 1000)
1 10 100 1000
Time Factor Time Factor Time Factor Time Factor
1 141 ms baseline 160 ms baseline 168 ms baseline 65.3 sec baseline
10 445 ms 3.2x 1981 ms 12.5x 16067 ms 95.6x 226.3 sec 3.6x
100 3192 ms 22.7x 4520 ms 28.3x 21821 ms 129.9x 230.9 sec 3.5x

Write Performance

What is the write-performance penalty of table and index partitioning?

Each test writes 1,000 records to each of the three tables used above.  Recovery model is set to "Full".  The tests were run four times and results were averaged.

Partition
Count
System with
No Load
System with
Heavy Load
Time Factor Time Factor
1 1.40 sec baseline 44.99 sec baseline
10 2.32 sec 1.7x 52.19 sec 1.2x
100 2.61 sec 1.9x 60.47 sec 1.3x
1000 111.44 sec 79.5x 472.95 sec 10.5x

Conclusion

If your tables are relatively small, you likely haven't even read this far.  For large tables, partitioning of tables and indexes in the final version of SQL Server 2005 may be a win.

For large tables, consider between 10 and 100 partitions.  Avoid using more than 100 partitions unless given specific guidance.

As always, experiment and test.  Then wait for the next version of the beta and experiment and test again.  The optimal solution will depend greatly on your access patterns, your hardware, and your configuration.  It will be very interesting to see how these performance numbers change as SQL Server 2005 gets closer to RTM.

Left as an Exercise for the Reader ...

There is much more to partitioning than is covered here.  All these examples use the PRIMARY filegroup.  In a real implementation, you would want to use multiple filegroups for manageability and possibly performance.  Kimberly's paper has examples with multiple filegroups.  Also, a real implementation would have much more powerful hardware than I used here.  This could greatly affect the performance statistics.

References
  • "SQL Server 2005 Partitioned Tables and Indexes" by Kimberly L. Tripp
  • SQL Server 2005 Books Online
    • "Partition Functions"
    • "Partition Schemes"
    • "Partitioned Tables"
    • "Partitioned Tables and Indexes"
    • "Partitioned Indexes"
    • "Taking Advantage of Multiple Disk Drives"
    • $PARTITION
    • sys.data_spaces
    • sys.destination_data_spaces
    • sys.partition_*
    • sys.index_columns.partition_ordinal
    • sys.partitions


Marketplace
(Sponsored Links)
What are the green links?
   



 
Copyright 2007 CMP Tech LLC | Hosted By SecureWebs.com
Privacy Policy (4/10/06) | Your California Privacy Rights (4/10/06) | Terms of Service | Advertising Info | About Us | Help