SQL Server Performance - Common Symptoms and Tools (Part II)
By Microsoft Team
Published: 11/12/2003
Reader Level: Beginner Intermediate
Rated: 4.00 by 3 member(s).
Tell a Friend
Rate this Article
Printable Version
Discuss in the Forums

Customers are normally good at identifying processor and memory bottlenecks, but they rarely start a support call with a statement like, "My disk subsystem is a bottleneck." Therefore, identifying and resolving disk (I/O) subsystem bottlenecks are difficult cases. The cases usually start with, "The overall performance has degraded over time", "It is just slow", or "The workload has increased and everything is slow." The customer can't pinpoint one query or workload that is causing the slow performance.

There are three areas of data collection to troubleshoot disk I/O bottlenecks. The first is System Monitor's Physical Disk object. System Monitor should capture all counters and all instances prior, during, and after the slow performance periods, if possible. (Please note that the Physical Disk counters are not enabled by default on Windows NT. But on Windows 2000, Windows XP, and Windows 2003 servers they are enabled by default. For enabling disk counters on NT systems, run "diskperf –y" from a command prompt and restart the server box for it to take affect.) Secondly, the database file layout—what drives are the .mdf, .ndf, and .ldf files residing on for each database? Thirdly, what is the configuration of each drive? This includes what type of raid configuration and how many underlying spindles (disk drives) make up the raid configuration. Information about the number and type of disk controllers can also be helpful.

Disk Layout Best Practices

As a rule of thumb, there are two guidelines for laying your disks:

    1. Database files that are accessed randomly. Typically data files (.mdf and .ndf) service random I/O. SQL Server does not know what query will be requested next or where on the disk platter the data resides. This causes random I/O for which the disk platter will spin to find data.
    2. Database files that are accessed sequentially. Transaction log (.ldf) activity is sequential. Sequential I/O's access data sectors that are adjacent to each other and have a faster seek time, and therefore have a higher throughput than random I/O.

If the database files are placed on separate disks, another distinct advantage is that it will benefit from parallel I/O. The data files that are accessed via random I/O take priority over sequential I/O.

When the data files and transaction log files of a database are on the same physical disk, the transaction log writes (sequential I/O) should wait till the random I/O is done, which reduces the throughput of the transaction logging. The recommended configuration for good performance is to separate the log files and data files onto separate dedicated disks.

On a side note, it is also not recommended to have transaction log files on RAID 5 because this slows down logging performance. Depending on your business reasons you can choose RAID 5 or any other RAID for the data files. But for log files, RAID 10 is always recommended.

Disk drive performance can be broken down into four areas:

    1. Seek time: The time required to move to the track that holds the data.
    2. Rotational latency: The time required for that platter to rotate under the head.
    3. Time required to transfer data from the disk drive to the disk controller.
    4. Time for device drivers in the operating system, which is usually negligible.

The sum of these components is the time an I/O takes.

Disk drives have physical limitations on how many I/Os a disk is capable of handling. Depending on the workload, this limit can be reached. When the number of I/O requests exceeds the disks I/O capacity, the I/O requests will take longer and be queued for its turn on the disk. Because of this, spreading out data files across multiple physical drives allows for parallel I/O. Adding more physical disks and spreading the data files among them can resolve a lot of disk subsystem bottlenecks. As a general rule, a disk drive can handle about 125 random I/O's per second and 225 sequential I/O's per second. These numbers are general guidelines and can differ based on the hardware. See the disk drives specifications to determine actual numbers. Also note that a disk drive's performance degrades when it is at 85 percent of its capacity.

Suppose from the disk drive specifications you find out that the average seek time is 3.9 ms. The average rotational latency is 2 ms. Then the theoretical disk I/O capacity is (3.9ms + 2ms) / 1000 = 169 I/Os per second. The theoretical limit is almost never reached, so take 75 percent of the theoretical limit (169 * .75) = 127 I/O's per second. This gives you a disk I/O capacity to work with.

Investigating the System Monitor

In the System Monitor there are five Physical Disk counters for each physical disk contained in the Logical Drive that are key to identifying I/O bottleneck. Each disk should be investigated individually.

    1. Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.
    2. Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
    3. Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.
    4. Disk Reads/sec is the rate of read operations on the disk.
    5. Disk Writes/sec is the rate of write operations on the disk.

First, on the System Monitor properties window, slide the time interval to only include the period of "slow performance." Then calculate the I/Os per disk as follows.

Suppose you have a RAID 5 configuration with four physical disks, and you had the following information from the System Monitor.

Disk Reads/sec            120
Disk Writes/sec           150
Avg. Disk Queue Length     12
Avg. Disk Sec/Read       .035
Avg. Disk Sec/Write      .045

Raid Calculations:

Raid 0 -- I/Os per disk = (reads + writes) / number of disks
Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

You can use the above RAID calculations to determine the I/Os per disk. 120 reads + (4 * 150 writes) / 4 physical disks = 180 I/Os per disk. This is over the general rule of 125 I/Os per disk capacity. This system has a disk bottleneck.

You can determine that with 720 total I/Os, six physical disks would be required to resolve the disk bottleneck. (720 / 125= 5.76). Adding even more disks will provide room for growth. You should calculate for peak workloads and future growth.

The Average Disk queue length is the total for all physical disks in the RAID configuration. In the example above, 12 for avg. disk queue length / 4 physical disk = 3 queued I/Os per disk. Anything over 2 is reason to investigate further.

Next look at average disk per seconds. In general, Average Disk Sec/Read of 11 ms -15 ms or lower is good. Average Disk Sec/Write of 12 ms or lower are good. Anything above this number is reason to investigate further.

Additional Comments

You should not rely on one counter to determine a bottleneck. Look for multiple counters to confirm your analysis.

See if SQL Server is doing a lot of I/O. Tools like SQL Profiler and Blocker Script output can tell you if the query(s) are resulting in these large I/Os, and tuning or optimizing these individual queries from a statistics and indexes point of view may help from the disk I/O perspective. An improper index strategy can add to a disk subsystem workload. For example, instead of doing an Index Seek, the SQL Server Optimizer may choose to do a Table/Index Scan, resulting in a lot of unnecessary I/O.

If the disk subsystem is a bottleneck, use the Index Tuning Wizard to go through a proper workload of the production load (captured in the form of a trace file) and see if it recommends any additional indexes.

You have seen that adding appropriate indexes reduces load on disks, thus reducing the disk I/O. If the proper indexes do exist, adding more spindles to the RAID configuration may also help.

Your disk subsystem should be able to handle your peak workload. Suppose that the slow performance is for thirty minutes each day; the thirty-minute interval should be investigated. If you use System Monitor counters for the entire day, the averages will show that there is no disk bottleneck.

© 2003 Microsoft



Marketplace
(Sponsored Links)
What are the green links?
   



 
Copyright 2007 CMP Tech LLC | Hosted By SecureWebs.com
Privacy Policy (4/10/06) | Your California Privacy Rights (4/10/06) | Terms of Service | Advertising Info | About Us | Help