Using Ranking and Windowing Functions in SQL Server 2005
By Stephen Forte
Published: 3/24/2005
Reader Level: Beginner Intermediate
Rated: 4.67 by 6 member(s).
Tell a Friend
Rate this Article
Printable Version
Discuss in the Forums

SQL Server 2005 is chock full of new features. You may have heard that you can write stored procedures in a CLR language like C# or VB .NET, but TSQL is here to stay. One new great feature adds the functionality of a Ranking expression that can be added to your result set that is based on a ranking algorithm being applied to a column that you specify. This will come in handy in .NET applications for paging and sorting in a grid as well as many other scenarios. We are going to take a look at the Ranking functions new to SQL Server 2005 using the new AdventureWorks database on the February 2005 Community Tech Preview.

ROW_NUMBER()

The most basic ranking function is ROW_NUMBER(). ROW_NUMBER() returns a column as an expression that contains the row’s number within the result set. This is only a number used in the context of the result set, if the result changes, the ROW_NUMBER() will change. The ROW_NUMBER() expression takes an ORDER BY statement with the column you want to use for the row count with an OVER operator as shown here:

Select
  SalesOrderID, CustomerID, Row_Number() Over (Order By SalesOrderID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  SalesOrderID

Results are shown here:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43659        676         1
43660        117         2
43661        442         3
43662        227         4
43663        510         5
43664        397         6
43665        146         7
43666        511         8
43667        646         9
...More

Alternatively if you have an ORDER BY clause in your result set different than your ORDER BY in your ROW_NUMBER() expression

--Row_Number using a unique value, different order by
Select
  SalesOrderID, CustomerID, Row_Number() Over (Order By SalesOrderID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  CustomerID --Different ORDER BY than in Row_NUMBER

The result is shown here:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43860        1           202
44501        1           843
45283        1           1625
46042        1           2384
46976        2           3318
47997        2           4339
49054        2           5396
...More

If you choose the ROW_NUMBER() function to run against a non-unique column, it will break the tie and still produce a running count so no rows will have the same number. For example, CUSTOMERID can repeat in this example and there will be several ties, SQL Server will just produce a monotonically increasing number, which means nothing other than the number in the result set as shown here:

Select
  SalesOrderID, CustomerID, Row_Number() Over (Order By CustomerID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  CustomerID

The result are shown here:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43860        1           1
44501        1           2
45283        1           3
46042        1           4
46976        2           5
47997        2           6
49054        2           7
50216        2           8
51728        2           9
57044        2           10
63198        2           11
69488        2           12
44124        3           13
. . . More

You can also use RANK as an expression in a WHERE clause. SQL Server returns an error when you try to refer to a Ranking function in a WHERE clause. You can still accomplish this by using a Common Table Expression (a new feature, a non-persisted view).

--use a common table expression if you want
--to filter by one of the rows that contain a
--ranking function since ranking functions
--are not allowed in where or having clauses

With NumberRows
As
(
  Select
    SalesOrderID, CustomerID, Row_Number() Over (Order By SalesOrderID) as RowNumber
  From
    Sales.SalesOrderHeader
)
Select
  *
From
  NumberRows
Where
  RowNumber Between 100 And 200
Order By
  SalesOrderID

The result is shown here:

SalesOrderID CustomerID  RowNumber
------------ ----------- --------------------
43758        27646       100
43759        13257       101
43760        16352       102
43761        16493       103
43762        27578       104
43763        16525       105
43764        16612       106
43765        11010       107
43766        16518       108
43767        11001       109
43768        27649       110
43769        21659       111
43770        27614       112
…More

You also have to use a Common Table Expression (CTE) if you want to rank by a GROUP BY aggregate. In this case we will use a SUM() in a CTE and then apply the ROW_NUMBER() after the aggregation.

--rank by totaldue, summed
--need a CTE to do a sum
--so this example will have a
--customerID summed with all of
--their orders

With CustomerSum
As
(
  Select
    CustomerID, Sum(totaldue) As totalamt
  From
    Sales.SalesOrderHeader
  Group By
    CustomerID

--this appends a row_number to the end of the resultset
Select
  *, Row_Number() Over (Order By totalamt Desc) as RowNumber
From
  CustomerSum

The results look like this:

CustomerID  totalamt              RowNumber
----------- --------------------- --------------------
678         1179857.4657          1
697         1179475.8399          2
170         1134747.4413          3
328         1084439.0265          4
514         1074154.3035          5
155         1045197.0498          6
72          1005539.7181          7
227         984324.0473           8
433         983871.933            9
166         979881.3491           10
146         964134.7777           11
670         946105.7121           12
506         937466.3027           13
...More

RANK()

RANK() works a lot like ROW_NUMBER() except that it will not break ties, you will not get a unique value for ties.

Select
  SalesOrderID, CustomerID, RANK() Over (Order By CustomerID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  CustomerID

The results look like this:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43860        1           1
44501        1           1
45283        1           1
46042        1           1
46976        2           5
47997        2           5
49054        2           5
50216        2           5
51728        2           5
57044        2           5
63198        2           5
69488        2           5
44124        3           13
. . . More

The next example will round the TOTALDUE field to the nearest hundredth and uses a CTE to RANK() over that derived field.

--rank by totaldue, summed and rounded (nearest 100)
--need a CTE to do a sum & rounding
--so this example will have a
--customerID summed with all of
--their orders

With CustomerSum
As
(
  Select
    CustomerID, Round(Convert(Int, Sum(totaldue))/100,8) *100 As totalamt
  From
    Sales.SalesOrderHeader
  Group By
    CustomerID
)
Select
  *, Rank() Over (Order By totalamt Desc) as Rank
From
  CustomerSum

The results look like this:

CustomerID  totalamt    Rank
----------- ----------- --------------------
678         1179800     1
697         1179400     2
170         1134700     3
328         1084400     4
514         1074100     5
155         1045100     6
72          1005500     7
227         984300      8
433         983800      9
166         979800      10
146         964100      11
670         946100      12
...More

DENSE_RANK() and NTILE(n)

DENSE_RANK() works exactly like RANK() but will remove the skipping of numbers in the tie.

Select
  SalesOrderID, CustomerID, DENSE_RANK() Over (Order By CustomerID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID>10000
Order By
  CustomerID

The results look like this:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43860        1           1
44501        1           1
45283        1           1
46042        1           1
46976        2           2
47997        2           2
49054        2           2
50216        2           2
51728        2           2
57044        2           2
63198        2           2
69488        2           2
44124        3           3
. . . More

NTile(n) will evenly divide all the results into approximately even pieces and assign each piece by the same number in the result set. A perfect example is the percent of 100 (for an examination in University for example, or a percentile of runners in a road race).

Select
  SalesOrderID, CustomerID, NTILE(10000) Over (Order By CustomerID) as RunningCount
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  CustomerID

The results look like this:

SalesOrderID CustomerID  RunningCount
------------ ----------- --------------------
43860        1           1
44501        1           1
45283        1           1
46042        1           1
46976        2           2
47997        2           2
49054        2           2
50216        2           2
51728        2           3
57044        2           3
63198        2           3
69488        2           3
44124        3           4
. . . More

One last example will bring these all together in one TSQL statement and show the difference between all four ranking functions.

--Ranking All
Use AdventureWorks
Select
  SalesOrderID as OrderID,
  CustomerID,
  Row_Number() Over (Order By CustomerID) as RowNum,
  RANK() Over (Order By CustomerID) as Rank,
  DENSE_RANK() Over (Order By CustomerID) as DRank,
  NTILE(10000) Over (Order By CustomerID) as NTile
From
  Sales.SalesOrderHeader
Where
  SalesOrderID > 10000
Order By
  CustomerID

The results look like this:

OrderID     CustomerID  RowNum  Rank   DRank  NTile
----------- ----------- ------- ------ ------ --------
43860       1           1       1      1      1
44501       1           2       1      1      1
45283       1           3       1      1      1
46042       1           4       1      1      1
46976       2           5       5      2      2
47997       2           6       5      2      2
49054       2           7       5      2      2
50216       2           8       5      2      2
51728       2           9       5      2      3
57044       2           10      5      2      3
63198       2           11      5      2      3
69488       2           12      5      2      3
44124       3           13      13     3      4
44791       3           13      13     3      4
...More

PARTITION BY

The ranking functions can also combine with windowing functions. A windowing function will divide a resultset into equal partitions based on the values of your PARTITION BY statement in conjunction with the OVER clause in your ranking function. It is like applying a GROUP BY to your ranking function-you get a separate ranking for each partition. The example below uses ROW_NUMBER with PARTITION BY to count the number of orders by order date by salesperson. We do this with a PARTITION BY SalesPersonID OVER OrderDate. This can be used with any of the four ranking functions.

Select
  SalesOrderID, SalesPersonID, OrderDate, Row_NUMBER() Over (Partition By SalesPersonID Order By OrderDate) as OrderRank
From
  Sales.SalesOrderHeader
Where
  SalesPersonID is not null

The results look like this:

SalesOrderID SalesPersonID OrderDate                OrderRank
------------ ------------- ------------------------ ---------
43659        279           2001-07-01 00:00:00.000  1
43660        279           2001-07-01 00:00:00.000  2
43681        279           2001-07-01 00:00:00.000  3
43684        279           2001-07-01 00:00:00.000  4
43685        279           2001-07-01 00:00:00.000  5
43694        279           2001-07-01 00:00:00.000  6
43695        279           2001-07-01 00:00:00.000  7
43696        279           2001-07-01 00:00:00.000  8
43845        279           2001-08-01 00:00:00.000  9
43861        279           2001-08-01 00:00:00.000  10
. . . More
48079        287           2002-11-01 00:00:00.000  1
48064        287           2002-11-01 00:00:00.000  2
48057        287           2002-11-01 00:00:00.000  3
47998        287           2002-11-01 00:00:00.000  4
48001        287           2002-11-01 00:00:00.000  5
48014        287           2002-11-01 00:00:00.000  6
47982        287           2002-11-01 00:00:00.000  7
47992        287           2002-11-01 00:00:00.000  8
48390        287           2002-12-01 00:00:00.000  9
48308        287           2002-12-01 00:00:00.000  10
. . . More

PARTITION BY supports other SQL Server aggregate functions including MIN and MAX.



Marketplace
(Sponsored Links)
What are the green links?
   



 
Copyright 2007 CMP Tech LLC | Hosted By SecureWebs.com
Privacy Policy (4/10/06) | Your California Privacy Rights (4/10/06) | Terms of Service | Advertising Info | About Us | Help