SQL Server 2005 is chock full of new features. You may have heard that you can
write stored procedures in a CLR language like C# or VB .NET, but TSQL is here
to stay. One new great feature adds the functionality of a Ranking expression
that can be added to your result set that is based on a ranking algorithm being
applied to a column that you specify. This will come in handy in .NET
applications for paging and sorting in a grid as well as many other scenarios.
We are going to take a look at the Ranking functions new to SQL Server 2005
using the new AdventureWorks database on the February 2005 Community Tech
Preview.
ROW_NUMBER()
The most basic ranking function is ROW_NUMBER(). ROW_NUMBER() returns a column
as an expression that contains the row’s number within the result set. This is
only a number used in the context of the result set, if the result changes, the
ROW_NUMBER() will change. The ROW_NUMBER() expression takes an ORDER BY
statement with the column you want to use for the row count with an OVER
operator as shown here:
Select
SalesOrderID, CustomerID, Row_Number() Over (Order By SalesOrderID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
SalesOrderID
Results are shown here:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43659 676
1
43660 117
2
43661 442
3
43662 227
4
43663 510
5
43664 397
6
43665 146
7
43666 511
8
43667 646
9
...More
Alternatively if you have an ORDER BY clause in your result set different than
your ORDER BY in your ROW_NUMBER() expression
--Row_Number using a unique value, different order
by
Select
SalesOrderID, CustomerID, Row_Number() Over (Order By SalesOrderID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
CustomerID --Different ORDER BY than in Row_NUMBER
The result is shown here:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43860 1
202
44501 1
843
45283 1
1625
46042 1
2384
46976 2
3318
47997 2
4339
49054 2
5396
...More
If you choose the ROW_NUMBER() function to run against a non-unique column, it
will break the tie and still produce a running count so no rows will have the
same number. For example, CUSTOMERID can repeat in this example and there will
be several ties, SQL Server will just produce a monotonically increasing
number, which means nothing other than the number in the result set as shown
here:
Select
SalesOrderID, CustomerID, Row_Number() Over (Order By CustomerID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
CustomerID
The result are shown here:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43860 1
1
44501 1
2
45283 1
3
46042 1
4
46976 2
5
47997 2
6
49054 2
7
50216 2
8
51728 2
9
57044 2
10
63198 2
11
69488 2
12
44124 3
13
. . . More
You can also use RANK as an expression in a WHERE clause. SQL Server returns an
error when you try to refer to a Ranking function in a WHERE clause. You can
still accomplish this by using a Common Table Expression (a new feature, a
non-persisted view).
--use a common table expression if you want
--to filter by one of the rows that contain a
--ranking function since ranking functions
--are not allowed in where or having clauses
With NumberRows
As
(
Select
SalesOrderID, CustomerID, Row_Number() Over (Order By
SalesOrderID) as RowNumber
From
Sales.SalesOrderHeader
)
Select
*
From
NumberRows
Where
RowNumber Between 100 And 200
Order By
SalesOrderID
The result is shown here:
SalesOrderID CustomerID RowNumber
------------ ----------- --------------------
43758 27646 100
43759 13257 101
43760 16352 102
43761 16493 103
43762 27578 104
43763 16525 105
43764 16612 106
43765 11010 107
43766 16518 108
43767 11001 109
43768 27649 110
43769 21659 111
43770 27614 112
…More
You also have to use a Common Table Expression (CTE) if you want to rank by a
GROUP BY aggregate. In this case we will use a SUM() in a CTE and then apply
the ROW_NUMBER() after the aggregation.
--rank by totaldue, summed
--need a CTE to do a sum
--so this example will have a
--customerID summed with all of
--their orders
With CustomerSum
As
(
Select
CustomerID, Sum(totaldue) As totalamt
From
Sales.SalesOrderHeader
Group By
CustomerID
)
--this appends a row_number to the end of the resultset
Select
*, Row_Number() Over (Order By totalamt Desc) as RowNumber
From
CustomerSum
The results look like this:
CustomerID
totalamt
RowNumber
----------- --------------------- --------------------
678 1179857.4657
1
697 1179475.8399
2
170 1134747.4413
3
328 1084439.0265
4
514 1074154.3035
5
155 1045197.0498
6
72 1005539.7181
7
227 984324.0473
8
433 983871.933
9
166 979881.3491
10
146 964134.7777
11
670 946105.7121
12
506 937466.3027
13
...More
RANK()
RANK() works a lot like ROW_NUMBER() except that it will not break ties, you
will not get a unique value for ties.
Select
SalesOrderID, CustomerID, RANK() Over (Order By CustomerID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
CustomerID
The results look like this:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43860 1
1
44501 1
1
45283 1
1
46042 1
1
46976 2
5
47997 2
5
49054 2
5
50216 2
5
51728 2
5
57044 2
5
63198 2
5
69488 2
5
44124 3
13
. . . More
The next example will round the TOTALDUE field to the nearest hundredth and
uses a CTE to RANK() over that derived field.
--rank by totaldue, summed and rounded (nearest
100)
--need a CTE to do a sum & rounding
--so this example will have a
--customerID summed with all of
--their orders
With CustomerSum
As
(
Select
CustomerID, Round(Convert(Int, Sum(totaldue))/100,8) *100 As
totalamt
From
Sales.SalesOrderHeader
Group By
CustomerID
)
Select
*, Rank() Over (Order By totalamt Desc) as Rank
From
CustomerSum
The results look like this:
CustomerID totalamt Rank
----------- ----------- --------------------
678
1179800 1
697
1179400 2
170
1134700 3
328
1084400 4
514
1074100 5
155
1045100 6
72
1005500 7
227
984300 8
433
983800 9
166
979800 10
146
964100 11
670
946100 12
...More
DENSE_RANK() and NTILE(n)
DENSE_RANK() works exactly like RANK() but will remove the skipping of numbers
in the tie.
Select
SalesOrderID, CustomerID, DENSE_RANK() Over (Order By CustomerID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID>10000
Order By
CustomerID
The results look like this:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43860 1
1
44501 1
1
45283 1
1
46042 1
1
46976 2
2
47997 2
2
49054 2
2
50216 2
2
51728 2
2
57044 2
2
63198 2
2
69488 2
2
44124 3
3
. . . More
NTile(n) will evenly divide all the results into approximately even pieces and
assign each piece by the same number in the result set. A perfect example is
the percent of 100 (for an examination in University for example, or a
percentile of runners in a road race).
Select
SalesOrderID, CustomerID, NTILE(10000) Over (Order By CustomerID) as
RunningCount
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
CustomerID
The results look like this:
SalesOrderID CustomerID RunningCount
------------ ----------- --------------------
43860 1
1
44501 1
1
45283 1
1
46042 1
1
46976 2
2
47997 2
2
49054 2
2
50216 2
2
51728 2
3
57044 2
3
63198 2
3
69488 2
3
44124 3
4
. . . More
One last example will bring these all together in one TSQL statement and show
the difference between all four ranking functions.
--Ranking All
Use AdventureWorks
Select
SalesOrderID as OrderID,
CustomerID,
Row_Number() Over (Order By CustomerID) as RowNum,
RANK() Over (Order By CustomerID) as Rank,
DENSE_RANK() Over (Order By CustomerID) as DRank,
NTILE(10000) Over (Order By CustomerID) as NTile
From
Sales.SalesOrderHeader
Where
SalesOrderID > 10000
Order By
CustomerID
The results look like this:
OrderID CustomerID RowNum
Rank DRank NTile
----------- ----------- ------- ------ ------ --------
43860
1
1 1
1 1
44501
1
2 1
1 1
45283
1
3 1
1 1
46042 1
4 1
1 1
46976
2
5 5
2 2
47997
2
6 5
2 2
49054
2
7 5
2 2
50216
2
8 5
2 2
51728
2
9 5
2 3
57044
2
10 5
2 3
63198
2
11 5
2 3
69488
2
12 5
2 3
44124
3
13 13
3 4
44791
3
13 13
3 4
...More
PARTITION BY
The ranking functions can also combine with windowing functions. A windowing
function will divide a resultset into equal partitions based on the values of
your PARTITION BY statement in conjunction with the OVER clause in your ranking
function. It is like applying a GROUP BY to your ranking function-you get a
separate ranking for each partition. The example below uses ROW_NUMBER with
PARTITION BY to count the number of orders by order date by salesperson. We do
this with a PARTITION BY SalesPersonID OVER OrderDate. This can be used with
any of the four ranking functions.
Select
SalesOrderID, SalesPersonID, OrderDate, Row_NUMBER() Over (Partition By
SalesPersonID Order By OrderDate) as OrderRank
From
Sales.SalesOrderHeader
Where
SalesPersonID is not null
The results look like this:
SalesOrderID SalesPersonID
OrderDate
OrderRank
------------ ------------- ------------------------ ---------
43659
279 2001-07-01
00:00:00.000 1
43660
279 2001-07-01
00:00:00.000 2
43681
279 2001-07-01
00:00:00.000 3
43684
279 2001-07-01
00:00:00.000 4
43685
279 2001-07-01
00:00:00.000 5
43694
279 2001-07-01
00:00:00.000 6
43695
279 2001-07-01
00:00:00.000 7
43696
279 2001-07-01
00:00:00.000 8
43845
279 2001-08-01
00:00:00.000 9
43861
279 2001-08-01
00:00:00.000 10
. . . More
48079
287 2002-11-01
00:00:00.000 1
48064
287 2002-11-01
00:00:00.000 2
48057
287 2002-11-01
00:00:00.000 3
47998
287 2002-11-01
00:00:00.000 4
48001
287 2002-11-01
00:00:00.000 5
48014
287 2002-11-01
00:00:00.000 6
47982
287 2002-11-01
00:00:00.000 7
47992
287 2002-11-01
00:00:00.000 8
48390
287 2002-12-01
00:00:00.000 9
48308
287 2002-12-01
00:00:00.000 10
. . . More
PARTITION BY supports other SQL Server aggregate functions including MIN and
MAX.