posted on Thursday, August 04, 2005 3:05 PM
by
amachanic
Swinging From Tree to Tree Using CTEs, Part 1: Adjacency to Nested Sets
I'm not sure how many times over the last several years I've seen the
same tired article titles... "Climbing Trees in SQL," "Climbing Up the
SQL Tree," or maybe, "Naked Coeds Playing in the Trees!" ... Oh
wait, I think that last one might be something else.
But anyway, the point is, I'm going to adhere to that standard.
But I'm adding a Tarzan-esque theme to this post, because we're not
going to just climb a tree. We're going to swing about between
trees. Different types of tree representations are appropriate
for different scenarios. Which is why, as I pointed out in my
recent SearchSQLServer article on recursive Common Table Expressions,
we have so many different ways of representing them. Adjacency
Lists, Materialized Paths, Nested Sets, and Nested Intervals spring to
mind. And there are probably others.
My article shows how to use the CTEs, in conjunction with a dynamically
generated materialized path, to manipulate an Adjacency List, getting
many of the benefits associated with using the Nested Sets Model.
And that's great. But the Nested Sets Model itself might be
useful in your endeavors. So in this post, I will show how to
extend the CTE discussed in that article.
The end-result CTE in the article, which can be run in the
AdventureWorks database, looks something like this (renamed for the
sake of this post):
WITH EmployeeLevels AS
(
SELECT
EmployeeId,
CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
1 AS Level
FROM HumanResources.Employee
WHERE ManagerId IS NULL
UNION ALL
SELECT
e.EmployeeId,
x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
x.Level + 1 AS Level
FROM EmployeeLevels x
JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
)
thePath is the materialized path, a '.' delimited breadcrumb from the
root to each node. We also end up with a level, which represents
how many nodes away from the root in the hierarchy each employee sits
(very important for those upwardly-mobile junior execs, no doubt!)
I'm going to assume that readers of this post are familiar with the
Nested Sets Model, as popularized by Joe Celko. If not, read the link.
After staring at the Nested Sets for a while, I arrived at the following mathematical properties:
- Value of Lft for the root node is 1
- Value of Rgt for the root node is 2 * (Number of nodes)
- Value of Lft for any node is ((Number of nodes visited) * 2) - (Level of current node)
- Value of Rgt for any node is (Lft value) + ((Number of subnodes) * 2) + 1
I think the only factor here that requires further explanation is
"(Number of nodes visited)". By this, I mean the number of
nodes that would have been visited (including the current node) if one
were doing a
preorder traversal of the tree. Luckily, this number is quite easy
to determine. The row number for each row, as determined by
ordering by the materialized path,
is
this number. I
encourage readers to validate this with pencil and paper if it doesn't
quite make sense reading on the screen. Draw a simple tree and
traverse, starting from the lefthand side of the root node.
But how to translate that into T-SQL? Luckily, SQL Server 2005, in addition to CTEs, also includes the very useful
ROW_NUMBER() function.
So to get the row number, we simply need to add another CTE into the
chain (did you know that successive CTEs can use the results of
previous CTEs?):
WITH EmployeeLevels AS
(
SELECT
EmployeeId,
CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
1 AS Level
FROM HumanResources.Employee
WHERE ManagerId IS NULL
UNION ALL
SELECT
e.EmployeeId,
x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
x.Level + 1 AS Level
FROM EmployeeLevels x
JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
),
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
ROW_NUMBER() OVER (ORDER BY thePath) AS Row
FROM EmployeeLevels
)
We now have current level and number of nodes visited, which gives us
the Lft value for each node. But how to determine the number of
subnodes, in order to get the Rgt value? Luckily, the materialized
path also gives us that capability...
For any given node, number of subnodes can be determined by counting
all nodes whose path value is LIKE the current path value + '.%'.
The resultant query should be fairly obvious at this point:
WITH EmployeeLevels AS
(
SELECT
EmployeeId,
CONVERT(VARCHAR(MAX), EmployeeId) AS thePath,
1 AS Level
FROM HumanResources.Employee
WHERE ManagerId IS NULL
UNION ALL
SELECT
e.EmployeeId,
x.thePath + '.' + CONVERT(VARCHAR(MAX), e.EmployeeId) AS thePath,
x.Level + 1 AS Level
FROM EmployeeLevels x
JOIN HumanResources.Employee e on e.ManagerId = x.EmployeeId
),
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
ROW_NUMBER() OVER (ORDER BY thePath) AS Row
FROM EmployeeLevels
)
SELECT
ER.EmployeeId,
ER.thePath,
ER.Level,
ER.Row,
(ER.Row * 2) - ER.Level AS Lft,
((ER.Row * 2) - ER.Level) +
(
SELECT COUNT(*) * 2
FROM EmployeeRows ER2
WHERE ER2.thePath LIKE ER.thePath + '.%'
) + 1 AS Rgt
FROM EmployeeRows ER
ORDER BY thePath
... And that's it! We have now converted the Adjacency List tree into a Nested Sets tree.
In the next installment, I will show how to determine the Nested
Intervals encoding from an Adjacency List tree, also using recursive
CTEs.
Questions?