2008: Converting XML to HIERARCHYID
While working on a new book, I have been experimenting with the new HIERARCHYID data type. And since I've seen comparisons drawn between using XML versus using HIERARCHYID to manage a hierarchy of data, I decided that it would likely be very useful if I could convert an XML structure into an equivalent relational structure using HIERARCHYID. Here is what I came up with:
DECLARE @x XML
=
'<A id="1">
<B id="2">
<C id="3"/>
<D id="4"/>
</B>
<E id="5"/>
<F id="6"/>
</A>';
WITH Folders AS
(
SELECT
t.c.value('@id',
'int')
AS ID
,
NULLIF(t.c.value('../@id',
'nvarchar(50)'),
'')
AS ParentID
, t.c.value('local-name(.)',
'nvarchar(50)')
AS
Description
, t.c.value('for $s in . return count(../*[. << $s]) + 1',
'int')
AS RowNum
FROM @x.nodes('//*')
AS t(c)
)
, FolderTree AS
(
SELECT ID, ParentID,
Description, RowNum,
HIERARCHYID::GetRoot()
AS FolderNode
FROM Folders
WHERE ParentID IS
NULL
UNION ALL
SELECT F.ID, F.ParentID, F.Description, F.RowNum,
CAST(FT.FolderNode.ToString()
+
CAST(F.RowNum AS
varchar(50))
+
'/'
AS HIERARCHYID)
FROM Folders AS F
INNER
JOIN FolderTree AS FT ON F.ParentID = FT.ID
)
SELECT
ID,
ParentID,
Description,
FolderNode,
FolderNode.ToString() AS Path
FROM FolderTree
ORDER
BY FolderNode;
Which results in:
ID | ParentID | Description | FolderNode | Path |
1 | NULL | A | 0x | / |
2 | 1 | B | 0x58 | /1/ |
3 | 2 | C | 0x5AC0 | /1/1/ |
4 | 2 | D | 0x5B40 | /1/2/ |
5 | 1 | E | 0x68 | /2/ |
6 | 1 | F | 0x78 | /3/ |
The basis of this is the same as if converting an existing adjacency model to use HIERARCHYID – use a combination of a recursive CTE and row numbers partitioned by parent to generate the new path of the node and then cast that path to the HIERARCHYID data type. The parent path is retrieved using the ToString method of the parent node, and this is concatenated with the RowNum value and a forward slash (/) to result in the path for the current node. And finally, the newly generated path is then cast to HIERARCHYID, which uses its static Parse method behind the scenes to create a new instance of HIERARCHYID. This table shows the additional detail and should reveal how the new path is created.
ID | ParentID | Description | FolderNode | Path | Parent Path | RowNum |
1 | NULL | A | 0x | / | N/A | 1 |
2 | 1 | B | 0x58 | /1/ | / | 1 |
3 | 2 | C | 0x5AC0 | /1/1/ | /1/ | 1 |
4 | 2 | D | 0x5B40 | /1/2/ | /1/ | 2 |
5 | 1 | E | 0x68 | /2/ | / | 2 |
6 | 1 | F | 0x78 | /3/ | / | 3 |
On the XML side of things, I generated a partitioned row number by using the nodes method to return all nodes in the XML structure, as shown here: FROM @x.nodes('//*')
AS t(c). The double forward slash (//) translates to "relative path" and the asterisk means any node, so this returns all nodes at all levels. Then, to return the row number, I use one of my favorite XQuery examples:
t.c.value('for $s in . return count(../*[. << $s]) + 1',
'int')
AS RowNum
This translates to "count the number of immediate child nodes of my parent node that are positioned before me." Now that I have all the nodes, and a partitioned row number for these nodes, I can construct the new path of the node.
I will post more as I continue writing about and exploring SQL Server 2008.
Enjoy!
Cross-posted from SQLBlog! -
http://www.sqlblog.com