The new HierarchyId data type introduced in SQL 2008 simplifies querying this data hugely.
Let’s start with a scenario. Consider the hierarchical category list in any popular auction website.
Home
|--Art
| |--Art Supplies
| |--Carvings & Sculptures
| |--Drawings
| |--Paintings
| \--Photographs
\--Books
|--Audio Books
|--Childrens Books
|--Comic Books
|--Fiction
|--Magazines
\--Non fiction
|--Biography
|--Cooking
|--Crafts
|--History
| |--Ancient
| |--Asia
| |--Australia
| |--Britain
| |--Europe
| \--New Zealand
\--Hobbies
This is only a tiny segment of the categories. You can imagine the hierarchy could be both wide and deep. SQL 2008 introduces the new HierarchyId SQLCLR data type that stores the information about where the node lies within the tree of categories, as well as providing various methods which we can use to find information about the node.
First we’ll create a table to store our categories:
CREATE TABLE Categories
(
CategoryId INT IDENTITY(1,1) NOT NULL,
OrgNode HIERARCHYID NOT NULL,
CategoryName VARCHAR(200) NOT NULL,
CONSTRAINT
[PK_Categories] PRIMARY KEY CLUSTERED
(
[CategoryId] ASC
)
);
Next we’ll populate our root node:
INSERT INTO Categories (OrgNode,
CategoryName)
VALUES (hierarchyid::GetRoot(), 'Home')
HierarchyId::GetRoot() is a static method that returns the root of the hierarchy tree. If we want to add a child node to this then we’ll need to grab the next available node value.
We can do this with the GetDescendant() method on an existing HierarchyId value. If we pass two NULL values, it’ll give you a new child node value one level deeper.
DECLARE
@ChildOrgNode HierarchyId
SELECT @parentOrgNode
= OrgNode
FROM Categories
WHERE CategoryId=1
INSERT INTO Categories (OrgNode,
CategoryName)
VALUES (@parentOrgNode, 'Art')
We can only add the first child node like this however. If there are existing child nodes then we’ll need to pass different values in to determine which existing child node it’s going before/after.
GetDescendant(@nodeX, NULL) will get a new node value after child node X.
GetDescendant(@nodeX, @nodeY) will get a new node value between child nodes X and Y.
GetDescendant(NULL, @nodeY) will get a new node value before child node y.
We can turn this into a stored procedure to simplify inserts in the future:
CREATE PROCEDURE AddCategory(
@ParentCategoryId INT,
@CategoryName VARCHAR(200)
)
AS
BEGIN
DECLARE @childOrgNode HierarchyId
DECLARE @parentOrgNode HierarchyId
-- Get the OrgNode value of our parent
node
SELECT @parentOrgNode = OrgNode
FROM Categories
WHERE CategoryId=@ParentCategoryId
SET TRANSACTION
ISOLATION LEVEL
SERIALIZABLE
BEGIN TRANSACTION
-- Determine the last child
node below our parent
-- (this could be null if no
child nodes yet exist)
SELECT @childOrgNode = max(OrgNode)
FROM Categories
WHERE OrgNode.GetAncestor(1) =@parentOrgNode;
-- insert our new node after
the existing last child
INSERT INTO Categories (OrgNode,
CategoryName)
VALUES (@parentOrgNode.GetDescendant(@ChildOrgNode, NULL), @CategoryName)
COMMIT
RETURN SCOPE_IDENTITY()
END;
And we can call this quite simply:
DECLARE @booksId INT
EXEC @BooksId = AddCategory 1, 'Books'
EXEC
AddCategory @BooksId,
'Audio Books'
EXEC
AddCategory @BooksId,
'Comic Books'
Now, let’s have a look at our data.
SELECT * FROM
Categories;
As you can see, our OrgNode value looks like a hexadecimal value. We can see a friendlier view if we use the ToString() method
SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM
Categories
We can get even fancier if you need, and determine the ParentId for each category and a bit more information. This makes it look more like the old mechanisms that we used to use for hierarchical lists, and can sometimes be useful in our front end code.
SELECT
C.CategoryId,
C.OrgNode,
C.OrgNode.ToString() AS OrgNodeString,
P.CategoryId AS ParentId,
C.OrgNode.GetLevel() AS Depth,
C.CategoryName
FROM Categories
C
LEFT JOIN Categories P ON
P.OrgNode = C.OrgNode.GetAncestor(1)
The GetAncestor method will return the HierarchyId value of the nth parent. GetAncestor(1) will return the immediate parent, GetAncestor(2) will return the parents parent, etc.
Let’s generate a whole bunch more data, just so that we can see how querying larger trees works
DECLARE
@categoryId INT;
SELECT
@categoryId = CategoryId FROM Categories WHERE
CategoryName = 'Art'
EXEC
AddCategory @categoryId,
'Art Supplies';
EXEC
AddCategory @categoryId,
'Carvings & Sculptures';
EXEC
AddCategory @categoryId,
'Drawings';
EXEC
AddCategory @categoryId,
'Paintings';
EXEC
AddCategory @categoryId,
'Photographs';
SELECT
@categoryId = CategoryId FROM Categories WHERE
CategoryName = 'Books'
EXEC
AddCategory @categoryId,
'Fiction';
EXEC
AddCategory @categoryId,
'Magazines';
EXEC
@categoryId = AddCategory @categoryId, 'Non fiction';
EXEC
AddCategory @categoryId,
'Biography';
EXEC
AddCategory @categoryId,
'Cooking';
EXEC
AddCategory @categoryId,
'Crafts';
EXEC
AddCategory @categoryId,
'Hobbies';
EXEC
@categoryId = AddCategory @categoryId, 'History';
EXEC
AddCategory @categoryId,
'Ancient';
EXEC
AddCategory @categoryId,
'Asia';
EXEC
AddCategory @categoryId,
'Australia';
EXEC
AddCategory @categoryId,
'Europe';
EXEC AddCategory @categoryId, 'New Zealand';
SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM
Categories
Now comes the really clever part. What if we want to know all of the categories that are in the books section?
DECLARE
@parentOrgNode HierarchyId;
-- Get the
OrgNode value of our parent category
SELECT
@parentOrgNode = OrgNode
FROM Categories
WHERE
CategoryName='Non
Fiction';
SELECT
CategoryId,
OrgNode,
OrgNode.ToString() AS OrgNodeString,
CategoryName
FROM Categories
WHERE
OrgNode.IsDescendantOf(@parentOrgNode) = 1
The IsDescendantOf method will return a Boolean value (1 = true) if the given node is a descendant of the parameter node provided. Any node is considered a descendant of itself, which is why we can see the Books category as well.
The beauty of this is that it is '''FAST'''. Compare this with our traditional method where we would use a ParentId field, and recursively walk through the list of categories adding them to a temporary table.
If we want to build a breadcrumb trail of our category we could do something similar:
DECLARE
@parentOrgNode HierarchyId;
-- Get the
OrgNode value of our parent node
SELECT
@parentOrgNode = OrgNode
FROM Categories
WHERE
CategoryName='Non
Fiction';
SELECT
CategoryId,
CategoryName,
OrgNode.GetLevel() AS Depth
FROM Categories
WHERE
@parentOrgNode.IsDescendantOf(OrgNode) = 1
ORDER BY OrgNode.GetLevel()
Imagine that we also have a table called Products. Each product would belong to a Category. Querying all products that were in any of the book categories would be simple:
SELECT
P.*
FROM Categories
C
INNER JOIN Products P
ON C.CategoryId
= P.CategoryId
WHERE
OrgNode.IsDescendantOf(@parentOrgNode) = 1
For our Windows and Web applications the HierarchyId data type is a full CLR type, so we can use all of these methods in our .NET projects simply by adding a reference to the Microsoft.SqlServer.Types class library and using the SqlHierarchyId type.
One tip for people using Entity Framework (which doesn’t natively support the HierarchyId type) is that you need to query it with a .ToString() method to get the string representation, and then use the Parse() method to turn this string back into a HierarchyId. This can be used for updates in both directions, ie: both in SQL and in C#.
For example, this will return a valid HierarchyId value:
SELECT HierarchyId::Parse('/23/4/')
Now, all of this simplicity on querying the data does come with a downside. If you want to shuffle entire branches of your hierarchy around, then you need to do a lot more work re-parenting the HierarchyId values. We can certainly achieve this with a stored procedure, but I’m not going to get into here because it does get a lot more complex.
It can however be simpler to just provide simple Add and Delete methods, and Edits will only allow you to edit the non-hierarchy details such as title etc.