Monday, May 13, 2013

Handling Hierarchies with the SQL HierarchyId datatype

Traditionally when we’ve dealt with hierarchical data in SQL we’ve need to jump through a few hoops to query it. Usually we’d set up a our data table with a parent Id reference, and to query it we would use a temp table and populate it in a recursive loop. This isn’t very efficient.

The new HierarchyId data type introduced in SQL 2008 simplifies querying this data hugely.

Let’s start with a scenario. Consider the hierarchical category list in any popular auction website.

Home
|--Art
|  |--Art Supplies
|  |--Carvings & Sculptures
|  |--Drawings
|  |--Paintings
|  \--Photographs
\--Books
   |--Audio Books
   |--Childrens Books
   |--Comic Books
   |--Fiction
   |--Magazines
   \--Non fiction
      |--Biography
      |--Cooking
      |--Crafts
      |--History
      |  |--Ancient
      |  |--Asia
      |  |--Australia
      |  |--Britain
      |  |--Europe
      |  \--New Zealand
      \--Hobbies

This is only a tiny segment of the categories. You can imagine the hierarchy could be both wide and deep. SQL 2008 introduces the new HierarchyId SQLCLR data type that stores the information about where the node lies within the tree of categories, as well as providing various methods which we can use to find information about the node.

First we’ll create a table to store our categories:


CREATE TABLE Categories
(
      CategoryId INT IDENTITY(1,1) NOT NULL,
      OrgNode HIERARCHYID NOT NULL,
      CategoryName VARCHAR(200) NOT NULL,
CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED
(
      [CategoryId] ASC
)
);


Next we’ll populate our root node:


INSERT INTO Categories (OrgNode, CategoryName)
VALUES (hierarchyid::GetRoot(), 'Home')


HierarchyId::GetRoot() is a static method that returns the root of the hierarchy tree. If we want to add a child node to this then we’ll need to grab the next available node value.

We can do this with the GetDescendant() method on an existing HierarchyId value. If we pass two NULL values, it’ll give you a new child node value one level deeper.


DECLARE @ChildOrgNode HierarchyId

SELECT @parentOrgNode = OrgNode
FROM Categories WHERE CategoryId=1

INSERT INTO Categories (OrgNode, CategoryName)
VALUES (@parentOrgNode, 'Art')


We can only add the first child node like this however. If there are existing child nodes then we’ll need to pass different values in to determine which existing child node it’s going before/after.

GetDescendant(@nodeX, NULL) will get a new node value after child node X.
GetDescendant(@nodeX, @nodeY) will get a new node value between child nodes X and Y.
GetDescendant(NULL, @nodeY) will get a new node value before child node y.

We can turn this into a stored procedure to simplify inserts in the future:


CREATE PROCEDURE AddCategory(
      @ParentCategoryId INT,
      @CategoryName VARCHAR(200)
)
AS
BEGIN
      DECLARE @childOrgNode HierarchyId
      DECLARE @parentOrgNode HierarchyId
     
      -- Get the OrgNode value of our parent node
      SELECT @parentOrgNode = OrgNode
      FROM Categories
      WHERE CategoryId=@ParentCategoryId
     
      SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
      BEGIN TRANSACTION
            -- Determine the last child node below our parent
            -- (this could be null if no child nodes yet exist)
            SELECT @childOrgNode = max(OrgNode)
            FROM Categories
            WHERE OrgNode.GetAncestor(1) =@parentOrgNode;

            -- insert our new node after the existing last child
            INSERT INTO Categories (OrgNode, CategoryName)
            VALUES (@parentOrgNode.GetDescendant(@ChildOrgNode, NULL), @CategoryName)
      COMMIT

      RETURN SCOPE_IDENTITY()
END;


And we can call this quite simply:


DECLARE @booksId INT
EXEC @BooksId = AddCategory 1, 'Books'
EXEC AddCategory @BooksId, 'Audio Books'
EXEC AddCategory @BooksId, 'Comic Books'


Now, let’s have a look at our data.


SELECT * FROM Categories;


SELECT * FROM Categories;

As you can see, our OrgNode value looks like a hexadecimal value.  We can see a friendlier view if we use the ToString() method


SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM Categories


SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM Categories

We can get even fancier if you need, and determine the ParentId for each category and a bit more information.  This makes it look more like the old mechanisms that we used to use for hierarchical lists, and can sometimes be useful in our front end code.


SELECT
      C.CategoryId,
      C.OrgNode,
      C.OrgNode.ToString() AS OrgNodeString,
      P.CategoryId AS ParentId,
      C.OrgNode.GetLevel() AS Depth,
      C.CategoryName
FROM Categories C
LEFT JOIN Categories P ON P.OrgNode = C.OrgNode.GetAncestor(1)


Select more details

The GetAncestor method will return the HierarchyId value of the nth parent.  GetAncestor(1) will return the immediate parent, GetAncestor(2) will return the parents parent, etc.

Let’s generate a whole bunch more data, just so that we can see how querying larger trees works


DECLARE @categoryId INT;
SELECT @categoryId = CategoryId FROM Categories WHERE CategoryName = 'Art'
EXEC AddCategory @categoryId, 'Art Supplies';
EXEC AddCategory @categoryId, 'Carvings & Sculptures';
EXEC AddCategory @categoryId, 'Drawings';
EXEC AddCategory @categoryId, 'Paintings';
EXEC AddCategory @categoryId, 'Photographs';
SELECT @categoryId = CategoryId FROM Categories WHERE CategoryName = 'Books'
EXEC AddCategory @categoryId, 'Fiction';
EXEC AddCategory @categoryId, 'Magazines';
EXEC @categoryId = AddCategory @categoryId, 'Non fiction';
EXEC AddCategory @categoryId, 'Biography';
EXEC AddCategory @categoryId, 'Cooking';
EXEC AddCategory @categoryId, 'Crafts';
EXEC AddCategory @categoryId, 'Hobbies';
EXEC @categoryId = AddCategory @categoryId, 'History';
EXEC AddCategory @categoryId, 'Ancient';
EXEC AddCategory @categoryId, 'Asia';
EXEC AddCategory @categoryId, 'Australia';
EXEC AddCategory @categoryId, 'Europe';
EXEC AddCategory @categoryId, 'New Zealand';

SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM Categories


SELECT CategoryId, OrgNode, OrgNode.ToString()AS OrgNodeString, CategoryName FROM Categories

Now comes the really clever part.  What if we want to know all of the categories that are in the books section?


DECLARE @parentOrgNode HierarchyId;

-- Get the OrgNode value of our parent category
SELECT @parentOrgNode = OrgNode
FROM Categories
WHERE CategoryName='Non Fiction';

SELECT
      CategoryId,
      OrgNode,
      OrgNode.ToString() AS OrgNodeString,
      CategoryName
FROM Categories
WHERE
      OrgNode.IsDescendantOf(@parentOrgNode) = 1


Select Non Fiction branch

The IsDescendantOf method will return a Boolean value (1 = true) if the given node is a descendant of the parameter node provided.  Any node is considered a descendant of itself, which is why we can see the Books category as well.

The beauty of this is that it is '''FAST'''.  Compare this with our traditional method where we would use a ParentId field, and recursively walk through the list of categories adding them to a temporary table.

If we want to build a breadcrumb trail of our category we could do something similar:


DECLARE @parentOrgNode HierarchyId;

-- Get the OrgNode value of our parent node
SELECT @parentOrgNode = OrgNode
FROM Categories
WHERE CategoryName='Non Fiction';

SELECT
      CategoryId,
      CategoryName,
      OrgNode.GetLevel() AS Depth
FROM Categories
WHERE
      @parentOrgNode.IsDescendantOf(OrgNode) = 1
ORDER BY OrgNode.GetLevel()


Select Non Fiction breadcrumb chain

Imagine that we also have a table called Products.  Each product would belong to a Category.  Querying all products that were in any of the book categories would be simple:


SELECT
      P.*
FROM Categories C
INNER JOIN Products P
      ON C.CategoryId = P.CategoryId
WHERE
      OrgNode.IsDescendantOf(@parentOrgNode) = 1


For our Windows and Web applications the HierarchyId data type is a full CLR type, so we can use all of these methods in our .NET projects simply by adding a reference to the Microsoft.SqlServer.Types class library and using the SqlHierarchyId type.

One tip for people using Entity Framework (which doesn’t natively support the HierarchyId type) is that you need to query it with a .ToString() method to get the string representation, and then use the Parse() method to turn this string back into a HierarchyId.  This can be used for updates in both directions, ie: both in SQL and in C#.

For example, this will return a valid HierarchyId value:


SELECT HierarchyId::Parse('/23/4/')


Now, all of this simplicity on querying the data does come with a downside. If you want to shuffle entire branches of your hierarchy around, then you need to do a lot more work re-parenting the HierarchyId values. We can certainly achieve this with a stored procedure, but I’m not going to get into here because it does get a lot more complex.

It can however be simpler to just provide simple Add and Delete methods, and Edits will only allow you to edit the non-hierarchy details such as title etc.