Wednesday, September 9, 2009

Trees in SQL Databases

The Tree is a widely used data structure.  It’s found all over the place from industry (e.g., organizational charts are usually trees) to geneology (e.g., family trees) to biology (e.g., phylogenetic trees).

Trees have been heavily studied by Computer Scientists since data stored in trees can often be located very quickly (relative to the total amount of data).  Trees can also be used as a design tool to identify highly parallelizable components of an algorithm.

SQL Databases are designed around the Entity-Relational model.  The benefits of SQL are many but if we’re to reap these benefits we’ve got to find a way to get the world’s data into a form congenial to the Entity-Relational model.

Converting an existing data model into relational form usually entails identifying the entities and their relationships.  These are then mapped to tables, keys and foreign keys.

Trees are a little bit different in that they are typically modeled as an entity with a relationship to itself.

familymemberstable

The nodes of a tree exhibit a 1-to-many relationship.  Each child has a single parent; each parent can have 0 or more children.  In the relational model each node is represented as a row.  The parent of a node, if any, is stored in the ParentSSN field.  The SSN is the primary key for the table so each row is required to have one.

The ParentSSN field allows nulls because every node does not have a parent (e.g., the root node has no parent).

While trees can be represented in SQL databases in this way, the hierarchical nature of the tree makes querying it via SQL somewhat more difficult than querying other kinds of data.

Oracle supports hierarchical queries via the CONNECT BY keyword.  SQL Server 2008 added support for hierarchical queries.

No comments :

Post a Comment