Thursday, October 4, 2007

When RDBMS theory hits practice

There's a project I'm working on that uses MSDE to store its data. I inherited a schema that was totally non-relational; pretty much all of the data model objects are stored in binary serialized IMAGE fields. While this probably made it much easier, initially, to save data it rules out any meaningful querying via SQL.

So I've finally gotten the go ahead to "refactor" this monstrosity. With notions of Relations, Entities (both strong and weak) and Joins in mind I merrily produce a schema that's totally normalized, heavily query-able and makes for quite pretty ER diagrams.

What I neglected to consider was that part of the model relied on data in another part of the model but needed to survive mutations/deletions to that data. Well, I didn't fully consider it. A first cut at the schema wasn't foreign keyed to the data in the other part of the model but, as my boss pointed out, that wasn't sufficient. What good is an ID that refers to a row that no longer exists (even if the deletion didn't cascade)? It's a dangling reference. What I needed was to duplicate some of the data from the other part of the model.

I think the general principle here is that duplication of data can be necessary if some part of the model needs that data to survive mutations/deletions in the source. Probably applies to logging (where strings usually handle the duplication) but also to any derived data objects that need to exist apart from their source model components.

No comments :

Post a Comment