June 30, 2008

SE Radio 102: Relational Databases

Recording Venue:
Guest(s):
Host(s): Bernd Arno
In this espisode we take a closer look at relational database systems and the concepts behind them. We start by discussing the relational paradigm, its concepts and ramifications, and go on to architectural aspects.

Show Notes

Links:

Tagged as:databases Technology Talk

Join the discussion

You must be logged in to post a comment.

4 comments

tarvin says:

September 1, 2008 at 10:47 pm

Thanks for another SE Radio podcast. I use SE Radio for refreshing my knowledge about important IT subjects, like database technology, and for catching up with emerging trends like REST.

I really liked the episode about transactions, but I have to say that I believe episode 102 contains a number of errors, and excessive bias.

The SQL standard actually specifies sequences, triggers and stored procedures (“SQL/PSM”). And it has done so for quite a while. In other words, many features which the podcast describes as proprietary or non-standardized is actually described in ISO’s SQL standard.

About primary keys:
The podcast is very much in favor of artificial keys, describing that approach as “modern”. I believe that trustworthy coverage of the subject would require less biased statements. Have a look at this Joe Celko article for a different (and more sensible, in my opion) view on the subect: http://www.intelligententerprise.com/030320/605celko1_1.jhtml
Also: The podcast claims that it’s very hard to change a primary key; it doesn’t have to be: Within a database, the change can easily propagate if foreign keys are defined with a CASCADE referential action. If changes to such values need to be recorded, triggers can be used to maintain a change history.

Finally, I believe that the podcast under-states the explosion of complexity associated with clustering (and hence, the increased risk of long service disruptions when someone gets it wrong).

—
Regards,
Troels Arvin
ArnoHaase says:

September 23, 2008 at 10:09 am

Thank you for the thorough and detailed feedback. And sorry I took so long to respond – I was on a lengthy vacation. Originally coming from the middle tier world rather than the relational world, I may occasionally tend to some bias, and your comments highlight a complementary perspective which is definitely a good thing.

That said, I had reasons for making the above mentioned statements, and I will proceed to explain them.

With regard to sequences, triggers and stored procedures, you correctly point out that they are part of the ISO standard. What I should have said is that the process of standard conforming adoption in t.

With regard to artificial vs. natural primary keys, I think the article by Joe Celko gives a good overview and introduces a useful taxonomy, but I see good reasons for having a different opinion. First of all, “natural” primary keys may not be known all the time. Using a natural primary key makes it difficult to support use cases where data is entered piecemeal – think presales where data is entered as it becomes available, and call center agents have the job of getting the missing data. Using natural primary keys means that no data can be entered unless this data is available.

Even worse, they may be available all the time in an initial version of a system, but later new requirements invalidate this assumption – making the database schema less robust with regard to changes in business *processes* that otherwise would have less impact on the data model.

Let’s look at an American employee management system that uses the social security number as a primary key. That makes it difficult to extend the system later to be able to manage foreign workers on a non-permanant visa who do not have an SSN. From the perspective of the system, they are employees like all others, but the choice of a “natural” primary key forces them to be handled differently. Or to build on top of the Universal Product Code example from Joe’s article – what about planned products that may not have a UPC yet? In both cases, using a natural primary key causes additional friction at the technical level.

And finally, a primary key that is exposed to a user automatically assumes business meaning and often eventually leaks to other systems, causing unnecessary and unpleasant coupling. The American social security number is a prominent example of this problem.

And yes, changing a primary key value in the database is possible (and even easy using CASCADE) as you point out. My perspective however was from the perspective of a client program accessing the database. Keeping data structures in memory – let alone caches – in sync with the database is very hard and tricky if referential fields are changed implicitly, and for this reason e.g. O/R mappers tend to forbid such changes in the first place.

To sum my opinion on natural primary keys up, I hold them to add very little benefit while adding potential risk, fragility and effort – at least for systems where significant business logic is implemented outside the database.

And regarding clusters, if the podcast gave the impression that clustering is simple or involves little risk, that was definitely not my intention. Thanks for pointing this out – clustering involves real complexity, and you should know what you are doing and plan for additional time to get it up and running.

– Arno Haase
hablutzel1 says:

January 24, 2009 at 2:25 am

Extremely helpful thank you!
Mitch says:

July 15, 2009 at 7:28 pm

How would you implement a type 2 slowly changing dimension using only natural keys? Concatenate every column in the table into a primary key? Sounds like a performance nightmare.