Tuesday, October 26, 2004 - Posts

No, stored procedures are NOT bad

I recently found a rather old post from Frans Bouma's blog, "Stored procedures are bad, m'key?".

Since comments are closed (he posted it almost a year ago), I have to reply here.

I'll state my bottom line at the top: Stored procedures are not only not bad, they're necessary for maintaining loosely coupled, application agnostic, flexible databases. And they're also necessary for maintaining loosely coupled, database agnostic, flexible database-based applications.

Mr. Bouma's argument centers around three claims: Ad-hoc queries are easier to construct in application code, stored procedures don't provide greater security than the application already does, and in certain cases stored procedure execution plan caching can be detrimental to performance. And all three of these arguments are true! But none of them has any bearing whatsoever on the real benefits of stored procedures.

Let's have a quick refresher on Computer Science 101, as it applies to object-oriented design: Loose coupling and high cohesion. What does that mean in plain English? Each module, or object, should have one very specific job, which it can perform for a VARIETY of other modules or objects on request (high cohesion). Furthermore, no module or object (or as few as possible) should depend upon the inner workings of another module or object. This latter problem is known as tight coupling, and its consequences are dire. Change one piece of code, end up changing every other piece of code that uses it.

News flash for Mr. Bouma: This is the reason object-oriented programming was invented. We try to separate components from one another so that their functions can be re-used and re-applied to other modules, other objects, and if we've done our job really well, even other applications entirely.

So how does this apply to the relationship between a database and an application?

The database, undoubtedly, should be completely application unaware. The database is serving up data. It has no idea what application is requesting the data, or whether an application is requesting the data. It needs to have no such idea. All it needs to do is keep serving and all is happy. Furthermore, the database doesn't care if multiple applications request data, or if those applications request the same data. The database is set up to provide a single point of access for all data requests in its domain of knowledge (i.e. the data in the database being queried).

And now to the other side, the application itself. I am an advocate of the application being as database agnostic as possible. I don't believe this is entirely possible, but it is certainly a goal to which we can aspire. Applications should request data from the database using standardized, documented interfaces, after which that data should be composed as quickly as possible (at the lowest level) into native objects. This allows for changes in the data interface to have as little ripple effect as possible in the application.

Mr. Bouma makes the claim that, "changes to a relational model will have always an impact on the application that targets that model". Again, he is correct. Which is why applications should NOT target a model. Applications should have absolutely no knowledge of database schema, including table names, column names, data types, or any other information. This is the role of stored procedures. We can, using stored procedures, completely encapsulate this metadata and provide standardized interfaces into the data.

Need to change a datatype in the database? Perhaps you won't need to change the output values that the application receives from the stored procedures. Perhaps you won't need to change the parameter input values. No change is needed in application code. Need to change a column name? Same thing. Need to re-architect the entire schema? Again, just change your stored procedures. The application will keep running as if nothing changed.

Contrast this to Mr. Bouma's suggestion that we drop stored procedures altogether and instead build all queries within application code. Suddenly, any small change in the database needs to be completely regression tested throughout not only one app, but every application that uses the database. You've created an extremely tight, perhaps unbreakable coupling between the application and the database. Large-scale changes to the schema will most likely never be possible. Will there ever really be time to re-write all of that application code?

Now, back to the three central arguments: Yes, ad-hoc queries are easier to construct in application code; but it's also argued by many database experts that ad-hoc queries are a sign of either poor database design, poor application design, or both. Either way, dynamic SQL isn't too hard to work with, and I've seen plenty of extremely ad-hoc applications in which its use is hardly a stumbling block. Next, security. It's true that if a user isn't authenticated in the application, he or she won't be able to use that application to access the data. So in that case, the stored procedure does not provide greater access control. But the same is certainly not true for every other application that uses the same database. Unless, of course, code is duplicated across every application. The database should be the final word on its data. This includes data security and data integrity. Application code simply cannot do this if the database need ever be shared. Finally, Bouma's assertions about cache plans are simply not worth touching. The WITH RECOMPILE option has been around long enough that DBAs and developers know it's there and know how to use it.

Now that we're at the bottom of this post, I'll re-state my conclusion: Stored procedures are necessary for proper object-oriented database-based application development. Not using them will force code duplication, tight coupling between applications and databases, and greatly increased man hours for every change made to either application or database.