I'm finally back to having fun, which for me means writing code. Went
to work at 3AM to tackle today's assignment: put together some scripts
to simplify the backup and restore process for a SQL Server database.
Since changing jobs, its seemed like ages since I've been so enthralled
with a problem that I'll bounced out of bed that early.
Being all about SQL Server 2005, I started by efforts in earnest and wrote many of backups and restores like this:
backup log @dbname
to disk = @logFileName
with checksum,stop_on_error...
restore database @dbName
from disk = @lastBackupName
with checksum,stop_on_error...
So you may be saying to yourself, self, what do CHECKSUM and STOP_ON_ERROR mean? Ah, these are new additions for SQL Server 2005 designed to make the backup and restore process a bit more reliable.
CHECKSUM looks at pages being backed up and validates them. It then
rolls these checksums into a checksum for the whole file for a
backup. If used with a restore, the checksums are compared after
operation as a validation of the restore. Page checksums are also
validated on restore. Of course, both the backup and the restore have
to be done with this option for that magic to full work. But when it
does work, you can pretty darned sure you have a high-fidelity restore
of that backup.
STOP_ON_ERROR does that it implies: when the first error occurs during
a backup or restore, that operation halts and throws an error. The
counterpoint to this option is CONTINUE_ON_ERROR. Note that
STOP_ON_ERROR is the default for both backups and restores (the duh
factor should be high on that,) and its pretty obvious what this means
during a restore. But there's some interesting interplay between
CHECKSUM and STOP_ON_ERROR for backups. If both of these options are
set for a given backup command, a backup will abort if and when a torn
page is detected or if a page checksum goes Pear-shapped for some
reason.
The practical upshot of this is that if you a database that's going bad
you, using the combination of CHECKSUM and STOP_ON_ERROR might save
from thinking you have a good backup when, in fact, you might not. And
it aborts the process strongly enough that you can capture it as an
event to go have an operator or DBA go look at to see what repairs they
can make before bigger problems come up to bite you.
Just be careful with this though, because the default for a backup is
to NOT compute these checksums -- you have to explicitly tell it to.
Also, there's no such thing as a free lunch -- computing these checksum
takes a bit longer for the backups. Be prepared for that.
And sure, 3AM is a bit early even for me to attack a task like this.
But then, I like being "back up" early in the morning... (sorry, I
couldn't help myself...)