Monday, February 14, 2005 - Posts

Integration Services Checkpoints...

A few questions have been cropping up regularly about checkpoint restartability. Here are some fundamentals.

Checkpoints are a feature found at the control flow level only

Checkpoints cannot be set in the dataflow. In fact, checkpoints cannot be set anywhere in the package. Precedence constraints are automatically the checkpoints in a package. This is the granularity for this release. So, anytime you create a new precedence constraint, you're essentially creating a checkpoint as well.

Checkpoints are dataflow agnostic, (but there are ways to use them in dataflow)

This is just another way of saying that checkpoints are a control flow feature. So, why be redundant then, Kirk? Well, it's not always clear to folks and some people are still just getting used to the idea of data flow and control flow being separated. Especially, the press. :)

Once a checkpoint file is used to run a previously failed package, the IS runtime deletes it

When the checkpoint file is used in the package, it will be cleaned up.

IS creates a checkpoint file only when the package fails

This is important. If the package doesn't fail, it won't write out a checkpoint file.

Task failure does not necessarily mean that the package failed

On a related note, tasks may fail in a package and the package still not fail. Why? Well, the rationale goes something like this. If a package were to always fail whenever any task failed, then the package could never recover from a task failure. That's why there is an OnFailure precedence constraint option. To handle failed tasks. So, you must cause the package to fail if you wish for a checkpoint file to be created. There are many ways to make a package fail. See below.

There are three properties that control checkpoint restartability, they are:

  • CheckpointFileName - Name of file where IS saves checkpoint information.
  • CheckpointUsage - Enumerated choice IfExists, Never, and Always. Controls if the package uses the checkpoint file. If Exists will use the file if it is found in the location specified by the CheckpointFileName. If not, it will attempt to run the package as though it never failed previously, ie. from the start to finish. If “Never” is specified, then the package will always run as though there were no checkpoint file. Finally, “Always“ is a curious choice. If “Always“ is specified, the package will refuse to run if it doesn't find a checkpoint file. This is useful in the case where it would be destructive to run an entire package subsequent to a failed partial execution. For example, if you had a package that did a simple load. If the package succeeded in loading some data, but then failed on a subsequent operation, running the entire package again would likely result in duplicated data. So, specifying “Always“ ensures that if the checkpoint file isn't available for some reason, the package will refuse to execute.
  • SaveCheckpoints - The package should save a checkpoint file if it fails.

Ways to make a package fail

  • All containers and tasks have a property called FailPackageOnFailure. Set that to true, and if the task or container fails, it will also fail the package. Use this for critical tasks, like a dataflow task, that if it fails, you're sure that the package won't succeed. Peripheral tasks like send mail or other informative type tasks, may not be a good choice here.
  • All containers and tasks also have a property called FailParentOnFailure. This property if set on a direct descendant of a package will fail the package if it fails, just like the FailPackageOnFailure property will. This property can be used in packages to create cascading failures from leaf (task) containers up to the package.
  • ForceExecutionResult is a property you can set in conjunction with the above named properties to simulate a task or container failure. Set it to Failure and it will appear to the IS runtime that the task has failed. ForceExecutionResult is available on all containers as well.
  • Set the MaxErrorCount (maximum number of errors allowed before failure). When the number of errors hits that threshold, the package will fail. This is useful is cases where you want to execute a package that has known errors, but they are handled within the package or are inconsequential. However, you also know that if you get more than n number of errors, something is very wrong and the package should just terminate. This is the sanity check setting for errors.

Hope this helps a bit.

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden