Dataflow Task (RSS)

Blogs about the dataflow task

Next CTP...

The next CTP is close and I wanted to give you all a preview of some improvements that will be available there.

I'm happy to write that the new transfer tasks will be available. Through a series of planning changes and priority shifts, these tasks were pushed off until the very last minute, but they will be available in the next CTP. They will be backward compatible and integrated with the migration wizard to support migrating the DTS 2000 transfer task incarnations.

Another exciting development in the next CTP will be property expressions in the dataflow task. I'll blog more about this later, it deserves it's own dedicated blog. But, in a limited manner, some components will support property expressions on certain key properties. This will vastly simplify the solution to some difficult dataflow task configuration issues.

Look for these and other improvements in the next CTP.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Normalizing flat files

A common question asked on newsgroups is how to normalize a flat file with DTS.

With DTS the answer was usually, you load the file into a table and then perform queries on the table to extract the colums into normalized tables. For large flat files this can be a real pain because you must use a different set of tools to perform a simple operation and it may be unnecessarily time consuming because you have to load the data before you can even begin to transform it. Yes, ELT.  Finally, you would need to use one process to extract the data and another to transform it, decentralizing the process.

It would be great if it were possible to normalize the data as it flows through the pipeline for a one time load sequence. With SSIS, you can transform the data as you load it, ETL.

For example, maybe a flat file contains header/detail information in the same row. Likely, since the flat file isn't normalized there's some redundant header data. One way to do this in Integration Services is by splitting the flow from a flat file source adapter. For the simple case, where you only need to split the flat file into two tables, use the multi-cast to split the flow into two. On the one flow, the one to extract header information, use the sort t\ransform to remove duplicates. Then, use a lookup that references the header destination table. If the header data (row) is already there, based on the key, then you'll want to discard that row. Here's a case where the error flow comes in handy. For the lookup, flow the rows not found (errors) into the header destination table. Then, only those rows that aren't already in the table will get inserted.

On the other flow, the details flow, you can just flow this directly to the details table. No need to remove duplicates etc. if the details are unique transaction line items etc.and there shouldn't be any redundency. If there is, you can build a flow similar to the one for header information to eliminate dups.

If there is information that needs to be separated into more than 2 tables similar flows can be created along the same lines.

The point here is that it can all be done in the pipe and within one process keeping it centralized and easier to debug and maintain. In some cases, it may also perform better, especially in cases where the header row count is low or there is a low threshold of deviance from a small set of header data.

Update: Just wanted to add one last thing. ETL scenarios all have one thing in common, they're all different. So the approach to them will be different as well. Ultimately, how you approach your solution is going to depend largely on the requirements of the particular project. This post is not about establishing _the_ correct SSIS methodology, but more to explain the expanded possibilities that are available with SSIS vs. DTS.

Universe.Earth.Software.Microsoft.SQLServer.SSIS.KirkHaselden

Raw adapters to use variable access mode

We received feedback that the raw adapters were difficult if impossible to configure. We've modified the raw adapters so that it's possible to easiliy configure them. They now use a variable access mode. The variable points to the location of the raw file and we all know how easy it is to configure variables, right?

This should make it easier to deploy packages with the raw file adapters in them.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

Adding headers to flat files

So, you want to record some information in a flat file about when it was created, what machine it was created on etc.?

The flat file destination adapter has a field in the UI that allows you to enter some text that will be included in the header, but then it's static and won't change from package execution to package execution. There's a lot of information stored in the variables and other locations in the package. It would be nice to be able to place that information into the header.

You can this information into the header and you can build the header to be as complicated as you like.
Here's how:

Create a variable for specifying the location of the destination flat file.
I called it FileLocation. (You can also retrieve the Flat File Connection Manager and get the connection string, which is more robust)
Drop a script task onto the designer.
Specify the FileLocation and System::StartTime variables on the ReadOnlyVariables property in the property grid.
Put the following code into it:
Public Class ScriptMain

Public Sub Main()

Dim time As Variable = Dts.Variables("System::StartTime")

Dim fileLocation As Variable = Dts.Variables("FileLocation")

Dim file As New System.IO.StreamWriter(fileLocation.Value().ToString())

file.WriteLine(time.Value.ToString())

file.Close()

Dts.TaskResult = Dts.Results.Success

End Sub

End Class

Now, drop a pipeline on the designer surface and build it ensuring that the destination flat file adapter has "Overwrite data in the file" _unchecked_.

Drag a success precedence constraint from the script task to the pipeline.

This will put a nice header in the file and allow the pipeline to write all the data after it.

Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden