posted on Thursday, January 27, 2005 8:40 AM by Knight_Reign

Ragged right clarification

In my SQL Server Magazine article on the Import/Export Wizard, I used ragged right format for importing a flat file with the wizard. I mentioned ragged right format but didn't give a clear enough explanation for some people as is evident from the emails I've received asking for clarification. I'll try to clarify a little better here some of the advantages and attributes of each file format.

Fixed Width
First let's start with fixed width format files. Fixed width files, as their name implies, use the same width for all the columns regardless of the contents of those columns. Typically, this meens that there is a lot of wasted space for columns with data that varies, like names for example. So, these types of files tend to be a little larger on disk than delimited or ragged right. If you care, they're also more difficult to view in notepad or other text editors and more difficult to get right when importing into Excel, Access or other desktop data software. So why would someone use this format? Well, if you don't know why, you probably don't need to use it. Those who do need it know that they need it because they're using legacy software or mainframe software that takes the format. It's usually for backwards compatibility.

Delimited
Delimited files use a delimiter like a comma or tab to separate the columns and typically use a character combination to delimit rows/records. This format is usually the easiest to view in notepad etc. and is very easy to import into software packages like Excel because the delimiter makes it very obvious where the columns are. These are much more compact than fixed width files typically. The only case where these files will be larger is if the data in the fixed width file is exactly conformed and uniform in all columns. In other words, it's the same size for all columns, then there is no need for the extra delimiter or the row/record delimiters. This is rare. There is almost always at least one non-uniformly sized column. Which brings us to Ragged Right.

Ragged Right
Ragged right files are similar to fixed width except, they use a delimiter to mark the end of a row/record. This is a compromise between the two file types that eliminates the column delimiters and makes the file easier to work with, for example, when importing to a different data application or when viewing in a text editor. The difference between ragged right and fixed width means that the last column may be of variable size. Ragged right uses a row/record delimiter, usually <cr><lf>, to mark the end of a row. The row delimiter makes the file easier to deal with in common text editors and when importing, it's easier to find the end of the row. So it's ideal for cases where you only have one non-uniform column etc. or if you need such a compromise to save space.

File Sizes
Of course it really depends on the data, but generally speaking, delimited files come in smaller because less space is wasted on whitespace. There are cases like I mentioned above where that isn't true. I personally like working with delimited files the most, especially comma delimited files. Again, there's a reason for having so many different formats and depending on your requirements, one will work better than another. Many times it comes down to what your legacy software will support. Then you don't have a choice in the matter.

Package
I've created a package that illustrates the differences. It uses deployment, so you can configure it before you run it to point to the correct location for dropping the output flat files. Basically, it sends the employees table through a multicast to three flat file destination adapters. Each one uses one of the three different formats described here. Playing with the column widths in the flat file connection managers gives you a good feel for how setting column widths effects the resulting size and format of the files. I encourage you to use a larger dataset as well and get a feel for how changing the column types effects the performance.
Here's what the package looks like:

And here is where you can download the package.

Let me know how it goes.

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

Comments

# flat file thoughts @ Monday, February 26, 2007 9:30 PM

Hey Kirk -
I was googlin' for "flat file fixed width etl SSIS" in an effort to see if any tools had been developed to deal w/ large, fixed width flat files.

your blog came up on page1, so I thought I may point you to this thread that had some interesting details -

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=603748&SiteID=1

I wonder how many customers who use (or plan to use) SSIS face this issue?

arjun