This topic was discussed a bit in a forum post recently that has quite a bit of interest in my opinion. Since I posted about the use of the GUID recently I thought I would do some some testing on the performance differences and the downfalls of using the CHAR and VARCHAR data types.
First we'll consider the functional differences between the 2 data types. The CHAR is obviously a static length character datatype while the VARCHAR is a variable length character datatype. The obviously positive and negative (we’ll cover this later) thing about a VARCHAR, is that you only store the amount of data you need to and no more, well, to get a little deeper you store an additional 2 bytes with every VARCHAR value.
FUNDAMENTAL CONCEPT
When inserting into a CHAR(100) column regardless of the length of the data going into the column, SQL Server will store 100 bytes for this column. When inserting into a VARCHAR(100) column depending on the length of the data going into the column, SQL Server will store the number of bytes of data going into this column.
SELECTS
If you are using the same data to insert into 2 separate tables, one utilizing a char and the other utilizing a varchar, and the length of the data going into those tables are of fixed length, then there will be no performance difference in the retrieval of that data as long as no functions are used as part of the data retrieval. So in other words if I had a char(100) column in the first table and a varchar(100) column in the same spot on the second table and all of the data that went into those columns were exactly 100 characters in length then both data types would perform equally as well. If the data that was going into the char and varchar columns was of variable length (values varying from 1-100 characters equally) then the varchar would perform better. This is not because the varchar data type is more efficient but because there will be fewer data pages to retrieve from disk (I'll get into this some more when I go over the inserts).
INSERTS
Comparing inserts for these data types is where the difference in performance lies because of the storage of data. If the same 2 tables are used as in the select example and the data that is going into those tables is of variable length then the inserts will perform faster on the table with varchar. If those same 2 tables are used and the data is of a fixed 100 character length then the inserts using both data types will be equal.
UPDATES
Now, lets take a future scenario, lets say you have your variable length data in your 2 tables, and you want to do an update. If your varchar(100) value is only storing 10 characters in a particular row. And your data pages are full. And you update it with a value that is 100 characters long, a page split will occur. This will cause this update to be slower than updating your char column value that is 10 characters in length with a 100 character value. Why you may ask? Well because when you insert a 10 character value into a char(100) column, the database will store a value of 100 characters, the original 10 characters plus a padding of 90 spaces on the end. When you insert a 10 character value into a varchar(100) column, the database will store a value only giving space on the data page for the 10 characters. After the page fills there is no room for the additional 90 bytes of data so the system splits the page and moves to row to a new data page, placing a pointer to the new page and row. Evidence of this activity can be seen in the test below.
TEST RESULTS
In these tests I cover a couple of different scenarios.
- Inserting 200,000 rows into a table called T1, which contained an integer identity column and a char(100) column
- Inserting 200,000 rows into a table called T2, which contained an integer identity column and a varchar(100) column
- Selecting all rows from table T1
- Selecting all rows from table T2
- Selecting a range of 15,000 rows from table T1
- Selecting a range of 15,000 rows from table T2
- Selecting a single row from table T1
- Selecting a single row from table T2
- Updating 200,000 rows from table T1
- Updating 200,000 rows from table T1
Inserting 200,000 NULL rows into the char and varchar tables T1 and T2 proves the amount of space allocated to store the data for each table. During these two tests table T1 (char table) required 2817 data pages to store the data. Table T2 (varchar table) required only 322 data pages to store all of the data. This was because the T1 table reserved the full 100 character value for the char column regardless of the fact that the data inserted into the char column was a NULL value and therefore required 2817 pages to store the data. Table T2 did not reserve the 100 characters of space when inserting the NULL value into the table therefore only needing 322 pages to store the data. The inserts for both tables took 60 seconds for T1 and 61 seconds for T2. The inserts were run multiple times and the averages were used for reporting. After the inserts on table T1 the Scan Density and Page Density were 99.44% and 99.09% respectively. After the inserts on table T2 the Scan Density and Page Density were 95.35% and 99.73% respectively.
Selecting all of the rows from both tables only 174 milliseconds for T1 and 152 milliseconds for T2. This is because of the drastic difference in the number of pages being returned by both queries with no WHERE clause. The number of logical reads that required by the system for table T1 were 2819 and the number logical reads required to retrieve all pages for T2 was 323. When returning a range of 15,000 rows from T1 the query completed in 21 milliseconds and retrieved 214 data pages. When returning the 15,000 rows from table T2 the query completed in 17 milliseconds and retrieved only 26 data pages. When returning a single row from T1 the query completed in 0 milliseconds and retrieved 3 data pages. When returning a single row from table T2 the query completed in 0 milliseconds and retrieved only 2 data pages.
The performance hit when using a varchar in these test cases can be seen doing updates on these two tables. When updating all rows in these two tables from a NULL value to a 100 character value the results vary greatly. It took 1032 milliseconds to update all 200,000 rows in table T1, while taking 1911 milliseconds to update all rows in T2. After doing a SHOWCONTIG on the two tables after the update you can see the cost of the updates to be page splits caused by the updates against T2. There are still only 2817 pages for table T1 but now there are 3859 pages for T2 instead of the 322 pages we saw earlier. There is also still a Scan Density and Page Density of 99.44% and 99.09% for T1 but now there is a 46.04% and 74.25% Scan Density and Page Density for T2.
CONCLUSION
There is no performance difference fundamentally retrieving data from the char and varchar datatypes. The difference in performance comes from how you will be utilizing the columns.
Make sure that you plan for the way you will be using the data, be it static or dynamic.
-Zach